Next Article in Journal
The Integration of AI and IoT in Marketing: A Systematic Literature Review
Previous Article in Journal
Recent Advances and Applications of Odor Biosensors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks

School of Information Science and Technology, Shihezi University, Shihezi 832003, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(9), 1853; https://doi.org/10.3390/electronics14091853
Submission received: 15 March 2025 / Revised: 24 April 2025 / Accepted: 25 April 2025 / Published: 1 May 2025

Abstract

:
Ensuring the security of remote sensing images is essential to prevent unauthorized access, tampering, and misuse. Deep learning-based digital watermarking offers a promising solution by embedding imperceptible information to protect data integrity. This paper proposes SIR-DCGAN, an attention-guided robust watermarking method for remote sensing image protection. It incorporates an IR-FFM feature fusion module to enhance feature reuse across different layers and an SE-AM attention mechanism to emphasize critical watermark features. Additionally, a noise simulation sub-network is introduced to improve resistance against common and combined attacks. The proposed method achieves high imperceptibility and robustness while maintaining low computational cost. Extensive experiments on both remote sensing and natural image datasets validate its effectiveness, with performance consistently surpassing existing approaches. These results demonstrate the practicality and reliability of SIR-DCGAN for secure image distribution and copyright protection.

1. Introduction

In the field of image security [1], guaranteeing the reliability and authenticity of transmitted data is fundamental to safeguarding information. Although numerous protective techniques have been suggested, such as image encryption, these approaches often prove inadequate when applied to remote sensing images. While encryption can successfully block unauthorized access, it frequently disrupts subsequent data analysis and practical use. On the other hand, digital watermarking [2] provides a more sophisticated solution by embedding invisible marks that maintain visual quality while allowing for copyright verification.
This issue is particularly evident in the realm of remote sensing. Remote sensing images, which are used in satellite monitoring, military intelligence, and environmental observation, are typically high-resolution and multispectral, meaning that even slight distortions or unauthorized modifications can result in substantial misinterpretations. Unlike conventional multimedia images, remote sensing data requires a security solution that minimizes perceptual degradation while maintaining resilience to various distortions, channel noise, and deliberate attacks. In this regard, digital watermarking presents a compelling solution, offering a balance between robustness, embedding capacity, and imperceptibility.
In recent years, digital watermarking has garnered significant academic attention as a means of copyright protection for remote sensing imagery. A variety of algorithms specifically designed for this purpose have emerged, many of which are rooted in traditional frequency-based approaches, including those employing cosine domain transformations, multi-scale wavelet analysis, and matrix factorization techniques such as singular value decomposition (SVD) [3,4,5]. These methods utilize spectral transformations and decomposition principles to achieve improvements in the imperceptibility, resilience, and security of the watermark. However, conventional watermarking strategies still encounter multiple challenges in real-world scenarios.
  • Susceptibility to distortion: many classical techniques show weak resistance to common perturbations, including compression artifacts (e.g., JPEG), random noise interference, and spatial transformations, resulting in the partial or complete degradation of embedded watermark signals.
  • Low adaptability of handcrafted features: these solutions typically depend on manually engineered embedding patterns and robustness heuristics, which are often insufficient for the high complexity and variability inherent in remote sensing imagery and the diverse threat models involved.
  • High computational demands: some watermarking systems, although robust, impose significant computational overhead, limiting their feasibility for real-time use in large-scale remote sensing applications.
Recent breakthroughs in deep learning have opened up novel avenues for improving the robustness, imperceptibility, and adaptive capabilities of digital watermarking systems. A growing body of research in image processing demonstrates the potential of deep learning models to surpass the limitations of conventional watermarking approaches. Deep learning-based watermarking techniques for remote sensing images can generally be categorized into two main branches. Among them, convolutional neural networks (CNNs) [6] and generative adversarial networks (GANs) [7] have become prominent tools in this domain. One of the pioneering works by Haribabu et al. [8] introduced a CNN-based model that embedded watermark information through an encoder–decoder structure inspired by autoencoders. Subsequently, Mun et al. [9] introduced an iterative learning framework built upon CNNs, further boosting watermark robustness. These neural network-based solutions can be viewed as evolved variants of frequency-domain approaches and are now integral to many modern watermarking strategies. In a representative example, Ahmadi et al. [10] presented ReDMark, an end-to-end framework employing residual CNNs and incorporating simulated attacks as differentiable transformations. This technique enhances both security and robustness by dispersing watermark signals across a broader spatial domain. Meanwhile, Luo et al. [11] introduced a deep-learning watermarking method that does not rely on predefined distortion models during training. Instead, it gains resilience through adversarial learning combined with channel coding, demonstrating superior generalization against unforeseen perturbations. Despite these advancements, difficulties persist in accurately extracting semantically meaningful image features and effectively integrating them into the watermark embedding process, which continues to constrain overall performance under noise and attack scenarios.
In the realm of generative adversarial networks (GANs), a wide array of image watermarking approaches have been proposed, leveraging the adversarial interplay between generator and discriminator components to achieve a favorable balance among embedding strength, stealth, and robustness [12]. One of the earliest milestones in this direction is the HiDDeN framework introduced by Zhu et al. [13], which adopts an encode-perturb-decode architecture. In this setup, the encoder embeds watermark signals into images with minimal visual disruption, while the decoder is trained to accurately retrieve the watermark even under distortions. Hao et al. [14] explored GANs specifically in remote sensing contexts by designing a model where the generator is responsible for both embedding and recovering watermark information under varying noise levels, while the discriminator not only detects watermark presence but also introduces simulated degradations. To encourage watermark placement in frequency bands less noticeable to the human eye, a high-pass filter was applied prior to the discriminator, thereby improving both stealth and durability.
Building on these efforts, Liu et al. [15,16,17,18] devised a modular dual-stage learning pipeline consisting of a clean-condition adversarial training phase followed by a distortion-aware refinement process. This architecture integrates a deeply layered redundant encoding module [19] that enhances information redundancy during embedding and maintains resilience under challenging noise conditions. Jia et al. [20] further tackled real-world degradation scenarios by introducing a mini-batch training approach that incorporates both real and synthetically compressed images using JPEG-based artifacts, thereby improving generalization. In a similar vein, Huang et al. [21] advanced the field with an attention-driven adversarial watermarking framework (ARWGAN), where attention layers actively guide watermark placement toward regions of lower perceptual significance. Lastly, Fernandez et al. [22] proposed a fully self-supervised embedding paradigm, which uses unsupervised feature extraction under various data augmentations to generate robust, fused representations of image features and watermark signals, enhancing overall stability across perturbations.
The aforementioned research demonstrates that convolutional and generative adversarial network-based watermarking models can effectively integrate the stages of embedding and retrieval into a unified learning framework. This end-to-end optimization bypasses the constraints imposed by manually engineered features in conventional approaches. Nonetheless, the majority of such methods have been developed for general-purpose natural images, leaving the domain of remote sensing relatively underexplored. Remote sensing imagery presents a set of distinct challenges, including ultra-high spatial resolution, multispectral content, diverse scene structures, and the demand for robustness under dynamically changing conditions, which differentiate it substantially from standard image datasets. These unique demands highlight the necessity for watermarking strategies specifically tailored to the characteristics of remote sensing data. Accordingly, this work is dedicated to addressing that gap through targeted algorithmic innovation.
Existing GAN-based watermarking methods, including HiDDeN, ReDMark, and ARWGAN, typically target general image domains and lack specific adaptation to remote sensing scenarios. In contrast, our method leverages a DCGAN-based architecture enhanced with a multi-scale feature fusion module and attention mechanisms tailored for the challenges of high-resolution remote sensing images. In particular, deep convolutional generative adversarial networks (DCGANs) offer unique advantages over conventional CNNs and GANs in remote sensing image watermarking. A DCGAN combines the feature extraction strength of CNNs with the adversarial learning mechanism of GANs, making it more effective in image generation and classification tasks. Unlike GANs’ fully connected networks, A DCGAN introduces convolutional and transposed convolutional layers in both the generator and discriminator, enabling superior multi-scale feature extraction suited to the high-resolution and multispectral nature of remote sensing images.
A DCGAN’s adversarial training mechanism enhances both the imperceptibility and robustness of embedded watermarks. It also improves training stability and image quality, ensuring resistance to noise, cropping, compression, and other common attacks while preserving watermark integrity. These attributes make the DCGAN a promising solution for secure remote sensing image protection.
Recent studies have demonstrated the DCGAN’s significant potential in remote sensing applications. Sürücü et al. [23] developed a multispectral DCGAN (MS-DCGAN) model to generate synthetic multispectral images and proposed a TransStacking model to distinguish between fake and real images with high accuracy. Wu et al. [24] introduced a DCGAN-based image colorization method for remote sensing imagery, incorporating multi-scale convolution and residual networks to enhance both visual quality and quantitative metrics. Jia et al. [25] combined a DCGAN with graph convolutional networks (GCNs) to develop the DGCGAN model, improving classification accuracy and robustness. Zhu et al. [26] enhanced sample diversity in aquaculture information extraction using an improved DCGAN algorithm. Shanmugam et al. [27] proposed DcGAN-HAD, a dual-discriminator conditional DCGAN for anomaly detection in hyperspectral images, which significantly improved detection accuracy and efficiency through optimized training.
These advances illustrate the DCGAN’s promising potential in remote sensing image watermarking, particularly in feature extraction, image generation, and resilience under complex conditions. This study adopts a DCGAN-based approach to explore robust, imperceptible, and efficient watermarking solutions for remote sensing imagery.
Recent advancements in image protection, such as the semantically enhanced selective encryption method proposed by Liu et al. [28], leverage semantic segmentation and parallel computing to achieve efficient visual obfuscation. While effective for privacy-preserving applications, such encryption-based approaches often modify pixel values and risk destroying spatial–spectral integrity, making them less suitable for scenarios requiring scientific analysis or image fidelity, such as remote sensing.
In contrast, Zhang et al. [29] introduced a frequency-domain attention-guided adaptive watermarking model that improves robustness under signal distortions by embedding in the DCT domain. However, the fixed-frequency embedding approach may struggle with high-frequency variability in real-world remote sensing images. Furthermore, frequency-domain techniques often lack spatial adaptability, which limits their ability to preserve semantic consistency across multi-scale geospatial regions.
To address these limitations, our method employs a spatial-domain generative strategy using a DCGAN enhanced with a multi-scale feature fusion module and attention-guided embeddings. This design enables our framework to dynamically adapt watermark placement based on spatial semantics while preserving spectral integrity, traits that are particularly critical for remote sensing applications.
In conclusion, while frameworks like HiDDeN have made notable strides in improving the imperceptibility of embedded watermarks, they often fall short in extracting semantically rich and robust features, especially when applied to remote sensing data. These models typically fail to make full use of the distinctive spectral and spatial patterns inherent in remote sensing imagery, which limits their ability to withstand noise interference and ultimately undermines watermark reliability. To overcome these shortcomings, we propose a novel watermarking framework for remote sensing images by incorporating both an attention-driven mechanism and a multi-scale feature fusion strategy within a deep convolutional GAN architecture. Rather than relying on traditional feature pipelines, our method seeks to enhance the depth and robustness of learned representations, thereby improving resistance to various noise perturbations. By jointly optimizing feature extraction and adaptive embedding through specialized modules, this work offers a more resilient and context-aware solution for protecting the integrity of remote sensing imagery.
To tackle the limited feature representation and suboptimal robustness observed in existing deep learning-based watermarking techniques, this study puts forward a novel robust watermarking framework tailored for remote sensing imagery. The method integrates an attention-driven strategy with multi-scale feature fusion, all constructed within a deep convolutional generative adversarial network paradigm. Compared to conventional approaches, this work makes the following key contributions:
  • We design a unified framework for watermarking remote sensing images by coupling a DCGAN-based backbone with an attention enhancement module. This combination not only improves the imperceptibility of the embedded watermark but also strengthens its resilience against degradation. The incorporation of a domain-aware feature extraction unit allows for more effective utilization of spatial–spectral patterns during watermark encoding.
  • A dedicated multi-level feature fusion module is proposed to boost the stability and anti-perturbation capacity of the embedding process. By capturing hierarchical remote sensing image features across scales, this module ensures that the watermark is distributed in a way that balances invisibility and robustness, particularly under compression and noise-based attacks.
  • An adaptive attention mechanism is incorporated to refine the spatial distribution of the watermark. This component automatically identifies less perceptually sensitive yet semantically stable regions, allowing the watermark to be embedded in areas where it remains concealed while preserving robustness under complex image transformations.
  • The decoding architecture is optimized to better withstand interference from noisy or lossy environments. By refining the extraction pipeline, the framework achieves reliable watermark recovery even under severe compression or transmission noise, thereby enhancing real-world applicability.
Comprehensive experimental validation shows that the proposed method consistently surpasses prior models across multiple evaluation metrics, including imperceptibility, robustness, and resistance to distortion. Notably, it maintains stable performance under adversarial conditions such as JPEG compression and noise contamination, demonstrating its viability for robust and secure remote sensing image copyright protection.

2. Methods

In this study, we present SIR-DCGAN, a remote sensing image watermarking strategy built upon a deep convolutional generative adversarial framework. The proposed model is designed to strike a dual objective: enhancing both the imperceptibility and resilience of embedded watermarks. The DCGAN backbone is selected for its reliable adversarial training behavior and proven capacity in producing high-fidelity images. Within this architecture, the generator module embeds the watermark into the input image, while the discriminator learns to identify discrepancies between authentic and watermarked versions. Through this adversarial interplay, the generator is incentivized to produce outputs that are visually indistinct from the original inputs while remaining resilient under distortion.
To achieve a refined balance between accurate watermark insertion, robustness during retrieval, and minimal visual disruption to the host image, we construct a cascaded architecture comprising an embedding module, a simulated attack unit, and a decoding network. These components are trained jointly in an end-to-end fashion, allowing the system to co-optimize visual quality and robustness throughout the learning process. To overcome the limitations of conventional approaches in capturing multi-scale semantics, we incorporate the Inception-ResNet [30] feature fusion module (IR-FFM), which blends multi-scale convolution filters with cross-layer dense connections. This design promotes more effective interaction between shallow and deep features, accelerating convergence and strengthening the resistance of the watermark to noise perturbations.
In addition, to minimize visual degradation typically caused by embeddings, we develop a squeeze-and-excitation attention module (SE-AM). This component generates global attention masks by extracting context-aware features, thus guiding the embedding process toward texture-dense yet perceptually tolerant regions. This selective reinforcement of embedding zones significantly improves the visual fidelity of the output image while preserving the watermark.
To further enhance robustness under compound attack scenarios, a dedicated noise simulation network is introduced. This module emulates a variety of real-world distortions, including additive noise, compression artifacts, and cropping, and injects them into the training pipeline. The model receives feedback from these simulated distortions during training, enabling it to iteratively refine its embedding strategy and adapt to evolving threat patterns. Through this dynamic optimization mechanism, the watermarking model becomes increasingly resilient to diverse forms of interference.
Quantitative evaluations confirm that the proposed framework delivers superior performance under a wide range of attack conditions, including JPEG compression, random noise, and hybrid perturbations. At the same time, it maintains high visual quality and efficient embeddings, offering a practical and robust solution for securing remote sensing image content.
SIR-DCGGAN achieves high-quality watermark embedding and accurate extraction through the joint optimization of an encoder, a noise sub-network, a decoder, and a discriminator. During implementation, the original input I c o and the watermark M are fed and processed by the encoder to output a watermarked image I e n . Subsequently, after I e n undergoes attacks via the noise layer, it is input to the decoder to extract the embedded watermark M . Simultaneously, the discriminator classifies both the host image and the watermarked image to ensure the quality of the watermark embedding. Throughout the network training process, by optimizing the generation loss, decoding loss, and discrimination loss, the model simultaneously achieves imperceptible watermark embedding and highly robust extraction.
The total loss guiding the training process is constructed as follows:
L a l l = λ × L a d v e r s a r i a l l o s s + M S E ( y , x h ) + M S E ( x s , x s )
Here, the coefficient λ serves to balance the influence of the adversarial loss within the total objective. The term M S E ( y , x h ) measures the mean squared error between the original host image x h and the generated watermarked image y , thereby constraining the degree of distortion introduced during watermark embedding and preserving visual fidelity. Similarly, M S E x s , x s evaluates the discrepancy between the watermarked image before attack x s and its degraded version x s following noise perturbation, thus quantifying the model’s resistance to distortion and its ability to retain watermark information. The formulation for the adversarial loss component, L a d v e r s a r i a l l o s s , is given as follows:
L a d v e r s a r i a l l o s s = ( y l o g ( D ( x s ) ) + ( 1 y ) l o g ( 1 D ( G ( x s ) ) ) )
where D ( · ) and G ( · ) denote the outputs of the discriminator and generator networks, respectively; y is the ground-truth label; D x s represents the probability assigned by the discriminator to the real watermarked image x s ; and D G x s denotes the probability assigned to the generated watermarked image G x s .
The design of the loss components simultaneously constrains visual fidelity and promotes embedding resilience, aiming to minimize perceptual differences between the original and watermarked images while enhancing resistance to distortions. This joint optimization strategy ensures that the generated watermarked outputs maintain both high quality and robustness.
In summary, by incorporating the SE-AM and IR-FFM modules, as illustrated in Figure 1, SIR-DCGGAN not only effectively enhances the invisibility and embedding robustness of the watermark, but also significantly improves the model's adaptability to complex attack scenarios. Experimental results demonstrate that, compared with the state-of-the-art (SOTA) watermarking model, SIR-DCGGAN exhibits marked advantages in both watermark quality and resistance to attacks.

2.1. Encoder

The encoder structure is shown in Figure 2. The input consists of a host image I i n of size C × H × W and a binary watermark M ( 0 , 1 ) L of length L . The output is the watermarked image I e n with the same dimensions as I i n . The core modules include the squeeze-and-excitation network attention module and the Inception-ResNet feature fusion module.
The squeeze-and-excitation network attention module generates an attention mask M A to optimize the watermark embedding region, balancing watermark imperceptibility and robustness. This module extracts deep features to generate M A , guiding watermark embedding into texture-rich but visually inconspicuous areas, thereby reducing distortion and enhancing robustness. The module consists of L a y e r v , which extracts deep features, and L a y e r A , which generates the probability distribution of feature channels via SoftMax, ultimately producing M A .
IR-FFM extracts both shallow and deep features through dense connections, integrating them with the watermark to improve robustness. L a y e r D extracts and reuses image features, while L a y e r F further refines the feature distribution, ultimately generating the watermarked features F W Combined with M A , the adjusted features are represented as F A W = F W × M A . The encoder ensures high-quality watermark embedding by minimizing the following loss function:
L E = M S E ( I i n , I e n ) = | | I i n I e n | | 2 2 / ( C × H × W )
Here, I i n and I e n denote the original host image and the corresponding watermarked version. The term | | I i n I e n | | 2 2 calculates the Euclidean distance in pixel space, capturing the visual perturbation caused by embedding the watermark. To maintain consistency across images of different dimensions, this loss is normalized by the total number of pixels, C × H × W . Minimizing the embedding loss L E guides the encoder to retain the visual fidelity of the host image, ensuring that the watermark remains imperceptible while sustaining robustness under potential attacks. This design contributes to producing high-quality watermarked images with minimal visual artifacts.
Table 1 shows the architecture of the generator in the proposed SIR-DCGAN framework. The network begins with a latent noise vector, which is passed through a series of transposed convolutional layers to generate a 64 × 64 output image. Two custom-designed modules—IR-FFM and SE-AM—are integrated after the first and second deconvolutional layers, respectively, to enhance feature fusion and attention allocation. The IR-FFM module improves multi-scale semantic retention, while the SE-AM module adaptively highlights regions suitable for watermark embedding. This architecture balances robustness and imperceptibility while maintaining a lightweight design.

2.2. Noise Network

The noise sub-network applies multiple image distortions to the watermarked image I e n , including Crop, Cropout, Dropout, JPEG compression, Gaussian blur, and Rotate, to enhance the model’s robustness against image degradation. Specifically, the Identity operation denotes no attack, where I e n remains unchanged. The Crop operation randomly removes a square region of the image with a proportion of P . Cropout and Dropout replace the cropped or randomly selected pixel regions with the corresponding regions from the host image I c o . Gaussian blur applies a convolution kernel with w i d t h   r = 3   a n d   v a r i a n c e   σ = 1 to blur I e n . The Rotate operation rotates I e n counterclockwise by α = 15 ° . JPEG compression is performed using a quality factor of Q ∈ (0, 100).
Due to the non-differentiability of quantization and rounding operations in JPEG compression during training, a differentiable approximation is required for gradient propagation. Common approaches, such as JPEG-Mask and JPEG-Drop, randomly zero out high-frequency discrete cosine transform (DCT) coefficients. Furthermore, to approximate the rounding operation and ensure differentiability, the following transformation is applied:
x a p p r o x = x + ( x x ) 3
where x denotes the floor function and ( x x ) 3 smooths the rounding operation, ensuring differentiability for gradient propagation. This approximation smooths the rounding function, enabling gradient-based optimization in the presence of JPEG compression.

2.3. Decoder

The decoder structure is illustrated in Figure 3. It operates as the inverse process of the encoder, excluding the attention mechanism (AM), and primarily consists of multiple Inception-ResNet modules to extract the watermark from the distorted watermarked image I n o . Since the watermark is redundantly embedded across the entire image, where a single bit of information corresponds to modifications in multiple pixels, the Inception-ResNet module is introduced to enhance extraction accuracy. This module integrates convolutional layers with different kernel sizes to capture multi-scale feature information, improving the recovery of the extracted watermark M . The decoder optimizes its parameter θ D by minimizing the decoding loss L D , defined as
L D = M S E ( M , M ) = | | M M | | 2 2 / L
Here, M refers to the watermark prior to embedding, while M corresponds to the one retrieved after decoding, and L denotes the bit length of the full watermark sequence. This component of the loss function calculates the average squared deviation between the original and extracted watermark data, effectively assessing the decoder’s recovery precision. Minimizing L D guides the decoder to achieve reliable extraction performance, maintaining high reconstruction integrity even when the watermarked image undergoes distortion or degradation.

2.4. Discriminator

Figure 4 shows the architecture of the discriminator, which receives both the host image I c o and the watermarked image I e n as inputs. These images undergo three convolutional layers, each with 64 filters and a 3 × 3 kernel, followed by global average pooling and a final fully connected layer. The output is a value between 0 and 1, where values near 1 indicate a host image, while values close to 0 correspond to a watermarked version. To stabilize the training process, spectral normalization (SN) is applied. Throughout the training, the encoder functions as the generator, aiming to produce watermarked images that are indistinguishable from the original host images, making it challenging for the discriminator to differentiate between the two.
The generator loss L G and the classification loss L A are defined as follows:
L G = l o g ( 1 A ( I e n ) )
L A = l o g ( 1 A ( I c o ) ) + l o g ( A ( I e n ) )
where I c o denotes the host image, I e n represents the watermarked image, and A ( · ) refers to the discriminator’s output, which estimates the likelihood that an image is a host image. The generator loss L G encourages the encoder to create watermarked images that closely resemble the host images by minimizing the discriminator’s ability to correctly classify them. Simultaneously, the classification loss L A ensures that the discriminator learns to distinguish between host and watermarked images effectively. Minimizing both losses leads to improvements in the visual quality of the watermarked images while also enhancing the discriminator’s ability to perform accurate classifications.
Table 2 outlines the architecture of the discriminator in the proposed SIR-DCGAN model. It adopts a standard DCGAN-based design, which progressively downsamples the input image through a series of convolutional layers. Each layer is followed by LeakyReLU activation and batch normalization (except the first), ensuring stable training and effective feature extraction. The final sigmoid output produces a probability indicating whether the input image is real or generated. This architecture supports the adversarial training process to improve watermark imperceptibility and authenticity.

3. Experiments

The experimental procedure in this work is organized into several stages. In the initial phase, the Cats and Dogs dataset [31] and the SIRI-WHU remote sensing image dataset [32] are utilized for training. Of the total images, 80% are designated for training, and the remaining 20% are set aside for testing. To assess the model’s ability to generalize, additional datasets like COCO [33], VOC2012 [34], and CelebA [35] are incorporated. The proposed method is designed to handle grayscale images of arbitrary dimensions for both watermarking and covering. These images are resized to a uniform 128 × 128 resolution and normalized to a pixel range of 0 to 1.
In the model training phase, the SIR-DCGAN, implemented in PyTorch 2.0, is trained with the Adam optimizer and a learning rate of 0.0003. The model’s performance is evaluated using PSNR and SSIM for watermark invisibility, while normalized correlation (NC) is employed to assess robustness against image processing attacks. The results show that SIR-DCGAN achieves strong performance, with PSNR and SSIM values of 40.93 dB and 0.9957, respectively, indicating minimal perceptible difference between the watermarked and original images. Additionally, the model’s average processing time is 0.37 s, surpassing other similar methods.
A comparison with other existing methods demonstrates that SIR-DCGAN consistently achieves superior PSNR and SSIM scores, while its NC values stay above 0.9 across a range of image processing attacks, confirming its robustness. These experimental findings collectively validate the effectiveness of SIR-DCGAN for remote sensing image copyright protection and content authentication, highlighting its exceptional performance with low computational overhead.
Figure 5 provides a schematic overview of the watermark embedding process, highlighting the imperceptibility of SIR-DCGGAN. It includes the original image, the embedded watermark, a representation of the attention mechanism, and the final watermarked image. As illustrated in Figure 5a,d, the original and watermarked images appear nearly identical, demonstrating the effective concealment of the watermark. Figure 5c visualizes the attention mechanism applied to various regions, with different colors signifying varying levels of attention. For instance, red areas, which are rich in texture, are designated for higher watermark embedding intensity, while other areas receive lower intensity. The embedding process is outlined in steps (a) through (d). At the receiver’s end, the concealed watermark serves as proof of authenticity and document integrity, allowing verification without ambiguity.
This experiment uses noise attacks to validate the robustness of the proposed method. The first row of Figure 6 shows the watermarked image I e n after watermark embedding into the host image under joint attacks from multiple noise layers. The second row displays the corresponding noisy image I c o obtained after the watermark image I e n undergoes noise layer attacks. The third row presents the difference image | I e n I c o | , with the grayscale space [36] appropriately magnified.
This study employs a two-stage training approach, where the decoder parameters are fixed. This method allows real JPEG compression attacks to be directly integrated into the training process, enhancing the model’s robustness against actual JPEG compression. By incorporating these compression distortions during training, the model learns to effectively handle real-world JPEG compression, improving its ability to maintain watermark integrity under such conditions.
The hyperparameters [37] used for network training are summarized in Table 3.

4. Results and Discussion

This section presents a detailed performance assessment of the proposed watermarking approach across various evaluation criteria.
To begin with, we assess the watermark imperceptibility by calculating PSNR and SSIM scores and benchmark the results against the approaches proposed by Ahmadi et al., Jia et al., and Huang et al.
Subsequently, we examine the model’s resilience to noise by computing the average watermark embedding accuracy (BA) and contrast it with the outcomes achieved by Mahapatra et al., Jia et al., and Madhu et al., illustrating its robustness under diverse interference scenarios.
Lastly, the model’s computational efficiency is evaluated through its execution time, with a comparative analysis against other existing techniques [38], underscoring both its practicality and speed.

4.1. Invisibility

To assess the perceptual transparency of the proposed watermarking technique, we utilize two standard evaluation indicators: the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM). These indicators are commonly applied to determine how closely the watermarked image approximates the original, thereby measuring the degree of visual distortion introduced during watermark insertion. A higher value in either metric typically signifies that the embedded watermark causes minimal perceptible change, preserving the overall visual fidelity of the host image. PSNR quantifies the difference in pixel values between the original and watermarked images, while SSIM assesses the structural similarity by considering luminance, contrast, and texture. Together, these metrics provide a comprehensive evaluation of watermark invisibility, ensuring that the watermarking process maintains high visual quality.
The choice of evaluation metrics in this work is grounded in both their theoretical robustness and frequent citation in the literature on general and remote sensing image watermarking. PSNR serves to capture signal fidelity at the pixel level, making it well suited for assessing minor distortions. SSIM, by contrast, focuses on preserving spatial structures, which is critical for images rich in textures and man-made features, as often seen in remote sensing applications. Additionally, normalized correlation (NC) is employed to quantify the consistency between the recovered watermark and its original version, especially when the image has been subjected to various forms of degradation. Together, PSNR, SSIM, and NC offer a multi-perspective evaluation of both watermark invisibility and durability. This comprehensive set of metrics ensures that the performance of the proposed watermarking method is rigorously assessed across multiple dimensions. While PSNR and SSIM predominantly measure visual quality, NC highlights the robustness of watermark retrieval, ensuring that both aspects are adequately considered in the evaluation process. Consequently, these metrics provide a balanced approach to gauge the efficacy of the watermarking technique in real-world applications.
To demonstrate the effectiveness of the proposed model, we conducted an experimental verification of the watermark invisibility using both PSNR and SSIM metrics. Table 2 presents the comparison results between our method and existing state-of-the-art (SOTA) watermarking models. We compared the performance of methods proposed by Ahmadi et al. [10], Jia et al. [20], Huang et al. [21], Madhu et al. [39], Wei et al. [40], and Mahapatra et al. [41], using PSNR (in dB) and SSIM as evaluation criteria to verify the improvement of watermark invisibility achieved by the network structure proposed in this study.
Our method shows the best PSNR and SSIM values on the SIRI-WHU remote sensing image dataset, with values greater than 39.99 and 0.9958, respectively, outperforming other datasets. This indicates that our method is more compatible with remote sensing images, which often contain complex textures and spatial features that are critical in this domain. Although the PSNR value on the Cats and Dogs dataset and the SSIM value on the CelebA dataset are relatively lower compared to other datasets, they still represent improvements over existing methods. These differences can be attributed to the unique characteristics of each dataset, such as variations in image content, resolution, and complexity. Therefore, these lower values do not imply poor performance; rather, they suggest that the method performs slightly less well on these datasets compared to the remote sensing datasets, which are more aligned with the intended application of the watermarking technique.
It is worth emphasizing that the proposed approach consistently achieves higher PSNR and SSIM scores across all benchmark datasets utilized in this research, demonstrating a clear performance advantage over previously published techniques. This consistent outperformance highlights the robustness and generalizability of the proposed method across diverse image types. This enhancement in both metrics underscores the method’s effectiveness in embedding watermarks with minimal visual degradation, affirming its suitability for high-fidelity digital watermarking scenarios. These results also suggest that the method is adaptable to various image conditions, reinforcing its potential for practical deployment in real-world applications.
PSNR [42] and SSIM [43] serve as indicators of visual similarity between the watermarked output and the original cover image, offering insights into the imperceptibility of the watermarking process. The comparative results provided in the accompanying table reveal that the network architecture proposed in this work achieves superior results when compared to leading existing frameworks. As illustrative examples, the highest PSNR outcomes achieved by the competing methods developed by Ahmadi et al., Jia et al., Huang et al., Madhu et al., Wei et al., and Mahapatra et al. are 35.93 dB, 36.64 dB, 35.87 dB, 37.54 dB, 37.91 dB, and 31.33 dB, respectively, all of which fall short of the performance attained by our method.
Table 4 presents a comprehensive evaluation of the proposed method’s performance across various datasets, showcasing its remarkable effectiveness and versatility. In particular, the method achieves an impressive PSNR value of 40.93 and an SSIM score of 0.9957 when tested on the SIRI-WHU remote sensing image dataset. These results not only emphasize the model’s exceptional ability to preserve visual quality but also underline its capacity to maintain structural coherence in remote sensing images. The high PSNR value signifies that the watermarked images are nearly indistinguishable from the original images, while the SSIM score further affirms that the embedding process causes minimal structural degradation, preserving the essential features of the image.
Moreover, despite being specifically designed with a focus on remote sensing imagery, the proposed model demonstrates remarkable performance when evaluated on several general-purpose datasets, such as COCO, VOC2012, and CelebA, outperforming existing methods in these domains. This finding not only reinforces the model’s exceptional capability in handling remote sensing images but also underscores its extraordinary generalization ability across a wide range of image types and tasks. The fact that the model excels in these general-purpose datasets, which are traditionally used in more conventional computer vision tasks, illustrates its robustness and versatility. It is a strong indication that the model is not limited to a specific application area but is also highly adaptable to different types of visual data, maintaining a high level of performance and quality regardless of the dataset’s nature. This generalization ability emphasizes that the model can effectively handle images from diverse domains without compromising the accuracy, visual quality, or structural integrity, ensuring consistent results. Such adaptability is crucial for practical applications, where datasets often come from various sources with differing characteristics, yet the model still manages to meet the rigorous demands of both remote sensing and general-purpose tasks with the same level of excellence.
To deepen the understanding of the relationship between the two key quality metrics, PSNR and SSIM, Figure 7 provides a visual representation of their joint distribution. This figure offers a detailed view of the correlation between these metrics, helping to elucidate the interplay between visual imperceptibility and structural integrity in watermarked images. The joint distribution depicted in Figure 7 reveals denser red regions, which represent areas where particular combinations of PSNR and SSIM values occur more frequently. These areas are indicative of the regions where the watermarked images perform exceptionally well, preserving both the quality and the structure of the original host images.
The PSNR values generally fall within the range of 35 to 44, with higher PSNR values suggesting that the watermarked images closely resemble the original host images, showing little to no visible distortion. The SSIM values, on the other hand, typically range from 0.94 to 1.0, with values approaching 1 signifying that the watermark embedding process has had minimal impact on the structural features of the image. This range of SSIM values further illustrates the model’s ability to retain the structural integrity of the watermarked images, ensuring that the watermark does not distort critical features like edges, textures, or shapes.
The noticeable clustering of PSNR and SSIM values within these ranges reinforces the conclusion that the proposed method strikes an optimal balance between embedding a strong watermark and preserving the quality and structural fidelity of the watermarked images. This consistent performance across both metrics demonstrates the model’s robustness and effectiveness in achieving imperceptible watermarks while maintaining high levels of image quality and structural coherence. As a result, this method stands out as a highly effective solution for watermarking in remote sensing images, offering a promising approach for safeguarding the integrity and confidentiality of sensitive remote sensing data while ensuring minimal visual and structural degradation.

4.2. Robustness

This section examines the robustness [44] of the proposed watermarking method against various image distortions, specifically assessing its resilience to different types of attacks. Watermark accuracy is quantified using the normalized correlation (NC) metric. When compared to leading methods in the field, our approach demonstrates a clear advantage in terms of robustness. Experiments across multiple datasets show its strong performance despite a range of noise and manipulation types.
To further assess the model’s effectiveness, we calculate watermark embedding accuracy (BA) for test images. The results are compared with the approaches by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39], considering different noise levels, as shown in Figure 8. In panels (a)–(e), the BA of our model consistently outperforms others, particularly in scenarios involving Gaussian blur, Resize, and JPEG compression. This emphasizes the superior robustness of our method to various distortions. While other models, such as those by Mahapatra et al., Jia et al., and Madhu et al., produce higher image quality, our model achieves better BA, underscoring the importance of the noise sub-network in defending against attacks. Notably, Mahapatra et al.’s model exhibits a BA below 75% in certain conditions, such as with Gaussian blur (δ ≥ 1.25) and JPEG compression (Q ≤ 90), highlighting its limited capacity to protect against image distortions.
To further evaluate robustness, the impact of multiple noise types combined is tested, as illustrated in Figure 9a–d. The findings demonstrate that our model consistently outperforms the other methods in terms of BA, especially under more complex distortion conditions. This highlights the strength of our model not only in handling individual noise types but also in managing more intricate scenarios with multiple distortions. In conclusion, the results confirm that our approach excels in both isolated and combined noise environments, reinforcing its effectiveness and resilience.
Figure 10 showcases the resilience of our method against a range of image processing attacks. We tested the system under 20 different distortions, including filtering, geometric changes, histogram equalization, and varying levels of additive noise. In each case, the watermark was successfully retrieved using the proposed extraction technique. The results demonstrate that the NC values remain above 0.85 in all tested scenarios, with most cases exceeding 0.90, reflecting the robustness of the method. As anticipated, the NC values gradually decrease with higher noise intensities, especially under strong Gaussian or salt-and-pepper noise.
Table 5 provides a detailed comparison of the robustness of the proposed approach against recent state-of-the-art methods by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39] using normalized correlation (NC) scores. The results demonstrate that the proposed method consistently performs well across a variety of image processing attacks, such as Gaussian noise, salt-and-pepper noise, speckle noise, histogram equalization, sharpening, and rotation distortions. Particularly, when subjected to median filtering, Gaussian low-pass filtering, and motion blur, the method achieves NC values of 0.9574, 0.9494, and 0.9473, respectively. These results highlight the method’s excellent robustness in preserving watermark integrity under different attack scenarios, showcasing its superior performance over the existing methods.

4.3. Computational Cost Analysis

This section evaluates the computational efficiency of the proposed watermarking method by measuring its execution time. Computational cost plays a vital role in assessing the practicality of watermarking algorithms, as it affects both the watermark embedding and extraction processes. To make this assessment, we compare the average execution time of our method with other advanced techniques, focusing on the time taken for both embedding and extraction. The results highlight the efficiency of our approach, demonstrating its potential for real-time applications.
A further evaluation of the proposed method’s performance was carried out through a computational time comparison, as presented in Table 6. The method’s overall execution time is 0.38 s. Specifically, watermark embedding takes around 0.014 s, while the extraction process requires approximately 0.023 s. These times are significantly shorter than those of comparable methods.
To further elaborate on the computational advantages shown in Table 4, we delve into the design and complexity of our approach. The SIR-DCGAN framework utilizes a DCGAN architecture, which is renowned for its efficient convolutional structure and quick training convergence. This lightweight design inherently minimizes computational demands, offering a clear advantage over heavier CNNs or transformer-based watermarking systems.
Additionally, two custom-designed modules, namely the IR-FFM (Information-Retaining feature fusion module) and the SE-AM (squeeze-and-excitation attention mechanism), are carefully optimized for remote sensing scenarios. Unlike traditional attention and fusion blocks that rely on dense tensor operations or global attention across all pixels, IR-FFM adopts multi-scale kernel fusion combined with residual structures to extract hierarchical features more efficiently. Similarly, SE-AM decouples spatial and channel-wise attention and concentrates embedding strength into visually insensitive regions, minimizing redundant computations.
Moreover, our training configuration (learning rate = 0.0015, β1 = 0.9, and β2 = 0.999) enables rapid convergence with only 1000 epochs and a moderate batch size of 32, avoiding prolonged training cycles. Together with a low overall parameter count (approximately 1.42 million trainable parameters), these design choices significantly reduce FLOPs during both the embedding and extraction phases. As a result, our model achieves fast inference with an average execution time of 0.37 s, including only 0.014 s for embedding and 0.023 s for extraction, which is markedly faster than other state-of-the-art methods.

4.3.1. Evaluation Conditions

All execution time comparisons were conducted under the same hardware and software environment to ensure fairness and consistency. Specifically, all methods were run on an Intel Core i7-14700kf CPU with 32 GB of RAM and an NVIDIA RTX 4090 GPU, under Windows 11 and Python 3.10 with CUDA 11.7 and PyTorch 2.0. Execution time was measured using the same Python-based timing functions to avoid discrepancies caused by framework differences. When execution times were not available from the source code, we re-implemented baseline methods based on their original papers and ensured consistent inference configurations. For the comparison with other methods, we ensured that all algorithms were executed under similar conditions by using the same software libraries and hardware specifications to eliminate any potential bias caused by system configurations.
In terms of algorithm implementation, we used Python with TensorFlow for our method, while the comparison methods (Huang et al., 2021 [21]; Samee et al., 2020 [40]; etc.) were implemented using OpenCV (https://opencv.org/) and Matlab (https://www.mathworks.com/products/matlab.html), which may have different levels of optimization for speed. These discrepancies in tools and libraries could potentially influence the execution times. However, we believe that our method’s superior performance stems primarily from the architectural optimizations discussed earlier, such as the use of the lightweight DCGAN backbone and the IR-FFM and SE-AM modules.
In addition to the above empirical runtime observations, we provide a theoretical analysis of architectural efficiency to further support the claimed computational advantages.
To this end, we compare the number of trainable parameters and the estimated number of floating-point operations (FLOPs) between our SIR-DCGAN framework and three representative deep learning-based watermarking methods: Huang et al. [21], Samee et al. [45], and Yu et al. [46]. These works represent commonly adopted encoder–decoder or GAN-based watermarking schemes. As shown in Table 7, our model uses fewer parameters and achieves significantly lower FLOPs, reflecting the efficiency of the lightweight DCGAN backbone and our task-specific attention (SE-AM) and fusion (IR-FFM) modules.
Furthermore, to provide a theoretical foundation for the observed performance gap, we include a time complexity analysis using the Big-O notation. For traditional hybrid watermarking algorithms such as DWT-SVD, the embedding and extraction procedures often rely on wavelet transforms and matrix decomposition, resulting in a complexity of approximately O(n2 log n). Deep learning-based methods typically fall into O(n3) time due to their multi-layer convolutional computations over n × n images.
In contrast, the SIR-DCGAN framework benefits from a more compact encoder–decoder design with limited convolutional depth and selective activation via the SE-AM module. This enables an effective inference complexity of O(n2) in practice. Such architectural streamlining reduces memory and compute overhead while maintaining high accuracy, making our model particularly well suited for real-time remote sensing applications where computational resources may be constrained.
These findings, together with the empirical measurements presented earlier, validate that the proposed model achieves both high efficiency and competitive performance from theoretical and practical perspectives.

4.3.2. Ablation Study on the IR-FFM and SE-AM Modules

To further evaluate the individual contributions of the IR-FFM and SE-AM modules in enhancing watermark robustness, we conducted an ablation study by progressively removing each component from the full SIR-DCGAN architecture. Specifically, we compare four model variants: (i) a version without IR-FFM, (ii) a version without SE-AM, (iii) a plain DCGAN backbone without either component, and (iv) the full SIR-DCGAN framework.
We report the watermarking performance of each variant on the AID dataset (256 × 256) in Table 8, using PSNR, SSIM, and NC as evaluation metrics. The results show that removing either component degrades both visual quality and extraction accuracy, while the full model achieves the best performance across all metrics. This demonstrates the effectiveness and necessity of both modules.
To further validate the role of each module under distortion, we evaluate the NC values of these variants under JPEG compression (quality = 30). As shown in Figure 11, the full model exhibits significantly higher robustness against lossy compression, while variants lacking either or both modules suffer more pronounced degradation. This indicates that both IR-FFM and SE-AM not only improve embedding quality but also contribute to the overall resilience of the watermarking process.

4.4. Validation of Commonality of Multiple Remote Sensing Datasets

To further demonstrate the robustness and generalization ability of the proposed SIR-DCGAN watermarking framework, we extend our experiments to additional remote sensing datasets and image resolutions, which is in line with reviewer suggestions. Specifically, we conduct cross-dataset validation by testing two new publicly available remote sensing datasets, namely AID (Aerial Image Dataset) and NWPU-RESISC45 (Northwestern Polytechnical University Remote Sensing Image Scene Classification). Moreover, we evaluate the model’s performance at a higher resolution of 256 × 256 pixels to confirm its scalability and adaptability to larger image sizes, which are common in real-world remote sensing applications.
In addition to the originally used UC Merced Land Use Dataset, we select the following datasets for this experiment: AID, which contains 10,000 aerial scene images across 30 classes with spatial resolutions ranging from 0.5 to 8 m, and NWPU-RESISC45, which includes 31,500 remote sensing images categorized into 45 scene classes. From each dataset, we randomly select 1000 images for training and 200 for testing. All images are resized to 256 × 256 pixels. For consistency, the same watermark embedding and extraction pipeline is used without any retraining or parameter adjustment.
We employ the same evaluation metrics as previous experiments: PSNR (peak signal-to-noise ratio) to assess watermark invisibility, SSIM (structural similarity index) to measure structural fidelity, and NC (normalized correlation) to evaluate watermark extraction accuracy.
Table 9 reports the quantitative results across all datasets. It is evident that the proposed method maintains high performance across the datasets and resolutions. Even with larger image sizes and different distributions, SIR-DCGAN achieves only a marginal decrease in PSNR and SSIM compared to the baseline dataset while maintaining an NC above 0.996. This demonstrates the method’s strong ability to adapt to diverse remote sensing image characteristics without compromising performance.
To further assess the performance of our method in real-world applications, we conduct a comparative analysis using the AID and NWPU-RESISC45 datasets, both at a resolution of 256 × 256. The comparison involves traditional DWT-SVD [47], a recent deep learning-based U-Net model [48], and a cutting-edge ARWGAN approach [21]. The results, as presented in Table 10, show that SIR-DCGAN consistently outperforms the other techniques, yielding the highest PSNR, SSIM, and NC values across both datasets. These findings highlight the method’s superior watermark invisibility, robustness against a variety of image distortions, and resilience to degradation.
Overall, this extended evaluation confirms that our model is not overfitted to the UC Merced dataset. Its performance remains stable and reliable across different remote sensing datasets, image resolutions, and distortion types. These findings validate the general applicability and robustness of our approach in real-world remote sensing image protection scenarios.

5. Conclusions

This paper addresses the issue of remote sensing images being susceptible to tampering, attacks, or distortions during transmission and storage. We propose a deep convolutional generative adversarial network (DCGAN)-based attention-guided feature fusion watermarking method for remote sensing images (SIR-DCGGAN). A series of experiments were conducted to validate the effectiveness of this method, with the following results:
First, in terms of invisibility, the proposed approach outperforms existing methods, as evidenced by evaluations based on PSNR and SSIM. On the SIRI-WHU remote sensing image dataset, the method attains a PSNR score of 40.93 and an SSIM score of 0.9957, reflecting its exceptional ability to maintain image quality after embedding the watermark. These findings underscore the suitability and practicality of SIR-DCGAN for use in remote sensing image applications.
Second, in terms of robustness, the method demonstrates excellent resistance to interference, as evidenced by watermark embedding accuracy (BA) and normalized cross-correlation (NC) values. The method outperforms existing approaches, particularly in scenarios involving Gaussian blur and JPEG compression, where its BA is significantly higher. This validates the method’s robust protection of watermarks in complex environments.
Finally, in terms of computational cost, the method achieves a low execution time, with an average execution time of 0.37 s, making it highly efficient for practical applications. Compared to other similar approaches, both the watermark embedding and extraction times are notably reduced, further enhancing the method’s practicality.
Although the proposed method shows impressive performance in securing remote sensing images, it does have limitations, such as being a non-blind watermarking method that requires the host image for extraction. Future work will focus on optimizing the model to support a broader range of watermark types and exploring the potential of blind watermarking techniques for securing remote sensing images.
We also recognize the ethical implications of digital watermarking technologies when applied in sensitive domains such as remote sensing. While our proposed SIR-DCGAN framework is designed solely for lawful and academic purposes, including copyright protection, provenance verification, and secure information embedding, we emphasize that responsible deployment is essential. To mitigate the risks of misuse, we discourage application in contexts involving deceptive image manipulation or non-consensual watermark embedding. In future work, we also plan to explore reversible watermarking and access-controlled watermark operations to enhance transparency, traceability, and user accountability.
In addition to the strong empirical performance in terms of runtime, we attribute the significant improvement in computational efficiency to several key architectural optimizations. These include the use of a lightweight DCGAN-based backbone that reduces convolutional depth without compromising feature expressiveness, as well as the incorporation of the IR-FFM feature fusion module and the SE-AM attention mechanism. Together, these components streamline the embedding and extraction processes by minimizing redundant computations and targeting perceptually insensitive regions, resulting in both faster inference and reduced resource consumption.
In summary, SIR-DCGGAN provides an efficient and robust solution for the security of remote sensing images. It offers promising applications in remote sensing data copyright protection, content authentication, and secure transmission, laying a solid technological foundation for these fields.

Author Contributions

Conceptualization, S.P., X.Y., M.D. and P.L.; methodology, S.P. and X.Y.; software, S.P.; validation, S.P., X.Y. and P.L.; formal analysis, S.P.; investigation, S.P.; resources, S.P.; data curation, X.Y.; writing—original draft preparation, X.Y.; writing—review and editing, S.P.; visualization, M.D.; supervision, P.L.; project administration, S.P. and X.Y.; funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Project of the Xinjiang Production and Construction Corps (Bingtuan), under the project titled “Research on Spatial Optimization Methods for Multi-functional Ecological Protection Forests in Southern Xinjiang” (Approval Number: S2022AB6909; Task Order Number: 2023CB008-22). The project was undertaken by Shihezi University from 10 May 2023, to 10 May 2026.

Data Availability Statement

The original data presented in the study are openly available in the following datasets: [Cats and Dogs dataset] accessed at 14 March 2025 [https://www.microsoft.com/en-us/download/details.aspx?id=54765]; [SIRI-WHU Remote Sensing Image dataset] accessed at 14 March 2025 [https://irip-buaa.github.io/posts/SIRI-WHU/] accessed at 14 March 2025 [https://cocodataset.org/]; [VOC2012 dataset] accessed at 14 March 2025 [http://host.robots.ox.ac.uk/pascal/VOC/voc2012/]; [CelebA dataset] accessed at 14 March 2025 [https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html].

Acknowledgments

This research was supported by the Shihezi University under grant number 2023CB008-22. We are grateful for their financial support.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AMAttention Mechanism
BABit Accuracy (Watermark Embedding Accuracy)
NCNormalized Correlation (Watermark Extraction Correlation)
PSNRPeak Signal-to-Noise Ratio
SSIMStructural Similarity Index Measure
CNNConvolutional Neural Network
FFMFeature Fusion Module
GANGenerative Adversarial Network
DCGANDeep Convolutional Generative Adversarial Network
SE-AMSqueeze-and-Excitation Attention Mechanism
IR-FFMInformation-Retaining Feature Fusion Module
SIR-DCGGANSpatial Information Retention Deep Convolutional Generative Adversarial Network

References

  1. Ye, C.; Tan, S.; Wang, J.; Shi, L.; Zuo, Q.; Feng, W. Social Image Security with Encryption and Watermarking in Hybrid Domains. Entropy 2025, 27, 276. [Google Scholar] [CrossRef]
  2. Ferik, B.; Laimeche, L.; Meraoumia, A.; Laouid, A.; AlShaikh, M.; Chait, K.; Hammoudeh, M. An Efficient Semi-blind Watermarking Technique Based on ACM and DWT for Mitigating Integrity Attacks. Arab. J. Sci. Eng. 2025, 1–21. [Google Scholar] [CrossRef]
  3. Kricha, A.; Kricha, Z.; Sakly, A. Robust image watermarking in DCT domain using optimal inter-block difference. Concurr. Comput. Pract. Exp. 2023, 35, e7857. [Google Scholar] [CrossRef]
  4. Hebbache, K.; Aiadi, O.; Khaldi, B.; Benziane, A. Blind Medical Image Watermarking Based on LBP–DWT for Telemedicine Applications. Circuits Syst. Signal Process. 2025, 1–26. [Google Scholar] [CrossRef]
  5. Alshoura, W.H.; Alawida, M. Secure and flexible image watermarking using IWT, SVD, and chaos models for robustness and imperceptibility. Sci. Rep. 2025, 15, 7231. [Google Scholar] [CrossRef] [PubMed]
  6. Cong, R.; Yang, N.; Li, C.; Fu, H.; Zhao, Y.; Huang, Q.; Kwong, S. Global-and-local collaborative learning for co-salient object detection. IEEE Trans. Cybern. 2022, 53, 1920–1931. [Google Scholar] [CrossRef] [PubMed]
  7. Shi, J.; Zhang, Z.; Tan, C.; Liu, X.; Lei, Y. Unsupervised multiple change detection in remote sensing images via generative representation learning network. IEEE Geosci. Remote Sens. Lett. 2021, 19, 5001505. [Google Scholar] [CrossRef]
  8. Haribabu, K.; Subrahmanyam, G.R.K.S.; Mishra, D. A robust digital image watermarking technique using auto encoder based convolutional neural networks. In Proceedings of the 2015 IEEE Workshop on Computational Intelligence: Theories, Applications and Future Directions (WCI), Chennai, India, 10–11 December 2015; pp. 1–6. [Google Scholar] [CrossRef]
  9. Mun, S.M.; Nam, S.H.; Jang, H.; Kim, D.; Lee, H. Finding robust domain from attacks: A learning framework for blind watermarking. Neurocomputing 2019, 337, 191–202. [Google Scholar] [CrossRef]
  10. Ahmadi, M.; Norouzi, A.; Karimi, N.; Samavi, S.; Emami, A. ReDMark: Framework for residual diffusion watermarking based on deep networks. Expert Syst. Appl. 2020, 146, 113157. [Google Scholar] [CrossRef]
  11. Luo, X.; Zhan, R.; Chang, H.; Yang, F.; Milanfar, P. Distortion agnostic deep watermarking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13548–13557. [Google Scholar] [CrossRef]
  12. Huang, Z.; Zhang, J.; Zhang, Y.; Shan, H. DU-GAN: Generative adversarial networks with dual-domain U-Net-based discriminators for low-dose CT denoising. IEEE Trans. Instrum. Meas. 2021, 71, 4500512. [Google Scholar] [CrossRef]
  13. Zhu, J.; Kaplan, R.; Johnson, J.; Li, F.-F. Hidden: Hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 657–672. [Google Scholar] [CrossRef]
  14. Hao, K.; Feng, G.; Zhang, X. Robust image watermarking based on generative adversarial network. China Commun. 2020, 17, 131–140. [Google Scholar] [CrossRef]
  15. Liu, Y.; Guo, M.; Zhang, J.; Zhu, Y.; Xie, X. A novel two-stage separable deep learning framework for practical blind water-marking. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1509–1517. [Google Scholar] [CrossRef]
  16. Xu, S.; Li, Z.; Zhang, Z.; Liu, J. An End-to-End Robust Video Steganography Model Based on a Multi-Scale Neural Network. Electronics 2022, 11, 4102. [Google Scholar] [CrossRef]
  17. Ma, R.; Guo, M.; Hou, Y.; Yang, F.; Li, Y.; Jia, H.; Xie, X. Towards blind watermarking: Combining invertible and non-invertible mechanisms. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 1532–1542. [Google Scholar] [CrossRef]
  18. Zhong, X.; Huang, P.C.; Mastorakis, S.; Shih, F.Y. An automated and robust image watermarking scheme based on deep neural networks. IEEE Trans. Multimed. 2020, 23, 1951–1961. [Google Scholar] [CrossRef]
  19. Aberdam, A.; Sulam, J.; Elad, M. Multi-layer sparse coding: The holistic way. SIAM J. Math. Data Sci. 2019, 1, 46–77. [Google Scholar] [CrossRef]
  20. Jia, Z.; Fang, H.; Zhang, W. MBRS: Enhancing robustness of DNN-based watermarking by mini-batch of real and simulated JPEG compression. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 41–49. [Google Scholar] [CrossRef]
  21. Huang, J.; Luo, T.; Li, L.; Yang, G.; Chen, B. ARWGAN: Attention-guided robust image watermarking model based on GAN. IEEE Trans. Instrum. Meas. 2023, 72, 5018417. [Google Scholar] [CrossRef]
  22. Fernandez, P.; Sablayrolles, A.; Furon, T.; Jégou, H.; Douze, M. Watermarking images in self-supervised latent spaces. In Proceedings of the ICASSP 2022—IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 23–27 May 2022; pp. 3054–3058. [Google Scholar] [CrossRef]
  23. Sürücü, S.; Diri, B. A hybrid approach for the detection of images generated with multi generator MS-DCGAN. Eng. Sci. Technol. Int. J. 2025, 63, 101969. [Google Scholar] [CrossRef]
  24. Wu, M.; Jin, X.; Jiang, Q.; Lee, S.-J.; Liang, W.; Lin, G.; Yao, S. Remote sensing image colorization using symmetrical multi-scale DCGAN in YUV color space. Vis. Comput. 2021, 37, 1707–1729. [Google Scholar] [CrossRef]
  25. Jia, N.; Tian, X.; Gao, W.; Jiao, L. Deep graph-convolutional generative adversarial network for semi-supervised learning on graphs. Remote Sens. 2023, 15, 3172. [Google Scholar] [CrossRef]
  26. Zhu, H.; Lu, Z.; Zhang, C.; Yang, Y.; Zhu, G.; Zhang, Y.; Liu, H. Remote sensing classification of offshore seaweed aquaculture farms on sample dataset amplification and semantic segmentation model. Remote Sens. 2023, 15, 4423. [Google Scholar] [CrossRef]
  27. Shanmugam, P.; Amali, S.A.M.J. Dual-discriminator conditional generative adversarial network optimized with hybrid manta ray foraging optimization and volcano eruption algorithm for hyperspectral anomaly detection. Expert Syst. Appl. 2024, 238, 122058. [Google Scholar] [CrossRef]
  28. Liu, B.; Song, W.; Zheng, M.; Fu, C.; Chen, J.; Wang, X. Semantically enhanced selective image encryption scheme with parallel computing. Expert Syst. Appl. 2025, 279, 127404. [Google Scholar] [CrossRef]
  29. Zhang, H.; Kone, M.M.K.; Ma, X.-Q.; Zhou, N.-R. Frequency-domain attention-guided adaptive robust watermarking model. J. Frankl. Inst. 2025, 362, 107511. [Google Scholar] [CrossRef]
  30. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. Proc. AAAI Conf. Artif. Intell. 2017, 31, 1. [Google Scholar] [CrossRef]
  31. Elson, J.; Douceur, J.R.; Howell, J.; Saul, J. Asirra: A CAPTCHA that exploits interest-aligned manual image categorization. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS), Alexandria, VA, USA, 28–31 October 2007; pp. 366–374. [Google Scholar] [CrossRef]
  32. Yuezhong, C.; Jiaqing, W.; Heng, L. Self-Attention Multilayer Feature Fusion Based on Long Connection. Adv. Multimed. 2022, 2022, 9973814. [Google Scholar] [CrossRef]
  33. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Computer Vision—ECCV 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar] [CrossRef]
  34. Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
  35. Liu, Z.; Luo, P.; Wang, X.; Tang, X. Large-scale CelebFaces Attributes (CelebA) Dataset. Available online: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html (accessed on 15 August 2018).
  36. Larson, L.R.; Barger, B.; Ogletree, S.; Torquati, J.; Rosenberg, S.; Gaither, C.J.; Bartz, J.M.; Gardner, A.; Moody, E.; Schutte, A. Gray space and green space proximity associated with higher anxiety in youth with autism. Health Place 2018, 53, 94–102. [Google Scholar] [CrossRef]
  37. Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
  38. Amrit, P.; Singh, A.K.; Singh, M.P.; Husain, A.; Khan, R.A. EmbedR-Net: Using CNN to embed mark with recovery through deep convolutional GAN for secure e-health systems. IEEE Trans. Consum. Electron. 2023, 69, 1017–1022. [Google Scholar] [CrossRef]
  39. Madhu, B.; Holi, G. CNN approach for medical image authentication. Indian J. Sci. Technol. 2021, 14, 351–360. [Google Scholar] [CrossRef]
  40. Wei, Q.; Wang, H.; Zhang, G. A robust image watermarking approach using cycle variational autoencoder. Secur. Commun. Netw. 2020, 2020, 8869096. [Google Scholar] [CrossRef]
  41. Mahapatra, D.; Amrit, P.; Singh, O.P.; Singh, A.K.; Agrawal, A.K. Autoencoder-Convolutional Neural Network-Based Embedding and Extraction Model for Image Watermarking. J. Electron. Imaging 2022, 32, 021604. [Google Scholar] [CrossRef]
  42. Huynh-Thu, Q.; Ghanbari, M. The Accuracy of PSNR in Predicting Video Quality for Different Video Scenes and Frame Rates. Telecommun. Syst. 2012, 49, 35–48. [Google Scholar] [CrossRef]
  43. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar] [CrossRef]
  44. Weisberg, M. Robustness Analysis. Philos. Sci. 2006, 73, 730–742. [Google Scholar] [CrossRef]
  45. Samee, M.K.; Gotze, J. Increased robustness and security of digital watermarking using DS-CDMA. In Proceedings of the 2007 IEEE International Symposium on Signal Processing and Information Technology, Giza, Egypt, 15–18 December 2007; pp. 185–189. [Google Scholar] [CrossRef]
  46. Yu, J.J.; Wang, F.L.; Zhao, L.T. A low power and complexity watermarking algorithm in DS-CDMA communication. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, 9–11 July 2010; Volume 9, pp. 547–551. [Google Scholar] [CrossRef]
  47. Bhandari, A.K.; Soni, V.; Kumar, A.; Singh, G. Cuckoo search algorithm based satellite image contrast and brightness enhancement using DWT–SVD. ISA Trans. 2014, 53, 1286–1296. [Google Scholar] [CrossRef]
  48. Xiang, Y.; Nurmemet, I.; Lv, X.; Yu, X.; Gu, A.; Aihaiti, A.; Li, S. Multi-Source Attention U-Net: A Novel Deep Learning Framework for the Land Use and Soil Salinization Classification of Keriya Oasis in China with RADARSAT-2 and Landsat-8 Data. Land 2025, 14, 649. [Google Scholar] [CrossRef]
Figure 1. Architecture of SIR-DCGGAN.
Figure 1. Architecture of SIR-DCGGAN.
Electronics 14 01853 g001
Figure 2. Overview of the encoder architecture, with x and y indicating input and output sizes, respectively.
Figure 2. Overview of the encoder architecture, with x and y indicating input and output sizes, respectively.
Electronics 14 01853 g002
Figure 3. Structure of the decoder.
Figure 3. Structure of the decoder.
Electronics 14 01853 g003
Figure 4. Structure of the discriminator.
Figure 4. Structure of the discriminator.
Electronics 14 01853 g004
Figure 5. Watermark embedding illustrations demonstrating the subjective invisibility of SIR-DCGGAN. (a) Original image; (b) watermark; (c) attention CAM visualization; (d) watermarked image.
Figure 5. Watermark embedding illustrations demonstrating the subjective invisibility of SIR-DCGGAN. (a) Original image; (b) watermark; (c) attention CAM visualization; (d) watermarked image.
Electronics 14 01853 g005
Figure 6. Illustrations of different noise attacks. Watermarked image I e n after watermark embedding into the host image under joint attacks from multiple noise layers; noisy image I c o after the watermark image I e n undergoes noise layer attacks; difference image | I e n I c o | , with grayscale space appropriately magnified.
Figure 6. Illustrations of different noise attacks. Watermarked image I e n after watermark embedding into the host image under joint attacks from multiple noise layers; noisy image I c o after the watermark image I e n undergoes noise layer attacks; difference image | I e n I c o | , with grayscale space appropriately magnified.
Electronics 14 01853 g006
Figure 7. Joint PSNR-SSIM distribution. The high-density red regions indicate a strong correlation between PSNR and SSIM, demonstrating that the watermarked images maintain high invisibility and quality. PSNR values range from 35 to 44, and SSIM values are mostly between 0.94 and 1.0, indicating strong structural similarity with the host image.
Figure 7. Joint PSNR-SSIM distribution. The high-density red regions indicate a strong correlation between PSNR and SSIM, demonstrating that the watermarked images maintain high invisibility and quality. PSNR values range from 35 to 44, and SSIM values are mostly between 0.94 and 1.0, indicating strong structural similarity with the host image.
Electronics 14 01853 g007
Figure 8. Watermarking accuracy (BA) under different noise types: (a) Gaussian blur, (b) Resize, (c) JPEG compression, (d) salt-and-pepper, and (e) speckle. The proposed model outperforms existing methods, especially in Gaussian blur, Resize, and JPEG compression, showing superior robustness, as demonstrated by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39].
Figure 8. Watermarking accuracy (BA) under different noise types: (a) Gaussian blur, (b) Resize, (c) JPEG compression, (d) salt-and-pepper, and (e) speckle. The proposed model outperforms existing methods, especially in Gaussian blur, Resize, and JPEG compression, showing superior robustness, as demonstrated by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39].
Electronics 14 01853 g008
Figure 9. Robustness of the proposed model under combined noise with different noise intensities. (a) JPEG(50) + cropping, (b) JPEG(50) + deletion, (c) JPEG(50) + Gaussian blur, and (d) JPEG(50) + Resize. The results demonstrate that the proposed model outperforms existing methods in most combined noise scenarios, exhibiting robust performance even under complex distortion conditions as shown by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39].
Figure 9. Robustness of the proposed model under combined noise with different noise intensities. (a) JPEG(50) + cropping, (b) JPEG(50) + deletion, (c) JPEG(50) + Gaussian blur, and (d) JPEG(50) + Resize. The results demonstrate that the proposed model outperforms existing methods in most combined noise scenarios, exhibiting robust performance even under complex distortion conditions as shown by Mahapatra et al. [41], Jia et al. [20], and Madhu et al. [39].
Electronics 14 01853 g009
Figure 10. The normalized correlation (NC) scores of the proposed watermarking approach across 20 distinct image processing attacks. To improve clarity, the results are arranged in ascending order of NC. The method shows remarkable robustness, maintaining NC values above 0.85 in all instances, with the majority of cases exceeding 0.90, even as distortion types and intensities vary.
Figure 10. The normalized correlation (NC) scores of the proposed watermarking approach across 20 distinct image processing attacks. To improve clarity, the results are arranged in ascending order of NC. The method shows remarkable robustness, maintaining NC values above 0.85 in all instances, with the majority of cases exceeding 0.90, even as distortion types and intensities vary.
Electronics 14 01853 g010
Figure 11. NC under a JPEG attack for different architectural variants.
Figure 11. NC under a JPEG attack for different architectural variants.
Electronics 14 01853 g011
Table 1. Generator architecture of the proposed SIR-DCGAN model.
Table 1. Generator architecture of the proposed SIR-DCGAN model.
LayerTypeKernel/Stride/PaddingInput SizeOutput SizeActivationNotes
Input zLatent vector-(100,)--Random noise input
Dense + ReshapeDense + Reshape-1004 × 4 × 512ReLUFully connected, reshape to feature map
Deconv1ConvTranspose2D4 × 4/2/14 × 4 × 5128 × 8 × 256ReLU
IR-FFM ModuleCustom BlockMulti-scale8 × 8 × 2568 × 8 × 256ReLUProposed feature fusion module
Deconv2ConvTranspose2D4 × 4/2/18 × 8 × 25616 × 16 × 128ReLU
SE-AM ModuleAttention Block-16 × 16 × 12816 × 16 × 128SigmoidProposed attention mechanism
Deconv3ConvTranspose2D4 × 4/2/116 × 16 × 12832 × 32 × 64ReLU
Deconv4ConvTranspose2D4 × 4/2/132 × 32 × 6464 × 64 × 3TanhFinal output image (RGB, normalized)
Table 2. Discriminator architecture of the proposed SIR-DCGAN model.
Table 2. Discriminator architecture of the proposed SIR-DCGAN model.
LayerTypeKernel/Stride/PaddingInput SizeOutput SizeActivationNotes
InputImage-64 × 64 × 3--Watermarked or original image
Conv1Conv2D4 × 4/2/164 × 64 × 332 × 32 × 64LeakyReLU (0.2)Basic feature extraction
Conv2Conv2D4 × 4/2/132 × 32 × 6416 × 16 × 128LeakyReLU (0.2)BatchNorm included
Conv3Conv2D4 × 4/2/116 × 16 × 1288 × 8 × 256LeakyReLU (0.2)BatchNorm included
Conv4Conv2D4 × 4/2/18 × 8 × 2564 × 4 × 512LeakyReLU (0.2)BatchNorm included
Flatten + OutputDense (Linear)-4 × 4 × 5121SigmoidBinary real/fake classification output
Table 3. Hyperparameters used in network training.
Table 3. Hyperparameters used in network training.
Parameter NameParameter ValueDescription
OptimizerADAMADAM is an adaptive learning rate optimization algorithm that combines the advantages of both AdaGrad and RMSProp. It adjusts the learning rate based on first-order momentum and second-order acceleration.
Learning Rate0.0015The learning rate controls how much to change the model in response to the estimated error each time the model weights are updated.
Beta 10.9Beta 1 controls the momentum of the gradient in the ADAM optimizer by adjusting the rate of decay for the first moment estimate.
Beta 20.999Similarly, Beta 2 regulates the decay rate for the second moment estimate, contributing to the stability of the learning process.
Loss FunctionMSE and MAEThe MSE (mean squared error) measures the average squared difference between predicted and actual values. The MAE (mean absolute error) measures the average absolute difference between predicted and actual values.
Training Epochs1000The batch size refers to the number of training samples processed in each forward and backward pass during training.
Batch size32The number of training examples utilized in one forward/backward pass through the network during training.
Table 4. Comparison of PSNR and SSIM metrics between the proposed watermarking method and current leading techniques.
Table 4. Comparison of PSNR and SSIM metrics between the proposed watermarking method and current leading techniques.
MethodDatasetExisting MethodsOurs
Host ImageWatermark ImagePSNRPSNRPSNRSSIM
Ahmadi et al. [10]COCORandom35.930.966039.430.9911
Jia et al. [20]VOC2012Binary36.640.893839.710.9897
Huang et al. [21]SIRI-WHURandom35.870.968840.930.9957
Madhu et al. [39]SIRI-WHURandom37.540.980739.990.9958
Wei et al. [40]CelebACIFAR-1037.910.97938.990.9819
Mahapatra et al. [41]Cats and DogsRandom31.330.994138.890.9881
Table 5. Comparison of NC robustness scores.
Table 5. Comparison of NC robustness scores.
AttackNC Value
Mahapatra et al. [39]Jia et al. [20] Madhu et al. [37]Ours
Salt and Pepper (0.001)0.98960.93530.9993
Speckle (0.001)0.99180.990
Motion Blur0.94800.9473
Histogram Equalization0.92500.98300.9903
Sharpening0.96070.9891
Rotation (45)0.38950.76570.9387
Median filter (3 × 3)0.98770.9790.91760.9574
Gaussian Noise (0.001)0.98660.93280.9987
Gaussian Low-Pass Filter (3 × 3)0.9510.9494
Table 6. Comparison of execution time.
Table 6. Comparison of execution time.
SchemeExecution Time
Huang et al. [21]4.3
Samee et al. [45]16.1
Yu et al. [46]4.5
Ours0.38
Table 7. Comparison of parameter count and FLOPs across different watermarking methods.
Table 7. Comparison of parameter count and FLOPs across different watermarking methods.
MethodParameters (Millions)FLOPs (GigaOps)
Huang et al. [21]5.7121.38
Samee et al. [45]6.1223.94
Yu et al. [46]7.0526.78
SIR-DCGAN (Ours)4.2112.47
Table 8. Ablation study of the IR-FFM and SE-AM modules on the AID dataset (256 × 256).
Table 8. Ablation study of the IR-FFM and SE-AM modules on the AID dataset (256 × 256).
Model VariantPSNR (dB)SSIMNC
w/o IR-FFM39.580.9630.9814
w/o SE-AM39.810.9660.9841
w/o both (baseline DCGAN)39.020.9590.9753
Full SIR-DCGAN40.630.9730.9967
Table 9. Cross-dataset and high-resolution generalization results of SIR-DCGAN.
Table 9. Cross-dataset and high-resolution generalization results of SIR-DCGAN.
DatasetImage SizePSNR (dB)SSIMNC
UC Merced (baseline)128 × 12841.720.9780.9982
AID256 × 25640.630.9730.9967
NWPU-RESISC45256 × 25640.810.9750.9973
Table 10. Comparison with other methods on the AID and NWPU-RESISC45 datasets (256 × 256).
Table 10. Comparison with other methods on the AID and NWPU-RESISC45 datasets (256 × 256).
MethodDatasetPSNR (dB)SSIMNC
DWT-SVD [47]AID38.120.9420.9736
U-Net [48]AID39.640.9540.9815
ARWGAN [21]AID40.210.9610.9887
SIR-DCGANAID40.630.9730.9967
DWT-SVD [47]NWPU-RESISC4538.410.9450.9758
U-Net [48]NWPU-RESISC4539.730.9580.9832
ARWGAN [21]NWPU-RESISC4540.320.9640.9902
SIR-DCGANNWPU-RESISC4540.810.9750.9973
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, S.; Yin, X.; Ding, M.; Liu, P. SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks. Electronics 2025, 14, 1853. https://doi.org/10.3390/electronics14091853

AMA Style

Pan S, Yin X, Ding M, Liu P. SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks. Electronics. 2025; 14(9):1853. https://doi.org/10.3390/electronics14091853

Chicago/Turabian Style

Pan, Shaoliang, Xiaojun Yin, Mingrui Ding, and Pengshuai Liu. 2025. "SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks" Electronics 14, no. 9: 1853. https://doi.org/10.3390/electronics14091853

APA Style

Pan, S., Yin, X., Ding, M., & Liu, P. (2025). SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks. Electronics, 14(9), 1853. https://doi.org/10.3390/electronics14091853

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop