1. Introduction
With the rapid advancement of the Internet, the transmission of digital media across networks has become widely adopted across various industries. During the process of network-based information transmission, preventing unauthorized access to protected content has become a critical and widely discussed issue [
1]. Within the realm of intellectual property protection, cryptography, steganography, and digital watermarking are three widely adopted techniques for ensuring copyright authentication in digital media ecosystems [
2,
3,
4]. Among these techniques, digital watermarking operates by embedding watermark information into digital media in a manner that ensures high imperceptibility to the human visual system [
5]. After undergoing network transmission and potential processing distortions, the embedded watermark can still be reliably extracted and recognized, thereby enabling robust copyright protection of digital content [
6].
In traditional digital watermarking techniques, methods are typically categorized into spatial-domain and frequency-domain approaches based on the embedding domain of the watermark [
7,
8]. Spatial-domain watermarking embeds watermark information by directly modifying the pixel values of the cover image [
9]. This approach offers fast computation and low complexity but suffers from limited embedding capacity and poor robustness against common image processing attacks [
10]. For instance, Li et al. [
11] proposed a blind spatial-domain watermarking scheme where the host image is divided into
blocks. By selecting blocks with lower standard deviations and further partitioning them into four
sub-blocks, the watermark is embedded by adjusting the direct current coefficients of three selected sub-blocks. This method demonstrates significant advantages in imperceptibility compared to existing techniques. In contrast, frequency-domain watermarking involves transforming the cover image into the frequency domain and embedding the watermark within the transformed coefficients [
12]. Although this technique significantly improves robustness, especially against compression and noise attacks, it typically incurs higher computational costs [
13]. For instance, AbdElHaleem et al. [
14] transformed images into the YCbCr color space and applied the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) to the Y channel for watermark embedding. The method also employs a fractional-order Lorenz system for watermark encryption, thereby enhancing the security of the watermarking scheme. Bounceu et al. [
15] proposed a watermarking method that integrates DWT with Singular Value Decomposition (SVD) to enhance the robustness and stability of the embedded watermark. Experimental results indicate that the method achieves excellent imperceptibility, making it well-suited for medical image applications. Gao et al. [
16] integrated the Integer Wavelet Transform (IWT) with Zernike Moments, extracting features from the IWT low-frequency sub-band to achieve high robustness against geometric transformations and common attacks.
Currently, deep learning-based watermarking algorithms are also very popular. Relying on the powerful feature extraction and fitting capabilities of deep learning, neural network watermarking algorithms demonstrate excellent performance in many scenarios. The research context of this field evolved from foundational end-to-end architectures to specialized cross-media applications. Zhu et al. [
17] proposed the HiDDeN framework, which employed joint encoder–decoder training and introduced differentiable approximations of non-differentiable distortions like JPEG compression. To enhance the practicality of such models, Liu et al. [
18] developed a two-stage separable deep learning framework, addressing the slow convergence and noise-simulation limitations of earlier one-stage end-to-end methods. As the field moved toward physical-world applications, Wengrowski and Dana [
19] introduced Light Field Messaging to model the complex camera-display transfer function. Similarly, Tancik et al. [
20] presented StegaStamp, which enables the robust encoding of hyperlinks into physical photographs by simulating a wide range of spatial and photometric distortions. More recently, to specifically counter the complex distortions of screen-shooting, Fang et al. [
21] proposed PIMoG, which decomposes the noise layer into perspective, illumination, and moiré distortions, achieving superior extraction accuracy. Building upon these representative works, recent specialized innovations have further refined these mechanisms. Qiao et al. [
22] proposed a Scalable Universal Adversarial watermark approach. By extending the defense range of pre-watermark mechanisms, their method effectively counters new forgery models while maintaining low computational costs. Wu et al. [
23] developed a robust framework based on multi-layer watermark feature fusion. This architecture allows for arbitrary depthwise stacking to associate with watermarks, demonstrating superior invisibility and generalization capabilities, particularly in few-shot learning scenarios. Furthermore, deep learning methods have shown remarkable robustness against cross-media attacks, such as print-camera and screen-shooting processes, which are traditionally challenging. Qin et al. [
24] designed a network architecture incorporating deep noise simulation and constrained learning, significantly reducing distortion and enhancing robustness against the print-camera channel. Similarly, targeting the screen-shooting resilience, Guo et al. [
25] proposed a double-branch network. By assigning Gaussian-distributed weights to the encoder branches, their scheme achieves a balance between visual quality and robustness against screen-capture distortions. Cao et al. [
26] proposed an end-to-end framework combining DCT-domain channel attention and adversarial training to resist screen-shooting attacks. By employing a training strategy based on Generative Adversarial Networks, their model effectively generates universal watermark masks that achieve a superior balance between imperceptibility and robustness. Although these neural network-based methods achieve excellent performance, they often require substantial computational resources, large training datasets, and long training times. In contrast, traditional methods, particularly those based on efficient transforms like WHT, offer the advantages of low complexity, blind extraction without training, and ease of hardware implementation. Therefore, optimizing traditional algorithms for specific application scenarios remains a valuable research direction.
With the rapid advancement of the multimedia big data era, color images have garnered increasing attention due to their large information capacity and superior visual quality [
27,
28,
29]. These advantages have led to their widespread application in real-world scenarios. Over the past decades, most digital watermarking research has predominantly focused on traditional binary images [
30] and grayscale images [
31,
32], with relatively limited attention given to dual color image watermarking. In recent years, research on dual color image watermarking, where both the cover image and the watermark are in color, has advanced significantly, leading to the development and recognition of numerous novel algorithms tailored for such scenarios. For instance, Su et al. [
33] proposed a blind color image watermarking scheme that integrates a graph-based transform. The method leverages the structural properties of graphs to efficiently extract stable transform coefficients in the spatial domain and incorporates particle swarm optimization to adaptively optimize the embedding strength. Zhang et al. [
34] proposed a blind color image watermarking algorithm based on dual quaternion QR decomposition. By introducing Arnold scrambling for watermark protection, the method combines dual quaternion matrix representation for color images and employs a dual-structure preservation algorithm to enhance computational efficiency. Wang et al. [
28] proposed a watermarking algorithm based on the split quaternion matrix model, which effectively analyzes the complex spectral correlations among RGB channels in color images. The method is specifically optimized to address the inherent complexities of color image processing, demonstrating strong robustness and high perceptual quality.
In contemporary digital watermarking, WHT-based algorithms represent an important research direction. Numerous studies have explored various embedding strategies by exploiting the statistical and structural properties of WHT coefficients. Chen et al. [
35] utilized the property that the first row of WHT coefficients concentrates the majority of energy, embedding watermark information into elements of that row. The embedding is achieved by adjusting specific coefficient pairs within the first row, and experimental results demonstrate strong robustness against common image processing operations. To reduce perceptual distortion caused by coefficient modification, Reddy et al. [
36] proposed a strategy designed to minimize the range of coefficient perturbations. In this method, watermark bits are embedded into the first and second columns of WHT coefficients, effectively constraining the affected coefficient region. Experiments show that this method maintains good performance under median filtering, JPEG compression, and noise attacks. Unlike the above robustness-oriented approaches, Prabha et al. [
37] focused on improving imperceptibility. Their algorithm embeds watermark information by slightly modifying the coefficients in the third and fourth rows, which are less perceptually sensitive, thereby achieving high visual quality after embedding. However, prioritizing imperceptibility inevitably leads to reduced robustness.
Imperceptibility and robustness are two fundamental performance metrics of digital watermarking systems [
38]. However, this inherent trade-off dictates that enhancing imperceptibility typically compromises robustness, and vice versa. Striking an optimal balance between these competing requirements remains a pivotal challenge in contemporary digital watermarking research [
39,
40]. The choice of embedding regions within the cover image plays a crucial role in determining the overall performance of a watermarking system [
41]. Embedding watermarks in high-texture regions enhances robustness against signal processing attacks, albeit often at the expense of visual imperceptibility. Conversely, embedding in smooth areas improves visual transparency but substantially reduces resilience to malicious manipulations [
42]. This inherent trade-off has driven extensive research into adaptive region selection strategies aimed at optimizing the balance between imperceptibility and robustness. Kumari et al. [
43] proposed a block selection strategy based on low variance values, where image blocks with minimal pixel variation are identified as suitable embedding regions. The selection is further optimized using an Enhanced Tunicate Swarm Algorithm, refined by the Sine Cosine Algorithm, to effectively locate blocks with the lowest variance and least visual complexity for watermark embedding.
The performance of digital watermarking techniques primarily depends on several key indicators, including imperceptibility, robustness, security, and embedding capacity. With the widespread use of color images, it has become increasingly important to design watermarking algorithms that can effectively process color images while maintaining both high imperceptibility and strong robustness.
To address this challenge, this paper proposes a novel watermarking algorithm. In the proposed method, the color image is first decomposed into its R, G, and B channels, each of which is further divided into non-overlapping 4 × 4 blocks. Candidate embedding blocks are then selected based on entropy calculations, followed by the application of the WHT to the selected blocks. Subsequently, the embedding and extraction of the color watermark are performed according to the differences between paired WHT coefficients in the frequency domain. This strategy fully exploits the energy compaction property of the WHT, embedding watermark information into the cover image by quantizing and adjusting the coefficient differences. As a result, the proposed method achieves a significant improvement in watermark robustness while ensuring imperceptibility to the human visual system.
The main contributions of this paper are summarized as follows:
A WHT-based watermarking algorithm that achieves a superior balance between imperceptibility and robustness is proposed. Experimental results demonstrate that the proposed method exhibits good performance in both aspects. In particular, the algorithm consistently enables accurate watermark extraction under various attacks, outperforming state-of-the-art methods.
An entropy-based block selection mechanism is employed to identify optimal regions for embedding, which enhances the imperceptibility of the watermarking algorithm. In addition, the watermark image is encrypted using the Logistic chaotic map to further enhance security.
A difference-based embedding position selection strategy is proposed, which selects coefficient pairs with the smallest original differences for watermark embedding. This approach effectively minimizes embedding-induced distortion in the WHT coefficients, thereby preserving the high visual quality of the watermarked image.
The structure of this paper is organized as follows:
Section 2 presents the fundamental theories and mathematical derivations underlying the proposed algorithm.
Section 3 details the specific procedures for watermark embedding and extraction.
Section 4 describes the experimental setup, including dataset information and simulation results, and provides an in-depth analysis of the experimental data. Finally,
Section 5 concludes the paper with a comprehensive summary.
4. Experimental Results
The benchmark methods [
35,
36,
37] selected for comparison in this section are all based on a Hadamard Transform framework, where a Hadamard matrix is left-multiplied before watermark embedding in the transform domain. This common methodological foundation ensures an objective and equitable basis for comparison with the proposed algorithm. In the subsequent imperceptibility and robustness experiments, both the proposed algorithm and the benchmark methods are carried out under identical experimental conditions. All experiments were conducted on
RGB cover images and
RGB watermarks, with implementations performed in MATLAB 2024a. In the chaotic encryption phase, the control parameter and the initial value of the Logistic map are set to
and
, respectively, to ensure the system operates in a fully chaotic state. The specific sources of the datasets are detailed in the Data Availability Statement.
Figure 6 displays the cover images and watermarks used in this study.
To evaluate the imperceptibility of the proposed algorithm, the PSNR and Structural Similarity Index Measure (SSIM) are employed, whereas the Normalized NC and Bit Error Rate (BER) are adopted to assess its robustness.
PSNR is employed to quantify the pixel-level distortion between the cover image
I and the watermarked image
. A higher PSNR value indicates greater similarity between the two images. For color images, the overall PSNR is calculated as shown in Equation (
13):
where the PSNR of the
i-th channel is given by Equation (
14):
where
correspond to the R, G, and B channels, respectively;
M and
N denote the number of rows and columns of the color image;
represents the pixel value of the cover image
I at coordinates
in the
i-th channel; and
represents the corresponding pixel value of the watermarked image
.
SSIM evaluates the similarity between two images based on luminance, contrast, and structural similarity. Along with PSNR, it is one of the most widely adopted metrics for assessing imperceptibility in image watermarking. The value of SSIM closer to 1 indicates a higher degree of similarity between the two images. The SSIM is defined as shown in Equation (
15):
where
,
, and
represent the luminance, contrast, and structural comparisons, respectively, and
,
, and
are weighting exponents.
NC measures the correlation between the original watermark
w and the extracted watermark
. The value of NC closer to 1 indicates that the watermark has been accurately and completely extracted, reflecting strong resistance to attacks. The NC is defined as shown in Equation (
16):
where
correspond to the R, G, and B channels, while
M and
N denote the number of rows and columns of the watermark image, respectively.
and
represent the pixel values at coordinates
in the
i-th channel of the original watermark
w and the extracted watermark
, respectively.
BER directly quantifies the extent of errors in the extracted watermark caused by attacks on the watermarked image. A lower BER indicates stronger robustness of the watermarking algorithm, with BER = 0 meaning perfect extraction. The BER is defined as shown in Equation (
17):
where
correspond to the R, G, and B channels of the color image, ⊕ denotes the XOR operation, and
M and
N represent the number of rows and columns of the watermark image, respectively.
and
denote the pixel values at coordinates
in the
i-th channel of the original watermark
w and the extracted watermark
, respectively.
To determine the optimal quantization step size, simulation experiments were conducted using the entire USC-SIPI Image Database to ensure statistical reliability. Since attacks within the same category exhibit similar characteristics, representative attacks are selected from common types of image processing operations, including compression, geometric distortions, and enhancement attacks. Specifically, JPEG2000 compression (CR = 4), Scaling (0.9), and Gaussian noise (0.03%) are employed to simulate typical attacks. The quantization step size T is gradually increased during the experiment, and both imperceptibility (evaluated by PSNR and SSIM) and robustness (evaluated by NC) are assessed.
The experimental results, calculated as statistical averages, are illustrated in
Figure 7. In the figure,
denotes the average NC value under JPEG2000 attack,
represents the average NC value under Scaling attack, and
corresponds to the average NC value under Gaussian noise attack. The PSNR and SSIM values are computed without any attack. As the quantization step size T increases, the PSNR and SSIM values decrease, while the NC values increase. Specifically, a sensitivity analysis reveals a consistent trend across all three attack types: robustness improves rapidly at lower
T values and subsequently stabilizes. This trend implies that a larger T enhances robustness at the cost of reduced imperceptibility. This process validates that the proposed method maximizes robustness while maintaining acceptable imperceptibility. In this paper, based on these global statistics, T = 8 is chosen as the quantization step size. It provides sufficient robustness margin for noise and compression attacks while maintaining the average PSNR above 35 dB, as PSNR > 35 dB generally indicates good visual quality [
44].
4.1. Imperceptibility Analysis
In digital watermarking, good imperceptibility requires that the watermarked image be visually indistinguishable from the original cover image under human visual perception. To evaluate the imperceptibility of the proposed algorithm, we conducted an imperceptibility analysis using the images shown in
Figure 6 as the cover and watermark images. The proposed watermarking algorithm is compared with the algorithms [
35,
36,
37], all of which are based on the Hadamard Transform framework. The experimental results are presented in
Table 3. Guided by the widely accepted benchmarks where PSNR
dB indicates good visual quality [
44] and SSIM
is considered acceptable [
45], we analyze the performance as follows. It can be observed that for multiple cover images, the proposed algorithm consistently achieves PSNR values greater than 35 dB and SSIM values above 0.96, indicating stable performance. Although Algorithms [
35,
36] achieve higher SSIM values in certain cases, their PSNR values sometimes <35 dB, reflecting less stable performance. Considering that SSIM > 0.93 is generally regarded as acceptable, the proposed algorithm provides higher and more stable PSNR performance while maintaining high SSIM values. Overall, in terms of imperceptibility, the proposed algorithm outperforms Algorithms [
35,
36]. Although Algorithm [
37] demonstrates excellent imperceptibility, with both PSNR and SSIM values significantly higher than those of the compared methods, it exhibits a relatively high BER for watermark extraction even in the absence of attacks, and shows weak robustness against various attacks. This suggests that Algorithm [
37] overemphasizes imperceptibility at the expense of accurate watermark recovery.
In addition to ensuring imperceptibility, one of the core objectives of a watermarking algorithm is the complete extraction of watermark information.
Table 4 presents a comparison of the NC and BER results among the proposed method and Algorithms [
35,
36,
37]. The experimental results show that, for all tested images, the value of NC is 1 and the value of BER is 0 for the watermarks extracted using the proposed method, which are identical to those of the original watermark, thereby demonstrating perfect recovery. In contrast, Algorithms [
35,
36,
37] fail to achieve complete extraction in certain watermarked images.
4.2. Robustness Analysis
Digital images transmitted over networks are vulnerable to various processing operations and malicious attacks. Therefore, when designing a robust watermarking scheme, it is essential to ensure that the watermark remains both extractable and recognizable after attacks. Consequently, robustness against attacks has become one of the core criteria for evaluating the effectiveness of watermarking algorithms.
A series of robustness analysis experiments are conducted using the representative images displayed in
Figure 6 as the cover images and watermarks.
Table 5 summarizes the parameters for each attack type and presents the NC values obtained by comparing the original watermark with the extracted watermarks. To quantitatively evaluate the robustness, we establish the following criteria: an NC value greater than 0.90 indicates high-fidelity recovery, while an NC value between 0.70 and 0.90 implies that the watermark content is clearly recognizable and acceptable. The results demonstrate that the proposed algorithm can successfully extract recognizable watermarks under a wide range of attacks.
4.3. Robustness Comparison
This section presents robustness comparison experiments using the cover and watermark images in
Figure 6, comparing the proposed method with state-of-the-art algorithms [
35,
36,
37].
4.3.1. Robustness Against Common Image Processing Attacks
Noise attacks are among the most common types of image distortions, with Gaussian noise and Speckle noise being two representative forms.
Figure 8 and
Figure 9 illustrate the NC values of the extracted watermarks under Gaussian and Speckle noise for different watermarking algorithms. The experimental results demonstrate that, under both types of noise attacks, the proposed method consistently achieves the highest NC values, indicating superior robustness compared with the benchmark methods.
JPEG2000 is a widely adopted image compression standard.
Figure 10 shows the NC values of the extracted watermarks under JPEG2000 compression for different watermarking algorithms. The results show that the proposed method yields slightly lower NC values compared with Algorithms [
35,
36], but achieves higher NC values than Algorithm [
37]. Importantly, the NC values of the proposed method remain above 0.9 across all tested compression parameters, indicating that the extracted watermarks are still clearly recognizable. This is because JPEG2000 compression shifts pixel differences into the high-frequency components and discards part of the information through coarse quantization during compression. Since the proposed method embeds the watermark into WHT coefficients with small differences, it is inherently more sensitive to pixel variations, which explains why its robustness advantage is less pronounced under JPEG2000 compression attacks.
Filtering represents another major category of image attacks, among which Gaussian low-pass filtering is widely adopted.
Figure 11 compares the NC values of the extracted watermarks for different algorithms under Gaussian low-pass filtering. The results show that the proposed method exhibits a clear performance advantage in this scenario, achieving higher NC values than the competing methods and thereby verifying its effectiveness against Gaussian low-pass filtering attacks.
4.3.2. Robustness Against Geometric Attacks
Cropping, Scaling, and Rotation are typical types of geometric attacks.
Figure 12 illustrates the NC values of the extracted watermarks for different algorithms under cropping attacks. The experimental results demonstrate that the proposed method achieves significantly higher NC values than other methods.
Figure 13 presents the NC values of the extracted watermarks under scaling attacks, where the proposed method consistently achieves higher NC values than the competing algorithms. In summary, the proposed algorithm demonstrates strong robustness against both Cropping and Scaling attacks.
For rotation angles of
and
, the proposed method achieves higher NC values than Algorithm [
35] but slightly lower values than Algorithms [
36,
37]. However, since all algorithms yield NC values above 0.9 at these angles and the extracted watermarks remain clearly recognizable, the differences between the methods are insignificant. At a rotation of
, the proposed method outperforms all compared methods in terms of NC values. When the angle increases to
, the proposed method achieves higher NC values than Algorithms [
35,
37], though slightly lower than Algorithm [
36] (
Figure 14). Overall, the proposed method can effectively extract watermarks subjected to various rotation angles.
4.3.3. Robustness Against Image Enhancement Attacks
Brightening, Darkening, and Sharpening are typical types of image enhancement attacks.
Figure 15,
Figure 16 and
Figure 17 illustrate the watermark extraction results of different algorithms under these three attacks. The experimental results demonstrate that, across various attack parameters, the proposed method consistently achieves higher NC values than all competing methods, indicating superior robustness against these three types of image enhancement attacks.
4.3.4. Robustness Comparison Under Aligned Imperceptibility
It is well-established in watermarking research that imperceptibility and robustness are conflicting attributes. A strictly fair comparison requires normalizing the visual quality to evaluate the robustness performance. However, the baseline methods [
35,
36,
37] exhibit widely varying PSNR levels (ranging from ≈33 dB to ≈52 dB) due to their fixed parameter settings, making direct comparison difficult.
To address this, we dynamically adjusted the quantization step size
T of the proposed algorithm to strictly align its PSNR with each baseline method. The experimental results are presented in
Table 6.
These results conclusively demonstrate that the proposed algorithm optimizes the trade-off between imperceptibility and robustness more effectively than the state-of-the-art methods. The proposed scheme consistently yields higher watermark extraction accuracy under the same visual quality constraints.
4.4. Embedding Capacity Analysis
In this section, the embedding capacity of the proposed algorithm is compared with state-of-the-art image watermarking algorithms [
35,
36,
37]. With the exception of [
35], all other algorithms achieve a maximum embedding capacity exceeding 0.25 bits per pixel. During the watermark embedding process [
35,
36,
37], and the proposed method all partition the cover image into 4 × 4 blocks; however, ref. [
35] embeds 2 bits per block, whereas [
36,
37], and the proposed method embed 4 bits per block.
Table 7 presents the maximum embedding capacity of the different watermarking algorithms.
4.5. Real-Time Test
To evaluate the computational efficiency of the proposed scheme, the execution time for watermark embedding and extraction was measured. The experiments were conducted on a computer equipped with an Intel Core i7-9750H CPU(Intel Corporation, Santa Clara, CA, USA), 16 GB RAM, and MATLAB R2024a. The average execution time was calculated over 1000 independent runs on cover images of size
and watermarks of size
.
Table 8 compares the average execution times of the proposed scheme with methods [
35,
36,
37].
It can be observed that the proposed method requires slightly more time compared to the referenced algorithms. This marginal increase in computational cost is primarily attributed to the Logistic chaotic encryption and the entropy calculation and sorting. However, considering the significant improvements in robustness and security demonstrated in previous sections, this trade-off is well-justified. Furthermore, the total execution time remains within the order of seconds, ensuring the algorithm’s feasibility for practical applications.