Blind Watermarking for Hiding Color Images in Color Images with Super-Resolution Enhancement

This paper presents a novel approach for directly hiding the pixel values of a small color watermark in a carrier color image. Watermark embedding is achieved by modulating the gap of paired coefficient magnitudes in the discrete cosine transform domain according to the intended pixel value, and watermark extraction is the process of regaining and regulating the gap distance back to the intensity value. In a comparison study of robustness against commonly encountered attacks, the proposed scheme outperformed seven watermarking schemes in terms of zero-normalized cross-correlation (ZNCC). To render a better visual rendition of the recovered color watermark, a generative adversarial network (GAN) was introduced to perform image denoising and super-resolution reconstruction. Except for JPEG compression attacks, the proposed scheme generally resulted in ZNCCs higher than 0.65. The employed GAN contributed to a noticeable improvement in perceptual quality, which is also manifested as high-level ZNCCs of no less than 0.78.


Introduction
Nowadays, multimedia data are stored in a digital form suitable for access and distribution on the Internet. Among the available types of multimedia (e.g., text, images, audio, and video), color images are the most frequently used data on openly accessible network platforms [1,2]. While an extensive amount of image data is continuously transmitted over networks, illicit usage of color images has resulted in problems such as unauthorized or unlicensed copying, intellectual property infringements, and malicious tampering. The issue of copyright violation has drawn considerable attention from researchers and content providers who continuously seek feasible countermeasures to protect their ownership and copyright material. To this end, digital watermarking is an effective means of securing data content [3][4][5]. It plays a pivotal role in information security and has broad application prospects.
Image watermarking is the process of embedding proprietary information into carrier images via the use of specific data processing algorithms. The embedded information (or simply a watermark) is supposedly imperceptible and sufficiently robust to withstand unintentional modifications and/or intentional attacks. In addition, the watermark information should be retrievable and easy to verify whether the inspected image has tampering or infringement problems. Imperceptibility, robustness, and capacity are three mutually conflicting properties of image watermarking [6]. The balance between these often relies on the adjustment of the embedding strength of the watermark. Given that the payload capacity is fixed, increasing the embedding strength may improve the robustness but impair the imperceptibility. To date, it has remained a challenge to find an ideal watermarking scheme that can comprehensively address the three requirements.
Image watermarking can be divided into blind and non-blind categories. Those belonging to the non-blind category require the original image and/or additional information for watermark extraction, whereas blind ones require neither. As the original image is not always accessible during the stage of watermark extraction, the blind approach is most prevalent in practical applications. Image watermarking schemes can also be classified based on the domain chosen to hide the watermark. Transform-domain approaches are widely used as they can take advantage of an image's intrinsic characteristics and human visual perception. Commonly observed transformations for image watermarking comprise the discrete Fourier transform (DFT) [7][8][9], discrete wavelet transform (DWT) [10][11][12][13], discrete cosine transform (DCT) [14][15][16][17][18], and various matrix decompositions (MD) [19][20][21][22][23][24]. Combinations of the abovementioned transforms were also attempted with encouraging results [25][26][27][28][29].
Although watermarks can be any form of digital data conveying ownership information, the most frequently found watermarks in the literature were image logos. This can be attributed to the fact that the performance of an image watermarking scheme can be easily identified through visual inspection of the recovered watermarks. Our literature survey indicates that there has been no shortage of high-capacity color watermarking schemes proposed in recent years. The following are some examples. In [9], Zhang et al. exploited the unique features of the direct current (DC) component of the DFT. Each binary bit was determined by manipulating the inequality relationship between the DC components of two adjacent image blocks of size 2 × 2. In the original literature, the authors hid two 24-bit color images of 32 × 32 pixels in a carrier color image of size 512 × 512, but it is possible to embed 64 × 64 × 3 × 8 bits into the host image if every 2 × 2 block is used to perform watermarking.
After taking the Haar transform on each image block, Liu et al. [30] chose the coefficient with the largest energy to perform quantization. Given that one bit was hidden in every two 2 × 2 blocks, the maximum allowable capacity was the exact number of 64 × 64 × 3 × 8 bits. As a result, the proposed watermarking scheme achieved a payload capacity of 3/8 (= 64 × 64 × 24/512 × 512) bit per pixel (bpp).
The concept of embedding watermark bits into the transformed coefficients derivable from another domain was also found in [27,31]. In [27], Su et al. designed a combineddomain watermarking scheme that nominally modulated the first-level approximation coefficients of the DWT. An essential feature of this scheme lies in the acquisition of the targeted DWT coefficient in the spatial domain. Such a design exploits not only the computational efficiency of spatial-domain watermarking schemes but also the strong robustness of transform-domain schemes. Yuan et al. adopted a similar strategy [31], which explores the DC component of a 2D DCT in the spatial domain to achieve largecapacity watermarking. Here, a binary bit is characterized by the relativity of the DC coefficients between adjacent blocks. For these two aforementioned watermarking schemes, the resultant payload capacity can reach as high as 3/8 bpp as long as the transformed coefficients are derived from block matrixes of 2 × 2 pixels.
Following the division of the carrier image into block matrices, Chen et al. [32] presented an efficient blind watermarking algorithm using the Walsh-Hadamard transform (WHT). A binary bit can be embedded into a WHT matrix block of size 4 × 4 by fine-tuning the paired coefficients in the first row of the transformed matrix. Regarding the class of MDbased image watermarking, Hsu et al. [23] proposed a high-capacity QR decomposition (QRD)-based image watermarking algorithm that manipulates two pairs of elements drawn from the first column of the orthogonal matrix after applying the QRD to a block of size 4 × 4 pixels. Meanwhile, Liu et al. [24] proposed embedding watermark information into the maximum eigenvalue of the Schur decomposition of selected matrices using quaternary coding. This scheme claimed to be efficient because multiple bits were accommodated in the quaternary code. As these forgoing three approaches proceeded from block matrixes of size 4 × 4, a high capacity at a level of 3/8 bpp is feasible if each block can contain 2 bits. Although the aforementioned high-capacity blind color image watermarking schemes have achieved certain degrees of success, their robustness against severe attacks could not yet match acceptable expectations. Hence, this study aims to develop a scheme capable of performing efficient high-capacity blind color image watermarking, along with the use of a generative adversarial network (GAN) to enhance the visual quality of extracted watermarks. Consequently, the contributions in this study include two aspects: (1) the development of a novel blind watermarking scheme that enables a direct and effective embedding of 8-bit unsigned pixel values, which facilitates the goal of hiding color images in color images; (2) the incorporation of a GAN to enhance the watermark images retrieved from the watermarked carrier images, thus making the watermarks more visually recognizable.
The remainder of this paper is organized as follows. Section 2 outlines the technical background pertaining to the fundamentals of DCT and super-resolution (SR). Section 3 elucidates the procedures for watermark embedding and extraction in the DCT domain. Section 4 presents experimental results. Improvements owing to the incorporation of the GAN are also reported. Finally, conclusions are presented in Section 5.

DCT-Based Watermarking
Among the transformations used in digital image processing, the DCT has drawn considerable attention as it can efficiently compact the signal energy at low frequencies [33]. This property makes the DCT highly suitable for applications in image compression and watermarking. The two-dimensional (2D) DCT performed on an image matrix acts the same as a one-dimensional (1D) DCT performed along a single dimension, followed by another 1D DCT in the other dimension. The 2D DCT for an image block matrix X = {X mn } M×N and output DCT matrix Y = Y pq M×N is defined as where X mn denotes the (m, n) th pixel value of the image block, Y pq represents the frequency component at position (p, q), and M and N correspond to the row and column size of X, respectively. After the 2D DCT, the output matrix Y containing direct-current (DC) and alternating-current (AC) components can be obtained. The DC coefficient situated at the top-left corner of the resulting DCT matrix signifies the average intensity of the image block. Depending on the distance from the DC component, the remaining coefficients in Y can be classified as low-, middle-, and high-frequency ranges. DCT-based image watermarking is referred to as the modification of DCT coefficients according to specific rules imposed by watermarking. Once the watermarking process is complete, the application of the inverse DCT (IDCT) can recover the image block with concealed watermark information. The formula for the 2D IDCT is expressed as follows:

Super-Resolution Reconstruction
SR is a technique used to reproduce images with higher resolution. This technique can be employed as a post-process to enhance the watermark image. In this study, we only consider single-image SR (SISR) that attempts to reconstruct high resolution from a single low-resolution watermark image. SISR is challenging because high-frequency details are already lost in the low-resolution image. Owing to the advancement of deep learning, reestablishing the complex mapping between low-and high-resolution images is possible using very deep neural networks.
As demonstrated in [34][35][36][37], several methods developed based on convolutional networks have shown promising results. Ledig et al. [38] not only tackled the SISR problem using a deep residual network (ResNet) but also introduced a GAN framework to acquire photo-realistic natural images. Figure 1 illustrates the GAN structure for carrying out SISR. The generator is a 16-block deep ResNet, where the function is denoted by G θ G (·), capable of recovering a high-resolution image I * HR ( G θ G (I LR )) from a low-resolution image I LR . The discriminator, which is expressed as D θ D (·), renders the probability whether the input is from either the reference or generator distributions. Subsequently, the GAN-based network is optimized to solve an adversarial min-max problem as follows: where θ G and θ D correspond to the model parameters associated with the generator and discriminator, respectively. Further, E[·] denotes the expectation operator based on a probability distribution, and P(·) is a probability density function. The basic idea behind the above formulation is to train the generative model G θ G (·) to fool the discriminator D θ D (·), which is supposedly trained to distinguish between model-generated and real images. It was reported in [38] that, with the GAN-based framework, using perceptual loss in the discriminator model contributed to significant improvements in perceptual quality.
As demonstrated in [34][35][36][37], several methods developed based on convo works have shown promising results. Ledig et al. [38] not only tackled the S using a deep residual network (ResNet) but also introduced a GAN framewo photo-realistic natural images. Figure 1 illustrates the GAN structure for SISR.    Figure 2 depicts the entire process of watermark embedding and extraction. The watermarks used in this study were downsized color images of 64 × 64 pixels. To ensure information security, each watermark image was scrambled using the Arnold transform [39] before embedding. When extracting the watermark, one needs a correct key to descramble the watermark to restore the actual content.  Figure 2 depicts the entire process of watermark embedding and extraction. The watermarks used in this study were downsized color images of 64 64 × pixels. To ensure information security, each watermark image was scrambled using the Arnold transform [39] before embedding. When extracting the watermark, one needs a correct key to descramble the watermark to restore the actual content.

Watermark Embedding via the DCT-MGA Scheme
The proposed DCT-based magnitude gap adjustment (DCT-MGA) scheme starts by partitioning the carrier image into non-overlapping blocks with a size of 8 8 × . The next step is to take the 2D DCT of each block in the R , G , and B channels individually. After obtaining the required DCT coefficients, watermark embedding is a joint operation of two involved coefficients such that the gap between the two coefficient magnitudes matches the desired length. Let , such that the gap between them can reflect the intended pixel value. For this purpose, we first define the following parameters: ( , , ) 127.5 255 where low m and high m are the lower and higher levels of the two magnitudes, respectively, and ˆ( , , ) w i j k denotes the ( , ) th i j pixel value of the scrambled watermark in the

Watermark Embedding via the DCT-MGA Scheme
The proposed DCT-based magnitude gap adjustment (DCT-MGA) scheme starts by partitioning the carrier image into non-overlapping blocks with a size of 8 × 8. The next step is to take the 2D DCT of each block in the R, G, and B channels individually. After obtaining the required DCT coefficients, watermark embedding is a joint operation of two involved coefficients such that the gap between the two coefficient magnitudes matches the desired length.
Let Y p 1 ,q 1 ,r 1 and Y p 2 ,q 2 ,r 2 denote the two DCT coefficients that participate in the embedding process. Further, (p 1 , q 1 ) and (p 2 , q 2 ) represent the coordinates of the paired coefficients, while r 1 and r 2 signify the channel sources for each respective coefficient. For example, (p 1 , q 1 , r 1 ) =(3, 0, 1) and (p 2 , q 2 , r 2 ) =(3, 1, 1) were selected to embed the red color pixel values. Watermark embedding involves the manipulation of two coefficient magnitudes, namely, Y p 1 ,q 1 ,r 1 and Y p 2 ,q 2 ,r 2 , such that the gap between them can reflect the intended pixel value. For this purpose, we first define the following parameters: where m low and m high are the lower and higher levels of the two magnitudes, respectively, andŵ(i, j, k) denotes the (i, j) th pixel value of the scrambled watermark in the k th channel. A value of 255 in the denominator of Equation (6) is the maximum allowable level of an 8-bit unsigned integer; thus, the midline of the dynamic range becomes 127.5. Further, α signifies the embedding strength. Consequently, the gap (symbolized as ρ) reflects a scaled quantity deviating from the midline, with a distance proportional toŵ(i, j, k). After determining the gap ρ, the two magnitudes are adjusted accordingly.
Equation (7) adjusts m low in the least-squares sense, but the resulting outputm low must be positive to conform with the definition of magnitude. Equation (8) confines the gap betweenm high andm low to exactly ρ. After determiningm high andm low , we reassign the two involved coefficients as: where sgn(·) denotes a sign function. This embedding process is repeated for every block and channel involved. After replacing the modified coefficients in the DCT block matrix, we used the IDCT to obtain the watermarked image block.
In this study, we deliberately chose three pairs of adjacent DCT coefficients to embed three pixel values in 8-bit unsigned integer format. While watermarks hidden at high frequencies are vulnerable to high-frequency attacks (e.g., low-pass filtering and image compression), watermarks at low frequencies cannot resist attacks such as unsharp filtering and histogram equalization. A compromise can be reached by selecting DCT coefficients situated in the middle frequencies. Figure 3 illustrates the locations of the three coefficient pairs selected to embed the three pixel values in this study. Specifically, the three designated where the first two indices in each coordinate representation indicate the location of the 2D DCT matrix. The third index signifies the color channels with 1, 2, and 3 corresponding to the red, green, and blue colors, respectively. The motives for these pair arrangements are twofold. First, the watermarking process generally requires an increase in one magnitude but a decrease in the other for each pair. As the energy of each color component remains roughly balanced, modifications of a pair of DCT coefficients in adjacent locations with the same color are expected to yield less variation in color intensity. Second, because textual patterns of adjacent DCT coefficients retain partial similarity, the image reconstructed after watermarking can show more structural similarities.

Watermark Extraction and Regulation
Watermark extraction is the process of retrieving a watermark from a watermarked color image. The right half of Figure 2 shows the extraction procedure. The first two steps involve partitioning the watermarked color image into non-overlapping blocks of size 8 8 × and then applying 2D DCT to these blocks. For each set of { } pqr Y  derived from an image block, we select three pairs of DCT coefficients (as shown in Figure 3) and then compute the pixel value using the following formula:

Watermark Extraction and Regulation
Watermark extraction is the process of retrieving a watermark from a watermarked color image. The right half of Figure 2 shows the extraction procedure. The first two steps involve partitioning the watermarked color image into non-overlapping blocks of size 8 × 8 and then applying 2D DCT to these blocks. For each set of Y pqr derived from an image block, we select three pairs of DCT coefficients (as shown in Figure 3) and then compute the pixel value using the following formula: where w(i, j, k) denotes the retrieved (i, j) th pixel value in the k th channel and the tilde implies that the watermark image may have suffered from attacks. In our experiments, it was observed that, under certain attacks, the extracted watermarks might appear brighter or darker than the original watermarks. Interestingly, watermarks with altered luminance are still visually recognizable, suggesting that the attacks have merely caused the pixel values to be rescaled. Restoring the retrieved pixel values back to the normal range is expected to provide a better view of the watermark. Consequently, a simple remedial algorithm is introduced below to rectify the recovered watermark.
According to Equation (10), the output is obtained by adding a deviation term (i.e., d 255 α Y p 1 ,q 1 ,r 1 − Y p 2 ,q 2 ,r 2 ) to the midline value 127.5. Because w(i, j, k) can only fall between 0 and 255, the absolute value of the deviation term (i.e., |d|) must not exceed 127.5. Luminance regulation begins with the sorting of |d|'s gathered from every channel. Then, the average in the range 90-95% of the sorted |d|'s, termed η, serves as a gauge to mark the upper margin. Depending on the resulting η, the deviation term is proportionally restrained to a range between ψ low and ψ up as follows: where w reg (i, j, k) denotes the regulated outcome from the initially retrieved pixel value w(i, j, k). Terms ψ up and ψ low respectively represent the upper and lower boundaries of the expected η. These two boundaries can be deduced from the dataset of possible watermarks. In this manner, the regulator restrains the luminance of the color watermark to a normal value. Note that the scale adjustment (as given in Equation (11)) has no influence on the textural content; however, it makes the watermark image more visually distinguishable. After gathering all pixels w reg (i, j, k) , the last step of watermark extraction is to use the decryption key to discern the watermark image.

Performance Evaluation
The test materials comprise 16 color images of 512 × 512 pixels collected from the USC-SIPI [40] and CVG-UGR [41] image databases. Figure 4 shows the 16 color images in a 4 × 4 array. The watermarks were obtained by downsampling the 16 color images to a size of 64 × 64 pixels. To accommodate such a watermark in a normal-size carrier image, each 8 × 8-sized image block must contain three 8-bit unsigned integer values. The payload capacity was thereby 3/8 (= (64 × 64 × 3 × 8)/(512 × 512)) bpp. Consequently, there are 256 possible combinations between the carrier images and intended watermarks. To enhance the security of the watermark information, each color channel of the color watermark image was scrambled before embedding using the Arnold transform [39]. To this end, the retrieved watermark must undergo a descrambling process after watermark extraction. Figure 5 presents the test watermarks with their scrambled versions examined in our experiments.
To render a balanced performance between robustness and imperceptibility, this study set the embedding strength α (as defined in Equation (6)) at 90 for the proposed DCT-MGA. The two boundaries for luminance regulation were chosen as ψ up = 83 and ψ low = 120, which correspond to the mean value plus/minus the standard deviation of the η's derived from all the images in the IAPR TC-12 database [42]. In addition to the proposed DCT-MGA, seven recently published blind color image watermarking schemes were considered for performance evaluation and comparison. For simplicity, the seven schemes are abbreviated hereinafter as QRMM22 [23], WHT21 [32], Haar21 [30], Schur21 [24], DFT20 [9], DCT20 [31], and DWT20 [27], with the last two digits signifying the publication year.
are 256 possible combinations between the carrier images and intended watermarks. To enhance the security of the watermark information, each color channel of the color watermark image was scrambled before embedding using the Arnold transform [39]. To this end, the retrieved watermark must undergo a descrambling process after watermark extraction. Figure 5 presents the test watermarks with their scrambled versions examined in our experiments.  To render a balanced performance between robustness and imperceptibility, this study set the embedding strength α (as defined in Equation (6) , which correspond to the mean value plus/minus the standard deviation of the η 's derived from all the images in the IAPR TC-12 database [42]. In addition to the pro-

Imperceptibility Test
When determining the embedding strength, trade-offs exist amid imperceptibility, robustness, and capacity. In general, a weak embedding strength is conducive to imperceptibility but detrimental to robustness. Raising the embedding strength can reinforce the watermark hidden in the carrier image but may also impair the visual quality of the watermarked image. To probe the influence of embedding strength, we conducted a pilot experiment to determine an adequate embedding strength that achieves balance in imperceptibility and robustness.
To assess the distortion of the carrier image due to the watermarking process, we adopted the peak signal-to-noise ratio (PSNR) and mean structural similarity (mSSIM) metrics given below to assess the degradation of the image quality.
where I = {I m,n,t } N row ×N col ×3 andÎ = Î m,n,t N row ×N col ×3 , respectively, represent the original and watermarked images of M row × N col pixels with a depth of three color channels.
B l,k,t andB l,k,t in Equation (13) correspond to the (l, k) th windows in the t th channel acquired from I andÎ, respectively. The denominator value L × K × 3 indicates the number of image blocks. The function SSIM(·) is responsible for measuring the degree of similarity between B r,s,t andB r,s,t .
SSIM B l,k,t ,B l,k,t = 2µ B l,k,t µB l,k,t where µ B l,k,t and σ 2 B l,k,t , respectively, represent the mean and variance of B l,k,t ; σ B l,k,tBl,k,t denotes the covariance between B l,k,t andB l,k,t ; k 1 and k 2 are values introduced to ensure the stability of SSIM. By default, these values are k 1 = 0.01 and k 2 = 0.02. Figure 6 depicts the average PSNRs and mSSIMs for α in the range 60-100 with increments of 2. Based on the experimental results shown in Figure 6, we chose α as 90 as this choice led to a PSNR near 38.5 dB and an mSSIM of approximately 0.96. In general, watermarked images with this PSNR and mSSIM are deemed to possess satisfactory quality.
Note that the seven watermarking schemes under comparison were intended to hide binary information, but not all of them were initially designed to attain a payload capacity as high as 3/8 bpp. If such a high capacity is under request, the original embedding strength specified in the literature may lead to unacceptable PSNR and mSSIM outcomes. Consequently, while implementing the compared schemes, we had to adjust the parameters relevant to the embedding strength such that the PSNRs and mSSIMs fell within an acceptable range.  Figure 6 depicts the average PSNRs and mSSIMs for α in the range 60-100 with increments of 2. Based on the experimental results shown in Figure 6, we chose α as 90 as this choice led to a PSNR near 38.5 dB and an mSSIM of approximately 0.96. In general, watermarked images with this PSNR and mSSIM are deemed to possess satisfactory quality. Note that the seven watermarking schemes under comparison were intended to hide binary information, but not all of them were initially designed to attain a payload capacity as high as 3 / 8 bpp. If such a high capacity is under request, the original embedding strength specified in the literature may lead to unacceptable PSNR and mSSIM outcomes. Consequently, while implementing the compared schemes, we had to adjust the parameters relevant to the embedding strength such that the PSNRs and mSSIMs fell within an acceptable range. Table 1 presents the statistical results measured for every possible combination of test images and watermarks. With the use of the specified embedding strength, the average PSNR and mSSIM obtained from the proposed DCT-MGA were 38.48 dB and 0.959, respectively, both of which ranked the highest among all compared schemes. The PSNRs  Table 1 presents the statistical results measured for every possible combination of test images and watermarks. With the use of the specified embedding strength, the average PSNR and mSSIM obtained from the proposed DCT-MGA were 38.48 dB and 0.959, respectively, both of which ranked the highest among all compared schemes. The PSNRs acquired by DCT20 and DFT20 fell below 33 dB, suggesting that these two schemes may cause excessive pixel modification. Ironically, the mSSIMs resulting from these two schemes were not the worst. This can be attributed to the fact that watermark embedding merely affects the DC component, which causes less damage to the textural structure of the carrier image. The worst mSSIM value generally occurred in the case of DWT20. Despite the average PSNR obtained from DWT20 remained above 36 dB, the resulting mSSIM unexpectedly dropped to 0.925. This is thought to be due to the watermark being embedded in a relatively wide frequency sub-band.

Robustness Test
The second phase of the performance evaluation focused on the robustness of the compared watermarking schemes in the presence of commonly encountered attacks. The types of attacks considered in this study included image compression, noise corruption, filtering, histogram equalization, geometric correction, cropping, and luminance adjustment. Table 2 lists the specifications for the intended attacks. As discussed in Section 3, the proposed DCT-MGA directly embeds 8-bit unsigned pixel values into the specific DCT coefficients. By contrast, all other compared watermarking schemes can only embed binary bits. Because the embedding data type of the proposed DCT-MGA differs from the others that perform binary watermarking, such a difference imposes difficulties in performance comparison. One feasible assessment metric applicable to either type of watermarking is zero-normalized cross-correlation (ZNCC), which characterizes the similarity of two data sequences without concerning the data type. For a watermark constituted by pixel values, ZNCC is defined as (15) where w v denotes the mean of the pixel values obtained from a watermark image W v = {w v } of size n L × n K (=64 × 64) with n ch (=3) channels, and the tilde signifies retrieval after a possible attack.
Alternatively, the ZNCC for the watermark composed of binary bits is (16) where w b denotes the mean of the binary values obtained from a watermark image of n L × n K × n ch × n b bits. Here, n L = n K = 64, n ch = 3 and n b = 8. As presented in Table 3, the proposed DCT-MGA rendered satisfactory ZNCCs that were normally larger than 0.65, except for JPEG compression. The lowest ZNCC occurred in Case A.2, where the quality factor was set to 40. Nonetheless, the seemingly disappointing score of 0.315 was still the highest among the eight compared schemes. In addition, except for cases D (salt-and-pepper noise corruption) and J (histogram equalization), the DCT-MGA generally obtained the highest rank. Other schemes may perform well, but they cannot handle all the types of attacks considered in this study. In Table 4, two typical watermarks, retrieved from the watermarked "Lena" color image for all the different schemes in the presence of various attacks, are presented as extracted watermark examples. A simple visual inspection of these watermarks can easily demonstrate the superiority of the proposed DCT-MGA.

Watermark Enhancement
Because the recovered watermark contains only 64 × 64 pixels, such a size makes it blurry when zoomed in. A possible method to improve the quality of the image watermark is to enhance the clarity and resolution using deep learning techniques. In this study, we adopted the SR-GAN framework developed in [38] to perform image denoising and SR reconstruction after recovering watermarks from the watermarked images. The employed generator utilizes a residual network (ResNet) capable of converting low-resolution images (64 × 64 in size) to high-resolution images (of size 256 × 256). SR-ResNet can operate independently to improve the visual quality of the watermark; however, the output images are often overly smoothed [38,43]. With the exploitation of the GAN, the generator is trained to fool the discriminator rather than to minimize the difference between the desired and generated image outputs. It was reported in [38] that, with the adoption of a perceptual loss function, the SR-GAN is empowered to recover photo-realistic textures from downsampled images.
The watermarks used for training the GAN comprise 1000 color images taken from the IAPR TC-12 database [42]. The four images (i.e., "Lena," "Baboon," "Peppers," and "F16") shown on the top row of Figure 4 served as the carrier images in a simulation of watermark extraction under attacks. Prior to network training, we adopted a preprocessing method similar to image augmentation to expand the number of image samples. In addition to the ideal case without any attack, we applied every attack listed in Table 2 to the four watermarked images and recovered the watermarks using Equation (10). The descrambled watermark (of size 64 × 64) and its original high-resolution image (of size 256 × 256) form an input-output sample pair. Overall, 76,000 (4 × 19 × 1000) watermark samples were used in the GAN training.
The improvement due to the SR-GAN is demonstrated in Figure 7, which presents an exemplary watermark retrieved from the watermarked "Lena" image under various attacks. Each subplot in Figure 7 comprises three parts: the left one shows the retrieved watermark on the upper-left corner and its up-sampled version; the middle one presents the high-resolution image resulting from the SR-ResNet trained based on the mean squared error (MSE) between the recovered high-resolution image and the ground truth; the right one corresponds to the so-called "photo-realistic" images rendered by the SR-GAN. As evidenced by this demonstration, SR-ResNet inherits the merits of convolutional neural networks, which not only enhance the resolution but also suppress the noise caused by attacks. Nonetheless, the use of MSE as a loss function in the SR-ResNet tends to yield overly smooth textures. With the involvement of the GAN, the generator network can render perceptually satisfying images that match the expected resolution.
As also revealed in Figure 7, the effect due to JPEG compression deserves special attention. It can be found in Subplots A.1 and A.2 that the extracted watermarks were not only noisier but also had chromatic aberrations in the associated colors. Such a phenomenon can be attributed to the poor preservation of the chroma information in the high-frequency region. Thanks to the SR-ResNet and SR-GAN, the problems with the noise, chroma, and limited resolution could be substantially ameliorated.
To determine whether SR-ResNet and SR-GAN are conducive to the recognition of the extracted watermark, we computed the ZNCCs between the recovered and real watermarks in the presence of all the attacks considered in this study. Figure 8 presents the results as a bar chart, wherein each attack category comprises four bins, denoted as "w64," "BIw256," "SR-ResNetw256," and "SR-GANw256." Specifically, "w64" indicates the results derived from watermarks of size 64 × 64, while "BIw256" corresponds to the results from the bicubically interpolated watermarks of size 256 × 256. Meanwhile, "SR-ResNetw256" and "SR-GANw256" represent those obtained from the SR-ResNet and SR-GAN, respectively. Several conclusions can be drawn from Figure 8. First, the interpolated watermarks did not improve the ZNCCs. Second, SR-ResNet substantially enhanced the ZNCC when the watermarks were severely damaged by malicious attacks. This can be attributed to the denoising capability of the ResNet. Third, the ZNCCs in the case of "SR-GANw256" appeared somewhat less than those observed in "SR-ResNetw256." Such a result is not difficult to understand as the SR-GAN tends to rebuild high-frequency details that are different from the original content based on what has been learned in the training stage. Thus, the added details tend to be detrimental to the similarity measure in the pixel space.  Table 2.
To determine whether SR-ResNet and SR-GAN are conducive to the recognition of the extracted watermark, we computed the ZNCCs between the recovered and real watermarks in the presence of all the attacks considered in this study. Figure 8 presents the results as a bar chart, wherein each attack category comprises four bins, denoted as "w64," Figure 7. Super-resolved watermarks retrieved from the watermarked "Lena" image under various attacks. The three images from left to right in each cell correspond to the results of "w64/BIw256," "SR-ResNetw256," and "SR-GANw256," respectively. The label underneath each subfigure corresponds to the attack type listed in Table 2.
It is noted that the GAN merely plays a supporting role to denoise and super-resolve the small-sized watermark extracted from a watermarked color image, presumably after an attack. The purpose of the GAN is to make the recovered watermark more visually recognizable. It is certainly not a problem incorporating the GAN with other watermarking schemes. Although the experimental data were not presented or visualized in detail, our experiences with the other schemes followed a similar trend observed in Figures 7 and 8. Specifically, the ResNet generally made a considerable contribution when the original ZNCC was low; the involvement of the GAN conduced to image clarity perceived by the naked eye but did not necessarily improve the ZNCC. Providing the extracted watermarks were more accurate and precise, the subsequent ResNet and GAN could have the opportunity to enhance the images with the desired resolution and fidelity. It appears that the development of a competent GAN without compromising the ZNCC is also an imperative concern in watermark enhancement. GAN, respectively. Several conclusions can be drawn from Figure 8. First, the interpolated watermarks did not improve the ZNCCs. Second, SR-ResNet substantially enhanced the ZNCC when the watermarks were severely damaged by malicious attacks. This can be attributed to the denoising capability of the ResNet. Third, the ZNCCs in the case of "SR-GANw256" appeared somewhat less than those observed in "SR-ResNetw256." Such a result is not difficult to understand as the SR-GAN tends to rebuild high-frequency details that are different from the original content based on what has been learned in the training stage. Thus, the added details tend to be detrimental to the similarity measure in the pixel space. It is noted that the GAN merely plays a supporting role to denoise and super-resolve the small-sized watermark extracted from a watermarked color image, presumably after an attack. The purpose of the GAN is to make the recovered watermark more visually recognizable. It is certainly not a problem incorporating the GAN with other watermarking schemes. Although the experimental data were not presented or visualized in detail, our experiences with the other schemes followed a similar trend observed in Figures 7  and 8. Specifically, the ResNet generally made a considerable contribution when the original ZNCC was low; the involvement of the GAN conduced to image clarity perceived by the naked eye but did not necessarily improve the ZNCC. Providing the extracted watermarks were more accurate and precise, the subsequent ResNet and GAN could have the opportunity to enhance the images with the desired resolution and fidelity. It appears that the development of a competent GAN without compromising the ZNCC is also an imperative concern in watermark enhancement.

Conclusions
We introduced a new type of blind image watermarking scheme that directly embeds a color watermark into a carrier color image. Watermark embedding was implemented by manipulating the magnitudes of paired 2D DCT coefficients, whereas the extraction of the embedded watermark was resolved from the relative positions of these two coefficient magnitudes. To offset the stretching effect on the coefficient magnitudes owing to attacks, we also attached a regulator to restrain the pixel values within a legitimate range.
In a comparison of seven recently developed schemes engaged in high-capacity color-image watermarking, the experimental results prove the superiority of the proposed

Conclusions
We introduced a new type of blind image watermarking scheme that directly embeds a color watermark into a carrier color image. Watermark embedding was implemented by manipulating the magnitudes of paired 2D DCT coefficients, whereas the extraction of the embedded watermark was resolved from the relative positions of these two coefficient magnitudes. To offset the stretching effect on the coefficient magnitudes owing to attacks, we also attached a regulator to restrain the pixel values within a legitimate range.
In a comparison of seven recently developed schemes engaged in high-capacity colorimage watermarking, the experimental results prove the superiority of the proposed DCT-MGA scheme in terms of both imperceptibility and robustness. Overall, DCT-MGA makes it possible and reliable to embed a color watermark of size 64 × 64 into a carrier image of size 512 × 512 with satisfactory performance. In addition to DCT-MGA, a GAN was employed to enhance the extracted watermark. As demonstrated by the improvement in ZNCC, the generator network renders a color watermark image with a higher resolution and less noise. Nonetheless, the DCT-MGA scheme still lacks sufficient resistance to withstand JPEG compression and median filtering attacks. Developing a numerically embeddable watermarking scheme that survives JPEG compression and median filtering is a topic worthy of further study.

Conflicts of Interest:
The authors declare no conflict of interest.