Spatial-Perceptual Embedding with Robust Just Noticeable Difference Model for Color Image Watermarking

In the robust image watermarking framework, watermarks are usually embedded in the direct current (DC) coefficients in discrete cosine transform (DCT) domain, since the DC coefficients have a larger perceptual capacity than any alternating current (AC) coefficients. However, DC coefficients are also excluded from watermark embedding with the consideration of avoiding block artifacts in watermarked images. Studies on human vision suggest that perceptual characteristics can achieve better image fidelity. With this perspective, we propose a novel spatial–perceptual embedding for a color image watermarking algorithm that includes the robust just-noticeable difference (JND) guidance. The logarithmic transform function is used for quantization embedding. Meanwhile, an adaptive quantization step is modeled by incorporating the partial AC coefficients. The novelty and effectiveness of the proposed framework are supported by JND perceptual guidance for spatial pixels. Experiments validate that the proposed watermarking algorithm produces a significantly better performance.


Introduction
Nowadays, with the rapid development of information technology, the storage, replication, and dissemination of digital multimedia have become easier. Multimedia copyright protection has become a problem that people attach great importance to. As a branch of information-hiding technology, digital watermarking has played a significant role in a wide range of applications including content protection, digital rights management (DRM) and media system monitoring, [1][2][3] etc. Digital watermarking technology is to protect copyright by hiding the watermark information in the original digital information. In recent years, digital watermarking, especially image watermarking, has been a wide concern of researchers in the field of data hiding [4][5][6].
Digital watermarking technology, especially robust watermarking technology, has been widely used and developed for more than twenty years [7,8]. According to the processing domain of the original image, the existing robust watermarking techniques can be divided into two categories: spatial domain watermarking and transform domain watermarking. The spatial domain watermarking achieves the purpose of embedding watermark information by directly modifying the image pixel values. The transform domain watermarking algorithm converts the image to the frequency domain space, and embeds the watermark information by modifying the frequency domain coefficients. In general, the image watermarking algorithms are implemented in the new domain by transform

JND Modeling
JND, which refers to the minimum visibility threshold of the HVS, is determined by the underlying physiological and psychophysical mechanisms. The existing JND models are divided into two categories according to the calculated domain for the JND threshold. One category is the pixel-wise domain, which can directly calculate the JND threshold for an image pixel [22,33]. Another category is the subband-domain, for example, the DCT domain [21,24].
It is noted that Contrast Sensitivity Function (CSF), Luminance Adaptation (LA) and Contrast Masking (CM) are important contributing factors for JND in images. The pixel domain JND models are obtained directly in the pixel domain and only consider LA and CM effects. As the HVS highly depends upon contrast sensitivity, existing DCT-based JND models estimate contrast sensitivity using the CSF in the DCT domain. Different from the pixel domain JND models, the DCT domain JND model which has spatial CSF is more consistent with HVS than the JND model in the pixel domain. In addition, the DCT-based JND model is more robust when applied to the proposed watermarking algorithm than the pixel-based JND model, which can be clearly seen in Section 2.1.4. Recently, a sophisticated perceptual DCT-based JND model [24], which built a regularity-based CM factor, is proposed. The corresponding product form for k-th block with size 8 × 8 can be expressed as where T k JND (x, y) is the JND threshold for the k th block, T k base (x, y) is the base threshold based on the spatial CSF for k-th block, the modulation factor is LA factor F k LA (x, y) and the CM factor F k CM (x, y). x, y are the indices (x = 0, 1, 2, ..., 7, y = 0, 1, 2, ..., 7) in a block.

The Baseline CSF Threshold
In addition to considering the spatial frequency effect for CSF value, we also need to consider several factors such as the oblique effect factor and the spatial summation effect factor [34]. Thus, for k-th DCT block, the corresponding baseline sensitivity can be given as where s is a parameter is set to 0.25 in [34], G is the number of gray levels which set to 255, L max and L min are the display luminance values corresponding to the maximum and minimum gray levels.
∂ x and ∂ y are DCT normalization factors which can be obtained by Equations (3) and (4) A (ω) is the CSF in [35] which can be expressed as where ω (cycles/degree) is the spatial frequency and the empirical constants a, b, c and d are set to 2.6, 0.0192, 0.114 and 1.1 [24], respectively. The CSF with the spatial is shown in Figure 1. The spatial CSF indicates that the sensitivity of the HVS for spatial frequency values, the pixel-based JND, cannot incorporate CSF because spatial CSF can be estimated only in the frequency domain. The DCT-based JND model can incorporate the spatial CSF effect of the HVS into modeling JND, thus they usually show better performance than pixel-domain JND models. Figure 1. The spatial Contrast Sensitivity Function (CSF) curve with the spatial frequency (cycles/degree) [35].
For the (x, y) sub-band in the DCT block, the corresponding frequency ω (x, y) can be calculated by indicating the horizontal/vertical length of a pixel in degrees of visual angle where M is the size of non-overlapped DCT block (M is 8 in this case). θ is the ratio of the viewing distance to the screen height, and h is the number of pixels in the screen height. The term 1/ r + (1 − r) · cos 2 φ x,y refers the oblique effect, and r is set to 0.6 in [36]. φ x,y represents the direction angle of the corresponding DCT component, which is expressed as

Luminance Adaptation
The luminance masking threshold usually depends on the background brightness of a local region; the brighter the background is, the higher the masking value is. The luminance modulation factor F LA which refers to the paper of [21], is calculated as where µ p is the average intensity value of the block.

Contrast Masking
As we all know, when the human visual system observes image blocks, different visual masking effects often appear according to the type of image block. Therefore, it is necessary to classify image blocks reasonably and calculate their contrast-masking effect. We adopted three AC coefficients to simply divide the image blocks into three types of image blocks-smooth, edge, and texture-and this will be shown in detail in Section 3.3. Combined with [24] for using DCT coefficients to classify the direction of image blocks, 11 types of image blocks were obtained. Since the HVS has different sensitivities with different regions, for different kinds of block type, the contrast masking values are also variant according to the sensitivity of the HVS. Three masking patterns for the direction of image blocks in [24] are given as where the constant β is set to 0.1. The extent of contrast masking effect [24] is measured as where the constants 1, 2 and 3 represent that different block types have different influence weights for the contrast-masking evaluation. The constant 0.75 represent the sensitivity of the HVS for high-frequency information, and the constant 0.25 refect sensitivity of the HVS for low-frequency information. These values are determined in [24]. At the same time, the influence of in-band masking and inter-band masking on contrast masking is considered; the final contrast masking factor [37] can be calculated by f or x 2 + y 2 ≤ 16 in smooth and orderly − edge block where D k (x, y) is the (x,y)-th DCT coefficient in the k-th block. Then, the σ is set to 0.36 [37].

Robust JND Model
With all the considerations mentioned above, a sophisticated perceptual JND model is obtained which not only perceives a variety of pixel changes in spatial block [33], but also keeps the partial AC coefficients invariant with watermark embedding. Because block classification is operated on partial AC values, if the JND model in the pixel domain is applied to the watermarking algorithm, the block classification will be inaccurate when extracting watermark information. Although this is also the case with the JND model using the DCT domain, the corresponding AC coefficient values in the T k JND (x, y) can be set to zero, and this can compensate for the robustness. Therefore, the JND threshold for a DCT subband in (15) is improved as

Watermark Embedding Scheme
Because of the high correlation between the three channels in the RGB space, there is a lot of redundant information between the color channels, so the YCbCr space is used to embed the watermark information. For a RGB color image, we convert it from the RGB color space into the YCrCb color space. The Y channel, which consists of HVS for embedding watermark information, is selected. Different from Su's [18] embedding algorithm, we use the logarithmic DM embedding algorithm. The flow diagram of watermarking scheme is given in Figure 2. Firstly, the RGB color image I is converted into a YCbCr image by Equation (16)  It has been demonstrated that DC component can be used to place watermarks for better robustness of the watermarking scheme [17]. Since the common image-processing procedures, where watermarked images may attacked, such as noise, compression and filtering, can change DC components less than AC components, the Y channel is divided into no-overlapped blocks of size 8 × 8. For the k-th block, the DC coefficient A k can be obtained by DCT. DCT, as a compression method in JPEG standard, has been widely used in watermarking schemes. Generally, an image in the spatial domain can be converted into the DCT domain by 2D block DCT, and the image in the DCT domain also can be inverted to the original image by inverse 2D DCT. For the k-th image block of the image, R k (i, j) (i = 0, 1, 2, ..., 7, j = 0, 1, 2, ..., 7), 2D DCT is given as follows where x and y are the horizontal and vertical frequency (x = 0, 1, 2, ..., 7, y = 0, 1, 2, ..., 7), ∂ x and ∂ y are DCT normalization factors which can be obtained by Equations (3) and (4), Then, A k can be transformed according to the following novel logarithmic function as follows where A k is the DC coefficient of the k-th image block, C 0 is the mean intensity of the whole image, and µ is a parameter which is set as 0.02 here. The transformed signal Y k is then used for watermark embedding by dither modulation (DM) according to the watermark bit m, as follows where ∆ k is the adaptive quantization step for k-th image block, and d m is the dither signal corresponding to the watermark bit m.
The watermarked DC coefficient A k w is obtained by applying inverse transform to the quantized data Y k w , as follows Thus, the modification E k in Equation (21), which is obtained by logarithmic DM for the k-th block, represents the energy of watermark information, where A k is the k-th image block's DC coefficient, A k w is the watermarked DC coeffiecient with modificication.
The sum modification M · E k for the k-th block can distribute the energy of embedded data over the spatial block [18]. In [18], the pixel value has the same modification amount in each block, and it ignores the perception of HVS for different pixel values. The smoothness, edges, and texture areas of the image are not well considered, and there is no good correlation with HVS, which inevitably leads to distortion in the spatial domain. JND gives us a promising way to guide the pixel changes. Consequently, the perceptual pixel changes are more consistent with HVS, and the watermarked pixel is obtained according to JND perceptual instead of uniform changes in the individual pixel. According to the inverse transformation from the DCT domain to the pixel domain, we can allocate the energy onto each pixel with this cross-domain JND operation. Consequently, M · E k can be distributed over all pixels in the k-th block with the guidance of the cross-domain JND in Equation (22) where R k * (i, j) is pixel of the watermarked image block in spatial after the embedding watermark, R k (i, j) is the original pixel of k-th image block, (i, j) are the spatial indices (i = 0, 1, 2, ..., 7, j = 0, 1, 2, ..., 7) in a block. sum IDCT T k * JND (i, j) is the total JND threshold of the k-th block. Repeating the same operation for the non-overlapped blocks, the watermark information is embedded in the Y channel. Then, the YCbCr watermarked image transforms to RGB color watermarked image I w as follows The main steps of the watermark embedding scheme can be described as Algorithm 1 showed.

Algorithm 1 Watermark Embedding
Input: The host image, I; Watermark message, m; Output: Watermarked image I w ; 1: The RGB color image is transformed to YCbCr color space by Equation (16). The Y channel is regarded as the watermark embedding channel; 2: Divide the Y channel image into 8 × 8 non-overlapped blocks, and perform DCT transform for each block; 3: for all blocks do 4: Use the three AC coefficients to obtain the adaptive quantization step by Equations (26)-(29); 5: Estimate the perceptual JND factors including the spatial CSF effect, luminance adaptation and contrast masking by Equations (2), (9) and (14), respectively; 6: The final robust JND model can be calculated by Equation (15); 7: Obtain the DC coefficient by Equation (17). One part of the watermark message m is embedded into the DC coefficient by Equations (18) Generate the modified block B * ; 10: end for 11: Generate the watermarked image Y by collecting all the modified blocks B * ; 12: Generate the watermarked color image by concatenating the modified Y with the Cb and Cr image channel and then convert the color space from YCbCr to RGB by Equation (23); 13: return Watermarked image I w ;

Watermark Extracting Scheme
Firstly, the RGB color watermarked image I w is converted into a YCbCr image by Equation (16), and the watermarked channel Y is divided into non-overlapped 8 × 8 pixel blocks. In the extraction process, for the k-th block, the DC coefficient A k w can be obtained by Equation (17). Then, the DC coefficient is transformed to signal Y k w by logarithmic function where A k w is the DC coefficient of k-th watermarked block. C w0 is the mean intensity of the whole image, µ is a parameter which is set as 0.02 here. Then, the watermark can be detected from the transformed signal Y k w according to the minimum distance as follows where m is the extracted watermark information of the block, ∆ is the adaptive quantization step, d m is the dither signal corresponding to the watermark bit m. Repeating the same operation for the non-overlapped blocks, the watermark information is extracted. The main steps of the watermark extracting scheme can be described as Algorithm 2 showed.

Algorithm 2 Watermark Extracting
Input: The received watermarked image, I w ; Output: Watermark message m ; 1: Transform the color image from RGB color space to YCbCr color space by Equation (16). Select the channel Y as the main host image; 2: Divide the host image into 8 × 8 non-overlapped blocks; 3: for all blocks do 4: Use the three AC coefficients to obtain the adaptive quantization step by Equations (26)-(29); 5: Obtain the DC coefficient by DCT and transform by logarithmic function by Equation (24); 6: One part of the watermark message is extracted by Equation (25);

Adaptive Quantization Step
Human eyes are usually very sensitive to the distortion in the smooth area or around the edge, so the quantization step should be weighted in the smooth and edge blocks. The Canny operator is usually used to classify image blocks, but in the watermarking system, the classification of blocks in the watermarking image is different from that of host image blocks without attack. Based on the above considerations, an adaptive quantization step for different types of block can be introduced. Here, the image block types are measured by partial AC coefficients. There are three AC coefficients selected for the edge strength, and the corresponding edge strength can be used to calculate the edge density value to classify the image blocks accurately, as in [26]. Thus, the new edge strength of a block is defined as where B k (x, y) represents the AC coefficient which can be calculated by Equation (17). Then, the new edge density can be given as where c is set to 10 −8 to avoid the case where the denominator equals zero. The normalized processing in Equation (27) can resist the volumetric attack, as it can remain unchanged with this attack. In this regard, the image block types are defined as Therefore, we can obtain the adaptive quantization steps by different block types. The adjustment equation with different masking weight for k-th image block can be expressed as where ∆ k is the k-th image block's fixed quantization step which controls the image quality; the new weighted ∆ k is the adaptive quantization step for k-th image block. The different weight factors not only expresses better perceptual quality but also make the quantization step robust to the volumetric attack.

Experiments and Results Analysis
In this section, we give and discuss the experiment's results. To prove the performance of the proposed watermarking scheme, we perform experiments using the original code in MATLAB R2016b on a 64-bit Windows 10 operating system at 16 GB memory, 3.20 GHz frequency of Intel processors.
In this experiment, the 24-bit color RGB images with the size of 512 × 512 are selected as the host images from the CVG-UGR [38] image database, as shown in Figure 3. The original watermark is the binary image with the size 64 × 64, as shown in Figure 4.
To evaluate the imperceptibility of the proposed method, the peak signal-to-noise ratio (PSNR) and the visual-saliency-based index (VSI) are utilized as the performance metrics. PSNR in Equation (29) is used to measure the similarity between the host image and watermarked image PSNR = 10 log 10 where MSE is the mean square error for color original image and watermarked image.
As an image quality assessment metric, the VSI presents good visual performance to measure the image quality between an original image and a distortion image. Thus, we used VSI for the original image and watermarked image.
where V 1 , V 2 represent the VS map extracted from original images I and watermarked image I W ; S is the local similarity of I and I W , as described in [39].
To test the robustness of watermarking method, the bit error rate (BER) is computed for comparison. The BER close to 0 proves that the watermarking algorithm is robust to attack. Then, the equation is as follows where m is the original watermark and m is the extracted watermark; Area is the size of the watermark image.

Comparison with Different JND Models within Watermarking Framework
To evaluate the robustness performance of our proposed robust JND model in the watermarking framework, the existing JND models-Bae's model [21], and Zhang's model [40]-were used to guide the spatial perceptual embedding for the watermarking comparisons. These two JND models also underwenr some AC coefficient zeroing and cross-domain operations. Testing images were standard color images with a dimension of 512 × 512, as shown in Figure 3. The watermark information was embedded into the Y channel. A binary watermark with the size 64 × 64 was used to embed into the cover images, as shown in Figure 4. The watermarked image quality was fixed to PSNR = 42 dB; we tested the robustness of different JND models within the watermarking framework.
To compare the robustness performance of the proposed JND and the other DCT domain JND models within the watermarking framework, different kinds of attacks such as Gaussian Noise (GN) with mean zero and different variance, JPEG compression, where the JPEG quality factor varies from 40 to 50, Salt and Pepper noise (SPN) with different factors, Rotain (RO) 60 • , and Gaussian filter (GF) with 3 × 3 window were used to evaluate the robustness of the proposed JND model. Table 1 shows the average BER values of watermarked images attacked by Gaussian Noise, JPEG compression attacks, Salt and Pepper noise, Rotation and Gaussian filter. When the watermarked images were attacked by Gaussian Noise attacks and Salt and Pepper noise attacks, the BER of the proposed JND model was significantly lower than the other two JND models. This indicates that the proposed JND model performed much better for noise attacks. As shown in Table 1, when the JPEG compression quality is 40, the comparison BER values of the proposed JND model are about 0.87%. When the watermarked images are contaminated by Rotation 60 • , the average BER values of three JND models are presented in Table 1. It can be clearly seen that the proposed model always has the lowest BER. The watermarked image was also attacked by Gaussian filter with 3 × 3 window; the BER obtained by our proposed method did not exceed 3.5%. Above all, the watermarking frame based on the proposed JND model has an excellent robustness performance.

Imperceptibility Test for Watermarking Scheme
PSNR and VSI, which are consistent with HVS, are the object tests for watermarked images to test the invisibility of the proposed method. Because the visual quality of a watermarked image can be changed by the quantification steps, the PSNR value between the watermarked image and the original image was fixed at 42 dB. The VSI and BER values can be obtained from our proposed method in Figure 5. As we all know, the closer the VSI value to 1, the better the image quality. When VSI = 0.9920, the difference between the original image and the watermark image cannot be seen by the HVS. In Figure 5, the VSI values from our method are all closer to 1, which has better visual quality. Moreover, it can be seen from the value of BER that the watermark can be entirely extracted under without any attacks by our method.

Comparison with Individual Quantization-Based Watermarking Methods
It can be seen that the proposed image watermarking method has good visual perceptual quality with HVS in Figure 5. Moreover, robustness is an important metric for the watermarking scheme. In this section, in order to show the robustness of the proposed method, some basic attacks are selected to test the robustness, and we compared it with other quantization-based methods: [26,32,41,42]. To ensure the fairness of the experiment, the Y channel of the color image was used as the embedding position, and we fixed the watermarked image quality with PSNR = 42 dB by adjusting the quantization steps. To better present the robustness of the proposed method, we show the average BER values of the testing watermarked images under some common attacks.
Adding noise is the most common image processing technology to verify robustness. In this paper, we chose Gaussian noise attack and Salt and Pepper noise attack as noise attacks to test the robustness of the proposed method. A Gaussian white noise with a mean of 0 and different standard deviations was used to attack the watermarked image. In Table 2, the average BER values can be given for Gaussian noise with different factors. It can be clearly seen that our method has lower BER values when attacked by Gaussian noise. Table 2 shows the average BER values after adding the Salt and Pepper noise with a different quantity. Although the BER of our method is higher than that of [26] when the factor is 0.0005, the BER is only about 0.05% higher than that of [26]. Comparing the experimental results of other noise factors, the average BER of our method is lower than [26,32,41,42]. As can be seen from Table 2, the proposed scheme had good robustness against the noise attacks and exhibited good performance. Since JPEG image is the most popular image format transmitted over the internet, in this experiment, the watermarked image was compressed by a JPEG compression factor from 30 to 100, increasing with Step 10. Table 3 lists the partial experimental results of watermarked images with compression factors from 40 to 60, respectively. When the compression factor is 40, the bit error rate obtained by our method is about 2.5% lower than that obtained by [26]. Thus, our method has better performance compared to others against the JPEG compression attacks. Filtering attack is one of the classical attacks for the watermarked image. Since the watermark can be removed from the watermarked image by the filter, we selected the Gaussian filtering (GF) and the Median filtering (MF) as filtering attacks: the robust to Gaussian filtering attack and Median filtering attack with window size 3 × 3. Table 4 shows the average experiment results of by the GF attack and MF attack. In total, the proposed method has the best robustness performance under filtering attacks. The watermarked image is first rotated clockwise by a certain number of degrees, and then counter-clockwise by the same number of degrees. In this experiment, the watermarked image is first rotated 15 • , 30 • , 60 • clockwise, and then restored to its original shape counter-clockwise. Table 5 shows the average BER values at different rotation angles, and the average BER values do not exceed 0.6% in our method, which proves that our method has better performance. Generally, the watermarked image will be contaminated by multiple attacks. We further compared some combined attacks, the same as in Table 6. The testing images were attacked by JPEG quality = 50, and then attacked by the Gaussian noise. It can be noted that our method shows better robustness performance.

Comparison with Spatial-Uniform Embedding-Based Watermarking Methods
In order to compare the proposed scheme with the existing spatial uniform embedding for transform domain quantization watermarking, [18,19], the binary watermark with the size 32 × 32 is used. In [18], a simple preprocessing of a 32 × 32 binary watermark is proposed to obtain the actual 64 × 64 bits, which are embedded into 64 × 64 blocks. Another selective mechanism is used in [19], where 32 × 32 blocks are adaptively selected from the original 128 × 128 blocks. In this section, we compare the proposed scheme with [18,19]. To ensure the fairness of the experiment, the proposed method compares other existing image watermarkings- [18]-with the same image quality, and the Y channel of the color image is utilized to embed the watermark. The same as in [18], an optimum sub-watermark can be obtained by combining foue sub-watermarks here. The BER was computed to make an objective performance comparison.
To test the robustness of the proposed method, watermarked images are attacked by some common image processing operations (such as adding noise, JPEG compression, filtering attacks, and combined attack) and geometrical distortions (such as scaling and rotation). We test watermarked images by different attacks with different factors. Table 7 lists the partial average BER values for various image attacks with fixed image quality, PSNR = 42 dB. The BER of Gaussian noise (GN) with variance 0.0025 and the BER of Salt and Pepper noise (SPN) with the noise quantity 0.0025 are given in Table 7. It can be clearly seen that our method produces a lower BER than others for noise attack. For JPEG compression, we show the results of JPEG compression with JPEG factor = 30 in Table 7. In the filtering attack, our method and [18] have good robustness performance. The scaling operation, as an image geometric attack, is often used in image processing. At first, the watermarked image is scaled down from 25% to 200% with an increment size of 25%, respectively. We give the average BER when the watermarked image is reduced by 25% and then restored to the original size. For volumetric attack (VA), we only give the BER after the factor with 0.5. It can be clearly seen that [18,19] can not resist this image processing operation. We give the experimental result of rotating (RO) the watermarked image by 25 • . The watermarked image is first rotated clockwise by 25 • , and then counter-clockwise by the same degree. The BER obtained by our method is about 6% lower than that obtained by [19]. Moreover, the watermarked images also will be attacked by the combined attacks. The BER does not exceed 0.8% under JPEG = 50 and GN with 0.002 (J+GN) in our method. Thus, the robustness performance of our method is obviously better than [18,19].

Conclusions
This paper proposes a new spatial-perceptual embedding with the guidance of the robust JND model for color image watermarking. The DC coefficient of the block is used as the cover coefficient for watermark embedding, and the DC coefficients are quantized by DM with an adaptive quantization step obtained from partial AC coefficients. More importantly, the pixel modification amount is modified by the inverse DCT-based JND thresholds, so that the watermarked image is more consistent with the perceptual characteristics of HVS. Experimental results have demonstrated that the proposed scheme is robust against common image processing attacks, such as noise addition, compression and volumetric attack. However, the used JND model only considers the Y channel information, without considering the color information. Thus, an improved JND model with color information can be proposed in future work. In the future, we will consider to realize the pixel updating in a watermarking framework with other image enhancement tasks.