Trace Concealment Histogram-Shifting-Based Reversible Data Hiding with Improved Skipping Embedding and High-Precision Edge Predictor (ChinaMFS 2022)

: Reversible data hiding (RDH) is a special class of steganography, in which the cover image can be perfectly recovered upon the extraction of the secret data. However, most image-based RDH schemes focus on improving capacity–distortion performance. In this paper, we propose a novel RDH scheme which not only effectively conceals the traces left by HS but also improves capacity–distortion performance. First, high-precision edge predictor LS-ET (Least Square predictor with Edge Type) is proposed, and the predictor divides pixels into ﬁve types, i.e., weak edge, horizontal edge, vertical edge, positive diagonal edge, and negative diagonal edge. Different types of target pixels utilize different training pixels with stronger local consistency to improve accuracy. Then, a novel prediction-based histogram-shifting (HS) framework is designed to conceal embedding traces in the stego images. Finally, we improve both the data-coding method and the skipping embedding strategy to improve the image quality. Experimental results demonstrate that the capacity–distortion performance of the proposed scheme outperforms the other trace concealment schemes and is comparable to the state-of-the-art schemes utilizing sorting technique, multiple histogram modiﬁcation, and excellent LS-based predictors. Moreover, it can conceal the embedding traces left by the traditional HS schemes to a certain extent, reducing the risk of being steganalyzed.


Introduction
Data hiding realizes copyright protection and content authentication by hiding secret data into digital media covers, which is an important technique in the field of information security.Due to the requirement of high fidelity of the media cover on some occasions, any little distortion is intolerable, and reversible data hiding (RDH) technology comes into being.RDH offers the ability of the exact recovery of both secret message and original cover media without any distortion.Because of this feature of RDH, many algorithms have been proposed in the last few decades.Existing RDH algorithms based on image space domain mainly include lossless compression, difference extension (DE), and histogram shifting (HS).
The RDH algorithm based on HS was originally proposed by Ni et al. [1], which created embedding space by manipulating the image histograms.However, the embedding capacity depended on the number of pixels belonging to the peak bin in the histogram.A few years later, the idea of prediction was used in HS algorithms to increase the embedding capacity.In this case, the prediction errors histogram is used, which is sharply distributed and peaks near zero.The algorithm in [2] used median edge detector (MED) and modified prediction error values of 0 and −1 in order to embed data.
Generally, the accuracy of the predictor affects the embedding capacity and the quality of the stego image, and many predictors have been proposed in recent years.Gradient adjusted predictor (GAP) was designed by Fallahpour [3]; it used more neighboring pixels to obtain the prediction values, so it was more suitable for images with complex textures.An improvement over this algorithm was proposed by Coltuc [4]; a simplified GAP (SGAP) was designed to reduce the time complexity.Sachnev et al. [5] designed a rhombus predictor, calculating the average of the four neighboring pixels as the prediction values, which showed the highest prediction accuracy among all fixed predictors.
However, the weights of the above-mentioned predictors are fixed.In recent years, adaptive predictors have been proposed to improve the capacity-distortion performance of HS algorithms.Dragoi and Coltuc [6] proposed an HS algorithm on the basis of local prediction (LP).The prediction value was weighted on a square block centered on the target pixel, and weights were adaptively calculated through the least square (LS) method.Hwang et al. [7] proposed a predictor using least absolute shrinkage and selection operator (LASSO).The LASSO predictor penalized and removed the pixels in the prediction context that did not affect the target pixels.Wang et al. [8] used weighted prediction and watermarking simulation (LP-WPWS) to improve LP.It increased the number of trained pixels in LP predictor, divided the prediction context into watermarked region and original region, and applied different weights on the pixels in the two regions to further improve the accuracy of prediction, but it did not take into account the characteristics of the pixel itself in the prediction process.Wang et al. [9] proposed the ridge regression predictor to solve the overfitting problem of LS and was able to obtain smaller prediction error.However, it did not investigate the selection of appropriate training pixels and prediction context.
In addition to designing high precision predictors, another way to improve the ability of the HS algorithm is to investigate new embedding methods.Wang et al. [10] proposed a novel RDH general framework using multiple histogram shifting (MH_RDH), which formulated the rate allocation among multiple histograms as a rate distortion optimization and solved it using an evolutionary algorithms.Experimental results showed that the method could considerably increase the payload.Kim et al. [11] used a pair of extreme predictions to generate two skewed histograms.Only the pixels from the peak and the short tail were used for embedding in the skewed histogram, which decreased the distortion from the lesser number of pixels being shifted.Choi et al. [12] proposed a novel HS method by skipping some pixels between the peak and the zero, which decreased the shifting distortion.
Recently, some algorithms have increased the occurrence of '0' in the secret binary stream by coding secret data, thereby improving the performance of HS algorithms.Yang et al. [13] proposed a message sparse representation to decrease the embedding distortion.It would decrease the number of '1's in the binary stream.In 2020, Xie et al. [14] proposed a signed-digit representation, and Peng et al. [15] proposed a remainder-storagebased EMD (RSBEMD) method.Those representation methods would not cause message expansion, viz., the size of the message was enlarged.However, the coded message stream would appear as new bits, such as '−1', '2', etc., which would cause additional shifting distortion.
However, these algorithms would leave quite obvious traces after embedding, suffering the risks of exposing the data-hiding action.In particular, there is a steganalysis algorithm [16] specifically designed for HS.Attackers can easily detect the existence of secret data from stego images, then intercept those images and decipher secret data.Various previous works have also noticed the detectability issue of RDH [17,18].In 2020, Dong et al. [19] proposed a method that could effectively conceal the traces left by HS.However, it caused a serious loss of image quality.
In this work, an embedding trace concealment histogram-shifting-based reversible data-hiding algorithm is proposed.In addition, we propose a novel HS framework using the improved data-coding method and the improved skipping embedding strategy.The histograms of the stego images maintain a shape similar to that of the original histograms.Therefore, the embedding traces are concealing, to a certain extent.Moreover, we first propose a LS-ET (Least Square predictor with Edge Type), in which the local texture characteristic of the target pixel can be fully utilized.The predictor uses only half-context of the target pixel for the flexibility of the embedding method and low computation, but the prediction accuracy is comparable to the full-context predictors.Meanwhile, the algorithm has a good capacity-distortion performance.
The main contributions of this paper are as follows: (1) It proposes a novel HS framework, which combines the improved data coding method and the improved skipping embedding method.It aims to conceal the embedding traces, still enabling the stego image histograms to conform to the Laplacian distribution.
(2) In the framework, two distortions are considered in the HS-based data-hiding algorithm: expanding distortion and shifting distortion, resulting in the algorithm's good capacity-distortion performance.
(3) It proposes LS-ET to improve prediction accuracy.Unlike traditional LS-based predictors that use fixed neighbors as the training pixels, the proposed LS-ET predictor utilizes unfixed pixels with stronger local consistency, with the target pixel as training pixels.
(4) It improves the message sparse representation method; the improved data-coding method is contrary to the judgment condition of the original method, which can obtain higher stego image quality when secret data is in image form.
(5) It improves the skipping embedding method to achieve the aim of concealing the embedding traces by adaptively determining the embedding position and skipping position on the basis of the prediction error histogram and the given payload.
The remainder of this paper is organized as follows.Section 2 analyzes the embedding traces left by traditional histogram-shifting algorithms.In Section 3, the proposed LS-ET predictor is described in detail.The framework of the proposed schemes is given in Section 4. Section 5 presents the experimental results and comparisons, and Section 6 concludes this paper.

The Analysis of Embedding Traces Left by HS Algorithms
For prediction-based HS algorithms, the following steps achieve data hiding.First, calculate the prediction value p(i,j) for each pixel x(i,j) of the original image using a predictor.Then, compute the prediction error e(i,j) by e(i, j) = x(i, j) − p(i, j) Finally, the prediction error is modified according to the secret data, and the modified prediction error e'(i,j) is calculated by where data is 1 bit of data to be embedded in the secret binary stream.The prediction errors that belong to [a,b] are called expandable errors, which will be expanded to embed 1 secret data bit.The distortion produced is called expanding distortion.It can be seen that '1's in the secret binary stream are the cause of expanding distortion.Other prediction errors will be shifted to the left or the right to make room for expansion, and the caused distortion is called shifting distortion.
As a known fact, the prediction error can be effectively modeled with a Laplacian distribution with zero mean [20].Assume that the length of the secret data is the same as the number of expandable errors; Figure 1 gives an example of the HS method.As shown in Figure 1, the original histogram conforms to a Laplacian distribution, while the after-embedding histogram becomes irregular.There are some consecutive bins that are lower in height than the bins on either side of them in the after-embedding histogram.Such obvious embedding traces are easily noticed by attackers.In addition, the secret data is usually encrypted before embedding, and bits '0' and '1' are equally distributed.In this case, those adjacent bins with secret data have almost the same height, which is considered to be an embedding trace (this phenomenon is termed 'flat ground').With this trace, attackers can not only detect the existence of the secret data from the stego images but also can even decipher the content of secret data.Wang et al. [16] designed an effective steganalysis method for HS based on whether the histogram of the given image exhibits the 'flat ground' phenomenon.The detection accuracy of the steganalysis method at the payload = 0.1 bpp is up to 97.67%.In Section 4, we propose a novel HS framework to conceal the embedding trace mentioned above.
Mathematics 2022, 10, 4249 4 of 21 errors will be shifted to the left or the right to make room for expansion, and the caused distortion is called shifting distortion.As a known fact, the prediction error can be effectively modeled with a Laplacian distribution with zero mean [20].Assume that the length of the secret data is the same as the number of expandable errors; Figure 1 gives an example of the HS method.As shown in Figure 1, the original histogram conforms to a Laplacian distribution, while the afterembedding histogram becomes irregular.There are some consecutive bins that are lower in height than the bins on either side of them in the after-embedding histogram.Such obvious embedding traces are easily noticed by attackers.In addition, the secret data is usually encrypted before embedding, and bits '0' and '1' are equally distributed.In this case, those adjacent bins with secret data have almost the same height, which is considered to be an embedding trace (this phenomenon is termed 'flat ground').With this trace, attackers can not only detect the existence of the secret data from the stego images but also can even decipher the content of secret data.Wang et al. [16] designed an effective steganalysis method for HS based on whether the histogram of the given image exhibits the 'flat ground' phenomenon.The detection accuracy of the steganalysis method at the payload = 0.1 bpp is up to 97.67%.In Section 4, we propose a novel HS framework to conceal the embedding trace mentioned above.

Proposed LS-ET Predictor
The least square predictor (LS) is an adaptive prediction technique that exploits local image patterns.The predictor employed an adapted linear combination of the prediction coefficients on the basis of training set of causal neighbors in order to compute the prediction value using where w(k) is the corresponding weights of neighbors.If we use the previous m causal pixels for the adaptation of the k-th order predictor coefficients, the vector of adapted coefficients w is calculated by solving the overdetermined linear system of equations using 1 ( ) where the size of the training matrix A is m × k, and y is an m × 1 vector containing the previous m causal pixels.
Traditional LS-based predictors use fixed neighbors as the training pixels.Figure 2 shows the training pixels used by LP [6] and ridge regression predictor [9], and the shadow pixels are the training pixels of the target pixel x(i,j).Training pixels are fixed, which means that they are not fully utilizing the target pixel's characteristics.

Proposed LS-ET Predictor
The least square predictor (LS) is an adaptive prediction technique that exploits local image patterns.The predictor employed an adapted linear combination of the prediction coefficients on the basis of training set of causal neighbors in order to compute the prediction value using where w(k) is the corresponding weights of neighbors.If we use the previous m causal pixels for the adaptation of the k-th order predictor coefficients, the vector of adapted coefficients w is calculated by solving the overdetermined linear system of equations using where the size of the training matrix A is m × k, and y is an m × 1 vector containing the previous m causal pixels.
Traditional LS-based predictors use fixed neighbors as the training pixels.Figure 2 shows the training pixels used by LP [6] and ridge regression predictor [9], and the shadow pixels are the training pixels of the target pixel x(i,j).Training pixels are fixed, which means that they are not fully utilizing the target pixel's characteristics.According to the prediction rule of the LS predictor, the prediction accuracy depends on the applicability of the trained prediction weights against the target pixel.Therefore, the training pixel should have strong local consistency with the target pixel.The proposed LS-ET predictor uses different training pixels based on the characteristic of the target pixel.
Firstly, calculate four direction gradients of the target pixel x(i,j): horizontal gradient dh, vertical gradient dv, positive diagonal gradient dpd, and negative diagonal gradient dnp by Equations ( 5)-( 8): 1 5 0.5 1 4 0.5 ( 2, 2) ( 1, 1) 0.5 ( 3, 3) ( 2, 2) ( 2, 1) ( 1, ) ( 1, 2) ( , 1) 1 3 0.5 According to the four directional gradients of the target pixel x(i,j), the pixel is divided into five types, namely weak edge, horizontal edge, vertical edge, positive diagonal edge, and negative diagonal edge, by Equations ( 9) and (10), where Th is the threshold for dividing types, which is discussed in Section 5.  d d d d  d   d d d d  d   d d d d  d   d d d d d In LS-ET, different training pixels are selected according to the gradient values, which have similar edge characteristics with the target pixel.
Consider a B × B block centered on the target pixel as the prediction block.The optimal B will be determined by experiments, which will be presented in Section 5.1.In the following, consider B = 7 as an example.According to the prediction rule of the LS predictor, the prediction accuracy depends on the applicability of the trained prediction weights against the target pixel.Therefore, the training pixel should have strong local consistency with the target pixel.The proposed LS-ET predictor uses different training pixels based on the characteristic of the target pixel.
Firstly, calculate four direction gradients of the target pixel x(i,j): horizontal gradient d h , vertical gradient d v , positive diagonal gradient d pd , and negative diagonal gradient d np by Equations ( 5)-( 8): According to the four directional gradients of the target pixel x(i,j), the pixel is divided into five types, namely weak edge, horizontal edge, vertical edge, positive diagonal edge, and negative diagonal edge, by Equations ( 9) and (10), where Th is the threshold for dividing types, which is discussed in Section 5.1.
In LS-ET, different training pixels are selected according to the gradient values, which have similar edge characteristics with the target pixel.
Consider a B × B block centered on the target pixel as the prediction block.The optimal B will be determined by experiments, which will be presented in Section 5.1.In the following, consider B = 7 as an example.
Type I: The target pixel belongs to the weak edge; then, in the B × B prediction block, select the top left B×B 2 neighboring pixels of the target pixel as the training pixels.
Type II: The target pixel belongs to the horizontal edge; then, in the B × B prediction block, select the left-side horizontal direction B 2 neighboring pixels of the target pixel as the training pixels.
Type III: The target pixel belongs to the vertical edge; then, in the B × B prediction block, select B 2 neighboring pixels located vertically above the target pixel as the training pixels.
Type IV: The target pixel belongs to the positive diagonal edge; then, in the B × B prediction block, select B 2 neighboring pixels in the positive diagonal direction above the target pixel as the training pixels.
Type V: The target pixel belongs to the negative diagonal edge; then, in the B × B prediction block, select B 2 neighboring pixels in the negative diagonal direction above the target pixel as the training pixels.
Limited to the skipping embedding method used in this paper-which needs to generate the entire image's prediction error histogram first to determine the skipping position and then perform the embedding operation-our scheme cannot embed in a pixel immediately after predicting it, as in other schemes.This means that the prediction context cannot contain pixels of embedded data in the prediction process.Similarly, in the extraction and recovery process, the prediction context also needs to be the original pixel, so as to ensure the complete extraction of the data and the lossless recovery of the image.Therefore, the predictor uses only half-context as the prediction context, namely x(i -1,j -1), x(i -1,j), x(i -1,j + 1), and x(i,j -1), not 8 neighboring (full-context).
Figure 3a-e respectively illustrate the training pixel and prediction context of target pixel which belongs to the weak edge, horizontal edge, vertical edge, positive diagonal edge, and negative diagonal edge.The shadow pixels are the training pixels of the target pixel, and the 4 neighboring pixels in the green border are the prediction context of the target pixel.
Matrix Y is the training matrix composed of training pixels.Using Equation (11), the predicted matrix P Y of Y is obtained: In Equation ( 11), C Y is the prediction matrix, which is composed of the prediction context of all training pixels, and W is the weight matrix.
For example, if the target pixel x(i,j) belongs to the horizontal edge, the training matrix Y and the prediction matrix C Y are as shown in Equations ( 12) and ( 13): The optimized weight matrix W is obtained by least square approximation, and the loss function of the least square method is shown in Equation ( 14): Calculate the predicted value p(i,j) using the weight matrix W and the prediction context matrix C, as shown in Equation (15): where the prediction context matrix C consists of 4 neighboring pixels and the constant 1, as shown in Equation ( 16): and then perform the embedding operation-our scheme cannot embed in a pixel immediately after predicting it, as in other schemes.This means that the prediction context cannot contain pixels of embedded data in the prediction process.Similarly, in the extraction and recovery process, the prediction context also needs to be the original pixel, so as to ensure the complete extraction of the data and the lossless recovery of the image.Therefore, the predictor uses only half-context as the prediction context, namely x(i − 1,j − 1), x(i − 1,j), x(i − 1,j + 1), and x(i,j − 1), not 8 neighboring (full-context).
In Equation ( 11), C Y is the prediction matrix, which is composed of the prediction context of all training pixels, and W is the weight matrix.

Proposed Framework for Concealing Embedding Traces
Almost all state-of-the-art prediction-based HS algorithms focus only on improving the capacity-distortion performance.However, the obvious embedding traces are left in the after-embedding histogram.Thus, we propose an HS framework toward concealing the embedding traces by keeping the after-embedding histogram in a similar shape to the original histogram.The framework combines the improved data-coding method and the improved skipping embedding strategy.It can flexibly select the predictors and encoding methods to obtain the high quality stego images, while reducing the risk of being detected by attackers.The detailed process is as follows.

Data Hiding
The data-hiding process consists of three parts: (1) prediction based on proposed LS-ET; (2) improved data coding; and (3) improved histogram-shifting embedding with skipping.Figure 4 illustrates the data-hiding process of the framework.by attackers.The detailed process is as follows.

Data Hiding
The data-hiding process consists of three parts: (1) prediction based on proposed LS-ET; (2) improved data coding; and (3) improved histogram-shifting embedding with skipping.Figure 4 illustrates the data-hiding process of the framework.

Prediction
Scan the pixels in a raster scan order and calculate the predicted value p by the proposed LS-ET predictor.Any prediction method can be used in this section.We use the proposed LS-ET predictor in the subsequent experiments to improve prediction accuracy.
Compute the prediction error by: Then, generate a prediction error histogram of the original image.

Improved Data Coding
Yang et al. [13] proposed a message sparse representation method to reduce the number of '1's in the secret data, which will lead to the expanding distortion.We make a simple improvement to the sparse representation for higher image quality.The detailed procedure for encoding the data is as follows.
Set the parameter r (r ≥ 0) to represent the sparse rate.The coding rate is defined as where L is the length of the original data, CL is the length of the sparse representation data, i.e., coded data.Experimental results show that the parameter r is inversely proportional to coding rate R. Initialize pos1 = 0 and pos2 = 0. pos1 is used to record the last cover symbol that has been coded, and pos2 is used to record the number of message bits that have been coded.

Prediction
Scan the pixels in a raster scan order and calculate the predicted value p by the proposed LS-ET predictor.Any prediction method can be used in this section.We use the proposed LS-ET predictor in the subsequent experiments to improve prediction accuracy.
Compute the prediction error by: Then, generate a prediction error histogram of the original image.

Improved Data Coding
Yang et al. [13] proposed a message sparse representation method to reduce the number of '1's in the secret data, which will lead to the expanding distortion.We make a simple improvement to the sparse representation for higher image quality.The detailed procedure for encoding the data is as follows.
Set the parameter r (r ≥ 0) to represent the sparse rate.The coding rate is defined as where L is the length of the original data, CL is the length of the sparse representation data, i.e., coded data.Experimental results show that the parameter r is inversely proportional to coding rate R. Initialize pos1 = 0 and pos2 = 0. pos1 is used to record the last cover symbol that has been coded, and pos2 is used to record the number of message bits that have been coded.
It is observed that secret data may be in the form of images, with a high probability of a consecutive '0' or '1' in the data.If the content of Data(pos2 + 2:pos2 + r + 1) contains more '0's in Case 2, it means that the value of (Data(pos2 + 2:pos2 + r + 1)) dec is smaller, that is, the length of the coded data is shorter, and fewer pixels will be used for embedding data in the embedding process.If that happens, it is more likely that the value of Data(pos2 + 1) is 0. Therefore, we set the condition for Case2 to Data(pos2 + 1) = 0, which is contrary to the condition in the original coding method of [13].In Yang's method [13], if Data(pos2 + 1) = 0, execute the above Case 1; and if Data(pos2 + 1) = 1, execute Case 2. The experimental results in Section 5 demonstrate the effectiveness of this change.The improvement in this paper can obtain higher stego image quality where secret data is in image form.In addition, it does not affect the experimental results where the '1's and '0's are randomly distributed in the secret data.

Improved Skipping Embedding
Skipping embedding is a novel HS method that decreases the shifting distortion to improve image quality.The embedding process for Choi's method [12] is outlined as follows.Firstly, set the peak and minimum (zero) histogram indices as the (IdxP, IdxZ).Then, set the skipping index as the IdxSkip, shifting the pixels between [IdxP + IdxSkip + 1, IdxZ -1].Finally, modify the pixels of the IdxP position for the data hiding.If it embeds a 0, the pixel does not change; if it embeds a 1, the pixel of the IdxP is increased to IdxP + IdxSkip + 1.
Choi's method [12] considered the skipping position to obtain higher image quality, but it did not focus on the deformation of the histogram.In this paper, we improve the skipping embedding method with the aim of concealing the embedding traces of existing HS methods.
First, the embedding position and skipping position are determined in the prediction error histogram of the original image.
Search the embedding position e e starting from the sides and progressing to the middle of the histogram; stop searching when the following is satisfied: where h(x) denotes the number of pixels whose prediction errors value is x, e e denotes embedding position, i.e., the value of expandable error, and length (CData) is the length of the given payload.
Because the prediction error histogram conforms to the Laplacian distribution, that is, the number of prediction errors near both sides of the histogram is small, the number of prediction errors near the middle of the histogram will be large.The embedding position is searched from both sides to the middle of the histogram to shorten the distance between the embedding position and the skipping position and to decrease the expanding distortion in the subsequent embedding process.
Search the skipping position e s starting from the embedding position e e and progressing to one side of the histogram.If e e is negative, search to the left side of the histogram, otherwise, to the right side.Stop searching when the following is satisfied where length (CData_1) is the number of '1's in the given payload.
Then, modify the prediction error according to the coded data cdata by For the prediction errors which fulfill e(i,j) = e e , if the cdata is '0', no shifting; otherwise, shift errors to the skipping position.For the prediction errors that fulfill e(i,j) ≥ e s > e e or e(i,j) ≤ e s < e e , shift errors to the left or the right to make room.Repeat the above until the secret data is fully embedded.
Compared with the original skipping embedding method [12], which uses only the peak bin for embedding and the skipping position is fixed, the improved skipping embedding method carefully designs the embedding position and the skipping position selection.The embedding position and skipping position are determined adaptively according to the prediction error histogram and the given payload to achieve the aim of concealing the embedding traces, ensuring that the after-embedding histogram still maintains the general shape as the original histogram, avoiding the 'flat ground' phenomenon and, thus, being able to resist steganalysis to a certain extent, especially for those that are HS-based, while also focusing on reducing the distortion generated in the embedding process to improve image quality.
Figure 5 demonstrates the comparison of the traditional HS methods with the improved HS method that combines the improved data-coding method and the improved skipping embedding strategy.Compared with traditional HS schemes where all prediction errors between the embedding position and the zero bin need to be shifted, the prediction errors in the proposed scheme between the embedding position and the skipping position are not modified.Hence, the proposed scheme causes less shifting distortion.Because of the improved data coding, the number of prediction errors that are skipped is increased, which further highlights the advantage of skipping embedding in reduce shifting distortion.Meanwhile, the after-embedding histogram maintains a similar shape to the original histogram.With the predicted value p and the modified prediction error e , the stego pixel x can be obtained by: x (i, j) = p(i, j) + e (i, j) It is worth mentioning that there are several rows and columns outside the image as reference pixels in the prediction process, which are excluded from secret data embedding.In order to ensure blind extraction, side information is embedded into the first row by LSB replacement, and the least significant bits in the first few pixels are reserved and appended to the secret binary stream.Since overflows or underflows may occur after embedding, we use location map LM to record those overflow and underflow locations and process the image by: If one pixel is modified to 0 or 255, we label it with '1'; otherwise, we label it with '0'.In addition, LM is losslessly compressed into CLM using the algorithm in [21] to reduce its size.In this paper, side information includes: 1.
The length of side information: L side .2.
The number of embedding times: ET.
Size of the original secret data. 5.
The length of CLM: L CLM .6.
The compressed location map: CLM.
Finally, the stego image I is generated.

Data Extraction and Image Recovery
For a received stego image I', the LSBs of the first L aux pixels are extracted to retrieve the side information.According to the side information, LM is decompressed to restore the pixels.Figure 6 illustrates the data extraction and image recovery process.The predicted value p for each pixel is obtained by using the proposed LS-ET predictor as in the hiding process.
The prediction error e '(i,j) is obtained by Equation (26): According to the received embedding position ee and the skipping position es, a bit of data cdata that was hidden in the pixel can be extracted by e s e i j e cdata e i j e The original prediction error e(i,j) is obtained by if '( , )  The predicted value p for each pixel is obtained by using the proposed LS-ET predictor as in the hiding process.
The prediction error e (i,j) is obtained by Equation ( 26): According to the received embedding position e e and the skipping position e s , a bit of data cdata that was hidden in the pixel can be extracted by The original prediction error e(i,j) is obtained by if e (i, j) = e s e (i, j) − 1 if e (i, j) > e s & e e < e s e (i, j) + 1 if e (i, j) < e s & e e > e s e (i, j) else (28) The original pixel x(i,j) can be recovered by x(i, j) = p(i, j) + e(i, j) After performing the above processes for all pixels in the stego image, the data CData hidden in the image is extracted and the original image I is recovered.
Decode the extracted data CData to obtain the original secret data Data.The detailed procedure for decoding is as follows.
Initialize pos1 = 0 and pos2 = 0. pos1 is used to record the last cover symbol that has been decoded, and pos2 is used to record the number of message bits that have been decoded.
Figure 7 lists the PSNR (peak-signal-to-noise ratio) under different parameters for image 'Lena' with embedding rates of 0.1 bpp to 0.6 bpp.It is observed that a 9 × 9 size block best exploits the local characteristics with as few reference pixels as possible.It is shown that the block size of B = 9 and threshold Th = 21 achieve the desirable performance.Such results can be similarly observed for other test images.Therefore, B = 9 and Th = 21 are adopted as preferable parameters in our scheme, which will be demonstrated in the next experiments.It is known that Th is an important threshold for pixel classification, so the value of Th can affect prediction accuracy, and the prediction accuracy should vary slowly with Th.When Th takes different values, the shape of the prediction error histogram changes accordingly, and it may change the embedding or skipping position in the embedding process, which is adaptively selected according to the shape of the histogram.However, if there is a slight difference in the choice of two positions, even if only a bin difference, it may have a great impact on the stego image quality.Thus, some of the sharp changes in PSNR in Figure 7 are caused by changes in the embedding position or skipping position.

Parameter r for Data Coding
The parameter r represents the sparse rate, which will directly affect the coding rate and coding effect.Figure 8 shows the effect of r on the capacity-distortion performance of It is known that Th is an important threshold for pixel classification, so the value of Th can affect prediction accuracy, and the prediction accuracy should vary slowly with Th.When Th takes different values, the shape of the prediction error histogram changes accordingly, and it may change the embedding or skipping position in the embedding process, which is adaptively selected according to the shape of the histogram.However, if there is a slight difference in the choice of two positions, even if only a bin difference, it may have a great impact on the stego image quality.Thus, some of the sharp changes in PSNR in Figure 7 are caused by changes in the embedding position or skipping position.

Parameter r for Data Coding
The parameter r represents the sparse rate, which will directly affect the coding rate and coding effect.Figure 8 shows the effect of r on the capacity-distortion performance of the algorithm, where r ∈ [1,5].As can be seen from Figure 8, the algorithm performance is acceptable when r values are 1, 2, or 3.If the value of r is too large, the coded data is too long.Although the number of '1's in the coding data is small, it produces a large amount of shifting distortion and loses great image quality, even the data cannot be fully embedded in the image.
At r = 1 and r = 2, the algorithm performance is similar in the 'Lena', 'Airplane', and 'Boat' images.However, the image quality of the complex image 'Baboon' is deteriorated at r = 2; the reason is that the prediction error histogram of the complex image is flatter (the number of pixels per bin is relatively small).If the coding data is too long, it will require a lot of bins as the expandable bin, resulting in a lot of shifting distortion.Thus, the image quality of complex images is more sensitive to the change of r.Moreover, the smaller the r, the better the capacity-distortion performance of the algorithm.
At r = 1 and r = 2, the algorithm performance is similar in the 'Lena', 'Airplane', and 'Boat' images.However, the image quality of the complex image 'Baboon' is deteriorated at r = 2; the reason is that the prediction error histogram of the complex image is flatter (the number of pixels per bin is relatively small).If the coding data is too long, it will require a lot of bins as the expandable bin, resulting in a lot of shifting distortion.Thus, the image quality of complex images is more sensitive to the change of r.Moreover, the smaller the r, the better the capacity-distortion performance of the algorithm.Figure 9 shows the prediction error distributions for 'Lena' at r = 1, 2, and 3 under different embedding rates (ER) of 0.1 bpp, 0.4 bpp, and 0.7 bpp, respectively.As can be seen, the three prediction error histograms all conform to the Laplacian distribution.Furthermore, the larger the r, the better the performance of concealing the embedding traces.When r = 3, since the number of '1's in the coded data is very small, there is no significant change in either bin in the embedding process, so the after-embedding histogram maintains a similar shape to the original histogram, even if the embedding capacity is very large.
Considering all of the above, the algorithm uses r = 2 as the preferable parameter and will be demonstrated in the next experiments.Figure 9 shows the prediction error distributions for 'Lena' at r = 1, 2, and 3 under different embedding rates (ER) of 0.1 bpp, 0.4 bpp, and 0.7 bpp, respectively.As can be seen, the three prediction error histograms all conform to the Laplacian distribution.Furthermore, the larger the r, the better the performance of concealing the embedding traces.When r = 3, since the number of '1's in the coded data is very small, there is no significant change in either bin in the embedding process, so the after-embedding histogram maintains a similar shape to the original histogram, even if the embedding capacity is very large.

Analysis of the Simple Improvement on Data Coding
Table 1 shows the comparisons of the proportion of the original data and the coded data.Herein, the original data are generated randomly, resulting in a uniform distribution of '1's and '0's.For the original data, the proportions of '0's and '1's are each approximately 50%.After coding, the proportion of '0's is up to approximately 84.6%, and the proportion of '1's is reduced to approximately 15.4%.The proportion of '1's after coding decreased by 34.6% compared with that before coding.The decrease in the number of '1's will result in the improvement in image quality with the decrease of embedding distortion.
Table 1.Comparison between the original data and the coded data.Considering all of the above, the algorithm uses r = 2 as the preferable parameter and will be demonstrated in the next experiments.

Analysis of the Simple Improvement on Data Coding
Table 1 shows the comparisons of the proportion of the original data and the coded data.Herein, the original data are generated randomly, resulting in a uniform distribution of '1's and '0's.For the original data, the proportions of '0's and '1's are each approximately Figure 10 shows the comparisons of the bpp-PSNR performance between the sparse coding in [13] and the improved sparse coding.Figure 10a,b are the bpp-PSNR performance comparisons when the secret data is image and pseudo random number, respectively.The secret image uses the classic watermark image 'flower', as shown in Figure 10c, and the original image uses 'Lena'.It is observed that the proposed algorithm with simple improvement on the sparse representation has better PSNR performance than that of [13].The PSNR of the proposed algorithm is increased by approximately 4dB.This is because the probability of consecutive '0's in the image data is too high, so that the length of the data coded by the original sparse representation is too long.When the secret data is a randomly distributed in a '0' and '1' binary stream, the bpp-PSNR performances of the two algorithms are basically the same, as shown in Figure 10b.

Anti-Steganalysis of the Proposed Framework
In order to demonstrate the trace concealment performance of our prop framework, it is compared with the traditional HS algorithm and skipping HS experiments are conducted on the Bossbase1.01database.Consider the widely used image Lena and Baboon as examples; Figure 11 illustrates the results with diff embedding rates (ER) of 0.1 bpp, 0.4 bpp, and 0.7 bpp, respectively.
As can be seen from Figure 11, the blue line shows the hypothesis Laplacian m The prediction error histogram of the original image is smooth, and the distribution o prediction error can be well-fitted with Laplacian model.However, the prediction histograms of the traditional HS algorithm and the skipping HS algorithm become irregular, with greatly deviation from the hypothesis Laplacian model.At low embed rates, the histograms can barely maintain a shape similar to the original histog However, at higher embedding rates, the middle of the histogram, which should ex an obvious peak, becomes a large area of flatness ('flat ground').The embedding t can be obviously identified, so it is difficult to ensure the security of the data hidd the image.Instead, the prediction error histogram of our proposed trace conceal

Anti-Steganalysis of the Proposed Framework
In order to demonstrate the trace concealment performance of our proposed framework, it is compared with the traditional HS algorithm and skipping HS.The experiments are conducted on the Bossbase1.01database.Consider the widely used test image Lena and Baboon as examples; Figure 11 illustrates the results with different embedding rates (ER) of 0.1 bpp, 0.4 bpp, and 0.7 bpp, respectively.
As can be seen from Figure 11, the blue line shows the hypothesis Laplacian model.The prediction error histogram of the original image is smooth, and the distribution of the prediction error can be well-fitted with Laplacian model.However, the prediction error histograms of the traditional HS algorithm and the skipping HS algorithm become more irregular, with greatly deviation from the hypothesis Laplacian model.At low embedding rates, the histograms can barely maintain a shape similar to the original histogram.However, at higher embedding rates, the middle of the histogram, which should exhibit an obvious peak, becomes a large area of flatness ('flat ground').The embedding traces can be obviously identified, so it is difficult to ensure the security of the data hidden in the image.Instead, the prediction error histogram of our proposed trace concealment algorithm is relatively smooth, which resembles the original histogram.Even though the prediction error histogram inevitably becomes flatter after data embedding, it still preserves the bellshape resembling the Laplacian distribution, thus achieving the purpose of concealing the embedding traces.The trace concealment effect in Baboon is not as good as that in Lena because the number of adjacent prediction errors in the prediction error histograms for such complex textured images is not much different; thus, it is more difficult to keep the histogram smooth after hiding data, but the proposed trace concealment algorithm is relatively better than the other two algorithms.
Next, demonstrate the performance of the proposed framework against steganalysis for the HS-based method.
Wang et al. [16] designed an effective steganalysis method for HS, which shows excellent steganalysis ability when 'flat ground' phenomenon occurs in the after-embedding histogram and when the histogram conforms to the Laplacian distribution.
Table 2 illustrates the identification error rates of the steganalysis method [16] for traditional HS algorithm, skipping HS algorithm, and proposed algorithm with different embedding rates.The general Bossbase1.01database is employed in the experiment, and 1000 images with size 512 × 512 are randomly selected from an amount of 10,000 images.The identification error rate is the proportion of stego images that cannot be identified as hidden by the steganalysis method after hiding the data in all the test images.A high identification error rate indicates that the proposed algorithm is well resistant to steganalysis.As one can see from Table 2, the resisting steganalysis performance of proposed trace concealment algorithm is superior to the traditional HS algorithm and skipping HS algorithm.

Anti-Steganalysis of the Proposed Framework
In order to demonstrate the trace concealment performance of our proposed framework, it is compared with the traditional HS algorithm and skipping HS.The experiments are conducted on the Bossbase1.01database.Consider the widely used test image Lena and Baboon as examples; Figure 11 illustrates the results with different embedding rates (ER) of 0.1 bpp, 0.4 bpp, and 0.7 bpp, respectively.
As can be seen from Figure 11, the blue line shows the hypothesis Laplacian model.The prediction error histogram of the original image is smooth, and the distribution of the prediction error can be well-fitted with Laplacian model.However, the prediction error histograms of the traditional HS algorithm and the skipping HS algorithm become more irregular, with greatly deviation from the hypothesis Laplacian model.At low embedding rates, the histograms can barely maintain a shape similar to the original histogram.However, at higher embedding rates, the middle of the histogram, which should exhibit an obvious peak, becomes a large area of flatness ('flat ground').The embedding traces can be obviously identified, so it is difficult to ensure the security of the data hidden in the image.Instead, the prediction error histogram of our proposed trace concealment algorithm is relatively smooth, which resembles the original histogram.Even though the prediction error histogram inevitably becomes flatter after data embedding, it still preserves the bell-shape resembling the Laplacian distribution, thus achieving the purpose of concealing the embedding traces.The trace concealment effect in Baboon is not as good as that in Lena because the number of adjacent prediction errors in the prediction error histograms for such complex textured images is not much different; thus, it is more difficult to keep the histogram smooth after hiding data, but the proposed trace concealment algorithm is relatively better than the other two algorithms.
Next, demonstrate the performance of the proposed framework against steganalysis for the HS-based method.
Wang et al. [16] designed an effective steganalysis method for HS, which shows excellent steganalysis ability when 'flat ground' phenomenon occurs in the after-embedding histogram and when the histogram conforms to the Laplacian distribution.(   ( Table 2 illustrates the identification error rates of the steganalysis method [16] for traditional HS algorithm, skipping HS algorithm, and proposed algorithm with different embedding rates.The general Bossbase1.01database is employed in the experiment, and 1000 images with size 512 × 512 are randomly selected from an amount of 10,000 images.The identification error rate is the proportion of stego images that cannot be identified as hidden by the steganalysis method after hiding the data in all the test images.A high

Capacity-Distortion Performance Comparison
The proposed RDH scheme focuses on not only improving anti-steganalysis ability but also on capacity-distortion performance.The proposed scheme considers two embedding distortions that lead to the decreasing image quality of the HS algorithms: reducing the expanding distortion by the improved data coding method and reducing the shifting distortion by the improved skipping embedding strategy.Moreover, due to the application of the improved data-coding method, the skipping distance is increased, which further highlights the advantages of skipping embedding in reducing shifting distortion.Figure 12 shows the capacity-distortion performance comparison results, including the rhombus predictor with sorting technique (Sachnev et al.) [5], LP (Dragoi et al.) [6], ridge regression predictor (Wang et al. (Ridge)) [9], multiple histogram algorithm (Wang et al. (MH)) [10], and trace concealment algorithm (Dong et al.) [19].The slightly worse capacity-distortion performance of proposed algorithm at 0.1 bpp in the Figure 12 is due to the fact that, at low embedding rates, there is no expandable bin in the prediction error histogram similar to length of the payload.Therefore, the embedding position selected may be closer to the middle of the histogram, resulting in more unnecessary shifting distortion.Thus, the worse capacity-distortion performance at the low embedding rate that does not occur in all images, but usually in smooth images, is unexpected.As can be seen from Figure 12, compared with the state-of-the-art algorithms that focus only on improving capacity-distortion performance, our proposed algorithm achieves similar capacity-distortion performance and superior anti-steganalysis ability.Moreover, compared with the trace concealment algorithm in [19] that loses too much image quality, our algorithm also achieves superior image quality.
Thus, the worse capacity-distortion performance at the low embedding rate that does not occur in all images, but usually in smooth images, is unexpected.As can be seen from Figure 12, compared with the state-of-the-art algorithms that focus only on improving capacity-distortion performance, our proposed algorithm achieves similar capacity-distortion performance and superior anti-steganalysis ability.Moreover, compared with the trace concealment algorithm in [19] that loses too much image quality, our algorithm also achieves superior image quality.

Figure 2 .
Figure 2. Training pixels of the target pixel x(i,j): (a) LP and (b) Ridge regression predictor.

Figure 2 .
Figure 2. Training pixels of the target pixel x(i,j): (a) LP and (b) Ridge regression predictor.

FigureFigure 3 .
Figure 3a-e respectively illustrate the training pixel and prediction context of target pixel which belongs to the weak edge, horizontal edge, vertical edge, positive diagonal edge, and negative diagonal edge.The shadow pixels are the training pixels of the target pixel, and the 4 neighboring pixels in the green border are the prediction context of the target pixel.

Figure 3 .
Figure 3. Training pixels and prediction context of the target pixel x(i,j): (a) Weak edge; (b) Horizontal edge; (c) Vertical edge; (d) Positive diagonal edge; and (e) Negative diagonal edge.

Mathematics 2022, 10 , 4249 11 of 21 Figure 5 .Figure 5 .
Figure 5.The comparison of the traditional HS methods with the proposed framework.With the predicted value p and the modified prediction error e ', the stego pixel x ' can be obtained by:

Figure 6 .
Figure 6.The data extraction and image recovery process.

Figure 6 .
Figure 6.The data extraction and image recovery process.

Figure 7 Figure 7 .
Figure7lists the PSNR (peak-signal-to-noise ratio) under different parameters for image 'Lena' with embedding rates of 0.1 bpp to 0.6 bpp.It is observed that a 9 × 9 size block best exploits the local characteristics with as few reference pixels as possible.It is shown that the block size of B = 9 and threshold Th = 21 achieve the desirable performance.Such results can be similarly observed for other test images.Therefore, B = 9 and Th = 21 are adopted as preferable parameters in our scheme, which will be demonstrated in the next experiments.

Figure 10 .
Figure 10.Comparisons between the sparse coding in [13] and the improved sparse codin secret data is in image form; (b) secret data is generated randomly; and (c) flower.

Figure 10 .
Figure 10.Comparisons between the sparse coding in [13] and the improved sparse coding: (a) secret data is in image form; (b) secret data is generated randomly; and (c) flower.

Figure 10 .
Figure 10.Comparisons between the sparse coding in [13] and the improved sparse coding: (a) secret data is in image form; (b) secret data is generated randomly; and (c) flower.
e(i, j) = e e & cdata = 0 e s if e(i, j) = e e & cdata = 1 e(i, j) + 1 if e(i, j) ≥ e s & e e < e s e(i, j) − 1 if e(i, j) ≤ e s & e e > e s

Table 2 .
Performance of resisting steganalysis for HS algorithms.