An Improved Integer Transform Combining with an Irregular Block Partition

After conducting deep research on all existing reversible data hiding (RDH) methods based on Alattar’s integer transform, we discover that the frequently-used method in obtaining the difference value list of an image block may lead to high embedding distortion. To this end, we propose an improved Alattar’s transform-based-RDH-method. Firstly, the irregular block partition method which makes full use of high correlation between two neighboring pixels is proposed to increase the embedding performance. Specifically, each image block is composed of a center pixel and several pixels surrounding this center pixel. Thus, the difference value list is created by using the center pixel to predict each pixel surrounding it. Since the center pixel is highly related to each pixel surrounding it, a sharp difference value histogram is generated. Secondly, the mean value of an image block in Alattar’s integer transform has embedding invariance property, and therefore, it can be used for increasing the estimation performance of a block’s local complexity. Finally, two-layer embedding is combined into our scheme in order to optimize the embedding performance. Experimental results show that our method is effective.


Introduction
In the field of information security, a technique called data hiding has been developed widely to hide secret bits into cover images for copyright protection, image authentication, etc.However, most data hiding techniques only focus on correct extraction of hidden data, while neglect lossless recovery of cover images.It is known that lossless recovery of cover images is required in some applications like remote sensing, law enforcement, archive management, etc.Thus, reversible data hiding (RDH), a kind of special data hiding, is presented to satisfy the requirements of these applications.Its speciality lies in the fact that it enables the original image to be completely recovered without any distortion after the embedded bits are extracted.
In the past decade, RDH has been extensively studied and a large amount of RDH schemes have been proposed.Here, we categorize the existing RDH schemes into five main classes according to their used techniques: lossless compression [1], difference expansion (DE) [2], histogram shifting (HS) [3], prediction-error expansion (PEE) [4] and integer transform [5][6][7][8].In the early stage of RDH development, lossless compression was the frequently-used technique, in which secret bits are embedded into the vacant embedding space generated by compressing the images.However, lossless compression cannot provide high embedding capacity.The demand for high embedding capacity was becoming more and more urgent.In 2003, Tian proposed an RDH method called DE to make a great improvement in embedding capacity [2].The key idea of schemes based on DE is to expand the difference value between two adjacent pixels to create a vacant least significant bit (LSB), and then embed one data bit into this empty LSB.HS was firstly proposed by Ni et al. to select the maximum and minimum points in the histogram for embedding [9].The maximum modification in HS is one grayscale unit.Afterwards, some improvements on HS were generated [10].After studying DE deeply, Thodi et al. found that DE can be further improved to obtain a sharper difference histogram.To this end, Thodi et al. introduced the prediction in DE, and thus a sharper prediction-error histogram is generated, which helps to further improve embedding performance.Later, various research on how to improve the prediction accuracy has been carried out [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25].The schemes based on integer transform have a similar feature that more than two pixels are grouped into a unit, and then this unit is processed to carry data via an integer transform [5][6][7][8].
Compared with Alattar's transform [5], Weng et al.'s method [8] selects previously the blocks located in smooth regions for data embedding.Therefore, the smoothness estimation of blocks is very important to the increase of embedding performance.Alattar's method has a characteristic that the mean value of a block is kept unaltered after embedding.Weng et al. made full use of this characteristic and utilized the mean value of a block along with pixels surrounding this block to estimate a block's smoothness.Thus, the application of the invariant mean value helps to increase embedding performance.In Weng et al.'s method, there is still room for improvement.They obtain the difference list by calculating the difference values between every two neighboring pixels in a block.Since two neighboring pixels are highly related, a sharp distribution of difference histogram is generated.However, in the process of calculating the watermarked pixels, they neglect that the accumulated sum of difference values is very large, and cannot be ignored, especially when most difference values within a block are positive integers or most difference values within a block are negative integers.More importantly, the accumulated sum results in large modifications to the original pixels.That is to say, the difference between the original and hidden pixels is very large.As an extension of Weng et al.'s method, we propose an improved Alattar's transform-based-RDH-method.Firstly, the irregular block partition method which makes full use of high correlation between two neighboring pixels on the basis of reducing embedding distortion is proposed.Specifically, each image block is composed of a center pixel and its surrounding pixels.Thus, the difference value list is created by using the center pixel to predict each pixel surrounding it.Since the center pixel is highly related to its surrounding pixels, a sharp difference value histogram is generated.Secondly, the mean value of an image block in Alattar's integer transform has embedding invariance property, and, therefore, it can be used for increasing the estimation performance of block's local complexity.Finally, two-layer embedding is combined into our scheme in order to optimize the embedding performance.Extensive experimental results show that the proposed method has better embedding performance.

Weng et al.'s Method
Here, we will give a simple introduction for Weng et al.'s method.In their method, the original image I is partitioned into non-overlapped image blocks with size n.All the pixels in a block are arranged into a one-dimensional pixel list x according to odd rows from left to right, even rows from right to left (see Figure 1).The mean value x of x and the n − 1 difference values can be calculated as Equation (1): where x 1 , x 2 , . . ., x n−1 , x n are n pixels of the pixel list x, and x is the mean value of x.The inverse transform of Equation ( 1) is formulated in Equation (2): where y = {y 1 , y 2 , . . . ,y n−1 , y n } represents the watermarked pixel list.From Equation (2), we know the difference value d i , i ∈ {1, . . ., n − 1}, is magnified by n − i times in the process of calculating y 1 when d i is generated by two neighboring pixels.This is also the reason that the difference values calculated by every two neighboring pixels would result in high embedding distortion.The rest of this section will explain it.
If each of n − 1 difference values is expanded to carry one-bit data, then Equation ( 2) can be converted into Equation (3): In order to illustrate the effect of difference expansion on embedding distortion, Equation (3) needs to be further simplified.To this end, suppose x i is replaced by k 2 + n x to simplify Equation (3) as below: where the notation A t is used to denote an additional term, i.e., After simplicity, it can be seen clearly that Equation ( 4) is composed of two parts: an embedding process and an additional term A t .In the embedding process, x is regarded as a prediction value to predict each pixel within a block.The value of A t is related to the distortion (i.e., the modified magnitude of each pixel in x.From the formula of calculating A t , we know A t is mainly determined by w i and k 2 , where i ∈ {1, 2, . . ., n − 1}.In the rest of this paper, for simplicity of description, the range of the value i will be no longer rementioned.Specifically, since k 2 ∈ {0, 1, . . . ,n − 1} and all to-be-embedded bits are 1.Even if A t does not reach this maximum, A t is also very large because the w i is magnified by n − i times.This maximum is determined predominantly by the block size n.In a word, the bigger the block size n is, the larger A t is, and the higher the distortion becomes. According to the description above, blocks with large size n cannot be used in Weng et al.'s method.Therefore, the blocks with size 4 are utilized in their method.However, even if n is set 4, this maximum of A t is 3, which would still lead to large decline in PSNR value.In addition, when n is 4, the maximum embedding rate only reaches 0.75 bpp (bit per pixel).In other words, there is still much room for improvement in Weng et al.'s method.To this end, our method focuses on increasing block size on the basis of reducing embedding distortion as much as possible by adopting a different way of obtaining the difference values.

Alattar's Method
Alattar's integer transform can be formulated in Equation ( 5): where the notation l is used to represent the average value of an n-sized image block, and h 1 , h 2 , . . ., h n−1 are n − 1 difference values of this block.

The Proposed Scheme
According to the description in Section 2, the difference values created by two neighboring pixels would result in large embedding distortion.To resolve this problem, we need to change the way of obtaining difference values.However, in order to remain high visual quality, high correlation between two adjacent pixels needs to be preserved.To achieve this goal, an irregular block partition strategy is proposed in our method, in which each block is composed of a center pixel and its neighboring pixels.With the adoption of the irregular block partition strategy, the center pixel is adjacent to each pixel surrounding it.After block partition, the center pixel is utilized to predict each pixel surrounding it, so that a sharp prediction-error histogram is generated.Finally, two-layer embedding is utilized to further increase embedding performance.Here, we will give a detailed description for each part of the proposed method (i.e., irregular block partition, block selection and two-layer embedding), and, meanwhile, performance analysis in Section 3.2 is used to explain the reason that our method can achieve better performance than Weng et al.'s.

Irregular Block Partition
The key idea of constructing the irregular block partition is to design different partition methods for different sized blocks to increase the correlation between the center pixel and its neighbors.In our scheme, an n-sized block has its own particular partition strategy, and each block has a center pixel and its surrounding pixels, as shown in Figure 2.For a 4-sized image block, x 1,1 is the center pixel, and x 1,2 , x 2,1 and x 2,2 are the three surrounding pixels of x 1,1 (Figure 2a).The blocks can also be a combination of 5-sized and 7-sized blocks (Figure 2b).In the rest of this paper, for simplicity, we utilize n = 5 to indicate the block partition in Figure 2b.For a 9-sized block, the center pixel x 2,2 is surrounded by eight neighboring pixels (Figure 2c).
In fact, our method can also be used for a generic block size n.However, the embedding performance may lead to an unsatisfactory result.To this end, we propose the irregular block partition.From Figure 2, one can observe that all three partition methods can make full use of high correlation between the center pixel and its neighboring pixels so that optimal embedding performance can be achieved.For an image I of a given size, it is partitioned into non-overlapped blocks according to one of three partition methods in Figure 2.Each partition method will yield a highest PSNR at a given payload.Therefore, there are three candidates of the PSNR.Among these three candidates, the highest PSNR is decided as the final PSNR.Meanwhile, the block partition method corresponding to the highest PSNR is selected as the optimal block size.
x (b) A 5-sized and 7-sized block

Performance Analysis
After block partition, for an n-sized (e.g., 4, 5 or 9) image block, the center pixel x c is used to predict each pixel surrounding x c so that the n − 1 difference values d 1 , d 2 , . . ., d n−1 are generated: The inverse transform of Equation ( 9) is calculated in Equation ( 10): During the embedding process, each original difference value is compared with a predefined threshold value pT h so as to determine its corresponding modification method.Specifically, if one difference value falls into [−pT h , pT h ), then it is suitable for expansion and it can be embedded with one-bit data.Otherwise, it is shifted by pT h to reduce embedding distortion (see Equation ( 11)): Substitute all modified difference values in Equation (11) into Equation (10), and a watermarked pixel array y = {y c , y 2 , . . . ,y n−1 , y n } is obtained in Equation ( 12): , where y c is used to represent the modified center pixel.
Here, in order to clearly explain that the prediction method used in our scheme would introduce less distortion, all the difference values within a block are considered to be expanded to carry one-bit data, and then our transform can be reorganized into a new equation: 13) can be simplified as follows: In order to differentiate with A t , we utilize another notation A s to denote the additional term, i.e., A s = It can be clearly seen that the difference between Equation (4) and Equation ( 13) is the additional term.Comparing A s with A t , A s is smaller than A t because w i in A s is not magnified by n − i times.Since k 2 ∈ {0, 1, . . . ,n − 1} and w i ∈ {0, 1}, then the range of A s is calculated as follows: i.e., 0 As mentioned above, the A t in Equation ( 4) is dependent on the block size n.When n is set a large value, the distortion introduced by A t is very large.On the contrary, A s is not related with n.Whatever n is, the visual effect to host image is very small and fixed.Even if the same n is set in Weng et al.'s method and ours, the maximum of A t is larger than that of A s .Based on this advantage, n can be set to 3 × 3 in our method.Therefore, a 9-sized block is able to carry at most 8 data bits and the maximal embedding rate approximates to 89%.

Block Selection
How to design accurate evaluation method of local smoothness is an important factor for improving embedding performance.In Alattar's integer transform, although each pixel in a block may be changed in the data embedding process, the mean value remains unaltered.The mean value x represents the average value of all pixels in a block.Therefore, it can help to increase evaluation accuracy of the local complexity.To this end, the mean value is utilized along with the neighborhood (see Figure 3) to evaluate the local complexity.The local complexity, denoted by ∆, is computed in Equation ( 15): where u denotes the mean value between x and r + s + 1 neighboring pixels. 1,1 x L x + Figure 3.An r × s-sized block is marked in black, and its neighborhood is composed of r + s + 1 pixels marked in green, where n = r × s.
If the block complexity ∆ is smaller than a predefined threshold vT h , then this block is identified to be within a smooth region.Otherwise, it is estimated to be located in a rough region.It is well known that the advantage of smooth blocks lies in the fact that they introduce lower distortion than complex blocks at the same payload.In order to remain low embedding distortion, the blocks in complex regions are usually not used in the data embedding process.For smooth blocks, they cannot be used entirely for data embedding because of overflow or underflow problem.Specifically, after data embedding, each modified pixel in a smooth block must fall into the range of [0, 255].Otherwise, overflow or underflow happens and this smooth block cannot be used for data embedding.In order to clearly identify these unused smooth blocks, a location map is created.Since the complex blocks can easily be identified by comparing ∆ with vT h , it is not necessary for them to be recorded in this map.Therefore, this map is marked by 1 when x ∈ E s , where Otherwise, the map is marked by 0 when x ∈ O s1 , where O s1 = {x / ∈ D : < vT h }.After the location map is generated, it is losslessly compressed by an arithmetic encoder to create a compressed bitstream L, and its length is L S .

Two-Layer Embedding
The key idea of two-layer embedding is to perform the embedding process twice by adopting different embedding modes.Here, we will give an example to explain what the embedding mode is.In this example, the block size is set 4. That is to say, n is equal to 2 × 2. When n is 4, there exist four modes, in each of which some pixels are excluded from block partition, while the remaining ones are involved in block partition.From Figure 4, it can be seen that all the pixels marked in white circles are used for data embedding, while the pixels marked in black circles are excluded from data embedding in each mode.More specifically, the pixels used for data embedding are partitioned into n-sized image blocks.Therefore, in our method, a different mode is selected in each layer.The advantage of doing so is to further increase embedding performance.Virtually, our scheme benefits significantly from two-layer embedding.It can provide superior embedding performance in contrast to one-layer embedding.With the help of two-layer embedding, each layer is capable of carrying a part of the payload so that the thresholds, i.e., vT h and pT h , can be set smaller values.We utilize Lena and Barbara as test images to demonstrate the advantage of two-layer embedding.Table 1 gives comparison of the optimal thresholds, PSNR, payload size (bpp) between one-layer and two-layer embedding on Lena and Barbara.

Embedding and Extraction Procedures
This section is used for describing the embedding and extraction procedures, respectively.

Embedding Procedure
According to the description above, some information, e.g., vT h , pT h and the compressed location map L, helps decoders to recover the host images and extract hidden data on the decoding side.Without them, the reversibility cannot be ensured.Therefore, this information needs be embedded into host images for blind extraction.This information is composed of the compressed location map L (L S bits), vT h (8 bits), pT h (8 bits), the block size (4 bits for r and 4 bits for s), E (23 bits) and end of symbol (EOS) (8 bits).Since the to-be-embedded image may not be completely embedded in each embedding layer, some additional information needs to be recorded, e.g., the location (i.e., row number and column number) where the embedding process is terminated in each embedding layer.E is created to contain these additional information, which is composed of nine bits (row number), nine bits (column number), four bits (representing the type of embedding modes) and one bit (differentiating one-layer and two-layers embedding.0 and 1, respectively, denote one-layer and two-layer embedding).For example, for n ∈ {4, 9}, nine embedding modes are provided by employing 9-sized blocks so that four bits are required.In addition, an end of symbol (EOS) is put at the end of the overhead information.Generally, the size of the overhead information L ∑ = L S + 55.
The size of the overhead information largely depends on the size of the compressed location map L S .We utilize Table 2 to denote the size of the compressed location map L S under different payload size for Lena and Baboon.As illustrated in Table 2, the location map can be losslessly compressed into a very short bitstream whatever the payload is.Even if the required payload is large, L S is still very small.Therefore, the overhead information occupies a very small proportion of the payload.The detailed data embedding process is listed as follows.
Input: Host image I, block size r × s, threshold vT h , threshold pT h , to-be-embedded watermark.Output: Watermarked image I w .
(1) One-layer embedding perform Equation (11) and Equation ( 12); end For the blocks without adjacent r + s + 1 pixels, they are ignored in the embedding procedure to ensure reversibility.We employ W M <pT h to describe the number of data bits embedded into the host image, which is equivalent to the number of difference values belonging to [−pT h , pT h ). •

Overhead information embedding
The overhead information is obtained according to the description above.Suppose P C denotes the required payload, and it is partitioned into two parts which correspond to the first and second embedding layers, respectively.P L stands for the to-be-embedded payload of the current layer, while W M <pT h − L ∑ represents the maximal embedding capacity.Firstly, P L is embedded into the blocks in E s according to the step of data bits embedding.Secondly, for the first L ∑ modified pixels, we collect their LSBs (least significant binary) and append them to the payload P L .In this way, the locations of their LSBs are vacant so that they can be occupied by the overhead information.Finally, the rest of the payload P L along with L ∑ LSBs are embedded into the remaining blocks in E s according to the step of data bits embedding.
(2) Watermarked image obtaining The payload P C can be satisfied by one-layer embedding.Therefore, a watermarked image I w is created after (1) is performed.
Two-layer embedding is adopted to achieve required payload P C .The remaining payload is defined as P L = P C −(W M <pT h − L ∑ ).Suppose P C = P L , then we repeat (1) for the second-layer embedding.

Step 1: Overhead information extraction
The LSBs of the watermarked pixels are collected into a binary sequence χ.The EOS symbol is obtained from the χ.All the bits before EOS are decompressed by an arithmetic decoder so as to retrieve the original location map.The location map is recompressed to obtain L S .The additional information after L S are extracted one by one according to their own lengths.

Step 2: Data extraction and original image recovery
After the row and column numbers where the embedding procedure are terminated is gained, the last modified block is determined.To ensure reversibility, the data extraction process must be performed according to the inverse order of data embedding.That is to say, the last modified block is firstly extracted while the first modified block is last extracted.
For the blocks whose local variance ∆ belongs to [vT h , +∞), they are kept invariant.As for the blocks whose local variance ∆ falls into the range of (−∞, vT h ) and their corresponding locations in location map are assigned by 0, they remain unchanged.When the locations of the blocks located in (−∞, vT h ) are recorded as 1, these blocks can be completely restored according to Equations (10)  and (16).In addition, the blocks without r + s + 1 neighbors are skipped during the extraction process: After the original L ∑ LSBs are fully extracted in each embedding layer, they are used for replacing the LSBs of the first modified pixels in the data embedding process so as to ensure payload extraction.If the extracted bit used for identifying one-layer or two-layer embedding is 0, then the original image is able to be correctly recovered after Steps 1 to 2 are performed.Otherwise, we repeat Steps 1 and 2 for the second-layer extraction.

Experimental Results
In the experiments, we compare our method with five RDH schemes proposed by Alattar [5], Wang et al. [6], Peng et al. [7], Weng et al. [8] and Luo et al. [16] so as to demonstrate our method is effective.Figure 5 illustrates comparisons of capacity-distortion performance among six RDH schemes on six images: Airplane, Baboon, Barbara, Goldhill, Sailboat and Lena.
Our method achieves better performance compared with Alattar's method by introducing block selection and irregular block partition.The experimental result also shows that the proposed method outperforms Alattar's method.The integer transform proposed by Wang et al. can be thought to be a prediction process, in which the mean value of a block is used for predicting each pixel in this block.Their method is able to achieve high visual quality because there is not an additional term in Wang et al.'s scheme.However, when a block is used for data embedding, all the pixels must be expanded to carry data bits even if the distortion introduced by modified pixels is huge.In addition, it is very difficult to compress the location map which aims at differentiating the available blocks with unused ones in their method.In our method, the differences between every two pixels can be controlled by pT h so as to keep high visual quality.In the light of the invariant mean value, only the locations of the blocks identified to be within smooth regions are necessary to be recorded.To this end, in comparison to Wang et al.'s method, a location map with reduced size is required in our scheme and it is able to be remarkably compressed.In a word, our method achieves higher performance than As for Airplane, Barbara, Goldhill and Sailboat in Figure 5, superior capacity-distortion performance can be obtained in our method at all embedding rates in contrast to Weng et al.'s.More specifically, slight increase in PSNR is introduced in our algorithm when low capacity is required, especially at the embedding rates smaller than 0.3 bpp.However, the proposed scheme provides significantly preferable performance at high embedding rates larger than 0.3 bpp.Experimental results in Baboon and Lena demonstrate that the distortion generated in our method is slightly higher than Weng et al.'s at low embedding rates.In contrast, when it comes to higher embedding rates (0.3 bpp for Baboon and 0.5 bpp for Lena), the performance is capable of exceeding that of Weng et al.'s method.In brief, the new algorithm is able to achieve more effective embedding performance, especially when larger embedding capacity is required.This is because the additional term provided in our method is smaller, which enables less distortion and better visual quality at the given embedding rates.Apart from small sized blocks, larger sized ones are also adopted in this paper.In this way, the embedding capacity increases dramatically while distortions produced by modification are low for the pixels located in smooth regions.In addition, unique prediction patterns employed in our method compensate for the shortage existing in large sized blocks i.e., weak intra-block correlation.In other words, more accurate prediction is achieved and a sharper difference histogram is formed.Payload Size (bpp)

Conclusions
In this paper, we propose a RDH method based on Alattar's integer transform.The irregular block partition is adopted to increase the correlation between two pixels within a block.Since the mean value remains unchanged before and after data hiding, it is utilized to improve the prediction accuracy of block complexity.In addition, two-layer embedding is also introduced to enhance the embedding performance.The experimental results show that the proposed method achieves better embedding performance compared with prior state-of-the-art works.

Figure 1 .
Figure 1.Taking a 4 × 4-sized block for example, a two-dimensional image block is arranged into a one-dimensional pixel list according to the arrow direction.

Figure 5 .
Figure 5.Comparison of capacity-distortion performance among six reversible data hiding (RDH) schemes on six images: Airplane, Baboon, Barbara, Goldhill, Sailboat and Lena.

Table 1 .
Performance comparison between one-layer embedding and two-layer embedding on Lena and Barbara.

Table 2 .
Size of the compressed location map L S under different payload size for Lena and Baboon.
S (in bits) Payload (in bpp) L S (in bits) Wang et al.'s.As an extension of Wang et al.'s method, Peng et al.'s method embeds more than one bit into each pixel in a block.Therefore, the problems in Wang et al.'s method are extended to Peng et al.'s.Thus, our method is superior to Peng et al.'s.