Joint Adaptive Coding and Reversible Data Hiding for AMBTC Compressed Images

: This paper proposes a joint coding and reversible data hiding method for absolute moment block truncation coding (AMBTC) compressed images. Existing methods use a predictor to predict the quantization levels of AMBTC codes. Equal-length indicators, secret bits and prediction errors are concatenated to construct the output code stream. However, the quantization levels might not highly correlate with their neighbors for predictive coding, and the use of equal-length indicators might impede the coding efﬁciency. The proposed method uses reversible integer transform to represent the quantization levels by their means and differences, which is advantageous for predictive coding. Moreover, the prediction errors are better classiﬁed into symmetrical encoding cases using the adaptive classiﬁcation technique. The length of indicators and the bits representing the prediction errors are properly assigned according to the classiﬁed results. Experiments show that the proposed method offers the lowest bitrate for a variety of images when compared with the existing state-of-the-art works.


Introduction
The rapid development of internet technology has made data security a key consideration for data transmission or content protection.While most digital data can be transmitted over the internet, transmitted data is also exposed to the risks of illegal access or interception.A solution to these problems is to employ a data hiding method, in which secret data are embedded into a digital media to cover the presence of embedment, or to protect the transmitted contents.Data hiding methods for images can be classified into irreversible [1][2][3] and reversible [4][5][6][7] methods.The cover images of both methods are distorted to obtain stego images.However, the distortions of irreversible methods are permanent, and thus, they are not suitable for applications where no distortions are allowed.In contrast, reversible methods recover the original images from the distorted versions after extracting the secret data.Since the original image can be completely recovered, reversible data hiding (RDH) methods have special applications, and much research has been devoted to investigating data hiding methods of this type.
Reversible data hiding method can be applied to images in spatial or compressed domains.The spatial domain methods embed data into a cover image by altering its pixel values.The difference expansion scheme proposed by Tian [8] and the histogram shifting scheme proposed by Ni et al. [9] are two well-known methods of this type.Many recent works [4][5][6][7] have exploited the merits of [8,9] to propose improved methods with better embedding efficiency.In contrast, compressed domain methods modify the coefficients of compressed codes [10], or use the joint neighborhood coding (JNC) [11] scheme to embed data bits into the compressed code stream.Up to now, RDH methods have been applied to vector quantization (VQ) [12,13], joint photographic experts group (JPEG) [14], and absolute moment block truncation coding (AMBTC) [15][16][17][18][19] compressed images.AMBTC [20] is a lossy compression method which re-represents an image block by a bitmap and two quantization levels.As AMBTC requires insignificant compaction costs while achieving satisfactory image quality, some researchers have investigated RDH methods and their applications [21][22][23] in this format.
Zhang et al. [16] in 2013 proposed an RDH for AMBTC compressed images.In their method, the different quantization levels are classified into eight cases, which in turn are recorded by three-bit indicators.By re-representing the AMBTC codes, secret data are embedded into the compressed code stream.Sun et al. [17] also proposed an efficient method using JNC.This method predicts pixel values and obtains prediction errors, which are then classified into four cases.Secret bits, two-bit indicators, and prediction errors are concatenated to construct the output code stream.Hong et al. [18] in 2017 modified the prediction and classification rules from Sun et al.'s work and proposed an improved method with a lower bitrate.However, their method also uses two-bit indicators to represent four categories of prediction errors.The use of equal-length indicators might not efficiently represent the encoding cases, leading to an increase in bitrate.In 2018, Chang et al. [19] also proposed an interesting method to improve Sun et al.'s work.In their method, a neighboring quantization level of the current quantization level, x, is selected as the prediction value, p, to predict x.The prediction error is obtained by performing the exclusive OR operation between the binary representations of p and x, and then the result is converted to its decimal representation.Since the calculated prediction errors are all positive, the sign bits are not required to record them.
In this paper, we propose a new RDH method for AMBTC compressed codes.With the proposed method, the AMBTC compressed image can be recovered, and the embedded data can be completely extracted.Unlike the aforementioned methods that embed data into quantization levels, the proposed method transforms the quantization levels into means and differences, which are used for carrying data bits.Moreover, the classifications of prediction errors are adaptively assigned, and varied length indicators are utilized to effectively reduce the bitrate.The experimental results show that the proposed method achieves the lowest bitrate when compared to recently published works.The rest of this paper is organized as follows.Section 2 briefly introduces the AMBTC compression method as well as Hong et al.'s work.Section 3 presents the proposed method.The experimental results and discussions are given in Section 4, while concluding remarks are addressed in the last section.

Related Works
This section briefly introduces the AMBTC compression method.Hong et al.'s work that was published recently is also introduced in this section.

AMBTC Compression Technique
The AMBTC compression technique partitions an image, I, into blocks of size r × r.The averaged value, {v i } N i=1 , of each block, {I i } N i=1 , where N is the total number of blocks, is calculated.AMBTC uses the lower and upper quantization levels, denoted by a i and b i , respectively, and a bitmap, B i , to represent the compressed code of block I i .a i and b i are calculated by rounding the averaged result of pixels in I i with values smaller than, and larger than, or equal to, v i , respectively.The j-th bit in B i , denoted by B i,j , is set to '1' if I i,j ≥ v i .Otherwise, B i,j is set to '0'.Every block is processed in the same manner, and the final AMBTC code, {a i , b i , B i } N i=1 , is obtained.The decoding of AMBTC codes can be done by simply replacing the bits valued '0' by a i and the bits valued '1' by b i .A simple example is given below.Let I i = [23,45,46,47; 43,47,77,80; 88,86,78,90; 78,80,68,70] be an image block to be compressed, where the semicolon denotes the end-of-row operator.

Hong et al.'s Method
In 2017, Hong et al. [18] proposed an efficient data hiding method for AMBTC compressed images.Their method uses the median edge detection (MED) predictor to obtain the prediction values as follows.Let x be the to-be-predicted upper or lower quantization level, and x u , x , and x u be the quantization levels to the upper, left, and upper left of x, respectively.The MED predictor predicts x using the following rule to obtain the prediction value.p: x u ≥ max(x u , x ); max(x u , x ), x u ≤ min(x u , x ); The prediction error, e, calculated by e = x − p, is classified into four categories using the centralized error division (CED) technique.Two-bit indicators, '00', '01', '10', and '11', are used to specify four corresponding encoding cases.The first case occurs when e = 0.In this case, the prediction error does not need bits to be recorded.The n-bit secret data, s j n j=1 , and the indicator '00' are concatenated together and output the result, s j n j=1 ||00 to the code stream, CS.The second case occurs if 1 ≤ e ≤ 2 β .In this case, the bitstream,  8  2 is output.The detailed encoding and decoding procedures can be found in [18].

Proposed Method
The existing RDH methods for AMBTC compressed images [16][17][18][19] all perform predictions on the quantization levels of AMBTC codes.The prediction errors are classified into categories.The secret data bits, indicators, and prediction errors are then concatenated and output to the code stream.For the JNC technique, better prediction often results in a smaller bitrate since fewer bits are required to record the prediction errors.However, the existing methods all perform the prediction on two quantization levels, which might not correlate enough to obtain a good prediction result.Moreover, this classification of prediction errors does not consider their distributions and assigns the same classification rules to different images.Improper classification might lead to a significant increase in bitrate.The proposed method transforms pairs of quantization levels into their means and differences using the reversible integer transform (RIT).The transformed means and differences are generally more correlated than the quantization levels, and thus, they are more suitable for predictive coding.The prediction errors are adaptively classified according to their distribution, ensuring that fewer bits are required to record them.Techniques used this paper will be separately introduced in the following two subsections.

Reversible Integer Transform of Quantization Levels
A pair of quantization levels (a i , b i ) can be transformed into their mean, m i , and difference, d i , using transform: where • is the floor operator.The transformation in Equation ( 2) is reversible since the pair of quantization levels, (a i , b i ), can be recovered by the inverse transform: Since the mean operation has a tendency to reduce the noise, and the difference operation tends to reduce the variation in neighboring quantization levels, the transformed means and differences are often more correlated than the quantization levels.Therefore, the prediction performed on the means and differences will improve the prediction accuracy.Figure 1 shows the distribution of prediction errors for the Lena and Baboon images using the MED predictor.In this figure, the red line with circle marks is the predictor that is applied on the upper and lower quantization levels, whereas the histogram is the prediction result when the predictor is applied on the transformed means and differences.As seen from Figure 1, the peak of the histogram is higher than that of the red circle line for both Lena and Baboon images.In fact, the experiments on other test images also obtain similar trends, indicating that the transformed means and differences are more amenable for predictive coding.where     is the floor operator.The transformation in Equation ( 2) is reversible since the pair of quantization levels, ( , ) i i a b , can be recovered by the inverse transform: Since the mean operation has a tendency to reduce the noise, and the difference operation tends to reduce the variation in neighboring quantization levels, the transformed means and differences are often more correlated than the quantization levels.Therefore, the prediction performed on the means and differences will improve the prediction accuracy.Figure 1 shows the distribution of prediction errors for the Lena and Baboon images using the MED predictor.In this figure, the red line with circle marks is the predictor that is applied on the upper and lower quantization levels, whereas the histogram is the prediction result when the predictor is applied on the transformed means and differences.As seen from Figure 1, the peak of the histogram is higher than that of the red circle line for both Lena and Baboon images.In fact, the experiments on other test images also obtain similar trends, indicating that the transformed means and differences are more amenable for predictive coding.

Adaptive Case Classification Technique
In Hong et al.'s CED technique [18], the encoding of a prediction error, e , is classified into four cases.If bits are required to encode a case-2 prediction error.Figure 2 shows a typical prediction error histogram, and the distribution has a sharp peak at zero and decays exponentially towards two sides.
The prediction errors fall in case 2 of [18] is shaded by and each requires  bits to record its value.However, the occurring frequencies of prediction errors in the range of 1 2 e    vary significantly.Within this range, the occurring frequency is the highest at 1 e  , in general, and decreases exponentially towards 2 e   .Therefore, the recording of prediction errors using  bits in this range with unbalanced frequencies is likely to increase the bitrate.The encoding of case 3 prediction errors, used in [18], also had similar problems.

Adaptive Case Classification Technique
In Hong et al.'s CED technique [18], the encoding of a prediction error, e, is classified into four cases.If 1 ≤ e ≤ 2 β (case 2), e is encoded by a β − bit plus a two-bit indicator.Therefore, a total of 2 + β bits are required to encode a case-2 prediction error.Figure 2 shows a typical prediction error histogram, and the distribution has a sharp peak at zero and decays exponentially towards two sides.The prediction errors fall in case 2 of [18]  where     is the floor operator.The transformation in Equation ( 2) is reversible since the pair of quantization levels, ( , ) i i a b , can be recovered by the inverse transform: Since the mean operation has a tendency to reduce the noise, and the difference operation tends to reduce the variation in neighboring quantization levels, the transformed means and differences are often more correlated than the quantization levels.Therefore, the prediction performed on the means and differences will improve the prediction accuracy.Figure 1 shows the distribution of prediction errors for the Lena and Baboon images using the MED predictor.In this figure, the red line with circle marks is the predictor that is applied on the upper and lower quantization levels, whereas the histogram is the prediction result when the predictor is applied on the transformed means and differences.As seen from Figure 1, the peak of the histogram is higher than that of the red circle line for both Lena and Baboon images.In fact, the experiments on other test images also obtain similar trends, indicating that the transformed means and differences are more amenable for predictive coding.

Adaptive Case Classification Technique
In Hong et al.'s CED technique [18], the encoding of a prediction error, e , is classified into four cases.If The prediction errors fall in case 2 of [18] is shaded by and each requires  bits to record its value.However, the occurring frequencies of prediction errors in the range of 1 2 e    vary significantly.Within this range, the occurring frequency is the highest at 1 e  , in general, and decreases exponentially towards 2 e   .Therefore, the recording of prediction errors using  bits in this range with unbalanced frequencies is likely to increase the bitrate.The encoding of case 3 prediction errors, used in [18], also had similar problems.and each requires β bits to record its value.However, the occurring frequencies of prediction errors in the range of 1 ≤ e ≤ 2 β vary significantly.Within this range, the occurring frequency is the highest at e = 1, in general, and decreases exponentially towards e = 2 β .Therefore, the recording of prediction errors using β bits in this range with unbalanced frequencies is likely to increase the bitrate.The encoding of case 3 prediction errors, used in [18], also had similar problems.We propose an adaptive case classification (ACC) technique to better classify the encoding cases.By dividing cases 2 and 3 used in the CED technique into sub-cases, the overall encoding efficiency can be significantly increased.Let x be the to-be-encoded elements and p be the prediction value of x using the MED predictor.The prediction error, e, can be calculated by e x p   .Similar to the CED, the ACC classifies 0 e  into case 1, and no bits are required to record the prediction errors.Case 2 in the proposed ACC technique is sub-divided into case 2a ( 1 2 e    ) and case 2b (     .In this case, we directly record the eight-bit binary representation of x.We use twoor three-bit indicators to indicate the encoding cases.The indicator used for each case, the associated ranges of prediction errors, and the corresponding code length (exclude the secret data) are shown in Table 1.
Table 1.Indicators and the range of cases.

Case Indicator Range of Prediction Errors Code Length case 1
'00' The advantages of using the ACC over the CED technique are as follows.When 1 2 e    ,  bits are required to record the prediction errors in the CED technique.However, the ACC technique only requires  bits plus one extra bit to distinguish the sub-case.As a result, the ACC technique saves    , both techniques require  bits to record the prediction errors.However, the ACC technique requires one more bit to distinguish the sub-case.
    , the CED technique requires eight bits to record the prediction errors whereas the ACC technique requires only  bits.Therefore, in this range, the ACC technique saves We propose an adaptive case classification (ACC) technique to better classify the encoding cases.By dividing cases 2 and 3 used in the CED technique into sub-cases, the overall encoding efficiency can be significantly increased.Let x be the to-be-encoded elements and p be the prediction value of x using the MED predictor.The prediction error, e, can be calculated by e = x − p. Similar to the CED, the ACC classifies e = 0 into case 1, and no bits are required to record the prediction errors.Case 2 in the proposed ACC technique is sub-divided into case 2a (1 ≤ e ≤ 2 α ) and case 2b (2 α + 1 ≤ e ≤ 2 α + 2 β ), where α and β (α < β) are the number of bits used to record the prediction errors of cases 2a and 2b, respectively.Similarly, case 3a (−2 α ≤ e ≤ −1) and case 3b (− 2 α + 2 β ≤ e ≤ (2 α + 1 ) are the subcases of case 3, and the ACC respectively uses α and β bits to record the prediction errors of these two cases.Finally, case 4 occurs when or e ≤ −(2 α + 2 β + 1).In this case, we directly record the eight-bit binary representation of x.We use two-or three-bit indicators to indicate the encoding cases.The indicator used for each case, the associated ranges of prediction errors, and the corresponding code length (exclude the secret data) are shown in Table 1.
The advantages of using the ACC over the CED technique are as follows.When 1 ≤ e ≤ 2 α , β bits are required to record the prediction errors in the CED technique.However, the ACC technique only requires α bits plus one extra bit to distinguish the sub-case.As a result, the ACC technique saves β − α − 1 bits if 1 ≤ e ≤ 2 α .When 2 α + 1 ≤ e ≤ 2 β , both techniques require β bits to record the prediction errors.However, the ACC technique requires one more bit to distinguish the sub-case.When 2 β + 1 ≤ e ≤ 2 α + 2 β , the CED technique requires eight bits to record the prediction errors whereas the ACC technique requires only β bits.Therefore, in this range, the ACC technique saves 8 − β − 1 bits.Although the analyses above are focused on case 2 of Hong et al.'s CED technique, the results also hold for case 3 in their method since the prediction error histograms are often symmetrically distributed.
Since all of the prediction errors can be obtained by pre-scanning the transformed means and differences, the bitrate can be pre-estimated prior to the encoding and embedding processes.Therefore, the best parameters, denoted by α and β, that minimize the bitrate can be simply obtained by varying their values within a small range while pre-estimating the bitrate.According to our experiments, setting 1 ≤ α ≤ 3 and 3 ≤ β ≤ 5 is sufficient to obtain the best values.Figure 3 shows the effect of the proposed ACC technique applied on the Lena image with α = 2 and β = 4.In this figure, the red dots and red circles represent that the proposed method saves β − α − 1 = 1 and 8 − β − 1 = 3 bits when recording the prediction errors, respectively.In contrast, the blue cross marks represents that one more bit is required in the proposed ACC technique to record the prediction errors.As shown in this figure, the number of red dots and circles is significantly more than that of blue cross marks, indicating that the ACC technique indeed effectively reduces the bitrate.
Symmetry 2018, 10, x FOR PEER REVIEW 6 of 14 8 1    bits.Although the analyses above are focused on case 2 of Hong et al.'s CED technique, the results also hold for case 3 in their method since the prediction error histograms are often symmetrically distributed.
Since all of the prediction errors can be obtained by pre-scanning the transformed means and differences, the bitrate can be pre-estimated prior to the encoding and embedding processes.
Therefore, the best parameters, denoted by  and  , that minimize the bitrate can be simply obtained by varying their values within a small range while pre-estimating the bitrate.According to our experiments, setting is sufficient to obtain the best values.Figure 3 shows the effect of the proposed ACC technique applied on the Lena image with 2   and 4   .In this figure, the red dots and red circles represent that the proposed method saves bits when recording the prediction errors, respectively.In contrast, the blue cross marks represents that one more bit is required in the proposed ACC technique to record the prediction errors.As shown in this figure, the number of red dots and circles is significantly more than that of blue cross marks, indicating that the ACC technique indeed effectively reduces the bitrate.

The Embedding Procedures
In this section, we describe the embedment of secret data, S, in detail.Let a b B  be the AMBTC codes to be embedded.An empty array, CS, is initialized to store the code stream.The MED predictor described in Section 2.2 is employed to predict the visited elements.The step-by-step embedding procedures are listed as follows.
Step 1: Transform the quantization levels, m  , and differences, 1 { } N i i d  , using the RIT technique, as described in Section 3.1.

The Embedding Procedures
In this section, we describe the embedment of secret data, S, in detail.Let {a i , b i , B i } N i=1 be the AMBTC codes to be embedded.An empty array, CS, is initialized to store the code stream.The MED predictor described in Section 2.2 is employed to predict the visited elements.The step-by-step embedding procedures are listed as follows.
Step 1: Transform the quantization levels, {a i } N i=1 and {b i } N i=1 , into means, {m i } N i=1 , and differences, {d i } N i=1 , using the RIT technique, as described in Section 3.1.Step 2: Visit {m i } N i=1 and {d i } N i=1 sequentially, and use the ACC technique described in Section 3.2 to find the best α and β values, such that the estimated code length is minimal.
Step 3: Convert the elements in the first row and the first columns of {m i } N i=1 and {d i } N i=1 to their eight-bit binary representations.The converted results are appended to the CS.
Step 4: Scan the rest elements in {m i } N i=1 using the raster scanning order and use the MED predictor (Equation ( 1)) to predict scanned m i .Let p i be the prediction value and calculate the prediction error e i = m i − p i .
Symmetry 2018, 10, 254 7 of 14 Step 5: Extract n-bit secret data, s j n j=1 , from S. In accordance with e i , one of the four encoding cases is applied: Case 1: If e i = 0, the bits s j n j=1 ||00 are append to the code stream, CS.Case 2: If 1 ≤ e i ≤ 2 α , append the bits, s j n j=1 ||010||(e i − 1) α 2 , to the CS, where (y) k 2 is the k − bit binary representation of y. 8  2 , to the CS.
Step 7: Use the same procedures listed in Steps 4-6 to encode the differences, {d i } N i=1 , and append the encoded result to the CS.
Step 8: Append the bitmap, {B i } N i=1 , to the CS, to construct the final code stream, CS f .We use a simple example to illustrate the embedding procedures of the proposed method.Suppose the AMBTC codes consist of 3 × 3 upper and lower quantization levels, as shown in Figure 4a,b, respectively.Let S = s j 8 j=1 = '10111001' be the secret data and suppose α = 2, β = 4. Firstly, we convert the quantization levels into means, {m i } 9 i=1 , and differences, {d i } 9 i=1 , using Equation ( 2).The converted results are shown in Figure 4c,d     , append the bits, , to the CS. ) ) ( ) i m , to the CS.
Step 7: Use the same procedures listed in Steps 4-6 to encode the differences,   To encode the means   To encode the means {m i } 9 i=1 and embed S, convert the elements in the first row and the first column (elements numbered 1, 2, 3, 4, and 7) of {m i } 9 i=1 are converted to their eight-bit binary representations.In this example, we only focus on the encoding of elements m 5 , m 6 , m 8 , and m 9 (the elements numbered 5, 6, 8, and 9).To encode m 5 = 88, we use Equation (1) to predict m 5 and obtain the prediction value, p 5 = 88.Therefore, we have the prediction error, e 5 = 0, which is classified into case 1, and the associated indicator is 00 2 .Two bits, s j 2 j=1 = 10 2 , are extracted from S, and s j 2 j=1 and the case 1 indicator are concatenated, to obtain the codes for the fifth element, 10||00.The prediction value, p 6 , of m 6 = 87 is 86, and thus, e 6 = 87 − 86 = 1.Since 1 ≤ e 6 ≤ 2 α = 4, case 2a should be applied to encode m 6 .The third and fourth bits, s j i=1 can be encoded using the similar manner, but the detailed steps for this are omitted in this example.

The Extraction and Recovery Procedures
Once the receiver has the final code stream, CS f , the embedding parameters, α, β, and n, the embedded secret data, S, can be extracted, and the original AMBTC codes, {a i , b i , B i } N i=1 , can be recovered.The detailed extraction and recovery procedures are listed as follows.
Step 1: Prepare empty arrays , and S for storing the reconstructed lower quantization levels, upper quantization levels, means, differences, and secret data, respectively.
Step 2: Read eight bits sequentially from CS f and convert them into integers.Place the converted integers in the first row and the first column of {m i } N i=1 .
Step 3: Read the next n bits from CS f and append them to S.
Step 4: Visit the unrecovered means, m i , in {m i } N i=1 using the raster scanning order.Use Equation ( 1) to predict m i , and obtain the prediction value, p i .
Step 5: Read the next two bits, t j Step 6: Perform Steps 3-5 until all the means {m i } N i=1 are recovered.
Step 7: Recover the differences {d i } N i=1 and extract data bits embedded in {d i } N i=1 .The procedures are similar to Steps 2-6.
Step 8: Extract the remaining bits in CS f and rearrange them to obtain {B i } N i=1 .Transform {m i } N i=1 and {d i } N i=1 into {a i } N i=1 and {b i } N i=1 using Equation (3); the original AMBTC codes, {a i , b i , B i } N i=1 , can be reconstructed.
We continue the example given in Section 3.3 to illustrate the procedures of the extraction of S and the recovery of the quantization levels.Suppose the first row and the first column of the means {m i } 9 i=1 have been recovered (see Figure 4c), and the code stream, CS, to be decoded is 1000||1101000||101010010||011100111110.The first two bits, t j 2 j=1 = '10', are read from CS and placed in an empty array S. The MED predictor is used to predict m 5 ; we have p 5 = 88.The next two bits are '00', and thus, the prediction error is e 5 = 0. Therefore, m 5 = p 5 = 88.The next two bits, '11', are read and appended to S. The next two bits, '01', are read from CS, and the next bit read is '0'.
Therefore, m 8 = p 8 − e 8 = 90 − 7 = 83, where p 8 is obtained using the MED predictor.Finally, two bits, '01', are read from CS and appended to S. The next two bits are '11', and thus, m 9 can be obtained by converting the next eight bit to its decimal representation; we have m 9 = 62.Therefore, the means, {m i } 9 i=1 , can be recovered (Figure 4c).Similarly, the differences, {d i } 9 i=1 , can also be recovered (Figure 4d).Finally, according to Equation (3), the original upper and lower quantization levels can be recovered from {m i } 9 i=1 and {d i } 9 i=1 .The recovered quantization levels are shown in Figure 4a,b.

Experimental Results
In this section, we conduct several experiments and compare the results with some other state-of-the-art works to evaluate the performance of the proposed method.Six 512 × 512 grayscale images shown in Figure 5, including Lena, Tiffany, Jet, Peppers, Stream, and Baboon, were used as the test images to generate the AMBTC codes.These images can be obtained from [24].
Symmetry 2018, 10, x FOR PEER REVIEW 9 of 14 p is obtained using the MED predictor.Finally, two bits, '01', are read from CS and appended to S. The next two bits are '11', and thus, 9 m can be obtained by converting the next eight bit to its decimal representation; we have 9 62. m  Therefore, the means, m  , can be recovered (Figure 4c).Similarly, the differences, d  , can also be recovered (Figure 4d).Finally, according to Equation (3), the original upper and lower quantization levels can be recovered from

Experimental Results
In this section, we conduct several experiments and compare the results with some other stateof-the-art works to evaluate the performance of the proposed method.Six 512 512  grayscale images shown in Figure 5, including Lena, Tiffany, Jet, Peppers, Stream, and Baboon, were used as the test images to generate the AMBTC codes.These images can be obtained from [24].The peak signal-to-noise ratio (PSNR) metric was used to measure the quality of AMBTC compressed image.To make a fair comparison, the pure bitrate, p BR , metric, which is defined by Total number of pixels of the original image f CS , and secret data, S, respectively.In fact, the pure bitrate measures the number of bits required to record a pixel in the original image.Therefore, this metric should be as low as possible.The peak signal-to-noise ratio (PSNR) metric was used to measure the quality of AMBTC compressed image.To make a fair comparison, the pure bitrate, BR p , metric, which is defined by

Performance Evaluation of the Proposed Method
Total number of pixels of the original image , was used to measure the embedding performance, where CS f and |S| are the length of final code stream, CS f , and secret data, S, respectively.In fact, the pure bitrate measures the number of bits required to record a pixel in the original image.Therefore, this metric should be as low as possible.

Performance Evaluation of the Proposed Method
Figure 6a,b shows the pure bitrate, BR p , versus parameter β under various α for the Lena and Baboon images.As seen from the figures, setting α = 1 and β = 3 for the Lena image achieved the lowest bitrate.However, α = 3 and β = 4 had to be set for the Baboon image to achieve the best result.This is because the prediction error histogram of the Lena image is sharper than that of the Baboon image, forcing the ACC technique to select smaller α and β values.In contrast, the prediction error histogram of the Baboon image is flatter than that of the Lena image, and thus larger α and β are selected.Table 2 gives the best  and  values for the six test images.Notice that the values of  and  tended to be smaller for smooth images than those of complex images.The proposed method classifies prediction errors into four encoding cases, and each case requires a different number of bits to record the prediction error.Figure 7 shows the distributions of these cases when the ACC technique is applied on the transformed means with 1   and 3   .In this figure, the red circles, blue crosses, green dots, and black squares representing the corresponding means are encoded using case 1, 2a or 3a, 2b or 3b, and 4, respectively.
Figure 7 shows that the smooth parts of the Lena images are filled with red circles and blue crosses, indicating that they are encoding using case 1, 2a, or 3a.The texture areas or edges of the Lena image are filled with black squares, meaning that the complex parts of the Lena image are encoded by case 4. It is interesting to note that most of the green dots are in the vicinity of the black squares.This indicates that the areas occupied by the green dots are less complex than those of black squares and thus, the corresponding means are encoded by case 2b or 3b.
The proposed method uses the RIT technique to convert quantization levels into means and differences, and the ACC technique is then applied to encode the transformed values.To see how these two techniques affect the encoding performance, the pure bitrate is plotted in Figure 8 for all test images under various combinations of techniques.In this figure, the lines labeled 'CED' indicate that the CED technique with 4   was applied on the prediction errors obtained from predicting the quantization levels.In contrast, 'CED + RIT' represents that the CED was applied on the prediction errors obtained from the proposed RIT technique.The line labeled 'ACC' indicates that the ACC technique was applied on the two quantization levels, and 'ACC + RIT' is the result when both ACC and RIT techniques have been applied.For the ACC technique, the best  and  listed in Table 2 were used.Table 2 gives the best α and β values for the six test images.Notice that the values of α and β tended to be smaller for smooth images than those of complex images.The proposed method classifies prediction errors into four encoding cases, and each case requires a different number of bits to record the prediction error.Figure 7 shows the distributions of these cases when the ACC technique is applied on the transformed means with α = 1 and β = 3.In this figure, the red circles, blue crosses, green dots, and black squares representing the corresponding means are encoded using case 1, 2a or 3a, 2b or 3b, and 4, respectively.
Figure 7 shows that the smooth parts of the Lena images are filled with red circles and blue crosses, indicating that they are encoding using case 1, 2a, or 3a.The texture areas or edges of the Lena image are filled with black squares, meaning that the complex parts of the Lena image are encoded by case 4. It is interesting to note that most of the green dots are in the vicinity of the black squares.This indicates that the areas occupied by the green dots are less complex than those of black squares and thus, the corresponding means are encoded by case 2b or 3b.
The proposed method uses the RIT technique to convert quantization levels into means and differences, and the ACC technique is then applied to encode the transformed values.To see how these two techniques affect the encoding performance, the pure bitrate is plotted in Figure 8 for all test images under various combinations of techniques.In this figure, the lines labeled 'CED' indicate that the CED technique with β = 4 was applied on the prediction errors obtained from predicting the quantization levels.In contrast, 'CED + RIT' represents that the CED was applied on the prediction errors obtained from the proposed RIT technique.The line labeled 'ACC' indicates that the ACC technique was applied on the two quantization levels, and 'ACC + RIT' is the result when both ACC and RIT techniques have been applied.For the ACC technique, the best α and β listed in Table 2 were used.
As seen from Figure 8, the bitrate obtained by the 'CED' method was the largest.When the CED together with the RIT technique was applied, all the bitrates of the six test images were reduced, meaning that the transformed means and differences are more suitable for predictive coding than the original quantization levels.However, when the ACC technique was applied on the quantization levels, the reduction in bitrate was more significant, indicating that the ACC technique reduces the bitrate considerably.The best result was achieved when both ACC and RIT techniques were applied.It is interesting to note that the improvement of smooth images is more significant than that of complex images when using the proposed 'ACC + RIT' technique.The reason for this is that the proposed method saves 1     bits when the predication errors are categorized as case 2a or 3a but requires one additional bit when they are categorized as case 2b or 3b.Since the prediction errors categorized as case 2a or 3a are more frequent in smooth images than complex images, we can infer that the proposed method works better for smooth images.As seen from Figure 8, the bitrate obtained by the 'CED' method was the largest.When the CED together with the RIT technique was applied, all the bitrates of the six test images were reduced, meaning that the transformed means and differences are more suitable for predictive coding than the original quantization levels.However, when the ACC technique was applied on the quantization levels, the reduction in bitrate was more significant, indicating that the ACC technique reduces the bitrate considerably.The best result was achieved when both ACC and RIT techniques were applied.It is interesting to note that the improvement of smooth images is more significant than that of complex images when using the proposed 'ACC + RIT' technique.The reason for this is that the proposed method saves 1     bits when the predication errors are categorized as case 2a or 3a but requires one additional bit when they are categorized as case 2b or 3b.Since the prediction errors categorized as case 2a or 3a are more frequent in smooth images than complex images, we can infer that the proposed method works better for smooth images.As seen from Figure 8, the bitrate obtained by the 'CED' method was the largest.When the CED together with the RIT technique was applied, all the bitrates of the six test images were reduced, meaning that the transformed means and differences are more suitable for predictive coding than the original quantization levels.However, when the ACC technique was applied on the quantization Symmetry 2018, 10, 254 12 of 14 levels, the reduction in bitrate was more significant, indicating that the ACC technique reduces the bitrate considerably.The best result was achieved when both ACC and RIT techniques were applied.It is interesting to note that the improvement of smooth images is more significant than that of complex images when using the proposed 'ACC + RIT' technique.The reason for this is that the proposed method saves β − α − 1 bits when the predication errors are categorized as case 2a or 3a but requires one additional bit when they are categorized as case 2b or 3b.Since the prediction errors categorized as case 2a or 3a are more frequent in smooth images than complex images, we can infer that the proposed method works better for smooth images.

Comparison with Other Works
In this section, we compare some related works that have been published recently, including method, the exclusive OR operation is employed to generate the prediction errors.In the proposed method, the best α and β that minimize the bitrate are selected for embedment.All the settings used ensured that the best performance could be achieved for each compared method.The results are shown in Table 3.In Table 3, the PSNR metric measured the image quality of AMBTC compressed images.The payload was designed such that each quantization level carried two data bits, apart from methods that can only carry one bit per quantization level (e.g., Zhang et al.'s method).The embedding efficiency is defined by S / CS f , which indicates how many secret data bits can be carried per bit of code stream.Therefore, the larger the embedding efficiency, the better the performance is.We also compared the pure bitrate for all the methods.The results show that the proposed method achieves the highest embedding efficiency with the lowest pure bitrate.This is due to the subtly transformation of the quantization levels into means and differences, and the performance of predictive coding together with the ACC technique on the transformed domain.In contrast, the other compared methods all perform the prediction on the original quantization level, and the encoding cases are not adaptively designed.Therefore, their achieved pure bitrates are higher than that of the proposed method.
To further evaluate the performance, we also performed the test on 200 grayscale images with a size of 512 × 512 obtained from [25].To make a better comparison, we obtained the pure bitrate of each method, and sorted the pure bitrates obtained by the proposed method in ascending order.
The bitrates of other compared methods were then rearranged using the sorted order of the proposed method.The results are shown in Figure 9.

Conclusions
In this paper, we proposed an efficient data hiding method for AMBTC compressed images.The quantization levels were first transformed into means and differences.The adaptive classification technique was then applied to classify the prediction errors into four cases with two sub-cases to efficiently encode the prediction errors.Because the classification range is determined by the distribution of prediction errors, the size of the resultant code stream was effectively reduced.The experimental results show that the proposed method not only successfully recovers the original AMBTC codes and extracts the embedded data, but also provides the lowest bitrate when compared to other state-of-the-art works Since the pure bitrate of a smooth image is lower than that of a complex one, we can surmise that the images on the left-hand side are smoother than the right-hand side images.As seen from Figure 9, the reduction in pure bitrate was more significant for smooth images than complex ones.Nevertheless, the proposed method offered the lowest pure bitrate for all of the 200 test images, indicating that the proposed RIT and ACC techniques are indeed more efficient than other state-of-the-art works.

Conclusions
In this paper, we proposed an efficient data hiding method for AMBTC compressed images.The quantization levels were first transformed into means and differences.The adaptive classification technique was then applied to classify the prediction errors into four cases with two sub-cases to efficiently encode the prediction errors.Because the classification range is determined by the distribution of prediction errors, the size of the resultant code stream was effectively reduced.The experimental results show that the proposed method not only successfully recovers the original AMBTC codes and extracts the embedded data, but also provides the lowest bitrate when compared to other state-of-the-art works.

Figure 1 .
Figure 1.Comparison of the distribution of prediction errors.
), e is encoded by a -bit  plus a two-bit indicator.Therefore, a total of 2  

FrequenciesFigure 1 .
Figure 1.Comparison of the distribution of prediction errors.

Figure 1 .
Figure 1.Comparison of the distribution of prediction errors.
), e is encoded by a -bit  plus a two-bit indicator.Therefore, a total of 2   bits are required to encode a case-2 prediction error.Figure2shows a typical prediction error histogram, and the distribution has a sharp peak at zero and decays exponentially towards two sides.

Figure 2 .
Figure 2. Case classification of centralized error division (CED) and the proposed adaptive case classification (ACC).


where  and  (   ) are the number of bits used to record the prediction errors of cases 2a and 2b, respectively.Similarly, case 3a ( ) are the subcases of case 3, and the ACC respectively uses  and  bits to record the prediction errors of these two cases.Finally, case 4 occurs when 2

Figure 2 .
Figure 2. Case classification of centralized error division (CED) and the proposed adaptive case classification (ACC).

Figure 3 .
Figure 3. Bit-saving comparisons of the ACC and CED techniques.
sequentially, and use the ACC technique described in Section 3.2 to find the best  and  values, such that the estimated code length is minimal.Step 3: Convert the elements in the first row and the first columns of 1  to their eight-bit binary representations.The converted results are appended to the CS.Step 4: Scan the rest elements in 1 { } N i i m  using the raster scanning order and use the MED predictor (Equation (1)) to predict scanned i m .Let i p be the prediction value and calculate the prediction error

Figure 3 .
Figure 3. Bit-saving comparisons of the ACC and CED techniques.
and append the encoded result to the CS.Step 8: Append the bitmap,1 { } N i i B  , to the CS, to construct the final code stream, f CS .We use a simple example to illustrate the embedding procedures of the proposed method.Suppose the AMBTC codes consist of 3 3 upper and lower quantization levels, as shown in Figure4a,b, respectively.Let

4 
 .Firstly, we convert the quantization levels into means, using Equation (2).The converted results are shown in Figure 4c,d.

Figure 4 .
Figure 4. Quantization levels and transformed means and differences.(a) Upper quantization levels; (b) Lower quantization levels; (c) Means and (d) Differences.

2 00
embed S, convert the elements in the first row and the first column (elements numbered 1, 2, 3, 4, and 7converted to their eight-bit binary representations.In this example, we only focus on the encoding of elements 5 m , 6 m , 8 m , and 9 m (the elements numbered 5, 6, 8, and 9).To encode 5 88 m  , we use Equation (1) to predict 5 m and obtain the prediction value, 5 88 p  .Therefore, we have the prediction error, 5 0 e  , which is classified into case 1, and the associated indicator is 1 indicator are concatenated, to obtain the codes for the fifth element, 10||00.

Figure 4 .
Figure 4. Quantization levels and transformed means and differences.(a) Upper quantization levels; (b) Lower quantization levels; (c) Means and (d) Differences.
was used to measure the embedding performance, where | | f CS and | | S are the length of final code stream,

Figure
Figure 6a,b shows the pure bitrate,
had to be set for the Baboon image to achieve the best result.This is because the prediction error histogram of the Lena image is sharper than that of the Baboon image, forcing the ACC technique to select smaller  and  values.In contrast, the prediction error histogram of the Baboon image is flatter than that of the Lena image, and thus larger  and  are selected.

Figure 6 .
Figure 6.Comparisons of pure bitrate under various combinations of  and  .

Figure 6 .
Figure 6.Comparisons of pure bitrate under various combinations of α and β.

Figure 7 .
Figure 7.The distribution of encoding cases for the Lena image.

Figure 8 .
Figure 8. Pure bitrate comparisons of combinations of techniques.

Figure 7 .
Figure 7.The distribution of encoding cases for the Lena image.

Figure 7 .
Figure 7.The distribution of encoding cases for the Lena image.

Figure 8 .
Figure 8. Pure bitrate comparisons of combinations of techniques.Figure 8. Pure bitrate comparisons of combinations of techniques.

Figure 8 .
Figure 8. Pure bitrate comparisons of combinations of techniques.Figure 8. Pure bitrate comparisons of combinations of techniques.
is shaded by

Table 1 .
Indicators and the range of cases. .
i e  Therefore, m 6 is encoded using case 2a.The next α = 2 bits, t j 2 j=1 = '00', are read from CS; we have e 6 = ( t j = 1.Since p 6 = 86, we have m 6 = p 6 + e 6 = 87.The next two bits are read from CS and appended to S. Because the next three bits from CS are '101', m 8 is encoded using case 3b.The next β = 4 bits, t j 4 j=1 = '0010', are read from CS; we have e 8 = ( t j . The next two bits are read from CS and appended to S. Because the next three bits from CS are '101', 8 m is encoded using case 3b.The next

Table 2 .
The combination of  and  to achieve the lowest bitrates.

Table 2 .
The combination of α and β to achieve the lowest bitrates.
Zhang et al.'s [16], Sun et al.'s [17], Hong et al.'s [18], and Chang et al.'s [19] methods.In Zhang et al.'s method, eight cases for coding the differences are implemented to achieve the best result.In Sun et al.'s and Hong et al.'s methods, the best embedding parameters are set to obtain the lowest bitrate, as suggested in their original paper.In Chang et al.'s

Table 3 .
Comparisons with other works.