Lossless Data Hiding in VQ Compressed Images Using Adaptive Prediction Difference Coding

Sisheng Chen; Jui-Chuan Liu; Ching-Chun Chang; Chin-Chen Chang

doi:10.3390/electronics13173532

,

and

¹

School of Big Data and Artificial Intelligence, Fujian Polytechnic Normal University, Fuzhou 350300, China

²

Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, Taiwan

³

Information and Communication Security Research Center, Feng Chia University, Taichung 40724, Taiwan

^*

Authors to whom correspondence should be addressed.

Electronics2024, 13(17), 3532;https://doi.org/10.3390/electronics13173532

This article belongs to the Special Issue Recent Advances in Information Security and Data Privacy

Version Notes

Order Reprints

Abstract

Data hiding in digital images is an important cover communication technique. This paper studies the lossless data hiding in an image compression domain. We present a novel lossless data hiding scheme in vector quantization (VQ) compressed images using adaptive prediction difference coding. A modified adaptive index rearrangement (AIR) is presented to rearrange a codebook, and thus to enhance the correlation of the adjacent indices in the index tables of cover images. Then, a predictor based on the improved median edge detection is used to predict the indices by retaining the first index. The prediction differences are calculated using the exclusive OR (XOR) operation, and the vacancy capacity of each prediction difference type is evaluated. An adaptive prediction difference coding method based on the vacancy capacities of the prediction difference types is presented to encode the prediction difference table. Therefore, the original index table is compressed, and the secret data are embedded into the vacated room. The experimental results demonstrate that the proposed scheme can reduce the pure compression rate compared with the related works.

Keywords:

lossless data hiding; vector quantization; prediction difference; adaptive encoding

1. Introduction

Data hiding in digital images is a technology that embeds additional data into cover images based on the redundancy of natural images. It is a technique used in information security to protect data from unauthorized access or modification. The typical application scenarios for data hiding are covert communication and image authentication. In covert communication, the secret data are transmitted under the cover of meaningful images or image coding streams, marking them less likely to be suspected and intercepted. In image authentication, additional authentication data are embedded into original images to protect them from unauthorized modification without increasing the communication cost.

Over the past two decades, data-hiding technology has developed rapidly [1]. There are three main research, which are data hiding in the spatial domain [2,3,4,5], data hiding in the compression domain [6,7,8], and data hiding in the encrypted images [9,10,11]. The data hiding schemes based on the spatial domain embed the secret data into cover images by directly modifying the pixel values. The classical techniques include lowest significant bit (LSB) substitutions [12], difference expansion [2,13], histogram shifting [3,14], exploiting modification direction [4], matrix embedding [15], and so on [16,17,18]. To reduce communication and storage costs, digital images are usually compressed to smaller sizes. Therefore, the data hiding schemes in compressed images are widely studied. The data hiding schemes for compressed images perform their data embedding either in the image compression code or during the process of image compression simultaneously. In this research topic, the most focused image compression methods are block truncation coding (BTC) [8], vector quantization (VQ) [19,20,21], and the joint photographic experts group (JPEG) [22,23,24]. When an image owner and a data hider are different entities, for privacy security, the image owner first encrypts the image and then sends the encrypted image to the data hider. The data hider performs the data hiding in the encrypted images [25,26,27].

Compression domain-based data hiding is gaining increasing attention because it is more aligned with the storage and transmission formats of real-world images. Vector quantization (VQ) is a lossy compression technique that garners attention due to its high compression rate, considerable decompressed image quality, and simplicity. Data hiding in VQ compressed images includes lossy and lossless methods. The lossless data hiding method in VQ compressed images can reconstruct the original index table. In [6], Chang et al. proposed a lossless data-hiding method for VQ-compressed images via joint neighboring coding. They selected a neighbor index to predict the current index and then encoded the prediction errors to eliminate the redundancies in the index table. Later, to improve the embedding capacity, Wang and Lu [28] took more neighbor indices to party into the current index prediction. However, the performance improvement in the method was limited. In [29], Kieu and Rudder applied the MED method [30] to predict an index and used indicators to label prediction-error classifications. The above methods mainly focused on improving the predictor. In [20], Hong et al. presented an adaptive index rearrangement (AIR) technique to sort the codebook and make the neighbor indices have similar index values in the rearranged neighbor indices. And, they proposed a new scheme combining the AIR technique, the least square estimator, and adaptive coding techniques to improve the embedding rate. Built on the AIR, Li et al. [21] proposed a difference-index-based method using difference transformation and mapping. Zhang et al. [31] introduced the Tabu search algorithm to search optimal rearranged indices and used a linear regression method to predict the indices. Compared to Hong et al. [20], Zhang et al. [31] achieved lower pure compression rates.

In this paper, we propose a novel lossless data hiding scheme using an adaptive prediction difference encoding to exploit the lower pure compression rate of the final VQ stream even more. We first adjust the AIR method to better suit the improved MED predictor by adding the downward diagonal direction when calculating the adjacency matrix, and by combining index selection and position determination in each iteration. Next, we use the improved MED method to predict the indices in raster scanning order based on the first original index. Then, we present an adaptive prediction difference coding method to compress the VQ stream and vacate the room for data embedding. The contributions of this paper are as follows. (1) A more efficient binary tree-based coding system is presented, and it is used for generating indicators to label different types of prediction differences. (2) An adaptive indicator assignment method, called adaptive prediction difference coding, is presented for prediction difference coding, which assigns indicators to different types of prediction differences based on the distribution of the defined vacancy capacity. (3) A novel VQ-based data hiding scheme using the adaptive prediction difference coding is proposed, which can provide a lower pure compression rate compared to related works.

The rest of the paper is organized as follows. Section 2 reviews some related works. In Section 3, we describe the details of the proposed scheme. The experimental results and analysis are shown in Section 4. We conclude this paper in Section 5.

2. Related Works

In this section, we first introduce the VQ image compression technique. Then, we briefly review some related lossless data hiding in VQ compressed images based on codebook rearrangement.

2.1. VQ-Based Image Compression

Vector quantization (VQ) [32,33] is an important image compression technique. In VQ-based image compression, the original image

I

sized of

W \times H

is initially portioned into a sequence of no-overlapping blocks with a size of

a \times b

, that is,

I = {{I B}_{i, j} | 1 \leq i \leq W / a, 1 \leq j \leq H / b}

, where

I B_{i, j}

denotes the i-th column and j-th row image block. Then, a codebook is established and is represented as

B = \{{C W}_{0}, {C W}_{2}, \dots, {C W}_{L - 1}\}

, where

C W_{i} = ({c w}_{i, 1}, {c w}_{i, 2}, \dots, {c w}_{i, a \times b})

is the code words,

L

is the size of codebook,

{c w}_{i, j} \in [0, 255]

,

0 \leq i \leq L - 1

, and

1 \leq j \leq a \times b

. The pixel values in the image block

I B_{i, j}

form a

(a \times b)

-dimensional vector, represented as

v = (p_{1}, p_{2}, \dots, p_{a \times b})

. In the image encoding phase, the distances between the pixel value vector

v

and the code word

{C W}_{i}

are calculated as follows:

d (v, C W_{i}) = {[\sum_{j = 1}^{a \times b} {(p_{j} - {c w}_{i, j})}^{2}]}^{\frac{1}{2}}

(1)

The pixel value vector

v

is then mapped to the codeword

{C W}_{i^{*}}

with minimum distance, that is

i^{*} = \underset{0 \leq i \leq L - 1}{arg min} d (v, {C W}_{i})

(2)

Finally, the codeword

C W_{i^{*}}

is the quantization result of the image block

I B_{i, j}

and can be represented by a

{l o g}_{2}^{L}

-bit binary. Once quantization operation is performed on all image blocks, an index table

I T

is generated, which is the coding result of the original image. Figure 1 shows an illustration of VQ-based image encoding, where the codebook size

L

is 256,

a

is 4, and

b

is 4. During image decoding, we obtain the codeword

C W_{i^{*}}

from the codebook to reconstruct the corresponding image block. It is obvious that the codebook plays a decisive role in determining the quality of the recovered image. A larger codebook results in better image quality after recovery, albeit with the trade-off of lower compression rates.

Figure 1. An example of the VQ-based image encoding (

L = 256

).

2.2. Lossless Data Hiding Based on VQ Codebook Rearrangement

There are two classes of VQ-based lossless data hiding methods. The first class is to embed data in original index tables. In such schemes, a stego image can be decoded from a marked index table directly according to a standard codebook. The disadvantage of the first class’s method is that the embedding capacity is low because the redundancies of the original image are eliminated after VQ encoding. The second class is data embedding based on the rearranged codebook. It applies a codebook rearrangement technique to transfer the redundancies from a natural image to its index table and then compresses the index table to vacate the room for data embedding.

In [6], Chang et al. proposed the first VQ-based data hiding utilizing codebook rearrangement. Chang et al. [6] rearranged the codewords of a standard codebook by sorting the intensity of the codewords (SIC). The index table of a cover image, generated according to the rearranged codebook, retains a certain amount of redundancy from the original image. It generated the predicted value of an index from its neighboring indices: the one to the left, above, on the main diagonal, or on the secondary diagonal. The prediction process was carried out in raster order, and predefined indicators were used to label different cases of prediction errors. The index table generated according to the rearranged codebook was based on the SIC technique. It did not fully exploit the redundancies of the indices because it did not consider adjacent relationships in the cover image. To address this issue, Hong et al. [20] presented a different codebook rearrangement technique, called adaptive index rearrangement (AIR), which considered the adjacent relationships of indices. In the process of codebook rearrangement using the AIR technique, an occurrence frequency matrix was first constructed for a given index table. An element in the matrix denoted the occurrence frequency of two corresponding indices appearing in adjacent positions. Then, the new indices of the codewords were determined through step-by-step iteration based on the occurrence frequency matrix. In [20], the index table of a cover image, generated according to the rearranged codebook using AIR, improved the spatial correlations of indices and enhanced the efficiency of prediction error coding. In [21], Li et al. proposed a two-stage joint data embedding method. They first performed steganographic embedding on the standard index table and then calculated the difference index table based on the stego standard index table. In the second stage, data hiding was performed on the difference index table. In fact, existing methods can enhance the embedding capacity by using a steganographic embedding process similar to their first-stage embedding. Therefore, the focus of this method is to evaluate the performance of data hiding on the difference index table. Zhang et al. [31] introduced the Tabu search algorithm to find the optimal rearranged indices based on the occurrence frequency matrix. Compared to AIR, the Tabu search algorithm can slightly reduce the complexity of the index table. However, it can be seen that the key to reducing the pure compression rate of an image’s VQ stream lies in the index prediction method and the prediction difference coding method. In this paper, we present an adaptive indicator assignment method according to the vacancy capacities of prediction difference types. It improves the pure compression rate of data hiding based on VQ index table compression.

3. Proposed Scheme

In this section, a lossless data hiding scheme for VQ compressed images using adaptive prediction difference encoding is proposed. We first provide an overview of the proposed scheme. Afterward, we present the detailed procedures of data embedding, data extraction, and cover image recovery. Lastly, an example is illustrated for easier understanding of the proposed scheme.

3.1. Overview

The flowchart of the proposed lossless data hiding scheme in VQ index tables is outlined in Figure 2. We first conduct a rearrangement of the codewords in the codebook based on an adjusted AIR (adAIR) technique. According to the rearranged codebook, the original image is compressed into a VQ index table. Due to redundancy characteristics in the index table, we apply the improved MED (iMED) method to predict the index values and calculate the corresponding prediction differences between the original indices and the predicted indices. Based on the vacancy capacity of each prediction difference type, we present a prediction difference coding method to encode the prediction differences, thereby vacating embedding room for data hiding. The secret data are first encrypted and then embedded into the vacated room in the index table to generate the marked VQ index table. At the receiving end, the receiver can extract data based on the coding rules of the prediction differences and decode the prediction differences. Finally, we losslessly recover the VQ index table and then decompress it to reconstruct the cover image.

Figure 2. Flowchart of the proposed scheme.

3.2. Codebook Rearrangement

The original standard codebook is denoted by

B = \{{C W}_{1}, {C W}_{2}, \dots, W_{L}\},

where

{C W}_{i} = ({c w}_{i, 1}, c w_{i, 2}, \dots, {c w}_{i, a \times b})

for

1 \leq i \leq L

.

L

is the size of the codebook and is typically a power of 2. Here, we set

L

to be

2^{t}

, where

t

is a positive integer. According to the codebook, the original image

I

with a size of

W \times H

is compressed into an index table

{I T}_{o}

using the VQ compression algorithm. Let

{I T}_{o} (i, j)

denote the value in the index table

{I T}_{o}

at the i-th row and j-th column, where

1 \leq i \leq W / a, 1 \leq j \leq H / b

, and its range is

[0, 2^{t} - 1]

. It is the index value of the codeword corresponding to the image block located in the i-th row and j-th column. We use an adjusted AIR algorithm to rearrange the codebook. The details of the adjusted AIR are descripted as follows.

Step 1. Calculate the adjacency matrix

G

with a size of

L \times L

, where the element

G (i, j)

denotes the occurrence frequence of adjacency between the index

i

and index

j

in the index table

{I T}_{o}

. The adjacent directions include the horizontal, vertical, and downward diagonal.

Step 2. Construct a new index list

S L

to record the new order of the codewords. The original index list of the codebook is denoted as

O L

, which is represented as

O L = {0,1, \dots, L - 1}

. Let us assume that

G (i_{1}, i_{2})

is the maximum value of

G

. Then,

S L

is initialed as

S L = {i_{1}, i_{2}}

and

i_{1}

and

i_{2}

are deleted from

O L

. That is,

O L

is updated as

O L = O L / {i_{1}, i_{2}}

.

Step 3. Perform iterative loops until

O L

is empty. For each iteration, use Equation (3) to search for an index

i_{L}^{*}

from

O L

that has the maximal occurrence of adjacency to the indices of the left two-thirds of

O L

, and the corresponding maximum value is represented as

α_{L}

. Meanwhile, use Equation (4) to search for an index

i_{R}^{*}

from

O L

that has the maximal occurrence of adjacency to the indices of the right two-thirds of

O L

, and the corresponding maximum value is represented as

α_{R}

. If

α_{L} \leq α_{R}

, then

i_{R}^{*}

is pushed into the right of

S L

and

O L

is updated by deleting

i_{R}^{*}

, that is,

O L = O L / {i_{R}^{*}}

; otherwise,

i_{L}^{*}

is pushed into the left of

S L

and

O L

should be updated as

O L = O L / {i_{L}^{*}}

.

i_{L}^{*} = \arg \max_{i \in OL} \sum_{j = 1}^{⌊2 |S L| / 3⌋} G (i, S L (j)) .

(3)

i_{R}^{*} = \arg \max_{i \in OL} \sum_{j = ⌊|S L| / 3⌋}^{|S L|} G (i, S L (j)) .

(4)

Here, we search for two indices that have the maximum occurrence of adjacency within the left or right two-thirds of the list

S L

. The goal is to select an index with a high occurrence of adjacencies to

S L

while determining the side to which the selected index should be added. Under this asymmetric computation, we ensure that the selected index maintains a high correlation with the side it is added to, while also maintaining a high overall correlation.

The final new index list

S L

is represented as

S L = {τ (0), τ (1), \dots, τ (L - 1)}

, where

τ

is a one-to-one mapping from

{0,1, \dots, L - 1}

to

{0,1, \dots, L - 1}

. Therefore, the rearranged codebook is denoted by

B_{s} = \{{C W}_{τ (0)}, {C W}_{τ (1)}, \dots, C W_{τ (L - 1)}\}

. According to the rearranged codebook

B_{s}

, the original image

I

is compressed into a new index table

{I T}_{s}

using the VQ compression algorithm.

3.3. Prediction Difference Coding and Data Embedding

In the original image, the adjacent image blocks exhibit a strong correlation, resulting in high correlation between their corresponding VQ indices. Due to the rearranged the codebook, similar code words are eventually brought together. Therefore, the VQ index table obtained after VQ compression also exhibits high redundancy, similar to that of the original image. Based on the redundancy of the VQ index table, we apply the improved MED [30] as the predictor to predict the indices and calculate the difference between the real indices and the predicted values. Furthermore, we present a prediction difference coding method to encode the prediction differences, thereby vacating the embedding room in the index table for data hiding.

3.3.1. Prediction Difference Calculation

For the VQ index table

{I T}_{s}

of the original image

I

, we calculate the predicted values of the indices in

{I T}_{s}

with a raster order and then compute the corresponding prediction differences to obtain a prediction difference table

D

.

{I T}_{s} (i, j)

denotes the index located in the i-th row and j-th column, where

1 \leq i \leq w / a, 1 \leq j \leq h / b

, and the predicted value

\hat{{I T}_{s}} (i, j)

of the index

{I T}_{s} (i, j)

is calculated as follows:

\hat{{I T}_{s}} (i, j) = \{\begin{array}{l} {I T}_{s} (i, j - 1), i f i = 1, j > 1 \\ {I T}_{s} (i - 1, j), i f j = 1, i > 1 \\ P ({I T}_{s} (i, j)), i f i > 1, j > 1 \end{array},

(5)

and the function

P

is defined as follows:

P ({I T}_{s} (i, j)) = \{\begin{array}{l} \min {x, y}, & i f z \geq \max {x, y} \\ \max \{x, y\}, & i f z \leq \min \{x, y\} \\ x + y - z, & otherwise \end{array},

(6)

where

x = {I T}_{s} (i, j - 1)

,

y = {I T}_{s} (i - 1, j)

, and

z = {I T}_{s} (i - 1, j - 1)

. Then, calculate the prediction difference

D (i, j)

between the real index value and the predicted value using the XOR operation as follows:

D (i, j) = \hat{{I T}_{s}} (i, j) ⨁ {I T}_{s} (i, j)

(7)

3.3.2. Prediction Difference Coding

Based on the prediction difference, we can determine the number of consecutive identical high significance bits (HSBs) between an original index and its predicted value. The number of consecutive identical HSBs between

I T (i, j)

and

\hat{I T} (i, j)

is denoted by

N (i, j)

and can be calculated as follows:

N (i, j) = t - ⌈{l o g}_{2}^{D (i, j) + 1}⌉ .

(8)

The value range of

N (i, j)

is from 0 to

t

. Therefore,

N (i, j)

can take on

(t + 1)

possible values. To reduce the cost of storing the prediction differences, we apply adaptive binary tree encoding to label the different possible values of

N (i, j)

.

Let us assume that the total number of types used to label possible values is

k

. Figure 3 shows two binary coding trees, which are designed according to whether the number of leaf nodes k is odd or even. A path from the root node to each leaf node represents a unique code that will be used to label a type of possible value. Thus, if k is an odd number, then k types of the codes are {00, 01, 100, 101,

\dots, \underset{c}{\underset{⏟}{11 \dots 1}} 00

,

\underset{c}{\underset{⏟}{11 \dots 1}} 01

,

\underset{c}{\underset{⏟}{11 \dots 1}} 10, \underset{c}{\underset{⏟}{11 \dots 1}} 110

,

\underset{c}{\underset{⏟}{11 \dots 1}} 111

}, where

c = (k - 1) / 2 - 2

; if k is an even number, then k types of codes are {00, 01, 100, 101,

\dots, \underset{c}{\underset{⏟}{11 \dots 1}} 00

,

\underset{c}{\underset{⏟}{11 \dots 1}} 01

,

\underset{c}{\underset{⏟}{11 \dots 1}} 10, \underset{c}{\underset{⏟}{11 \dots 1}} 11

,}, where

c = k / 2 - 2

. The difference between the two cases is at the bottom portions of the coding trees.

Figure 3. Binary coding tree with k types.

In our scheme, we classify the prediction differences into t types, as shown in Table 1. Type 1 denotes that the prediction difference is 0, meaning that the number of identical HSBs between the original index and its predicted values is

t

. Type 2 denotes that the number of identical HSBs is

(t - 1) .

Type

i

denotes that the prediction difference is within

[2^{i - 2}, 2^{i - 1})

, meaning that the number of identical HSBs is

t - i + 1

, where

3 \leq i \leq t - 1

. Finally, for the prediction differences greater than

2^{t - 2} - 1

, we classify them into Type

t

, where the number of identical HSBs is 0 or 1. To assign the binary tree codes to label the t types of prediction differences, we first define a vacancy capacity for the

i

-th type as follows:

T C (i) = h (i) * (t - i + 1),

(9)

where

h (i)

represents the occurrence frequency of type

i

in the entire prediction difference table

D

. Then, we sort the vacancy capacity sequence

T C

in a descend ordering to obtain a sorted sequence

(T C (σ (1)), T C (σ (2)), \dots, T C (σ (t)))

, where

σ

is a one-to-one mapping from

{1,2, \dots, t}

to

{1,2, \dots, t}

, and

T C (σ (i)) > T C (σ (j))

when

i < j

. The designed codes are assigned to the t types of prediction differences as indicators according to the order

(σ (1), σ (2), \dots, σ (t)

), which is shown in Table 2.

Table 1. Classification of the prediction differences.

Table 2. The coding rules for prediction difference types.

3.3.3. Data Embedding

In the index table

{I T}_{s}

, the index value is represented by a

t

-bit binary. According to the prediction method, we can recover an index using the prediction difference. Therefore, we only need to record the prediction differences for each index except for the first one. We first calculate vacancy capacities for each prediction difference types of the index table, and then determine the types order

(σ (1), σ (2), \dots, σ (t)

). We can reduce the storage cost of an index table and vacate the embedding room for data hiding using the adaptive prediction difference coding. In the data embedding process, we use indicators to label the types of the prediction differences and embed secret data in the vacated room.

For the index value

{I T}_{s} (i, j)

except for the first one, the data embedding procedure is presented as follows:

Step 1. Calculate the predicted value

\hat{{I T}_{s}} (i, j)

of the index value

{I T}_{s} (i, j)

by Equation (5).

Step 2. Calculate the prediction difference

D (i, j)

by Equation (7) and then determine

N (i, j)

by Equation (8), which is the number of consecutive identical HSBs between

I T (i, j)

and

\hat{I T_{s}} (i, j)

.

Step 3. Convert the prediction difference

D (i, j)

to the corresponding indicator according to the coding rules in Table 2, denoted as

H C (i, j)

, with its length denoted as

L C (i, j)

.

Step 4. Determine the number of low significant bits (LSBs) of

{I T}_{s} (i, j)

that must be retained after coding, denoted as

L R (i, j)

. Because we can be certain that the (

N (i, j)

+1)-th HSB of

{I T}_{s} (i, j)

differs from

\hat{{I T}_{s}} (i, j)

, there is no need to record the

(N (i, j)

+1)-th HSB of

{I T}_{s} (i, j)

. Therefore,

L R (i, j)

can be calculated as follows:

L R (i, j) = \{\begin{array}{l} 0, & i f N (i, j) = t \\ t - N (i, j) - 1, & i f 2 \leq N (i, j) < t . \\ t, & i f N (i, j) \leq 1 \end{array}

(10)

Retain the

L R (i, j)

LSBs of

{I T}_{s} (i, j)

, denoted as

R L S B (i, j)

.

Step 5. Calculate the number of vacated bits after prediction difference coding, denoted as

V B (i, j)

. After coding, the prediction difference is converted into two parts: an indicator and the retained LSBs. Therefore,

V B (i, j)

can be calculated as follows:

V B (i, j) = t - L R (i, j) - L C (i, j) .

(11)

If

V B (i, j)

is greater than 0, it means that we have

V B (i, j)

bits of room available for embedding secret data. If

V B (i, j)

is less than 0, we record the

| V B (i, j) |

LSBs of

I T (i, j)

in the auxiliary information sequence for index recovery during the recovery process.

Repeat step 1 to step 5 until all the indices are processed. The determined indicators of type 1 to type t are concatenated into an indicator sequence, denoted as

I S

. We use

I S

to substitute the first

| |I S| |

bits of the prediction differences and the original prediction difference bits are recorded in the auxiliary information sequence, where

| |I S| |

is the length of

I S

. It is

(t^{2} + 6 t - 8)

/4 when t is even and

(t^{2} + 6 t - 7)

/4 when t is odd. Concatenate the auxiliary information sequence with the secret data. All the data are encrypted with a data-hiding key. Then, generate the marked index

\dot{{I T}_{s}} (i, j)

after embedding

I S

into

{I T}_{s}

. If

V B (i, j) > 0

,

V B (i, j)

bits of encrypted secret data are embedded into

{I T}_{s} (i, j)

, denoted as

s (i, j)

. Concatenate the indicator

H C (i, j)

, the retained LSBs

R L S B (i, j)

, and the encrypted data

s (i, j)

to generate the t-bit of marked index value

{\dot{I T}}_{S} (i, j)

, that is

{\dot{I T}}_{S} (i, j) = H C (i, j) | | R L S B (i, j) | | s (i, j) .

(12)

If

V B (i, j) = 0

, then the t-bit of marked index value

\dot{{I T}_{s}} (i, j)

is generated as follows:

\dot{I T} (i, j) = H C (i, j) | | R L S B (i, j) .

(13)

If

V B (i, j) < 0

, the (

L R (i, j) - | V B (i, j) |

)HSBs of

R L S B (i, j)

, denoted as

R L S B_H (i, j)

, are concatenated with the indicator

H C (i, j)

to generate the marked index value

\dot{I T} (i, j)

, that is

{\dot{I T}}_{S} (i, j) = H C (i, j) | | R L S B_H (i, j) .

(14)

And, the rest

| V B (i, j) |

LSBs of

R L S B (i, j)

are recorded in the auxiliary information sequence.

Once all the encrypted secret data are embedded into the index table, the marked index table

{\dot{I T}}_{S}

is obtained. Then, the

{\dot{I T}}_{S}

and the rearranged codebook

B_{s}

consist of the marked VQ compression stream that is then transmitted to the receiver or restored.

3.4. Data Extraction and Cover Image Recovery

After receiving the marked VQ compressed stream, the receiver can obtain the marked index table

{\dot{I T}}_{S}

and the rearranged codebook

B_{s}

. We can extract the secret data and recover the original VQ index table with shared data hiding key.

According to the marked index table and the codebook, the length of the code book, that is, the value of t, can be obtained. The length of the indicators sequence

| |I S| |

can be calculated according to

t

, and then the indicators sequence

I S

can be extracted from the

| |I S| |

bits following

{\dot{I T}}_{S} (1,1)

. The prediction difference coding rule table as shown in Table 2 can be constructed afterward. The data extraction and index recovery procedure in the marked index

{\dot{I T}}_{S} (i, j)

after

I S

are described as follows:

Step 1. Convert

{\dot{I T}}_{S} (i, j)

into a

t

-bit binary. According to the coding tree, separate the indicator

H C (i, j)

from the

t

-bit binary starting from the most significant bit. The length of

H C (i, j)

is represented as

L C (i, j)

.

Step 2. According to the prediction difference coding rule table, obtain the number of consecutive identical HSBs

N (i, j)

based on the

H C (i, j)

.

Step 3. Calculate

L R (i, j)

, the length of the retained LSBs of

{I T}_{s} (i, j)

, using Equation (10). Then, calculate

V B (i, j)

, the number of vacated bits after prediction difference coding, using Equation (11).

Step 4. If

V B (i, j) > 0

, we extract

V B (i, j)

bits of message from the

V B (i, j)

LSBs of

{\dot{I T}}_{S} (i, j)

, denoted as

s (i, j)

. If

V B (i, j) \leq 0

, there is no secret bit embedded into

{\dot{I T}}_{S} (i, j)

.

When all the embedded data are extracted, we decrypted it with a data hiding key and split it into two parts: the secret message and the auxiliary information sequence. Next, recover the first

| |I S| |

bits prediction difference bits from the top

| |I S| |

bits of the auxiliary information sequence. After that, delete the top

| |I S| |

bits from the auxiliary information sequence.

We recover the index table

{I T}_{s}

in a raster scan order started from

{I T}_{s} (1,2)

. To recover the index

{I T}_{s} (i, j)

, we calculate the predicted index

\hat{{I T}_{s}} (i, j)

using the Equation (5), and then follow one of the three cases below:

Case 1: If

N (i, j) = t

, then

{I T}_{s} (i, j)

=

\hat{{I T}_{s}} (i, j)

.

Case 2: If

2 \leq N (i, j) < t

, then set the first

N (i, j)

HSBs of

{I T}_{s} (i, j)

to be the same as

\hat{{I T}_{s}} (i, j)

and set the

(N (i, j) + 1

)-th HSB of

{I T}_{s} (i, j)

to be different from

\hat{{I T}_{s}} (i, j)

. Next, if

V B (i, j) \geq 0

, then the

(t - N (i, j) - 1

) LSBs of

{I T}_{s} (i, j)

are set to

(t - N (i, j) - 1

) bits following

H C (i, j)

in

{\dot{I T}}_{S} (i, j)

; if

V B (i, j) < 0

, then the

| V B (i, j) |

LSBs of

{I T}_{s} (i, j)

are set to be the top

| V B (i, j) |

bits of the auxiliary information sequence and the middle

[t - N (i, j) - 1 - V B (i, j)]

bits of

{I T}_{s} (i, j)

are set to be the retained

(t - L C (i, j)

) bits of

{\dot{I T}}_{S} (i, j)

.

Case 3: If

N (i, j) < 2

, then set the

(t - L C (i, j)

) HSBs of

{I T}_{s} (i, j)

to be the remaining

(t - L C (i, j)

) bits of

{\dot{I T}}_{S} (i, j)

after indicator

H C (i, j)

and set the

L C (i, j)

LSBs of

{I T}_{s} (i, j)

to be the top

L C (i, j)

bits of the auxiliary information sequence.

In two later cases, we delete the corresponding top bits in the auxiliary information sequence before moving on to recover the next index.

3.5. Example Illustration

In this subsection, we present an example to illustrate the process of data embedding, data extraction, and index recovery. Let us assume that the length of the codebook is 256, that is

t = 8

. The prediction differences are classified into 8 types according to Table 1. Based on the coding tree with

k = 8

and the vacancy capacities of prediction different types, we can generate the coding rules for prediction differences. We assume that

(σ (1), σ (2), \dots, σ (8)

) is (1, 2, 3, 4, 5, 6, 7, and 8), and the corresponding coding rules are as shown in Table 3. We use the set of indicators {00, 01, 100, 101

, 1100

,

1101

,

1110

, and

1111

} to label Type 1 through Type 8. For Type 8, we use ‘1111’ to indicate the case where the number of consecutive identical HSBs is either 0 or 1. Here, we also state the corresponding numbers of retained LSBs and the vacated bits after prediction difference encoding. We illustrate the secret data embedding only. The indicator sequence and the auxiliary information sequence are excluded.

Table 3. Coding rules of prediction differences with t = 8.

Assuming the index table

{I T}_{s}

is (21, 27, 27, 30; 18, 21, 21, 31; 18, 21, 31, 30; 20, 21, 29, 29) the procedure of data embedding is shown in Figure 4. The predicted index table

\hat{{I T}_{s}}

is (21, 21, 27, 27; 21, 24, 21, 30; 18, 21, 31, 31; 18, 20, 31, 31), where the first index remains unchanged. Calculate the prediction difference table

D

by Equation (7), and then calculate the numbers of the consecutive identical HSBs between the original index values and the predicted index by Equation (8), where the results are shown in N. Based on the numbers of identical HSBs, we determine the corresponding indicator, the number of retained LSBs, and the vacated bits after prediction difference encoding, which are shown in

H C

,

L R

, and

V B

, respectively. Because each

V B (i, j) > 0

, where

1 \leq i, j \leq 4

and

(i, j) \neq (1,1)

, the marked index

{\dot{I T}}_{S} (i, j)

consists of

H C (i, j)

,

L R (i, j)

-LSBs of

{I T}_{s} (i, j)

, and

V B (i, j)

bits of encrypted secret messages. The final marked index value table is

{\dot{I T}}_{S}

.

Figure 4. Example illustration of data embedding (t = 8).

The illustration of the data extraction and the index recovery is shown in Figure 5. The marked indices are converted into 8-bit binaries. Assume that the indicator sequence has been extracted. We can obtain the indicator of each index and then determine the number of retained LSBs in each index, which are shown in

H C

and

L R

, respectively. Based on the HC and

L R

, we can calculate the number of secret data bits embedded in each marked index as shown in

V B

. According to

V B

, we extract the embedded message from the corresponding LSBs of marked indices. After the data extraction, we can recover the original indices in a raster scan ordering. Let us assume that the indices before

{I T}_{s} (2,2)

has been recovered. To recover

{I T}_{s} (2,2)

, we first calculate the predicted index

\hat{{I T}_{s}} (2,2)

of

{I T}_{s} (2,2)

, which is 24. Convert

{\dot{I T}}_{S} (2,2)

and

{I T}_{s} (2,2)

into 8-bit binaries, respectively. According to the indicator sequence, we know the indicator

H C (2,2)

=

{(110)}_{2}

. Then, based on the coding rules, we obtain

N (2,2)

= 4 and

L R (2,2)

= 3. Because

N (2,2)

= 4, the 4 HSBs of

{I T}_{s} (2,2)

are the same as the bits of

\hat{{I T}_{s}} (2,2)

, and the 5th HSB differs from the 5-th HSB of

\hat{{I T}_{s}} (2,2)

. Since

L R (2,2)

= 3, we can determine that the 3 LSBs of

{I T}_{s} (2,2)

are the 3 bits following

H C (2,2)

in

{\dot{I T}}_{S} (2,2)

. Therefore,

{I T}_{s} (2,2) = {(00010101)}_{2}

= 21. Next, we can move on to recover the next index until we reach

{I T}_{s} (4,4)

, which is the final index.

Figure 5. Illustration of data extraction and index recovery.

4. Experimental Results and Analysis

To evaluate the performance of the proposed scheme, certain experiments were conducted. We first evaluated the correlations of the indices after the codebook rearrangement. Next, we analyzed the distribution of the prediction differences to evaluate the efficiency of the predictor. Finally, we evaluated the pure compression rate and the embedding rate of proposed scheme.

4.1. Experiments Setting

In the experiments, we took eight typical standard images sized of

512 \times 512

from the USC-SIPI image database [34], including Airplane, Lena, Tiffany, Peppers, Lake, Boat, Baboon, and Goldhill, and the 24 images from Kodak dataset [35] as the test images to evaluate the performance of the proposed scheme. The size of the codebook was set to 128, 256, 512, and 1024. The block size was set to

4 \times 4

. Three metrics are used to evaluate the performance of the proposed scheme, consisting of the compression rate, pure compression rate, and embedding rate. The compression rate

C R

is defined as

| | \dot{I T} | | / (W \times H)

(bpp), where

| |\dot{I T}| |

is the final size of the VQ index table and the

W \times H

is the size of the original cover image. Pure compression rate

P C R

is defined as the ratio of the sum of the size of all indicators and auxiliary information to the size of the cover image.

P C R

is used to evaluate the efficiency of a predictor and prediction difference coding rules, and the lower

P C R

indicates better performance. The embedding rate that denotes the average number of secret bits is embedded into each index.

4.2. Performance Analysis

In experiments, we projected all the codewords onto a base vector; we initially rearranged the codewords by sorting the projected values and then used the adAIR to obtain a codebook consisting the final rearranged codewords. Under the codebook rearrangement, we expect to enhance the correlations between adjacent indices and benefit the iMED prediction. We apply the average block complexity

C

to evaluate the correlations between adjacent indices, which is defined as follows:

C = \frac{1}{(W - 1) * (H - 1)} \sum_{i = 2}^{W} \sum_{j = 2}^{H} c (i, j),

(15)

and

c (i, j) = \frac{1}{3} (|I T (i, j) - I T (i, j - 1)| + |I T (i, j) - I T (i - 1, j)| + |I T (i, j) - I T (i - 1, j - 1)|),

(16)

where

I T (i, j)

denotes the index that locates at

(i, j)

in the index table

I T

.

c (i, j)

denotes the average difference in adjacent indices in the directions of left, upper, and left upper diagonal. Figure 6 shows the visualizations of the index tables using the standard codebook and the rearranged codebook on six test images, with a codebook size of 256. The first one is based on the standard codebook, and the second one is based on the rearranged codebook for each test image. We also present the corresponding average block complexity

C

for each index table. To further demonstrate efficiency of the codebook rearrangement on reducing the block complexity, we also conduct the experiments on 24 Kodak images as shown in Figure 7. From the experiment results, we can see that the correlation of adjacent indices in index table is significantly enhanced after the codebook rearrangement, which benefits the index prediction.

Figure 6. Visualize the index tables based on the standard codebook and the rearranged codebook of size 256 on test images.

Figure 7. Average block complexity comparison on index tables using the standard codebook and the rearranged codebook on 24 Kodak images.

We assign indicators to each prediction difference type based on the vacancy capacities. Indicators encoded by the binary tree coding indicators are assigned in descending order of vacancy capacities. In Figure 8, we show the vacancy capacities of each prediction difference type on four test images using different codebook sizes. The codebook size is 256, and the vacancy capacities of prediction difference types for the test images decrease from left to right in general. The codebook’s size is 512 or 1024, the vacancy capacities vary across different test images. Type 1 has the largest vacancy capacity in most test images among all types. The size relationships of vacancy capacities of other types vary from different test images. Therefore, it is necessary to adaptively assign the indicators for different cover images based on their vacancy capacities.

Figure 8. Vacancy capacities of each prediction type on test images.

In the proposed scheme, the maximum length of an indicator is

t / 2

when

t

is even, and

(t + 1) / 2

when

t

is odd, where

t

is the length of the binary representation of the index. Therefore, we can vacate the room for data embedding in index

I T (i, j)

after prediction difference coding when

N (i, j) > ⌈t / 2⌉ - 1

, without caring which indicator is assigned to indicate its prediction difference. The proposed indicators assignment strategy can maximize the embedding capacity. According to the coding of the Equations (12)–(14) and when we keep the same compression rate for the original VQ compression stream, the embedding rate

E R

and the pure compression rate

P C R

of the proposed scheme can be calculated as follows:

E R = \frac{1}{W / 4 \times H / 4} \sum_{i = 1}^{t} h (i) \times (t - L (i) - R (i))],

(17)

and

P C R = [\sum_{i = 1}^{t} h (i) \times (L (i) + R (i))] / (W \times H),

(18)

where

h (i)

represents the occurrence frequency of type

i

in the whole prediction difference table,

L (i)

represents the length of the indicator for type

i

, and

R (i)

represents the number of the LSBs that must be retained after encoding type

i

. We evaluate the embedding rate and the pure compression rate on four stander test images and 24 Kodak images. Figure 9 shows embedding rates on six standard test images with different codebook sizes. In the experiment, the corresponding compression rate is 0.4375, 0.5000, 0.5625, and 0.6250 when the codebook sizes are 128, 256, 512, and 1024, respectively. That is, the final length of a VQ compression stream after data embedding is the same size as the original cover index table. When the codebook size is 256 or 512, we can obtain a higher embedding rate. We can also observe that Tiffany and Airplane have higher embedding rates. This is because, after the codebook rearrangement, their index tables have lower complexity, which is validated by the experimental results in Figure 6. Figure 10 shows the pure compression rate on the six standard test images. We know the upper bound of PCRs are 0.4375, 0.5000, 0.5625, and 0.6250 when the codebook sizes are 128, 256, 512, and 1024, respectively. The results show that the proposed scheme can vacate sufficient room for data embedding. To further demonstrate the performances, we also test the proposed scheme on the 24 Kodak images, and the results are shown in Figure 11. The average embedding rates are 3.25 and 3.20, and the corresponding average pure compression rate are 0.2971 and 0.3625 when the codebook sizes are 256 and 512, respectively.

Figure 9. Embedding rates of test images under four different codebook sizes.

Figure 10. Pure compression rates of test images under four codebook sizes.

Figure 11. Performances on 24 Kodak images with codebook sizes of 256 and 512.

4.3. Comparison Experiments

In this subsection, we do performance comparisons with some related works, including Kieu and Rudder [29], Hong et al. [20], Li et al. [21], and Zhang et al. [31]. The main idea of such schemes is to rearrange the codebook and then compress the original VQ compression stream based on the redundancy of adjacent indices, thereby reducing the pure compression rate. Therefore, the lower pure compression rate indicates better performance. We show the experimental results on the pure compression rates of related works on eight standard test images in Table 4. Kieu and Rudder [29] used the SIC [6] method to resort to the codebook and used a 2- or 3-bit indicator to label the prediction errors. SIC [6] did not fully exploit the correlations of the index table. Therefore, the pure compression rates on complex images such as Baboon are not ideal. Hong et al. [20] used the AIR technique to improve the performance of the complex images. Schemes by Li et al. [21] and Zhang et al. [31] as well as the proposed scheme all continued the work based on the AIR technique. Li et al. [21] conducted two-stage data embedding in cover images. In order to ensure fairness in the comparison, we only consider data hiding in the difference index table. In the experiments, we applied the difference index table to replace our prediction difference table, while using the AIR to rearrange the codebook. The experimental results show that the proposed method has lower pure compression rates on test images compared to when conducted on difference index tables [21]. Compared to Zhang et al. [31], the proposed method achieves lower pure compression rates on test images, except for Baboon. Although Zhang et al. [31] used a weight-controlled predictor, it is difficult to match different regions with a group of weights. To further demonstrate the performance of the proposed method, we perform comparison experiments on 24 Kodak images with the method proposed by Zhang et al. [31]. In the experiments, we resized the images to

512 \times 512

as well. The experimental results are shown in Figure 12. Among the 24 test images, our method achieves a lower compression rate on 19 of them. The average pure compression rate of our method is 0.2972, while for Zhang et al. it is 0.3034. The comprehensive experimental results indicate that the proposed method is efficient and can provide a satisfactory pure compression rate on images except those with highly complex textures.

Table 4. Comparison of the pure compression rates of related works under codebook size is 256.

Figure 12. Comparison of the pure embedding rate over 24 Kodak images with Zhang et al. [31].

5. Conclusions

In this paper, we present a lossless data hiding scheme for VQ compressed images using an adaptive prediction difference encoding method. To enhance the correlations of adjacent indices, we used a modified AIR technique to rearrange the codebook. For the index tables based on a rearranged codebook, the improved MED predictor is used to generate the prediction differences table using the XOR operations. We define a vacancy capacity and present a coding tree to adaptively encode the prediction differences for different cover images. The prediction difference indicators, the retained bits, and the encrypted data are concatenated to generate marked VQ index tables. The experimental results on the adjacent indices show the efficiency of the codebook rearrangement. The vacancy capacity distribution analysis shows the necessity of adaptive indicator assignments. The experiments on the embedding rate and the pure compression rate show that the proposed data hiding scheme can obtain satisfying performance and outperform some related works.

Author Contributions

Conceptualization, C.-C.C. (Chin-Chen Chang), J.-C.L. and S.C.; methodology, S.C.; software, S.C.; validation, C.-C.C. (Chin-Chen Chang), J.-C.L. and S.C.; formal analysis, C.-C.C. (Ching-Chun Chang) and C.-C.C. (Chin-Chen Chang); investigation, S.C. and J.-C.L.; resources, C.-C.C. (Chin-Chen Chang); data curation, S.C.; writing—original draft preparation, S.C.; writing—review and editing, J.-C.L. and C.-C.C. (Chin-Chen Chang); visualization, S.C. and J.-C.L.; supervision, C.-C.C. (Ching-Chun Chang) and C.-C.C. (Chin-Chen Chang); project administration, C.-C.C. (Ching-Chun Chang) and C.-C.C. (Chin-Chen Chang). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All of the data generated during this study are included in this published article. The datasets analyzed during the current study are available in the USC-SIPI and Kodak repositories, which can be accessed at the following links: http://sipi.usc.edu/database/ (accessed on 1 September 2020) and http://www.r0k.us/graphics/kodak/ (accessed on 4 October 2022).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shi, Y.Q.; Li, X.; Zhang, X.; Wu, H.T.; Ma, B. Reversible data hiding: Advances in the past two decades. IEEE Access 2016, 4, 3210–3237. [Google Scholar] [CrossRef]
Tian, J. Reversible Data Embedding Using a Difference Expansion. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 890–896. [Google Scholar] [CrossRef]
Ni, Z.; Shi, Y.Q.; Ansari, N.; Su, W. Reversible data hiding. IEEE Trans. Circuits Syst. Video Technol. 2006, 16, 354–361. [Google Scholar] [CrossRef]
Zhang, X.; Wang, S. Efficient steganographic embedding by exploiting modification direction. IEEE Commun. Lett. 2006, 10, 781–783. [Google Scholar] [CrossRef]
Xiong, X. An Adaptive Bit Allocation Strategy for Minimizing Embedding Distortion in Interpolated Images Used for Reversible Data Hiding. IEEE Internet Things J. 2024, 11, 20088–20098. [Google Scholar] [CrossRef]
Chang, C.C.; Kieu, T.D.; Wu, W.C. A lossless data embedding technique by joint neighboring coding. Pattern Recognit. 2009, 42, 1597–1603. [Google Scholar] [CrossRef]
Zhang, Y.; Luo, X.; Yang, C.; Ye, D.; Liu, F. A framework of adaptive steganography resisting JPEG compression and detection. Secur. Commun. Netw. 2016, 9, 2957–2971. [Google Scholar] [CrossRef]
Zhang, X.; Pan, Z.; Zhou, Q.; Fan, G.; Dong, J. A reversible data hiding method based on bitmap prediction for AMBTC compressed hyperspectral images. J. Inf. Secur. Appl. 2024, 81, 103697. [Google Scholar] [CrossRef]
Puech, W.; Chaumont, M.; Strauss, O. A reversible data hiding method for encrypted images. In Security, Forensics, Steganography, and Watermarking of Multimedia Contents X; SPIE: San Jose, CA, USA, 2008; Volume 6819, pp. 534–542. [Google Scholar] [CrossRef]
Zhang, X. Reversible data hiding in encrypted image. IEEE Signal Process. Lett. 2011, 18, 255–258. [Google Scholar] [CrossRef]
Fu, Z.; Chai, X.; Tang, Z.; He, X.; Gan, Z.; Cao, G. Adaptive embedding combining LBE and IBBE for high-capacity reversible data hiding in encrypted images. Signal Process. 2024, 216, 109299. [Google Scholar] [CrossRef]
Chan, C.K.; Cheng, L.M. Hiding data in images by simple LSB substitution. Pattern Recognit. 2004, 37, 469–474. [Google Scholar] [CrossRef]
He, W.; Cai, Z. Reversible Data Hiding Based on Dual Pairwise Prediction-Error Expansion. IEEE Trans. Image Process. 2021, 30, 5045–5055. [Google Scholar] [CrossRef] [PubMed]
Kim, S.; Qu, X.; Sachnev, V.; Kim, H.J. Skewed Histogram Shifting for Reversible Data Hiding Using a Pair of Extreme Predictions. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 3236–3246. [Google Scholar] [CrossRef]
Fridrich, J.; Soukal, D. Matrix embedding for large payloads. IEEE Trans. Inf. Forensics Secur. 2006, 1, 390–395. [Google Scholar] [CrossRef]
Zhang, T.; Li, X.; Qi, W.; Guo, Z. Location-Based PVO and Adaptive Pairwise Modification for Efficient Reversible Data Hiding. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2306–2319. [Google Scholar] [CrossRef]
Wang, Y.; Xiong, G.; He, W. High-capacity reversible data hiding in encrypted images based on pixel-value-ordering and histogram shifting. Expert Syst. Appl. 2023, 211, 118600. [Google Scholar] [CrossRef]
Wu, Y.; Hu, R.; Xiang, S. PVO-based Reversible Data Hiding Using Global Sorting and Fixed 2D Mapping Modification. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 618–631. [Google Scholar] [CrossRef]
Qin, C.; Chang, C.C.; Chiu, Y.P. A novel joint data-hiding and compression scheme based on SMVQ and image inpainting. IEEE Trans. Image Process. 2014, 23, 969–978. [Google Scholar] [CrossRef]
Hong, W.; Zhou, X.; Lou, D.C.; Chen, T.S.; Li, Y. Joint image coding and lossless data hiding in VQ indices using adaptive coding techniques. Inf. Sci. 2018, 463–464, 245–260. [Google Scholar] [CrossRef]
Li, Y.; Chang, C.C.; He, M. High Capacity Reversible Data Hiding for VQ-Compressed Images Based on Difference Transformation and Mapping Technique. IEEE Access 2020, 8, 32226–32245. [Google Scholar] [CrossRef]
Mobasseri, B.G.; Berger, R.J.; Marcinak, M.P.; Naikraikar, Y.J. Data embedding in JPEG bitstream by code mapping. IEEE Trans. Image Process. 2010, 19, 958–966. [Google Scholar] [CrossRef]
Tang, W.; Yao, H.; Le, Y.; Qin, C. Reversible data hiding for JPEG images based on block difference model and Laplacian distribution estimation. Signal Process. 2023, 212, 109130. [Google Scholar] [CrossRef]
Weng, S.; Zhou, Y.; Zhang, T.; Xiao, M.; Zhao, Y. Reversible Data Hiding for JPEG Images With Adaptive Multiple Two-Dimensional Histogram and Mapping Generation. IEEE Trans. Multimed. 2023, 25, 8738–8752. [Google Scholar] [CrossRef]
Ma, K.; Zhang, W.; Zhao, X.; Yu, N.; Li, F. Reversible data hiding in encrypted images by reserving room before encryption. IEEE Trans. Inf. Forensics Secur. 2013, 8, 553–562. [Google Scholar] [CrossRef]
Chen, S.; Chang, C.C. Reversible data hiding in encrypted images using block-based adaptive MSBs prediction. J. Inf. Secur. Appl. 2022, 69, 103297. [Google Scholar] [CrossRef]
Yang, Y.; Chen, F.; Tai, H.M.; He, H.; Qu, L. Reversible data hiding in encrypted image based on key-controlled balanced Huffman coding. J. Inf. Secur. Appl. 2024, 84, 103833. [Google Scholar] [CrossRef]
Wang, J.X.; Lu, Z.M. A path optional lossless data hiding scheme based on VQ joint neighboring coding. Inf. Sci. 2009, 179, 3332–3348. [Google Scholar] [CrossRef]
Kieu, T.D.; Rudder, A. A reversible steganographic scheme for VQ indices based on joint neighboring and predictive coding. Multimed. Tools Appl. 2016, 75, 13705–13731. [Google Scholar] [CrossRef]
Weinberger, M.J.; Seroussi, G.; Sapiro, G. From LOCO-I to the JPEG-LS standard. IEEE Int. Conf. Image Process. 1999, 4, 68–72. [Google Scholar] [CrossRef]
Zhang, T.; Weng, S.; Wu, Z.; Lin, J.; Hong, W. Adaptive encoding based lossless data hiding method for VQ compressed images using tabu search. Inf. Sci. 2022, 602, 128–142. [Google Scholar] [CrossRef]
Linde, Y.; Buzo, A.; Gray, R.M. An algorithm for vector quantization. IEEE Trans. Commun. 1980, 28, 84–95. [Google Scholar] [CrossRef]
Nasrabadi, N.M.; King, R.A. Image Coding Using Vector Quantization: A Review. IEEE Trans. Commun. 1988, 36, 957–971. [Google Scholar] [CrossRef]
Weber, A.G. The USC-SIPI Image Database: Version 5; USC Viterbi School Eng.; Signal Image Processing Institute: Los Angeles, CA, USA, 2006. [Google Scholar]
Kodak. Kodak Lossless True Color Image Suite. Available online: http://r0k.us/graphics/kodak/index.html (accessed on 4 October 2022).

Figure 1. An example of the VQ-based image encoding (

L = 256

).

Figure 2. Flowchart of the proposed scheme.

Figure 3. Binary coding tree with k types.

Figure 4. Example illustration of data embedding (t = 8).

Figure 5. Illustration of data extraction and index recovery.

Figure 6. Visualize the index tables based on the standard codebook and the rearranged codebook of size 256 on test images.

Figure 7. Average block complexity comparison on index tables using the standard codebook and the rearranged codebook on 24 Kodak images.

Figure 8. Vacancy capacities of each prediction type on test images.

Figure 9. Embedding rates of test images under four different codebook sizes.

Figure 10. Pure compression rates of test images under four codebook sizes.

Figure 11. Performances on 24 Kodak images with codebook sizes of 256 and 512.

Figure 12. Comparison of the pure embedding rate over 24 Kodak images with Zhang et al. [31].

Table 1. Classification of the prediction differences.

Type	1	2	$3$	4	$\dots$	t − 2	t − 1	t
Prediction difference	0	1	$[2,4)$	$[4,7)$	$\dots$	${[2}^{t - 4}, 2^{t - 3})$	${[2}^{t - 3}, 2^{t - 2})$	$\geq 2^{t - 2} - 1$
Number of identical HSBs	$t$	$t - 1$	$t - 2$	$t - 3$	$\dots$	3	2	$\leq 1$
Frequency	$h (1)$	$h (2)$	$h (3)$	$h (4)$	$\dots$	$h (t - 2)$	$h (t - 1)$	$h (t)$

Table 2. The coding rules for prediction difference types.

Type	$σ (1)$	$σ (2)$	$σ (3)$	$σ (4)$	$\dots$	$σ (t - 3)$	$σ (t - 2)$	$σ (t - 1)$	$σ (t)$
Indicator (t is even)	00	01	100	101	$\dots$	$\underset{t / 2 - 2}{\underset{⏟}{11 \dots 1}} 00$	$\underset{t / 2 - 2}{\underset{⏟}{11 \dots 1}} 01$	$\underset{t / 2 - 2}{\underset{⏟}{11 \dots 1} 10}$	$\underset{t / 2 - 2}{\underset{⏟}{11 \dots 1} 11}$
Indicator (t is odd)	00	01	100	101	$\dots$	$\underset{(t - 1) / 2 - 2}{\underset{⏟}{11 \dots 1} 01}$	$\underset{(t - 1) / 2 - 2}{\underset{⏟}{11 \dots 1} 11}$	$\underset{(t - 1) / 2 - 2}{\underset{⏟}{11 \dots 1} 110}$	$\underset{(t - 1) / 2 - 2}{\underset{⏟}{11 \dots 1} 111}$

Table 3. Coding rules of prediction differences with t = 8.

Type ID	Identical HSBs	Indicator	Retained LSBs	Vacated Room (bits)
1	8	00	0	6
2	7	01	0	6
3	6	100	1	4
4	5	101	2	3
5	4	1100	3	1
6	3	1101	4	0
7	2	1110	5	−1
8	$\leq 1$	1111	8	−4

Table 4. Comparison of the pure compression rates of related works under codebook size is 256.

Image	Airplane	Lena	Tiffany	Peppers	Boat	Baboon	Goldhill
Kieu and Rudder [29]	0.307	0.349	0.283	0.335	0.342	0.492	0.389
Hong et al. [20]	0.317	0.308	0.260	0.316	0.345	0.443	0.351
Li et al. [21] (difference index)	0.316	0.321	0.248	0.316	0.343	0.448	0.335
Zhang et al. [31]	0.316	0.302	0.256	0.309	0.339	0.437	0.341
Proposed	0.298	0.281	0.234	0.299	0.324	0.440	0.328

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Lossless Data Hiding in VQ Compressed Images Using Adaptive Prediction Difference Coding

Abstract

1. Introduction

2. Related Works

2.1. VQ-Based Image Compression

2.2. Lossless Data Hiding Based on VQ Codebook Rearrangement

3. Proposed Scheme

3.1. Overview

3.2. Codebook Rearrangement

3.3. Prediction Difference Coding and Data Embedding

3.3.1. Prediction Difference Calculation

3.3.2. Prediction Difference Coding

3.3.3. Data Embedding

3.4. Data Extraction and Cover Image Recovery

3.5. Example Illustration

4. Experimental Results and Analysis

4.1. Experiments Setting

4.2. Performance Analysis

4.3. Comparison Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics