Two-Layer Reversible Data Hiding for VQ-Compressed Images Based on De-Clustering and Indicator-Free Search-Order Coding

: During transmission of digital images, secret messages can be embedded using data hiding techniques. Such techniques can transfer private secrets without drawing the attention of eavesdroppers. To reduce the amount of transmitted data, image compression methods are widely applied. Hiding secret data in compressed images is a hot issue recently. In this paper, we apply the de-clustering concept and the indicator-free search-order coding (IFSOC) technique to hide information into vector quantization (VQ) compressed images. Experimental results show that the proposed two-layer reversible data hiding scheme for IFSOC-encoded VQ index table can hide a large amount of secret data among state-of-the-art methods with a relatively lower bit rate and high security.


Introduction
In the era of fifth generation (5G) mobile networks, information security draws much more attention than ever and various data hiding schemes and secure communication techniques have been proposed [1][2][3][4]. The data hiding technique embeds secret messages into a carrier, such as an audio, a video, or a digital image. Upon receiving the marked carrier, the receiver extracts the secret data and recovers the carrier. Among the existing data hiding methods, not all of them can losslessly recover the carrier. A method that can perfectly recover a carrier is classified as a reversible data hiding (RDH) method [5][6][7][8]; otherwise, it is irreversible [9][10][11][12].
The most popular carrier is digital images. Since there are various compression formats for digital images, including lossy compression [13][14][15] and lossless compression [16,17], an efficient data hiding techniques should be designed according to the features of the target format. In this research, we focus on the data embedding of VQ compressed images.
Before VQ compression, a codebook should be established first. The LBG (Lindo-Buzo-Gray) algorithm [18] is a commonly applied technique to train a codebook. The compression process starts with partitioning the given image into disjoint blocks. Then, each block is encoded by finding its best fitting codeword in the codebook and represent the block with the index of the found codeword. As a result, the given image is encoded into an index table, which is also called the VQ compressed code.
Some methods have been proposed to further compress the VQ code [16,19]. In 1992, Kim proposed side-match vector quantization (SMVQ) [19]. Only the first row and the first column of the index table are recorded. In the decompression phase, blocks of the first row and the first column are decompressed first. Then, the remaining image blocks are predicted by side-matching with the codewords in the codebook. The compress ratio is very high. However, the quality of the decompressed image is poor.
In 2004, Chang et al. proposed a reversible scheme [20] to embed secret data in SOCencoded VQ indices. The indicator bit of each SOC-encoded index is directly replaced by a secret bit. When a flip of indicator bit is encountered, its following format of index code should be exchanged correspondingly. The embedding rate is satisfactory; however, exchange of index format severely expands the code length.
After that, improved versions [21][22][23] were proposed. In the original version of SOC data hiding, to switch an uncompressible code into a camouflaged compressed code, a reserved code is applied followed by the original index. Which makes a great expansion of file size and reduces the ability of SOC compression. In 2011, Rahmani et al. proposed an improved version [21], which leaves the uncompressible index unembedded and thus releases the reserved code. In 2016, Qin and Hu proposed a data hiding scheme [23] based on the improved search-order coded (ISOC) VQ indices. The compression ability and the embedding capacity can be further improved. These studies make different tradeoffs between embedding rate and file expansion based on the same scheme of indicator bit replacement. In recent years, RDH for VQ index table based on different kinds of compression methods for VQ have been proposed [24][25][26][27].
Another series of RDH for VQ index table is to apply the de-clustering concept, which is proposed by Chang et al. at 2006 [28]. With the help of SMVQ, indices of VQ can switch between a dissimilar pair to embed a secret bit without expanding the code length. Later, different kinds of clustering techniques [29,30] have been proposed to improve the hiding capacity of the de-clustering approach.
In this paper, we incorporate the de-clustering technique with an SOC-based method to improve data hiding capacity and information security. A two-layer RDH scheme for VQ index table is proposed. The de-clustering concept is applied to embed the first layer of secret data. Then, an indicator-free SOC is applied to compress the de-clustered index table. During compression, the second layer of secret data is embedded to camouflage a regular VQ index table. Experimental results demonstrate that the proposed scheme can hide a satisfactory amount of secret data with high security as expected.
The rest of this paper is organized as follows. VQ compression and SOC encoding techniques are introduced in Section 2. The proposed two-layer RDH scheme is presented in Section 3. Experimental results and comparisons with related works are provided in Section 4. Finally, conclusions are made in Section 5.

Related Work
The subject of our embedding scheme is SOC-encoded VQ index table. To know its features, we first introduce the VQ image compression and its corresponding SOC method.

VQ Image Compression
Vector quantization (VQ) is a simple and efficient image compression technique. It has a high compression ratio with an adjustable bit rate and fidelity. Therefore, it is very commonly applied in real applications.
Before compression, a codebook should be established. Firstly, three to five typical images are selected. Ideally, both smooth and complex images are included. Then, each image is divided into mutually exclusive blocks of size w × h. Each of the blocks is treated as a one-dimensional vector with K = w × h tuples. According to the desired codebook size n, 256 for example, the overall blocks are clustered into n groups. Finally, the mean vector of each vector group is recorded as a codeword. The collection of n codewords forms a codebook.
The compression process is illustrated in Figure 1. The image to be compressed is divided into blocks of size w × h in the same way as the codebook training process. Then, each block is rearranged into a vector and compared with the codewords in the codebook. The serial number of the closest codeword is recorded into the index table. As a result, the given image is compressed into an index table. given image is compressed into an index table.
A typical setting of parameters is to compress image blocks sized 4 × 4 with a book of length 256. Under such circumstance, the compression ratio is 16:1. To get a fidelity of approximation, codebooks with length 512 or 1024 are also applicab course, the compression ratio is reduced at the same time.

Search-Order Coding (SOC)
In 1996, [17] utilized the high similarity between index values within a local r to further compress the VQ index table. The process is illustrated in Figure 2. Assu that the black-dotted element is the current pixel under processing. Since the indic recorded in the raster scan order, the indices in the gray area are available to be ref The order of searching is as labeled in Figure 3. When a matched index is found wi predefined search range, the current index is encoded with the serial number matched index. The serial number of indices is assigned along the searching path repeated index values skipped to improve the coding capacity.
To distinguish between SOC representable and non-representable indices, an tional indicator bit is prefixed to an SOC serial number or an original index. The ex RDH schemes for SOC-encoded VQ indices focus on replacing the indicator bit w secret bit and modifying its following code to fit the data type indicated. In cases the indicator bit mismatches with the secret bit, the succeeding code may be signifi lengthened. An example of SOC is illustrated in Figure 3, where the index 36 at the center is processing and the gray portion is the referable indices that have been compressed e Suppose we apply four bits to represent the serial number of indices along the search A typical setting of parameters is to compress image blocks sized 4 × 4 with a codebook of length 256. Under such circumstance, the compression ratio is 16:1. To get a better fidelity of approximation, codebooks with length 512 or 1024 are also applicable. Of course, the compression ratio is reduced at the same time.

Search-Order Coding (SOC)
In 1996, [17] utilized the high similarity between index values within a local region to further compress the VQ index table. The process is illustrated in Figure 2. Assuming that the black-dotted element is the current pixel under processing. Since the indices are recorded in the raster scan order, the indices in the gray area are available to be referred. The order of searching is as labeled in Figure 3. When a matched index is found within a predefined search range, the current index is encoded with the serial number of the matched index. The serial number of indices is assigned along the searching path with repeated index values skipped to improve the coding capacity.
To distinguish between SOC representable and non-representable indices, an additional indicator bit is prefixed to an SOC serial number or an original index. The existing RDH schemes for SOC-encoded VQ indices focus on replacing the indicator bit with a secret bit and modifying its following code to fit the data type indicated. In cases where the indicator bit mismatches with the secret bit, the succeeding code may be significantly lengthened.
An example of SOC is illustrated in Figure 3, where the index 36 at the center is under processing and the gray portion is the referable indices that have been compressed earlier. Suppose we apply four bits to represent the serial number of indices along the search path and an indicator bit valued 1 to indicate a compressible index. The first encountered index 30 is labeled as 0000. The next index is also 30. According to the rule of labeling, repeated indices are left unlabeled. Then, the process is proceeded step by step and labels the indices 31, 32, 33, 34, 35, (31), 37, 38, 40, 41, 39, 42, 43, 44, and 36 as 0001 to 1110. The parenthesized 31 is also a repetition index and therefore skipped. As a result, the current index 36 is compressed into five bits 11,110, where the first bit 1 is the indicator bit. tional indicator bit is prefixed to an SOC serial number or an original index. The existing RDH schemes for SOC-encoded VQ indices focus on replacing the indicator bit with a secret bit and modifying its following code to fit the data type indicated. In cases where the indicator bit mismatches with the secret bit, the succeeding code may be significantly lengthened. An example of SOC is illustrated in Figure 3, where the index 36 at the center is under processing and the gray portion is the referable indices that have been compressed earlier.
Suppose we apply four bits to represent the serial number of indices along the search path and an indicator bit valued 1 to indicate a compressible index. The first encountered index 30 is labeled as 0000. The next index is also 30. According to the rule of labeling, repeated indices are left unlabeled. Then, the process is proceeded step by step and labels the indices 31, 32, 33, 34, 35, (31), 37, 38, 40, 41, 39, 42, 43, 44, and 36 as 0001 to 1110. The parenthesized 31 is also a repetition index and therefore skipped. As a result, the current index 36 is compressed into five bits 11,110, where the first bit 1 is the indicator bit.

Proposed Scheme
The proposed data hiding scheme for VQ index table is based on the concept of declustering and the indicator-free SOC. The proposed RDH scheme includes two layers. In the first layer of data embedding, the de-clustering concept is leveraged with the help of side-match evaluation. The second layer of data embedding is executed during an indicator-free SOC compression of the VQ indices. By appending secret bits to a SOC compressed VQ index to camouflage a regular VQ index. The complete flowchart is shown in Figure 4, where the processing procedures for data hider and receiver are both illustrated. In the following subsections, the side-match evaluation, the first layer of data embedding, the indicator-free SOC compression with the second layer of data embedding, the data extraction, and the index table recovery are described in detail.

Proposed Scheme
The proposed data hiding scheme for VQ index table is based on the concept of de-clustering and the indicator-free SOC. The proposed RDH scheme includes two layers. In the first layer of data embedding, the de-clustering concept is leveraged with the help of side-match evaluation. The second layer of data embedding is executed during an indicator-free SOC compression of the VQ indices. By appending secret bits to a SOC compressed VQ index to camouflage a regular VQ index. The complete flowchart is shown in Figure 4, where the processing procedures for data hider and receiver are both illustrated. In the following subsections, the side-match evaluation, the first layer of data embedding, the indicator-free SOC compression with the second layer of data embedding, the data extraction, and the index table recovery are described in detail. and an indicator bit valued 1 to indicate a compressible index. The first encountered ind 30 is labeled as 0000. The next index is also 30. According to the rule of labeling, repea indices are left unlabeled. Then, the process is proceeded step by step and labels the in ces 31, 32, 33, 34, 35, (31), 37, 38, 40, 41, 39, 42, 43, 44, and 36 as 0001 to 1110. The parent sized 31 is also a repetition index and therefore skipped. As a result, the current index is compressed into five bits 11,110, where the first bit 1 is the indicator bit.

Proposed Scheme
The proposed data hiding scheme for VQ index table is based on the concept of d clustering and the indicator-free SOC. The proposed RDH scheme includes two layers the first layer of data embedding, the de-clustering concept is leveraged with the help side-match evaluation. The second layer of data embedding is executed during an indi tor-free SOC compression of the VQ indices. By appending secret bits to a SOC co pressed VQ index to camouflage a regular VQ index. The complete flowchart is shown Figure 4, where the processing procedures for data hider and receiver are both illustrat In the following subsections, the side-match evaluation, the first layer of data embeddi the indicator-free SOC compression with the second layer of data embedding, the d extraction, and the index table recovery are described in detail.

The Side-Match Evaluation
For convenience, we apply a small codebook of length 16 in all illustrative examples, although typical lengths are 256, 512, or 1024 in most real applications. In addition, the head and rear codewords, i.e., CW0 and CW15, are reserved as explained. The reserved codewords are determined after codebook training. After obtaining the codebook, the principal component analysis (PCA) is used to determine the projection values of the fourteen codewords along the first component as shown in Figure 5. The codebook is denoted by C V = V 1 , V 2 , . . . , V 14 , where V j represents a codeword of 16 pixel-values in the codebook of size 14. The first principal component W of the codewords in the codebook is defined by

The Side-Match Evaluation
For convenience, we apply a small codebook of length 16 in all illustrative examples, although typical lengths are 256, 512, or 1024 in most real applications. In addition, the head and rear codewords, i.e., CW0 and CW15, are reserved as explained. The reserved codewords are determined after codebook training. After obtaining the codebook, the principal component analysis (PCA) is used to determine the projection values of the fourteen codewords along the first component as shown in Figure 5. The codebook is denoted by = , , … , , where represents a codeword of 16 pixel-values in the codebook of size 14. The first principal component W of the codewords in the codebook is defined by The projections of codewords to this component constitute a data set with maximum variance. Then, the codewords are sorted according to the ascending order of projection value. If the codebook shown in Figure 6 is a PCA-sorted one, we further divide the codewords into two clusters. The upper half is cluster 1, while the lower half is cluster 0. Besides, a one-to-one mapping in their corresponding order within the clusters is defined as illustrated in the figure.   The projections of codewords to this component constitute a data set with maximum variance. Then, the codewords are sorted according to the ascending order of projection value. If the codebook shown in Figure 6 is a PCA-sorted one, we further divide the codewords into two clusters. The upper half is cluster 1, while the lower half is cluster 0. Besides, a one-to-one mapping in their corresponding order within the clusters is defined as illustrated in the figure.
Future Internet 2021, 13, x FOR PEER REVIEW 5 of 19

The Side-Match Evaluation
For convenience, we apply a small codebook of length 16 in all illustrative examples, although typical lengths are 256, 512, or 1024 in most real applications. In addition, the head and rear codewords, i.e., CW0 and CW15, are reserved as explained. The reserved codewords are determined after codebook training. After obtaining the codebook, the principal component analysis (PCA) is used to determine the projection values of the fourteen codewords along the first component as shown in Figure 5. The codebook is denoted by = , , … , , where represents a codeword of 16 pixel-values in the codebook of size 14. The first principal component W of the codewords in the codebook is defined by The projections of codewords to this component constitute a data set with maximum variance. Then, the codewords are sorted according to the ascending order of projection value. If the codebook shown in Figure 6 is a PCA-sorted one, we further divide the codewords into two clusters. The upper half is cluster 1, while the lower half is cluster 0. Besides, a one-to-one mapping in their corresponding order within the clusters is defined as illustrated in the figure.   Next, the side-match evaluation is defined as illustrated in Figure 7. For a block in the VQ image, for example 'CW1', the pixel values along the border with upper and left blocks are collected to form two vectors, one is constituted by inside pixels and the other is constituted by the outside pixels. For the illustrated case, v in = [125, 130,133,130,129,126,140] and v out = [(132 + 128)/2, 135, 133, 133, 132, 133, 138]. Compute the distance between the two vectors by The corner pixel is compared with the average value of its top and left neighbors. By replacing the codeword with its counterpart 'CW8' according to the one-to-one mapping, another distance d f can be obtained. Normally, d t is much smaller than d f , since the neighboring image blocks are usually like each other. Based on this property, a switching between counterparts can be easily detected and therefore can be exploited to embed secret data.
ture Internet 2021, 13, x FOR PEER REVIEW 6 of Next, the side-match evaluation is defined as illustrated in Figure 7. For a block the VQ image, for example 'CW1', the pixel values along the border with upper and le blocks are collected to form two vectors, one is constituted by inside pixels and the oth is constituted by the outside pixels. The corner pixel is compared with the average value of its top and left neighbors. By r placing the codeword with its counterpart 'CW8' according to the one-to-one mappin another distance can be obtained. Normally, is much smaller than , since th neighboring image blocks are usually like each other. Based on this property, a switchin between counterparts can be easily detected and therefore can be exploited to embed s cret data.

The First Layer of Data Embedding Process
The first layer of data embedding in the proposed scheme is based on the side-matc evaluation. Since the mapping codeword pair defined between two clusters are dissimil enough, de-clustering by switching between a mapping codeword pair leads to an abno mal side-match value. The embedding rules are quite simple. The cluster labels '1' and ' are exploited to indicate the embedded binary bit. When the label of current processin index is matched with the binary bit to be embedded, it is kept unchanged; when they a not matched, the index is switched to its counter codeword according to the predefine mapping. In this way, the secret data can be embedded without lengthening the ind code. To execute the side-match evaluation, the indices in the first row and the first co umn of the index table are treated as the seed indices and kept unchanged. The first lay of data embedding is applied to the residual indices.
Note that the formal (real) VQ image blocks are not always 'normal' under sid match evaluation. In such abnormal circumstance, the reserved codewords, i.e., the hea and rear codewords, take over. When '0' is to be embedded, the head codeword is i serted; when '1' is to be embedded, the rear codeword is inserted. The insertion of a r served codeword indicates two things: (1) the secret bit is as it represents for, and (2) th current image block is abnormal, and its index is pushed behind. Knowing the info mation of an abnormal situation, the actual index of the current image block still can b embedded with a secret bit according to the same rule. Thus, an abnormal image blo leads to a double code length; however the embedded data is also doubled. In additio the binary secret data is stream encrypted before embedding. The algorithm for the pr posed secret data embedding scheme is summarized as Algorithm 1.

The First Layer of Data Embedding Process
The first layer of data embedding in the proposed scheme is based on the sidematch evaluation. Since the mapping codeword pair defined between two clusters are dissimilar enough, de-clustering by switching between a mapping codeword pair leads to an abnormal side-match value. The embedding rules are quite simple. The cluster labels '1' and '0' are exploited to indicate the embedded binary bit. When the label of current processing index is matched with the binary bit to be embedded, it is kept unchanged; when they are not matched, the index is switched to its counter codeword according to the predefined mapping. In this way, the secret data can be embedded without lengthening the index code. To execute the side-match evaluation, the indices in the first row and the first column of the index table are treated as the seed indices and kept unchanged. The first layer of data embedding is applied to the residual indices.
Note that the formal (real) VQ image blocks are not always 'normal' under side-match evaluation. In such abnormal circumstance, the reserved codewords, i.e., the head and rear codewords, take over. When '0' is to be embedded, the head codeword is inserted; when '1' is to be embedded, the rear codeword is inserted. The insertion of a reserved codeword indicates two things: (1) the secret bit is as it represents for, and (2) the current image block is abnormal, and its index is pushed behind. Knowing the information of an abnormal situation, the actual index of the current image block still can be embedded with a secret bit according to the same rule. Thus, an abnormal image block leads to a double code length; however the embedded data is also doubled. In addition, the binary secret data is stream encrypted before embedding. The algorithm for the proposed secret data embedding scheme is summarized as Algorithm 1. We use three typical examples to demonstrate the embedding process of the proposed data hiding scheme. The first example is shown in Figure 8. The index value is 'CW5' and the secret bit to be embedded is '1'. Since d t = d(cw5) < d(cw12) = d f and the cluster label matches with the secret bit, i.e., L(cw5) = 1 = b, the secret bit is embedded without any modification. The second example is shown in Figure 9. The index value is 'CW7' and the secret bit to be embedded is '0'. Again, this is a normal image block according to the side-match evaluation. Since the cluster label mismatches the secret bit, i.e., L(cw7) = 1 = 0 = b, the secret bit is embedded by recording its counterpart 'CW14' obtained through the pre-defined mapping between clusters. The third example is shown in Figure 10. In this case, the side-match evaluation, d t = d(cw11) > d(cw4) = d f , indicates the occurrence of an abnormal block. Since the secret bit to be embedded is '1', the rear codeword 'CW15' is recorded. In addition, the next secret bit '0' is embedded by recording the matched current index 'CW11'.
Algorithm 1 The first-layer of data embedding for VQ index table. Input: cover image , specialized codebook = | = 1,2, … , − 2 , codeword mapping function , secret data We use three typical examples to demonstrate the embedding process of the proposed data hiding scheme. The first example is shown in Figure 8. The index value is 'CW5' and the secret bit to be embedded is '1'. Since = ( 5) < ( 12) = and the cluster label matches with the secret bit, i.e., ( 5) = 1 = , the secret bit is embedded without any modification. The second example is shown in Figure 9. The index value is 'CW7' and the secret bit to be embedded is '0'. Again, this is a normal image block according to the side-match evaluation. Since the cluster label mismatches the secret bit, i.e., ( 7) = 1 ≠ 0 = , the secret bit is embedded by recording its counterpart 'CW14' obtained through the pre-defined mapping between clusters. The third example is shown in Figure 10. In this case, the side-match evaluation, = ( 11) > ( 4) = , indicates the occurrence of an abnormal block. Since the secret bit to be embedded is '1', the rear codeword 'CW15' is recorded. In addition, the next secret bit '0' is embedded by recording the matched current index 'CW11'.

The SOC Compression and the Second Layer of Data Embedding Processes
After the first layer embedding of secret data, the stego index table is sent to the indicator-free SOC compression process. To improve the efficiency of SOC, the indicatorelimination technique proposed by Chang et al. [31] is adopted. Two examples are applied to illustrate the SOC with an indicator-free method. The code length is set to four bits. However, the head and rear codes '0000' and '1111' are reserved for special usage. For a compressible index as shown in Figure 11, the coding is the same as the method introduced in Section 2.2, except that the label is started with '0001' instead of '0000'. Besides, four secret bits are appended to camouflage a regular VQ index. For an incompressible index as shown in Figure 12, no matched index can be found in the predefined range of '0001' to '1110'. One secret bit is embedded by putting a reserved code, '0000' for '0' and '1111' for '1'. Then, the actual index is appended behind. To avoid confusing in the decoding process, the head and the rear indices in the first layer of embedding are treated as incompressible indices. The overall scheme of the proposed SOC and second layer of data embedding is illustrated in Figure 13, where the code length of VQ compression is assumed to be eight bits. The algorithm for SOC compression and the second layer of data embedding is given as Algorithm 2. Note that the tracing path of the indices near boundary of the index table may encounter the out-of-boundary problem. For such a case, the tracing is stopped and it treats the index under processing as incompressible. For instance,

The SOC Compression and the Second Layer of Data Embedding Processes
After the first layer embedding of secret data, the stego index table is sent to the indicator-free SOC compression process. To improve the efficiency of SOC, the indicatorelimination technique proposed by Chang et al. [31] is adopted. Two examples are applied to illustrate the SOC with an indicator-free method. The code length is set to four bits. However, the head and rear codes '0000' and '1111' are reserved for special usage. For a compressible index as shown in Figure 11, the coding is the same as the method introduced in Section 2.2, except that the label is started with '0001' instead of '0000'. Besides, four secret bits are appended to camouflage a regular VQ index. For an incompressible index as shown in Figure 12, no matched index can be found in the predefined range of '0001' to '1110'. One secret bit is embedded by putting a reserved code, '0000' for '0' and '1111' for '1'. Then, the actual index is appended behind. To avoid confusing in the decoding process, the head and the rear indices in the first layer of embedding are treated as incompressible indices. The overall scheme of the proposed SOC and second layer of data embedding is illustrated in Figure 13, where the code length of VQ compression is assumed to be eight bits. The algorithm for SOC compression and the second layer of data embedding is given as Algorithm 2. Note that the tracing path of the indices near boundary of the index table may encounter the out-of-boundary problem. For such a case, the tracing is stopped and it treats the index under processing as incompressible. For instance, the indices in the first column are always incompressible according to this rule, since the

The SOC Compression and the Second Layer of Data Embedding Processes
After the first layer embedding of secret data, the stego index table is sent to the indicator-free SOC compression process. To improve the efficiency of SOC, the indicatorelimination technique proposed by Chang et al. [31] is adopted. Two examples are applied to illustrate the SOC with an indicator-free method. The code length is set to four bits. However, the head and rear codes '0000' and '1111' are reserved for special usage. For a compressible index as shown in Figure 11, the coding is the same as the method introduced in Section 2.2, except that the label is started with '0001' instead of '0000'. Besides, four secret bits are appended to camouflage a regular VQ index. For an incompressible index as shown in Figure 12, no matched index can be found in the predefined range of '0001' to '1110'. One secret bit is embedded by putting a reserved code, '0000' for '0' and '1111' for '1'. Then, the actual index is appended behind. To avoid confusing in the decoding process, the head and the rear indices in the first layer of embedding are treated as incompressible indices. The overall scheme of the proposed SOC and second layer of data embedding is illustrated in Figure 13, where the code length of VQ compression is assumed to be eight bits. The algorithm for SOC compression and the second layer of data embedding is given as Algorithm 2. Note that the tracing path of the indices near boundary of the index table may encounter the out-of-boundary problem. For such a case, the tracing is stopped and it

Secret Data Extraction and Index Table Recovery Processes
The secret data extraction rules can be designed based on the embedding rules. At the receiver end, the extraction of the second layer secret data and the SOC decoding is executed first. Then, the indices are processed sequentially to determine the embedded second layer data and the original index value. The cluster label of the marked index is retrieved as the secret bit. The side-match values of the marked index and its counterpart are evaluated and compared to determine the original index value. When a reserved index is encountered, the head index represents '0', while the rear index represents '1'. In addition, its succeeding index should be recovered with abnormal solution after extracting the secret bit. The algorithm for secret data extraction and index table recovery is summarized as Algorithm 3.
The overall scheme of phase 1 process is illustrated in Figure 14.
The three examples presented in the embedding process of the first layer are applied to demonstrate the secret data extraction and index table recovery in the second phase. The first example is shown in Figure 15. We get the marked index 'CW5'. The embedded secret bit is determined by b = L(cw5) = 1. Then, its counterpart 'CW12' is retrieved, and their side-match values are compared. Since d(cw5) < d(cw12), 'CW5' is recorded to the recovered index table. The second example is shown in Figure 16. The marked index is 'CW14', whose cluster label 0 is retrieved as the secret bit. The side-match values of 'CW14' and its counterpart 'CW7' are calculated and compared. Since 'CW7' best matches the current environment, it is recorded to the recovered index table. The final example is shown in Figure 17. A rear index 'CW15' is encountered. It means a secret bit '1' is embedded and the current block is abnormal. Its succeeding index 'CW11' is then retrieved. The cluster label 0 of 'CW11' is retrieved as the second secret bit. At last, the side-match values of 'CW11' and 'CW4' are compared. The mismatched index 'CW11' is recorded since the current block is indicated as an abnormal one.  The overall scheme of phase 1 process is illustrated in Figure 14.
The three examples presented in the embedding process of the first layer are applied to demonstrate the secret data extraction and index table recovery in the second phase. The first example is shown in Figure 15. We get the marked index 'CW5'. The embedded secret bit is determined by = ( 5) = 1. Then, its counterpart 'CW12' is retrieved, and their side-match values are compared. Since ( 5) < ( 12), 'CW5' is recorded to the recovered index table. The second example is shown in Figure 16. The marked index is 'CW14', whose cluster label 0 is retrieved as the secret bit. The side-match values of 'CW14' and its counterpart 'CW7' are calculated and compared. Since 'CW7' best matches the current environment, it is recorded to the recovered index table. The final example is shown in Figure 17. A rear index 'CW15' is encountered. It means a secret bit '1' is embedded and the current block is abnormal. Its succeeding index 'CW11' is then retrieved. The cluster label 0 of 'CW11' is retrieved as the second secret bit. At last, the side-match values of 'CW11' and 'CW4' are compared. The mismatched index 'CW11' is recorded since the current block is indicated as an abnormal one.

Experimental Results
In our experiment, nine standard gray level images including (a) Tank, (b) Bridge, (c) Elaine, (d) Lena, (e) Peppers, (f) Wine, (g) Goldhill, (h) Bird, and (i) Baboon, as shown in Figure 18, are applied, each of them is sized as 512 × 512. VQ compression is executed with a codebook of length 256, and all images are divided into blocks of size 4 × 4. Images

Experimental Results
In our experiment, nine standard gray level images including (a) Tank, (b) Bridge, (c) Elaine, (d) Lena, (e) Peppers, (f) Wine, (g) Goldhill, (h) Bird, and (i) Baboon, as shown in Figure 18, are applied, each of them is sized as 512 × 512. VQ compression is executed with a codebook of length 256, and all images are divided into blocks of size 4 × 4. Images of different features, smooth or complex, are included. The platform for implementation 3

Experimental Results
In our experiment, nine standard gray level images including (a) Tank, (b) Bridge, (c) Elaine, (d) Lena, (e) Peppers, (f) Wine, (g) Goldhill, (h) Bird, and (i) Baboon, as shown in Figure 18, are applied, each of them is sized as 512 × 512. VQ compression is executed with a codebook of length 256, and all images are divided into blocks of size 4 × 4. Images of different features, smooth or complex, are included. The platform for implementation are Intel Core i5-9400F CPU at 2.90 GHz and an 8 GB RAM personal computer with Windows 10 Professional operating system. The output of three processing steps for data hider are investigated, which are the visual quality of VQ images, the effectiveness of side-match evaluation, and the compression performance of SOC technique. are Intel Core i5-9400F CPU at 2.90 GHz and an 8 GB RAM personal computer with Windows 10 Professional operating system. The output of three processing steps for data hider are investigated, which are the visual quality of VQ images, the effectiveness of sidematch evaluation, and the compression performance of SOC technique.

Visual Quality of VQ Images
VQ compression is a widely applied image compression technique. The major difference in our application is that although the length of codebook is a typical size of 256, only 254 codewords are available during the training process. The remaining two codewords, the head and the rear ones, are dummies and reserved for special usage. Based on this formulation, the quality of VQ approximation is evaluated. We adopt the peak-signal-tonoise-ratio (PSNR) as the performance metric, which is defined by and the mean-square-error (MSE) is defined by The notations and ′ represent the original image and the VQ-decompressed

Visual Quality of VQ Images
VQ compression is a widely applied image compression technique. The major difference in our application is that although the length of codebook is a typical size of 256, only 254 codewords are available during the training process. The remaining two codewords, the head and the rear ones, are dummies and reserved for special usage. Based on this formulation, the quality of VQ approximation is evaluated. We adopt the peak-signal-to-noise-ratio (PSNR) as the performance metric, which is defined by and the mean-square-error (MSE) is defined by The notations I ij and I ij represent the original image and the VQ-decompressed image, respectively. The PSNR values of the VQ images produced with our specialized codebook are listed in Table 1. The image quality is very close to the conventional VQ compression with a codebook of length 256.

Effectiveness of Side-Match Evaluation
The performance of the proposed scheme is highly dependent on the side-match evaluation. To know the effectiveness of side-match evaluation for different images, the number of abnormal blocks detected in all test images are listed in Table 2. In our experiment, each 512 × 512 test image is divided into 128 × 128 = 16, 384 blocks in total. In the worst case of 'Baboon' image, only 2.3% of total image blocks are detected to be abnormal. In the best case of 'Bridge' image, only one block is abnormal. These results ensure good performance of the proposed reversible data hiding scheme.

Performance of SOC Encoding
After switching the index value between clusters during embedding, the marked index table is sent to SOC encoding. Since the switching for embedding intends to disrupt the continuity between adjacent blocks for reversibility, such operation reduces the applicability of SOC encoding. In our experiment, the assigned length of search order code is four bits. Based on this parameter setting, the number of SOC compressible and uncompressible blocks for the test images are listed in Table 3. The compressible percentage ranges from 25.5% for Baboon to 88.3% for Bridge and is highly dependent on the feature of cover image. An index in a smooth image is more likely to find the same value in its neighbors and be compressed by SOC. On the other hand, an index in a complex image is frequently uncompressible. The total file size in bytes and image bit rate in bits per pixel are also listed in the table. The resulting file size is larger for an image with more uncompressible blocks. The embedding capacity in bits for different layers of the proposed scheme are listed in Table 4.

Comparison with Related Works
Since the proposed two-layer RDH scheme for VQ index table is a hybrid method of de-clustering and SOC-based schemes, our experimental results are compared with the two series of schemes. The first series are the de-clustering approaches. Table 5 compares the embedding capacity (EC) and file size of the proposed scheme with the schemes proposed in [28,30]. Although the series of schemes are all based on the VQ index table, the file size of marked index table are different. Usually, a larger file size is required to embed more secret data. To make a fair comparison, the EC and file size are converted to the total bitrate (total BR) and data bitrate (data BR) as listed in Table 6.
Since the secret data is usually assumed to be completely random, which is supposed to be uncompressible. Therefore, image bitrate (image BR), that represents compression efficiency of a scheme, can be estimated by subtracting the data bitrate from the total bitrate of a marked VQ index table. These performance evaluations are plotted in Figure 19. As shown in the figure, the proposed scheme can embed a large amount of data with a comparable file size. More specifically, in addition to the first layer of de-clustering embedding, SOC compression in the second embedding phase efficiently vacates a large room for hiding data. Figure 19b shows that the proposed scheme successfully camouflages an uncompressed VQ index table.
The second series to be compared are the SOC-based approaches, including [20,21,23]. EC and file size are listed in Table 7. Again, the total BR, data BR, and image BR are compared and listed in Table 8. Table values are plotted in Figure 20 for convenience of observation. As shown in the figure, the proposed scheme provides a large amount of EC, since the vacated room by compression is exploited to embed secret data. However, Figure 20d shows that the compression efficiency is comparable with related works. In addition, the IFSOC applied in our scheme is free from indicator. The series of SOC-based schemes directly hide secret data into indicators, which are exposed to eavesdroppers. So, the proposed indicator free RDH scheme is more secure than the related works.

Comparison with Related Works
Since the proposed two-layer RDH scheme for VQ index table is a hybrid method of de-clustering and SOC-based schemes, our experimental results are compared with the two series of schemes. The first series are the de-clustering approaches. Table 5 compares the embedding capacity (EC) and file size of the proposed scheme with the schemes proposed in [28] and [30]. Although the series of schemes are all based on the VQ index table, the file size of marked index table are different. Usually, a larger file size is required to embed more secret data. To make a fair comparison, the EC and file size are converted to the total bitrate (total BR) and data bitrate (data BR) as listed in Table 6.
Since the secret data is usually assumed to be completely random, which is supposed to be uncompressible. Therefore, image bitrate (image BR), that represents compression efficiency of a scheme, can be estimated by subtracting the data bitrate from the total bitrate of a marked VQ index table. These performance evaluations are plotted in Figure 19. As shown in the figure, the proposed scheme can embed a large amount of data with a comparable file size. More specifically, in addition to the first layer of de-clustering embedding, SOC compression in the second embedding phase efficiently vacates a large room for hiding data. Figure 19b shows that the proposed scheme successfully camouflages an uncompressed VQ index table.

Conclusions
A two-layer RDH scheme for VQ index table based on de-clustering and IFSOC is proposed. After data embedding by de-clustering, IFSOC is applied to compress the marked index table and embed the second layer of data at the same time. The total embedding capacity is satisfactory. In addition, the compression after de-clustering embedding increases the security of the first layer data, while the indicator free SOC secures the second layer data.
Experimental results show that the proposed RDH scheme outperforms state-ofthe-art schemes under the overall evaluation of embedding capacity and image bit rate. Hybrid RDH scheme is a promising approach. By compressing VQ index followed by appending secret bits can successfully camouflage the original VQ index and create a high embedding capacity.

Data Availability Statement:
The authors confirm that the data supporting the findings of this study are available within the article.