Multi-Bit Data Hiding Scheme for Compressing Secret Messages

: The goal of data hiding techniques usually considers two issues, embedding capacity and image quality. Consequently, in order to achieve high embedding capacity and good image quality, a data hiding scheme combining run-length encoding (RLE) with multi-bit embedding is proposed in this paper. This work has three major contributions. First, the embedding capacity is increased 62% because the secret message is compressed before embedding into the cover image. Secondly, the proposed scheme keeps the multi-bit generalized exploiting modiﬁcation direction (MGEMD) characteristics, which are effective to reduce modiﬁed pixels in the cover image and to maintain good stego image quality

capacity and image quality.Consequently, in order to achieve high embedding capacity and good image quality, a data hiding scheme combining run-length encoding (RLE) with multi-bit embedding is proposed in this paper.This work has three major contributions.First, the embedding capacity is increased 62% because the secret message is compressed before embedding into the cover image.Secondly, the proposed scheme keeps the multi-bit generalized exploiting modification direction (MGEMD) characteristics, which are effective to reduce modified pixels in the cover image and to maintain good stego image quality.Finally, the proposed scheme can prevent modern steganalysis methods, such as RS steganalysis and SPAM (subtractive pixel adjacency matrix), and is compared to MiPOD (minimizing the power of the optimal detector) scheme.From our simulation results and security discussions, we have the following results: First, there are no perceivable differences between the cover images and stego images from human inspection.For example, the average PSNR of stego images is about 44.61 dB when the secret message (80,000 bits) is embedded for test cover images (such as airplane, baboon, Lena) of size 512×512.Secondly,

Introduction
Networks are ubiquitous in modern life.More and more things are increasingly digital, such as photos, videos, music, documents, personal information, and so on.Therefore, how to protect digital information is a hot issue.Cryptography and steganography are two popular technologies used to protect digital products.For cryptography, a key is used to encrypt data into meaningless numbers, then we can use the same key or another to decrypt.Common encryption methods are advanced encryption standard (AES), data encryption standard (DES), RSA and MD5.In general, cryptographic technologies provide a certain level of security, but cannot maintain security when the ciphertext is decrypted.Therefore, steganography technologies have been developed.Steganography technologies can be classified into watermarking and data hiding [1].Digital watermarking technology, in general, can be divided into two categories [2], visible and invisible watermarking.A visible watermark's advantage is the human eye can discern it.No algorithm is needed to view the information that represents data sources or the owner.The disadvantage of a visible watermark is that the image is changed by the watermark.It is easily overwritten or removed by signal processing technology.Watermarking techniques can be divided into two types: fragile and robust watermarks.A fragile watermark is primarily used to protect the integrity of the image.The slightest modification to the media with a fragile watermark results in the destruction of the watermark.A missing watermark denotes tampering.Robust watermarking can survive a designated class of transformations.An example of a robust watermark application is a watermark to carry copy and access control information.The media may be compressed, cropped or otherwise transformed, but the watermarked information survives.
Utilizing digital signal processing and digital imaging technologies to hide secret data without reducing the quality of the cover image is called data hiding.This technique is not readily apparent and hides information in any form (text, images, video).Data hiding has two techniques: spatial domain and transform domain.This kind of data hiding technology has very high image quality and is undetectable by the human eye.The technology of data hiding can be classified into two types: one is irreversible data hiding, and the other is reversible.The difference is reversible data hiding is lossless and can reconstruct the original cover image from the stego image after the secret message is extracted.
The watermarking technology always modifies pixels in the frequency domain [3], but pixels modified in the frequency domain contribute to more distortion.Therefore, data hiding technology usually occurs in the spatial domain for less distortion than watermarking technology and quick processing.For data hiding in the spatial domain, least significant bit (LSB) replacement is a classic scheme [4], but the capacity of embedded data is one bit per pixel (bpp), and it has no security.The exploiting modification direction (EMD) [5] scheme can embed 1.16 bpp for two pixels in each group.
Data compression [6] can save storage space and speed up network transmission.In other words, the raw data are processed through various mathematical algorithms to reduce data storage space.This reduced amount of data is transmitted.Then, the decompression operation can recreate the original data at the receiver.There are two types of data compression.One is lossless compression, such as PCX, GIF, TIFF, TGA and PNG image formats, ZIP, RAR data compression technology, run-length encoding, Huffman coding and Lempel-Ziv-Welch (LZW).The other is lossy compression, such as JPEG (Joint Photographic Coding Expert Group), VQ (Vector Quantization) [7][8][9] and SMVQ [10,11] (Side Match Vector Quantization).Therefore, to achieve the smaller secret message size, the data compression technology is a good solution.
To leverage the advantages of compression, we will propose a data hiding scheme that can embed more secret data, i.e., secrets are pre-compression, and then uses multi-bit data hiding.The proposed scheme effectively reduces secret messages size to improve embedding capacity and also combines multi-bit generalized exploiting modification direction (MGEMD) [12] to increase the embedding capacity.
In Section 2, we review previous work of RLE (run-length encoding) technology and the multi-bit embedding scheme.Section 3 gives a detailed introduction of the proposed method and then proposes a modified speed up method for MGEMD.In addition, there is also a discussion on the overflow/underflow problems and solutions.Experiments are given in Section 4. Finally, some conclusions are given in Section 5.

Related Work
In this section, there are two main related works.In the data compression part, we will review the RLE method.Then, three data hiding methods [5,12,13] based on the extraction function are introduced.

Run-Length Encoding
Run-length encoding (RLE) [14] is a well-known, simple and quick form of data compression in which sequences of the same data value are found in many consecutive data elements.The RLE applications of this encoding are when the source information comprises long substrings of the same character or binary digit.For this reason, using RLE to compress the binary secret message is very applicable.For example, secret message 00001011101 will encoded into 0(4)1(1)0(1)1(3)0(1)1 (1).
In 2006, Chang et al. [15] proposed two new image steganographic methods using the run-length approach.There are two methods, one is BRL (hiding bitmap files by run-length), which focuses on binary images, and the other is GRL (hiding general data files by run-length).The major idea of these methods is to use RLE to increase the SMVQ method embedding capacity [16].For binary images, Agaian and Cherukuri [17] also proposed run-length based steganography.Their proposed algorithm is dependent on their run length characteristics and characteristics values of the block and alters pixels of the cover image's embeddable blocks.Simultaneously, this scheme also enhances the security of the embedded data and the capacity of the embedding method.In addition, steganographic access control in data hiding using run-length encoding and modulo operations was proposed by Lee et al. [18] in 2011.In their scheme, a high capacity steganographic with access control modifies sharp bitstreams into smooth bit streams and embedded into the cover image.The modulo value is fixed in this scheme, meaning the embedding capacity is limited.
Accordingly, RLE to increase embedding capacity for data hiding is important.In particular, it will increase the compression ratio when there are many continuities of ones and zeros in these binary images (black and white picture).We use binary images and also gray-scale images for the experiments in this paper.The results reveal a good compression ratio and improved embedding capacity.

Exploiting Modification Direction Method
The exploiting modification direction (EMD) [5] method was proposed by Zhang and Wang in 2006.This method can embed more secret message capacity than the 1-LSB replacement data hiding method.In EMD, two pixels in each group and each pixel value in the image only change once (−1, 0 or +1).Therefore, to achieve this condition, the following extraction function as Equation ( 1) is given in the Zhang and Wang scheme.
where g i is the value of the pixel i and n is the number of pixels.For example, when n = 2, two pixels, g 1 and g 2 , are considered.Therefore, the extract function is f (g 1 , g 2 ) = (1 × g 1 + 2 × g 2 ) mod 5.According to their analysis, the best hiding bit rate is in five-ary.However, the secret embedding capacity decreases when the pixel number increases for each group.Specifically, the embedding capacity is less than 1 bpp (bits per pixel) when the pixel numbers are more than three for each group.

Generalized Exploiting Modification Direction
In order to improve the secret embedding capacity and to embed the binary secret data directly, Kuo and Wang proposed the data hiding method based on generalized exploiting modification direction (GEMD scheme) [13].The main idea of the GEMD scheme is that each (n + 1)-bit binary secret message can be hidden into n adjacent pixels in the cover image.The new extraction function f b (g 1 , g 2 , . . ., g n ) is defined as Equation ( 2): (2)

Multi-Bit Generalized Exploiting Modification Direction
In 2012, Kuo et al. also proposed the multi-bit GEMD (MGEMD) [12] method to increase embedding capacity by using adaptive k.MGEMD can also choose different values of n to determine how many pixels in a group are used to hide secrets in k bits of each pixel and the ability to hide an extra pixel group into one-bit information, i.e., it can embed the secret messages' (nk + 1) bits.MGEMD's extraction function is shown as Equation (3): where the weight value of c i is: For example, c 1 = 1, c 2 = 9, c 3 = 73, c 4 = 585 when k = 3, n = 4 from Equation ( 4).Obviously, the difference between the MGEMD scheme and GEMD scheme is that the modulus is changed from 2 n+1 to 2 nk+1 in order to increase embedding capacity for the MGEMD scheme.

The Proposed Scheme
As a rule, the goals of data hiding techniques are security, capacity, robustness, imperceptibility, unambiguousness and non-removability, respectively.Data hiding techniques are focused on increasing embedding capacity and high stego image quality.Obviously, significant differences between the original cover image and stego image will be generated when the capacity of embedded secrets increases.Thus, how to enhance the embedding capacity while still maintaining the original stego image quality is a very important issue.In order to give a solution to this issue, a high embedding capacity and good image quality scheme is proposed in this section.

Multi-Bit Data Hiding Scheme for Compressing Secret Messages
In data hiding schemes, unambiguousness means that the stego image was securely transmitted to the receiver and extracts the same secret message as embedded by the sender.To support this attribute, we need to employ a lossless compression method to allow us access to the original data.Fortunately, RLE is suitable, since it is simple, quick and lossless.There are three phases included in the proposed scheme.The flowchart of three phases (secret image compression phase, MGEMD phase and embedding phase) are shown in Figure 1.
Flowchart of the proposed scheme.

Secret Image Compression Phase
In the proposed scheme, data compression is used to decrease the secret message size, which effectively increasing the embedding capacity.In order to minimize the cost of transmission in a limited bandwidth network, information needs to be compressed before delivery to improve transport efficiency.The proposed scheme is combined with MGEMD to increase embedding capacity.As a result, the multi-bit data hiding scheme for compressing secret messages has twice the embedding capacity.The maximum runs and total length information of RLE will be regarded as secret messages to hide in the cover image.
Algorithm 1 .The multi-bit data hiding scheme for compressing secret messages.
Input: A cover image and secret image (S) with gray level image Output: A stego image (I ) Step 1.The gray level secret image is transformed into a binary stream.
Step 2. Compressing the binary stream by RLE for the new secret message (S ), which includes maximum runs, total length information and the embedding secret messages (s).
Step 3. Check if the new secret is zero or one in the high-order bit.This information tells us the beginning bit is zero or one by using RLE.Then, the first pixel's LSB of the cover image will be changed.Simultaneously, we also count the zeros and ones to record the total length information.
Step 4. Find parameters (n and k), such as 2 nk+1 > Max(runs) and n ≥ k.Quotient (Q) and remainder (R) are calculated from total length (L) information using Equation (5).
Step 5.For the second pixel to the last, decision variables n and k divide the pixels into n adjacent pixels (x 1 , x 2 , . . ., xn) as a non-overlapping group.
Step 3. The least significant bit of the first pixel is equal to one, meaning it is not modified, i.e., the first pixel is still 155.

Data Embedding
Before embedding, secret messages must be transformed to a binary stream.Then, the binary stream uses RLE lossless compression to reduce the data size.Finally, the compressed binary stream is hidden by the MGEMD scheme.The algorithm of the proposed scheme is shown in Algorithm 1.

Speeding up the Modified Method
In this subsection, we describe MGEMD features and then use these characteristics to speed up the embedding process.The MGEMD scheme groups the cover pixels into three categories for computation, i.e., D < 2 nk , D = 2 nk and D > 2 nk .In order to speed up the embedding speed, we propose the embedding formulas shown as Tables 1 and 2 for D < 2 nk and D > 2 nk , respectively.Now, we assume that s 1 = 5429 and s 2 = 6643 when k = 3, n = 4 and have both cover pixels of (10,19,5,9) in Tables 1 and 2, respectively.Table 1.Speeding up the embedding method when D > 2 nk .

The Solution for the Overflow/Underflow Problems
Unfortunately, in the embedding process, overflow/underflow problems may occur in the stego pixel values after applying MGEMD.That is to say, the stego pixel values may exceed the maximal value 255 or may be smaller than the minimal value zero for the gray-level image.Therefore, the proposed scheme provides a scheme to address this problem.In our scheme, 2 k bits are embedded in each cover pixel, where the pixel value of the cover image can fall between zero and 2 k − 1.In order to avoid problems, we can modify the pixel value to [0, 1, . . ., 2 k−2 ] and [255 − (2 k − 2), . . ., 255], respectively.Therefore, the value of the cover pixel will be modified to 2 k − 1 when its value is between zero and 2 k − 2. Similarly, the value of the cover pixel will be modified to 255 − (2 k − 1) when its value is between 255 and 255 − (2 k − 2).Hence, this solves the overflow/underflow problems during the embedding phase.

Data Extracting Phase
In the data extraction process, some information needs to be coordinated with the sender, such as the value of n and k.Next, the stego image can be transmitted from the sender to the receiver.The receiver is able to recover the secret message using the following steps: Algorithm 2 : Data extracting.
Input: A stego image (I ) Output: The secret image (S) Step 1. Check the first pixel's LSB for zero or one.
Step 2. The second pixel to the last pixel of the stego image are divided into non-overlapping groups of n adjacent pixels (x 1 , x 2 , . . ., x n ); then, we compute secret value t by using Equation (3).
Step 3. The decoded binary stream (RLE) will be transformed into an eight-bit gray-level value to reconstruct the secret image.
Example 2: From the results of Example 1, the receiver receives the stego image (155, 156, 156, 161, . ..) and (n, k) = (3, 3) from the sender.After the data extraction process, the secret message can be recovered, and the secret image can be reconstructed.After following these steps, we can extract the hidden secret images of the stego image.
Step 2. From Step 1, we know the sequence of RLE begins at one.

Experimental Results
In this section, experimental results are given to verify the proposed scheme's embedding capacity in terms of the increased rate, the image quality of the stego images, the decrease of pixels modified rate and image steganalysis.From these experiments, eight common eight-bit gray-level images and binary images of size 512 × 512 are tested.These eight cover images are airplane, baboon, boat, Elaine, Gold Hill, Lena, peppers and Tiffany, as shown in Figure 2. Two gray-level secret messages (bridge and pentagon) of size 100 × 100 are shown in Figure 3. Similarly, the eight binary images are shown in Figure 4.In the maximum embedding capacity test, we also use the random number generator (PRNG) to generate random bits as the secret message to be hidden.The embedding capacity (bpp) and image quality (PSNR: peak signal-to-noise ratio) are two important criteria to evaluate the stego image in the data hiding system.The PSNR is defined as Equation ( 6): where the MSE (mean square error) is defined as Equation (7).
where M , N is the image size and C(i, j) and S(i, j) are the pixel values of the stego image and the original image (cover image), respectively.The eight stego images are processed by the proposed method and MGEMD shown in Figures 5 and 6, respectively.Simultaneously, we compare the PSNR and non-modified pixels between our proposed scheme and MGEMD.The results are shown in Tables 3 and 4 when the embedding capacity is 80, 000 bits and 262, 144 bits, respectively.Furthermore, the proposed scheme also compares to [15,18] for the payload of about one hundred thousand, two hundred thousand and four hundred thousand binary images, respectively, shown in Table 5.From the comparison tables, we can find that our proposed scheme has high capacity and good image quality compared to the other schemes.According to Figure 5, there are no human perceivable differences between the cover images and stego images using our proposed scheme.From Table 3, the proposed scheme has better image quality than the MGEMD scheme.Conversely, the modified pixels of the stego image used by the proposed scheme are less than the MGEMD scheme.k is the runs length of the bitmap files by run-length (BRL) scheme.

Maximum Embedding Capacity
In this subsection, we allow all pixels of the cover image to be embedded.From the simulation results, the RLE compression ratio of the binary image is better than the gray-scale image, because the binary image has many continuous zeros or ones to increase the compression ratio.The test cover images are shown as Figure 2, and the secret messages were generated PRNG.For example, if we use the MGEMD scheme (when n = 3 and k = 3) with non-compressed secret messages, then the group numbers are 87,381 (512 × 512 ÷ 3) with the image size of 512 × 512.The embedding fixed secret capacity is 878,310 (87,381 ×(nk + 1) = 87,831 ×10) bits.However, according to our simulation results, the secret embedding capacity is 1,757,500 bits using our proposed scheme.That is to say, the embedding performance of the proposed scheme is 200% better than without using RLE.Furthermore, using the GMEMD method, we compare the embedding capacity with the compressed secret data or those that are not.The result is shown in Figure 7.

Image Steganalysis
In general, steganalysis for the stego image cannot be judged by the human eye.Because the stego image's quality is more than 33 dB, the human eye is unable to detect if the stego image has secrets or not.Therefore, it is necessary to have an effective way to detect whether a secret message is embedded.For image steganalysis, the regular singular steganalysis (RS steganalysis) is a common method to detect the stego image.In this paper, in order to prove that our proposed scheme has good security, we propose this detection method to test the stego image generated by our scheme.In RS steganalysis, n adjacent pixels (x 1 , x 2 , . . ., x n ) are selected as a pixel group.Then, the discrimination function DF , defined as   8a to 8d, the distribution of R m ∼ = R and S m ∼ = S −m for our proposed scheme is normal, but 1-LSB, 2-LSB and 3-LSB denote hidden information; because the 1-LSB, 2-LSB and 3-LSB methods mean that the secret message was hidden in the position of 1-LSB, 2-LSB and 3-LSB for each cover image's pixel, respectively.In addition, we evaluate our proposed method with the modern steganalysis tool SPAM (subtractive pixel adjacency matrix) [19].For the SPAM test, we use 500 image data from [20].There are 250 cover images and 250 stego images used for training.The test results between minimizing the power of the optimal detector (MiPOD) [21] and our proposed scheme are shown in Figure 9.The MiPOD scheme was proposed by Sedighi et al. in 2015.The major contribution of the MiPOD scheme is that the pixel values are changed by at most ±1 when a secure message is embedded into the cover image with the Gaussian cover model [22,23], and Sedighi et al. also considered a novel detectability-limited sender and estimated the secure payload of each cover image.In Figure 9, the vertical axis represents the error rate obtained by the SPAM method, and the horizontal axis represents the embedding rate (bpp), which ranges from 0.1 bpp to 6.7 bpp.From Figure 9, we can find that the error rate of the proposed method is better than the MiPOD scheme when the embedding capacity is over 0.6 bpp.In comparison, the proposed method has a higher embedding rate and a lower error rate.Therefore, if we can adapt the secret data embedding rate (such as MiPOD scheme); this can provide a certain probability to avoid detection attack.Furthermore, to understand steganalysis and stego image security, the spatial rich model (SRM) [24] or maxSRM [25] method can be used to analyze the proposed method in the future.The simulation error rate between minimizing the power of the optimal detector (MiPOD) and the proposed scheme for subtractive pixel adjacency matrix (SPAM).

Conclusions
A new data hiding technology is proposed in this paper, which combines the multi-bit data hiding scheme for compressing secret messages and a quicker operation for MGEMD.From the simulation results, we show that the proposed scheme can increase embedding capacity because the RLE compression ratio is high, though dependent on the source data, i.e., the binary images had a better compression ratio than the gray-scale images.Consequently, the quality of the stego image generated by our proposed scheme is not only better than the MGEMD method, but also modifies fewer pixels when the length of the secret message is the same.Additionally, according to our steganalysis and performance discussion, our scheme provides a higher embedding capacity than previous approaches, and it also can resist visual attack and RS steganalysis and SPAM.
is applied to quantify the smoothness or regularity of each pixel group.The flipping function of RS steganalysis is used to define three types of pixel groups: regular (R), singular (S) and unusable (U).The percentages of all groups of regular and singular with masks M = [1001] and −M = [−100 − 1] are represented as R m , R −m , S m and S −m .The statistical hypotheses of RS steganalysis are R m ∼ = R −m and S m ∼ = S −m , meaning the test image can pass steganalysis.In other words, R m overlaps with R −m , and S m overlaps S −m .Using the RS steganalysis method, we test a method based on LSB and our proposed method.The simulation results are shown in

Figure 8 .
Figure 8.According to Figure8ato 8d, the distribution of R m ∼ = R and S m ∼ = S −m for our proposed scheme is normal, but 1-LSB, 2-LSB and 3-LSB denote hidden information; because the 1-LSB, 2-LSB and 3-LSB methods mean that the secret message was hidden in the position of 1-LSB, 2-LSB and 3-LSB for each cover image's pixel, respectively.In addition, we evaluate our proposed method with the modern steganalysis tool SPAM (subtractive pixel adjacency matrix)[19].For the SPAM test, we use 500 image data from[20].There are 250 cover images and 250 stego images used for training.The test results between minimizing the power of the optimal detector (MiPOD)[21] and our proposed scheme are shown in Figure9.The MiPOD scheme was proposed by Sedighi et al. in 2015.The major contribution of the MiPOD scheme is that the pixel values are changed by at most ±1 when a secure message is embedded into the cover image with the Gaussian cover model[22,23], and Sedighi et al. also considered a novel detectability-limited sender and estimated the secure payload of each cover image.In Figure9, the vertical axis represents the error rate obtained by the SPAM method, and the horizontal axis represents the embedding rate (bpp), which ranges from 0.1 bpp to 6.7 bpp.From Figure9, we can find that the error rate of the proposed method is better than the MiPOD scheme when the embedding capacity is over 0.6 bpp.In comparison, the proposed method has a higher embedding rate and a lower error rate.Therefore, if we can adapt the secret data embedding rate (such as MiPOD scheme); this can provide a certain probability to avoid detection attack.Furthermore, to understand steganalysis and stego image security, the spatial rich model (SRM)[24] or maxSRM[25] method can be used to analyze the proposed method in the future.

Figure 9 .
Figure9.The simulation error rate between minimizing the power of the optimal detector (MiPOD) and the proposed scheme for subtractive pixel adjacency matrix (SPAM).

Table 3 .
Comparison table for the PSNR and non-modified pixels.

Table 4 .
PSNR and non-modified pixels of binary image hiding.

Table 5 .
Comparison of the PSNR values using binary image to hiding with different payloads.