Coverless Image Steganography Based on Generative Adversarial Network

: Traditional image steganography needs to modify or be embedded into the cover image for transmitting secret messages. However, the distortion of the cover image can be easily detected by steganalysis tools which lead the leakage of the secret message. So coverless steganography has become a topic of research in recent years, which has the advantage of hiding secret messages without modiﬁcation. But current coverless steganography still has problems such as low capacity and poor quality .To solve these problems, we use a generative adversarial network (GAN), an effective deep learning framework, to encode secret messages into the cover image and optimize the quality of the steganographic image by adversaring. Experiments show that our model not only achieves a payload of 2.36 bits per pixel, but also successfully escapes the detection of steganalysis tools.


Introduction
Since the invention of the Internet, technology has developed rapidly. The emergence of multimedia information such as images, audio and video has brought convenience to society [1] but it has also resulted in the illegal wiretapping, interception, tampering or destruction of important and sensitive information related to politics, military, finance and business, bringing huge losses to society. Therefore, information hiding technology has emerged [2,3]. With the development of this technology, the corresponding steganographic detection technology has also evolved. The traditional approaches, which adopt artifacts, tend to be easily detected by automated steganalysis tools and, in extreme cases, by human eyes, which poses the challenge of information hiding.
To solve this problem, researchers proposed a new information hiding method-coverless steganography-in 2015. Compared with the traditional approaches, which need to adopt the specified cover image for embedding the secret data, such as Highly Undetectable SteGO (HUGO) and JPEG compression [4][5][6][7], the coverless steganography no longer modifies the cover images, which is why it is called coverless. It is achieved by means of mapping with secret information. Even if the image is intercepted, it is hard to detect the presence of a message. Therefore, coverless steganography can naturally resist steganalysis tools. At present, existing coverless steganography is divided into two categories according to the steganographic principle-mapping-based [8,9] and synthetic-based methods [10]. The coverless image steganography based on mapping rules was first proposed by Zhou [11]. Each image represented an 8-bit pixel and was divided into nine blocks, and the feature sequence was calculated from the relationship between the mean values of adjacent block pixels. Zheng et al. [12] proposed an image steganography algorithm based on invariant features (SIFT). Unlike Zhou, Zheng used feature sequences generated by SIFT features, which enhanced the robustness of the system. Recently, Zhou et al. [13] proposed a method based on SIFT and Bag-of-Features (BOF). Compared with Reference [11], this method can better resist rotation, zoom, brightness change and other attacks, but the ability to resist translation, filter and shear is still limited.
The instance-based texture synthesis algorithm is a hotspot of current texture synthesis algorithms, which synthesizes new texture images by resampling the original images. The new texture image can be of any size, and its local appearance is similar to the original image. Otori [14,15] and others pioneered a steganographic algorithm based on pixel-based texture synthesis. First, they encoded the secret information into a colored dot pattern and then automatically draw a pattern from the sample image on the coat texture image to mask its existence and natural texture mode. Wu et al. [16] proposed an image steganography algorithm based on patch texture synthesis. Firstly, an overlap area will be generated during the synthesis process, and the mean square error of the overlap area and the candidate block will be calculated so as to sort the candidate blocks. Finally, the candidate blocks identical to the secret information sequence number are synthesized into the overlapping area to hide the secret information. However, if the method needs to hide more information, the hidden ability will drop. Inspired by the the marble deformation texture synthesis algorithm, Xu et al. [17] proposed a reconfigurable image steganography algorithm based on texture deformation. In order to hide the secret information, the secret image is reversibly twisted to synthesize different marble textures, but the robustness of the algorithm is limited.
Coverless information hiding is still a relatively new field. Compared with other information hiding technologies, its theoretical research and technical maturity still have some gaps, and there are still some problems such as low hiding capacity and efficiency. With the advent of deep learning [18][19][20], a new method of image steganography approaches is emerging [21][22][23][24]. The first set of deep learning approaches to steganography were from Baluja [22]. They used neural networks to combine a cover image and a secret message into a steganographic images but their images showed a strong spatial correlations, and convolutional neural network (CNN) training will use this feature to hide images in the images. So, the model trained in this way cannot be applied to arbitrary data. The emergence of generative adversarial networks (GANs) [25] has provided new approaches to achieving image steganography.
We propose a novel approach which uses CNN and GAN to achieve coverless steganography. Our work makes the following contributions: (1) We propose a method of using GAN to complete steganography tasks, whose relative payload is 2.36 bits per pixel. (2) We propose a measurement method to evaluate the image quality of the steganography algorithm based on deep learning, which can be compared with traditional methods.
The rest of the paper is organized as follows-Section 2 briefly describes the image steganography based on GAN. We elaborate on the details of our method in Section 3. Finally, Section 4 contains our experimental results, followed by conclusions in Section 5.

Image Steganography Based on GAN
At present, GAN has been applied for image steganography as follows-Volkhonskiy et al. [26] first proposed the Steganographic GAN (SGAN). SGAN adopted deep GAN [27], which accounted for not only the authenticity of the generated images but also the resistance to the detection. Based on SGAN, Shi [28] proposed SSGAN . The model structure of SSGAN was similar to that of SGAN, but Wasserstein GAN [29] was adopted as the network structure, which had a faster training speed and higher image quality. The above two networks used a GAN network to generate cover images, while the Hayes GAN model proposed by Hayes et al. [21] used adversarial learning to directly generate dense images. Zhu et al. [23] put forward another method of generating hiding data with deep networks by referring to Hayes GAN's structure. It is characterized by the robustness of an adversarial sample to image changes, so that the embedded information can be extracted with high accuracy under various cover attacks (Gaussian blur, pixel loss, cropping and JPEG compression). Tang et al. [30] proposed an adaptive steganographic distortion learning framework (ASDL) to learn the cost. After several rounds of adversarial learning, the security of ASDL-GAN has been continuously enhanced, but the security has not surpassed the traditional steganic algorithm represented by S-UNIWARD [31]. Atique et al. [32] proposed another model based on an encoder-decoder to accomplish the same steganographic task and their secret images are grayscale images, but they had problems such as color distortion and poor security of secret images. Then, Hayes et al. [21] and Zhu et al. [23] made use of GAN. They used the mean squared error (MSE) for the encoder, the cross entropy loss for the discriminator, and the mean squared error for the decoder but their capacity was only limited to 0.4 bits per pixel. Zhang [33] proposed a method for hiding arbitrary binary data in images using GAN, but their experimental results were not as ideal as designed. So we are inspired by the works of Baluja and Zhang, which can improve some shortcomings.

Method
In general, steganography only requires two operations-encoding and decoding, consisting of three modules: (1) An Encoder network ε, which receives a coverless image and a string of binary secret message, generates a steganographic image; (2) A Decoder network G, which obtains a steganographic image, attempts to recover a secret message; (3) A Discriminator network D is used to evaluate the quality of vectors and steganographic images S.
So, the architecture of our model is shown in Figure 1.

Encoder Network
Firstly, we input the cover image C with the size of (3 × W × H) and secret information M∈ {0, 1} Depth×W×H into the Encoder network ε. M is a binary data tensor of the shape Depth × W × H, where Depth is the number of bits that we try to hide in each pixel of the cover image, W × H represents the size of cover images. The encoded images should look visually similar to the cover images. We perform two methods on the Encoder network ε, respectively: (1) Use convolutional block Conv to process the cover image C to get the tensor a with the size of (32 (2) Concatenate the message M with a and then process the tensor b with a convolutional block Conv.
Then we built two encoders models: (i) Basic model: We apply two convolution blocks Conv to tensor b successively to generate steganographic images S. Formally: (ii) Dense model: We use the skip connection [18] to map the features f generated by the former Dense Block to the features l generated by the latter Dense Block, as shown in Figure 1. We assume that using skip connection can improve the embedding rate. Formally:

Decoder Network
The Decoder network G uses steganographic images S generated by the Encoder network ε. The Decoder network generates M = G(S), and is trying to recover the secret information tensor M according to the Reed Solomon algorithms.

Discriminator Network
In order to provide feedback on the performance of the encoder ε and generate more realistic images, we introduced a discriminator network D, which can differentiate stego images S from cover images C.
XuNet, an image steganalysis, has been designed based on a CNN by Xu. For improving the statistical modeling, it embedded an absolute activation (ABS) in the first convolutional layer, and applied the TanH activation function in the shallow layers of networks to prevent overfitting, and also added batch normalization (BN) before each nonlinear activation layer. This well-designed CNN provides excellent detection performance in steganalysis. To our knowledge, it is the best-performing data-driven CNN steganalyzer based on JPEG. Therefore, we design our steganalyzer based on XuNet and adjusted it to fit our models, as shown Figure 2. The discriminator network D consists of five convolution blocks and an SPP block, and two fully connected layers with a scalar output. In order to generate scalar scores, we use the adaptive mean pool on the output of the convolution layer. In addition, we use the spatial pyramid pooling (SPP) module to replace the global average pooling layer. The spatial pyramid pooling (SPP) module [34] and its variants play a huge role in target detection and semantic segmentation models. It breaks through the limitation of fully connected layers, so that images of any size can be input to the next fully connected layers. At the same time, the SPP module can extract more features from different acceptance domains, thereby improving performance. The detailed architecture of our steganalyzer is shown in Table 1.

The Objective Fuction
c is referred to as one of the cover images C, which can be represented by the probability distribution function P. We made the cover images C follow with P and a secret message M is embedded, and the generated steganographic images S also follow the probability distribution function Q. The statistical detection ability can be quantified by the KL divergence shown in formula (7) or the JS divergence in formula (8), JS(P Q) = 1 2 KL P P + Q 2 The KL divergence and the JS divergence are very basic quantities, which establish the best probabilistic steganographic analysis. The original GAN's goal is to minimize the JS divergence or the KL divergence [35]. GAN avoids the Markov chain learning mechanism in a sense, which makes it distinguishable from traditional probability generative models. Traditional probability generation models generally require Markov chain sampling and design, and GAN avoids this process with particularly high computational complexity, and directly performs sampling and correction, thereby improving the application efficiency of GAN, so its practical application scenarios are more extensive. The Encoder network ε with noise z tries to generate images which are similar with the cover images C. The Discriminator network D receives the generated images and judges them whether are the real examples or the false samples. The Discriminator network D and the Encoder network ε use cost functions (9) to play the minimax game. It trained D to maximize the probability of assigning the correct label to both training examples and samples from ε. Therefore, GAN can be used to solve the problem of steganography.

Encoder-Decoder Loss
In order to optimize the encoder-decoder network, this section optimizes three loss functions jointly, as shown Algorithm 1.
(1) The cross entropy loss function is used to evaluate the decoding accuracy of decoder network, that is (2) The mean square error is used to analyze the similarity between the steganographic image and the cover image, where W is the width and H is the length of image, that is (3) And the realness of the steganographic image using the discriminator, that is So, the training objective is to
1. While val G <threshold G do 2. Update ε and G using L G + L s + L r . 3. for n training epochs do 4.
if val G <threshold G then 5.
Update ε using L s , G using L G 6.
else if val D <threshold D then 7. else 8.
Update ε using L s + L r , G using L G 9.
Get val G ← CrossEntropy of G 10.
Get val D ←Cross validation accuracy of D 11. end if 12.
end for 13. done 14. return val G

Structural Similarity Index
Baluja [22] used the mean square error (MSE) between the pixels of the cover image and the generated image pixels as the loss function. However, MSE only penalizes the large errors of the corresponding pixels of the two images, but ignores the underlying structure of the images. Human visual systems (HVS) are more sensitive to the changes of brightness and color in textless areas, so the steganography GAN introduces the structural similarity index (SSIM) and its variant MS-SSIM [36] into the loss function.
The SSIM index compares similarity measurement tasks from three aspects-brightness δ, contrast and structure ρ. The similarity of the two images is measured by formulas (14)-(16) respectively, where µ x and µ y are the pixel averages of image x and image y, θ x and θ y are the pixel deviations of image x and image y, and θ xy is the standard variance of image x and y. In addition, c 1 , c 2 , and c 3 are three constants to prevent the denominator from going to zero and making the formula meaningless. The general calculation method of SSIM is shown in (17), where l > 0, m > 0, n > 0 and they are the parameters used to adjust the relative importance of the three components. The value range of the SSIM index is [0, 1]. The higher the index, the more similar the two images. So steganography GAN uses 1-SSIM (x, y) as the loss function to measure the difference between two images. MS-SSIM is an enhanced variant of the SSIM index, so it also introduces steganography GAN's loss function.
Considering the difference in pixel value and structure, we join MSE, SSIM and MS-SSIM together. Therefore, its mixed loss function L D is shown: where c represents the cover images, c is the steganographic images. M is the secret message, and M are extracted from the steganographic images. α and β are super parameters to trade off the quality of steganographic images and cover images. we set α and β of the loss function as 0.5, 0.3 respectively.

Experimental Results and Analysis
In this section, we will introduce our experiment details and results.

Evaluation Metrics
We take capacity, distortion, and secrecy into account. In this section, we will evaluate the performance of our model with the RS-BPP, PSNR and MS-SSIM.

Reed Solomon Bits Per Pixel
In the experiments, we adopt Reed-Solomon codes to accurately estimate the relative payload of our model. We call this metric the Reed-Solomon bits-per-pixel (RS-BPP), and note that it can be directly compared to traditional steganographic techniques because it represents the number of bits that are reliably transmitted in the image divided by the size of image.

Peak Signal-to-Noise Ratio
Peak signal-to-noise ratio (PSNR) is a commonly used image quality measurement indicator, whose purpose is to measure the distortion of the image, and has been shown to be related to the average opinion score of human experts [37].

Training
In each iteration, we match each cover image C with a data tensor M, which consists of a randomly generated sequence Depth × W × H bits. This sequence is sampled from a Bernoulli distribution M ∼ Ber (0.5). In addition, we use standard data enhancement processes in preprocessing, including horizontal flipping and random cropping to the cover image C. We use the Adam optimizer with a learning rate of 1e4, normalize the gradient norm as 0.25, clip the weight of the discriminator as [−0.1, 0.1], and train 32 epoch.
The experiments are conducted with the Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz, 64.00 GB RAM and one NVIDIA GeForce GTX 1080 Ti GPU.

Experimental Results
In our experiment, we used Div2k dataset (https://data.vision.ee.ethz.ch/cvl/DIV2K) to train and evaluate our model with 6 different data Depth ∈ {1, 2 . . . , 6}. We used 786 pictures for training and 100 pictures for validation. Data depth means that each pixel bit of the target randomly generates data tensor shape Depth × W × H. The mean values of extracted accuracy, RS-BPP, PSNR, and MS-SSIM for the test set are recorded in Tables 2-4.  We randomly selected cover images to generate samples (b) (d) from the Div2k dataset. As we can see in Figure 3, steganography GAN is an efficient method which generates highly similar images according to the cover images (a) (c). As Tables 2 and 3 show, they are image quality measurements and the relative load of the Basic and Dense models on the Div2k dataset. In all the experiments, our model shows the best performance on almost all the indicators compared with Zhang's [33]. Focusing on the Basic model, it performs significantly well compared with the Zhang's. Table 4 shows the extracted accuracy of the Decoder network which recovers secret information. Our dense model is close to Zhang's, but the basic model behaves better.

Discussion and Conclusions
In this study, GAN is used to synthesize the secret information and the cover image. At this point, the secret information is embedded in any position of the composite image. On this basis, a performance index of a steganography system based on deep learning is proposed, which is convenient for direct comparison with the traditional steganography algorithm. Our models adopt different convolution methods, and the experimental results prove that our models have a high payload, the cover image especially will not be modified in the process of hiding and extracting secret information, thus ensuring the security of secret information. Furthermore, we will consider how to combine GAN with relevance feedback, compensated for the lack of user intervention, to select cover images, to increase a user's overall quality of experience. Future steps for grouping relevant items together to make the system more efficient will be investigated.

Conflicts of Interest:
The authors declare no conflict of interest.