Lossless Image Steganography Based on Invertible Neural Networks

Image steganography is a scheme that hides secret information in a cover image without being perceived. Most of the existing steganography methods are more concerned about the visual similarity between the stego image and the cover image, and they ignore the recovery accuracy of secret information. In this paper, the steganography method based on invertible neural networks is proposed, which can generate stego images with high invisibility and security and can achieve lossless recovery for secret information. In addition, this paper introduces a mapping module that can compress information actually embedded to improve the quality of the stego image and its antidetection ability. In order to restore message and prevent loss, the secret information is converted into a binary sequence and then embedded in the cover image through the forward operation of the invertible neural networks. This information will then be recovered from the stego image through the inverse operation of the invertible neural networks. Experimental results show that the proposed method in this paper has achieved competitive results in the visual quality and safety of stego images and achieved 100% accuracy in information extraction.


Introduction
In recent years, steganography has become the focus of information security technology. With the popularization of the Internet and the development of multimedia technology, media such as books, audio, video, and images have been continuously copied and rapidly disseminated, enriching people's lives. However, this is followed by increasingly rampant information eavesdropping, infringements, and piracy activities, which seriously damage the legitimate rights and interests of creators and cause people to worry about information security issues. In order to solve these security problems in the network environment, scientists first use encryption technology to encrypt the information to be transmitted and then send the encrypted information out. The receiver receives the encrypted data and decrypts the data through the decryption algorithm designed by the sender, thereby restoring the original communication data. This method makes it impossible for eavesdroppers to understand the eavesdropped information, thereby improving the security of network information transmission. However, encryption technology turns the original information into meaningless messy information, which attracts the attention of attackers. In order to solve this problem, information-hiding technology has been proposed and has attracted widespread attention from researchers.
Steganography is a scheme that makes it impossible for anyone other than the intended recipient to know the event of the information transmission or the content of the message by embedding secret information into the cover image. Unlike cryptography, which makes information unreadable through different transformations of the text, steganography also hides the existence of information [1]. Image steganography embeds secret information into the cover image without destroying the original image, making the stego image Entropy 2022, 24, 1762 3 of 14 prevent information loss, they retain details of the input data. Ideally, the extraction of secret information and the recovery of cover images will be completely lossless without taking into account the rounding error when saving stego images. However, the rounding error of saving the image cannot be avoided, and there will be some differences between the original information and the secret information extracted by the reverse calculation when using the invertible neural networks. This means that the lossless extraction cannot be realized. Therefore, we propose converting the secret information into binary sequence and then embed it into cover image and use the appropriate threshold to recover the secret information while extracting and to prevent significant loss. In addition, we design a mapping module to increase the hidden capacity while ensuring 100% recovery for the message.

Steganography
According to the working domain and the application scope of the steganography algorithm, the traditional image steganography can be divided into three categories: steganography algorithm based on spatial domain, steganography algorithm based on transformation domain, and adaptive steganography algorithm. The most commonly used algorithm in the spatial domain is the least significant bit algorithm [32,33], which embeds the binary sequence into the lowest k significant bits of cover image. However, this method can easily produce artifacts in stego images and can be judged as stego images by steganalysis algorithm [34]. In order to solve the problem of poor robustness and low security for spatial domain steganography, researchers consider embedding information in the transformation domain. Discrete cosine transforms [8,35,36] and wavelet transforms [6,32,37,38] are typically used to convert images from the spatial domain to the transformation domain and then convert the image back into the spatial domain after embedding the message. The information-hiding method of transform domain is improved in terms of robustness and antidetection ability compared with the spatial domain. The adaptive steganography algorithm [39][40][41][42] selectively embeds secret information into areas with more-complex textures and more-obvious edges by analyzing the texture characteristics of the cover image, which ensures a good visual effect for the stego image and improves the ability of resisting detection.
Recently, with the vigorous development of deep learning, some researchers have applied deep-learning networks to image steganography. Among them, Rehman [43] proposed to train the end-to-end network for image steganography and designed an encoder and a decoder to generate stego images and to extract secret information. Compared with the traditional steganography method, Rehman [43] implemented embedding gray-scale images into color images of the same size. As a result, the hiding capacity was greatly improved, but the resulting stego images experienced obvious color distortion. Hayes [44] then proposed adding a discriminator to the end-to-end network to distinguish whether an image was a stego image or a cover image and to improve the performance of the generator and discriminator through adversarial training. On the basis of previous research, Tang [45] proposed a novel operation, which could deceive analyzers with convolutional neural network structures while embedding secret information, achieving better security performance. Yang [46] applied the U-Net network architecture in the generator and combined it with the generation adversarial networks, achieving a more advanced effect. Liao [47] utilized the correlation of RGB channels to adaptively allocate the embedded capacity of each channel, achieving better performance in resisting modern color image steganalysis. Hu [30] pointed out that embedded-based image steganography inevitably left traces of modification that would be detected by increasingly advanced steganalysis algorithms. Therefore, Hu [30] proposed a nonembedded steganography that did not need to modify the cover image to improve the antianalysis ability of stego images. In addition, in order to obtain greater hidden capacity, Baluja [17] first proposed embedding a color image entirely into another color image of the same size by using deep neural networks and then proposed to improve the network structure to enhance the security performance of stego images [18]. The above neural network-based methods usually have a good antidetection ability, but few methods can achieve lossless extraction for secret information.

Invertible Neural Networks
In recent years, invertible neural networks had attracted the attention of many researchers because of their excellent performance. The invertible neural network maps a high-dimensional complex distribution Px to a simple latent distribution Pz through a series of reversible transformations and utilizes the neural network to learn the mapping relationship between distributions Px and Pz. The forward process of the framework takes the high-dimensional complex data x as the input and outputs data z that conforms to a simple distribution. The reverse process is the generation process of a generative model, taking the sampled data z as input to generate high-dimensional complex data x. Dinh [48] was the first to propose invertible neural networks, and the basic architecture of which was the affine coupling layer popularized by the RealNVP [49] model. In addition, Dinh [49] introduced convolutional layers in the coupled model to suit image processing tasks. On the basis of invertible neural networks, Kingma [50] proposed Glow, a simple type of generative stream using an invertible 1 × 1 convolution and was capable of efficiently synthesizing large and subjectively realistic images. The basic working principle of invertible neural networks is to divide the input data into two parts, which are converted by arbitrarily function and are coupled in an alternating manner. However, the arbitrary function does not need to be reversible by itself. A deeply invertible network consists of a series of the above building blocks. Invertible neural networks have three characteristics: the map from input to output is bi-beamed, the size of the network of input and output must be consistent, and the Jacobian of the network is not equal to 0.
There had been several studies that applied invertible neural networks to image steganography. Xu [51] proposed a learning-based approach that effectively improved the robustness of stego images while maintaining their imperceptibility and large capacity. Jing [52] applied invertible neural networks combined with a wavelet transform to the hiding of a single color image, which had greatly improved the visual effect and extraction accuracy compared with the traditional method of embedding and extraction using two networks, separately. Lu [53] proposed hiding multiple color images in a color image of the same size, which greatly increased the hidden capacity of image steganography. Guan [54] added an importance map module on the basis of invertible neural networks, which guided the hiding of the next image with the hidden results of the current image, avoiding the problems of contour shading and color distortion caused by multi-image hiding. This also improved the invisibility of secret information. The above methods are all more relevant to the hidden capacity of the image rather than lossless extraction of secret information. In addition, invertible neural networks have a wide range of applications in the areas of image compression [55], image rescaling [56,57], image-to-image translation [58], and video superresolution [59].

Overview
The purpose of the steganography method proposed in this paper is to generate highquality stego images and extract embedded secret information without loss. As is shown in Figure 1, the method of this paper converts the secret information into binary sequence and then uses invertible neural networks to hide the binary sequence in the cover image, generating the stego image and the key file z. The inverse transformation takes the rounded image stego' and key file z as input and extracts secret_rev and cover_rev through the reverse operation of the invertible neural networks. Because of the inevitable rounding error when the stego is saved, the secret information cannot be accurately restored after the reverse operation. This means that the extracted secret_rev is not a binary sequence containing 0 and 1. To solve this problem, we set a threshold t that maps the value greater than t in the secret_rev to 1 and the value less than t to 0, thus obtaining the secret information consistent with the message. In addition, we add a mapping module map, which increases the hidden capacity of the method from 1 bpp to 3 bpp by compressing the secret information.
rounded image stego' and key file z as input and extracts secret_rev and cover_rev through the reverse operation of the invertible neural networks. Because of the inevitable rounding error when the stego is saved, the secret information cannot be accurately restored after the reverse operation. This means that the extracted secret_rev is not a binary sequence containing 0 and 1. To solve this problem, we set a threshold t that maps the value greater than t in the secret_rev to 1 and the value less than t to 0, thus obtaining the secret information consistent with the message. In addition, we add a mapping module map, which increases the hidden capacity of the method from 1 bpp to 3 bpp by compressing the secret information.  Figure 1 shows the overall network architecture proposed in this paper. Unlike many of the methods that currently exist, the embedding and extraction process proposed in this paper uses the same network, which is INNs (invertible neural networks) proposed by Dinh [48]. On the one hand, this can avoid the laborious process of having to train the two networks of encoder and decoder, thus reducing the workload of training the model. On the other hand, the INN mentioned in this paper essentially uses the neural network to scale the value of the secret information and then adds the corresponding position to the cover image. This ensures the natural similarity between the stego image and the cover image.

Network Architecture
The η, ρ, and φ in reversible blocks can be arbitrarily complex functions. In this paper, we use three widely used dense blocks [60] to represent them. Different from the residual network that establishes a skip connection between the front layer and the back layer, the dense blocks establish a dense connection between all the previous layers and the back  Figure 1 shows the overall network architecture proposed in this paper. Unlike many of the methods that currently exist, the embedding and extraction process proposed in this paper uses the same network, which is INNs (invertible neural networks) proposed by Dinh [48]. On the one hand, this can avoid the laborious process of having to train the two networks of encoder and decoder, thus reducing the workload of training the model. On the other hand, the INN mentioned in this paper essentially uses the neural network to scale the value of the secret information and then adds the corresponding position to the cover image. This ensures the natural similarity between the stego image and the cover image.

Network Architecture
The η, ρ, and ϕ in reversible blocks can be arbitrarily complex functions. In this paper, we use three widely used dense blocks [60] to represent them. Different from the residual network that establishes a skip connection between the front layer and the back layer, the dense blocks establish a dense connection between all the previous layers and the back layer, and the feature reuse is achieved through the connection of features on channel. The specific structure is shown in Figure 2. The mapping module is designed to improve extraction accuracy. If the sum of all the values of the binary information is greater than 1/2 of its length, the mapping module will interchange the values of 0 and 1, thereby reducing the amount of information actually embedded and increasing the hidden capacity of the method proposed by this paper. Correspondingly, when the secret information is extracted, the threshold t remains unchanged, the value greater than t in the secret_rev is mapped to 0, and the value less than t is mapped to 1. In the experiment, we find that when the capacity increases to 3 bpp, then even if the above mapping method is used, it is difficult to achieve a loss-free extraction; that is, the maximum error of extracting information will exceed the threshold t, which will lead to errors in the extraction process. Therefore, we try to map the binary data 0 and 1 to 0 and 0.5, which further reduces the amount of actual embedded information while setting the threshold to 0.25. The experiment proves that this method is feasible. The mapping process is detailed in Algorithms 1 and 2. layer, and the feature reuse is achieved through the connection of features on channel. The specific structure is shown in Figure 2. The mapping module is designed to improve extraction accuracy. If the sum of all the values of the binary information is greater than 1/2 of its length, the mapping module will interchange the values of 0 and 1, thereby reducing the amount of information actually embedded and increasing the hidden capacity of the method proposed by this paper. Correspondingly, when the secret information is extracted, the threshold t remains unchanged, the value greater than t in the secret_rev is mapped to 0, and the value less than t is mapped to 1. In the experiment, we find that when the capacity increases to 3 bpp, then even if the above mapping method is used, it is difficult to achieve a loss-free extraction; that is, the maximum error of extracting information will exceed the threshold t, which will lead to errors in the extraction process. Therefore, we try to map the binary data 0 and 1 to 0 and 0.5, which further reduces the amount of actual embedded information while setting the threshold to 0.25. The experiment proves that this method is feasible. The mapping process is detailed in Algorithms 1 and 2.  As is shown in Figure 1, our network uses N reversible blocks of the same structure, which is the affine coupling layer proposed by Dinh [49]. Among them, the first reversible block takes the cover image and secret_map as input, outputting cover1 and secret1. Each subsequent reversible block takes the output of the previous reversible block as input, and the last reversible block outputs the final result stego and key z. The specific calculation process is as follows: sec ret i+1 = sec ret i exp(ρ(cover i+1 ) +η(cover i+1 (2) where ϕ(•), ρ(•), and η(•) represent the three dense blocks and represents the Hadamard product. In the reverse operation of a reversible block, the input is the output of the forward operation, which are the stego and the key z. It can be derived from its reverse calculation process from Equations (1) and (2): After the last inverse transformation, the extracted secret information secret_rev is mapped to a sequence of 0 and 1 according to the threshold.

Loss Function
The overall loss function of this paper consists of two parts: the loss caused by the difference between the stego image and the cover image, and the loss caused by the difference between the secret_rev and the secret_map.
In forward computation, we expect the stego image to be as similar as possible to the cover image. This guarantees good visuals and avoids detection by steganography, thereby improving its safety. The calculation process for hidden losses is as follows: Loss enc = N ∑ n=1 L(cover (n) , stego (n) ) (5) where N refers to the number of training images in a batch, and L is used to measure the difference between cover image and stego image. We use the l2 distance here. The l2 distance, also known as the Euclidean distance, is the square root of the sum of squares of all elements of a vector. In the reverse calculation process, we expect the secret_rev to be as similar as possible to the secret_map, ensuring that the error between the two is within the appropriate range. In this way, the message will be restored without loss. The calculation process for the extraction loss is as follows: The overall loss is calculated as follows: where α and β are the parameters that balance the two losses.

Experimental Settings
The proposed method uses two datasets, coco2017 [61] and pascalvoc2012 [62]. There are five annotation types in the coco dataset, and the pascalvoc dataset consists of 20 categories. We randomly select 1000 images as the training set in the coco2017 dataset, and use  40,670 images from the coco2017 test set and 17,124 images from pascalvoc2012 to evaluate the method. During the training and the testing process, we use the center cropping strategy to crop the image to a size of 128 × 128. In this experiment, we set the number of reversible blocks to 8, the batch size to 2, and the parameters of the population loss function α and β to 1 and 10, respectively.
We use three steganography methods as a contrast, one of which is the traditional steganography method, LSB-ibit; the next is the most commonly used steganography algorithm, Rehman [43], which contains encoding and decoding networks; and the third one is steganography algorithm HiNet, which is based on invertible neural networks [52]. In order to ensure the visual quality of the stego image, we set the embedding capacity of the LSB method to 1 bpp. In addition, Rehman [43] and HiNet [52] were originally used to hide images, where Rehman [43] was used to hide one gray-scale image and HiNet [52] was used to hide one color image. In order to ensure that the above two methods are consistent with the message-hiding configuration in this paper, we make some modifications to the input structure of these two network models so that the data they embed are binary messages. Finally, for a relatively unbiased comparison, we train and test Rehman [43] and HiNet [52] using the same datasets mentioned in this paper.
In this paper, the effect of the method is evaluated from four aspects: visual quality, embedding capacity, recovery accuracy, and detection accuracy. Among them, the visual quality is evaluated by the PSNR (peak signal-to-noise ratio) and the SSIM (structural similarity) index. Embedding capacity bpp refers to the number of bits embedded per pixel. RA (recovery accuracy) refers to the degree of similarity of message_rev to message, that is, the ratio of the equal number of bits of the two binary sequences to their length. Because the data extracted by HiNet [52] and Rehman [43] are not the integers 0 and 1, we round the extracted result. DA (detection accuracy) refers to the correct rate at which an image is determined to be a stego image using the steganalysis algorithm. Table 1 shows the PSNR and SSIM of LSB-1bit, Rehman [43], HiNet [52], and our method. As is shown in the table, both indicator values of our method are significantly better than the others. Specifically, on the coco2017 and pascalvoc2012 datasets, the PSNR of our method improves by 11.4114 db and 13.3013 db, respectively, compared with the second-best-performing results. This is because the proposed method compresses the input data, reduces the actual number of data embedded in the cover image, and greatly improves the hiding performance of the model. So we can achieve better results than the other three methods can. In addition, we can see similar improvements in the SIMM.  Table 2 demonstrates the embedding capacity (C) and recovery accuracy (RA) of LSB-1bit, Rehman [43], HiNet [52], and our method on the coco2017 and pascalvoc2012 datasets. As is shown in the table, the recovery accuracy of Rehman [43] is less than 60, while the HiNet [52] method has been greatly improved over Rehman [43]; LSB-1bit can achieve lossless recovery for secret information on both the coco2017 and the pascalvoc2012 datasets; and our method improves the hidden capacity and antidetection ability of images while ensuring lossless recovery. Among them, Rehman [43] and HiNet [52] cannot fully extract the information, because the secret information undergoes complex linear and nonlinear calculations during embedding and extraction, resulting in inevitable rounding errors. Further, our method can achieve lossless recovery of information because of the reversibility of the invertible neural network itself and, on the other hand, because the mapping module can restore the extracted data to the original binary data through the suitable threshold t. The combination of the above two modules enables our method to achieve a complete extraction of information.  Figure 3 shows the visual effect of stego-image-embedded secret information using our method and the residual plot between the stego image and the cover image. The residual plots of the fourth column are the results of 40× magnification, and the residual plots of the fifth column are the results of 200× magnification. As is shown in the figure, the stego image and the cover image of the proposed method are difficult to distinguish with naked human eye, and the residual plot between the two is almost purely black, which shows that the difference between our stego image and the cover image is extremely small.  Figure 4 shows the stego images generated by LSB-1bit, Rehman [43], HiNet [52], and our method, as well as the residual plot between the stego image and the cover image. The residual plots of the fourth row are the results of 40× magnification. As is shown in the figure, the HiNet [52] and the LSB-1bit have no obvious visual differences from the cover image. The stego image of the Rehman [43] have more-obvious color distortion, the  Figure 4 shows the stego images generated by LSB-1bit, Rehman [43], HiNet [52], and our method, as well as the residual plot between the stego image and the cover image. The residual plots of the fourth row are the results of 40× magnification. As is shown in the figure, the HiNet [52] and the LSB-1bit have no obvious visual differences from the cover image. The stego image of the Rehman [43] have more-obvious color distortion, the residual plot of which is also the most pronounced. Compared with the above three methods, our method not only recovers secret information without loss but also shows outstanding performance in visual effects. This is also mainly due to the compression of the input data, which reduces the actual number of embedded data, thus minimizing the residual between the stego image and the cover image. Comparison of residual plots of the cover images and the stego images using LSB-1bit, Rehman [43], HiNet [52], and our method.

Steganalysis
Steganalysis is the method of measuring the security of stego images. The mainstream steganalysis methods can be divided into statistical steganalysis methods and deep-learning-based steganalysis methods. We use the open-source tool StegExpose for statistical steganalysis. Specifically, we constantly modify the detection threshold on a scale of 0 to 1 to get different true positive rate and false positive rate values and connect all the resulting points into an ROC curve. In Figure 5, we can see that StegExpose does not work well on our method and that the detection accuracy is close to random guessing. This shows that the stego images generated by our method are highly secure and can resist the detection of traditional steganalysis. Comparison of residual plots of the cover images and the stego images using LSB-1bit, Rehman [43], HiNet [52], and our method.

Steganalysis
Steganalysis is the method of measuring the security of stego images. The mainstream steganalysis methods can be divided into statistical steganalysis methods and deep-learning-based steganalysis methods. We use the open-source tool StegExpose for statistical steganalysis. Specifically, we constantly modify the detection threshold on a scale of 0 to 1 to get different true positive rate and false positive rate values and connect all the resulting points into an ROC curve. In Figure 5, we can see that StegExpose does not work well on our method and that the detection accuracy is close to random guessing. This shows that the stego images generated by our method are highly secure and can resist the detection of traditional steganalysis. SRNet [63] is a steganography detector based on deep convolutional neural networks, to distinguish the stego image from the cover image. The closer the test result is to 50%, the better the steganography performs. The results of the steganalysis are shown in Table  3, from which we can see that the steganalysis results of the LSB-1bit method, the Rehman method [43], and the HiNet method [52] on both datasets are all above 96%, which suggests that the SRNet method can easily distinguish whether the image is a cover image or a stego image. However, our method has a detection rate of less than 52% on both datasets, which illustrates that our method is more secure than the other three methods.

Ablation Experiment
The mapping module proposed plays a key role in recovery accuracy and embedding capacity. In Table 4, w/ mapping means that the model contains a mapping module, and w/o mapping means that there is no mapping module in the model. As is shown in this table, when the embedding capacity is 1 bpp, it can be extracted without loss because the amount of embedded information is small enough. When we increase the embedding capacity to 3 bpp, only the addition of a mapping module can achieve 100% recovery accuracy, which illustrates the effectiveness of the proposed method.  SRNet [63] is a steganography detector based on deep convolutional neural networks, to distinguish the stego image from the cover image. The closer the test result is to 50%, the better the steganography performs. The results of the steganalysis are shown in Table 3, from which we can see that the steganalysis results of the LSB-1bit method, the Rehman method [43], and the HiNet method [52] on both datasets are all above 96%, which suggests that the SRNet method can easily distinguish whether the image is a cover image or a stego image. However, our method has a detection rate of less than 52% on both datasets, which illustrates that our method is more secure than the other three methods.

Ablation Experiment
The mapping module proposed plays a key role in recovery accuracy and embedding capacity. In Table 4, w/ mapping means that the model contains a mapping module, and w/o mapping means that there is no mapping module in the model. As is shown in this table, when the embedding capacity is 1 bpp, it can be extracted without loss because the amount of embedded information is small enough. When we increase the embedding capacity to 3 bpp, only the addition of a mapping module can achieve 100% recovery accuracy, which illustrates the effectiveness of the proposed method.

Conclusions
In this paper, we proposed an image steganography framework on the basis of invertible neural networks, which used forward and backward propagation of the same network to complete the embedding and extraction of information and achieve good visual effects for stego images and lossless recovery for secret information. Among them, we designed a mapping module to convert messages or any other type of data into binary form, and then we compressed the binary sequence to increase the hidden capacity of the method. Comprehensive experiments demonstrated that our method is highly competitive in extraction accuracy, safety, and visual effects, and it significantly both quantitatively and qualitatively outperforms other existing methods.