1. Introduction
Deep image steganography aims to conceal secret messages in cover images imperceptibly. The secret messages are only allowed to be recovered by the informed receiver while being invisible to others, which secures their transmission without being noticed [
1,
2]. Henceforth, image steganography has been applied in various domains, such as information security [
3], data communication [
1], and copyright protection [
4].
In the image steganography task, the primary requirements converge to capacity, extraction error, and security. The embedded steganography (ES) generally selects an existing image as a cover and then embeds secret information into the cover image with a slight modification. However, these traditional ES steganography methods [
4,
5] have limited payload capacity. To further increase payload capacity, deep learning-based ES steganography methods have been recently proposed to achieve both acceptable imperceptibility and a small extraction error of the secret message [
6]. However, since all these ES methods need to modify the cover image, the modified cover image always contains a subtle pseudo-shadow of the secret message, especially under a high hiding payload. This leads to potential risks of exposing the secret message through compromising the cover image using steganalysis tools.
Instead of directly embedding the secret message into a cover image, steganography without embedding (SWE) is an emerging concept of hiding a secret message without a cover image, which eliminates the modification traces observed in ES methods. Thus, SWE has the unique advantage of reducing the risk of secret message breaches from typical steganalysis [
7]. Although current SWE approaches have achieved remarkable results, there still exist some fatal drawbacks. There are two types of SWE techniques. (1) Mapping-based methods transform the secret message into a sequence of image hashes selected from an existing image set [
8,
9]. These mapping-based methods require the construction of fixed image mapping rules, which do not accommodate the dynamic growth of images. (2) Alternatively, generating-based methods synthesize images by passing the secret message into a deep generator network, e.g., a generative adversarial network (GAN) [
10,
11]. However, due to the instability of the generative network and the irreversibility of the generative process, a critical weakness is that the payload capacity is extremely limited, especially for hiding large secret images. As shown in
Table 1, the maximum hiding capacity of the existing works without embedding is 4, and the hiding type can only be a bit. In order to achieve image-to-image steganography without embedding, the hiding capacity must be at least 24 BPP. For multi-image hiding, it needs a higher hiding capacity. Moreover, it is difficult to minimize the message extraction error while keeping the visual quality of the generated stego images [
11].
In this paper, we propose a novel DF-SWE approach to tackle the above issues of current SWE methods. Unlike conventional GANs that rely on implicit and non-invertible mappings between latent and data spaces, our framework leverages OpenAI’s Glow [
16], a flow-based generative model that explicitly constructs an invertible transformation between the input space and the latent space. Specifically, Glow introduces invertible 1 × 1 convolutions and affine coupling layers, ensuring that every transformation within the model is mathematically bijective and that the Jacobian determinant of each layer is tractable (i.e., can be computed exactly and efficiently). This property allows Glow to directly evaluate the exact log-likelihood of data samples, unlike GANs that rely on adversarial training without a likelihood objective. In steganographic tasks, flow-based models are advantageous because they offer stable and reversible mappings, while traditional GANs often suffer from training instability and non-invertible transformations. The bijective structure of Glow enables accurate forward and inverse processes, allowing information embedding and recovery with high fidelity, something difficult to achieve with the implicit, one-directional mappings of GANs. Our approach significantly enhances the payload capacity and can hide large images without cover images. To the best of our knowledge, DF-SWE can hide multiple secret images at one time, which greatly extends the capability of SWE-based methods. In addition, DF-SWE reduces the extraction error, which is attributed to the reversibility of the hiding and restoring processes by the invertible bijective mapping. Furthermore, DF-SWE guarantees the quality of the generated stego images to enhance the imperceptibility of the secret images.
In sum, our novel DF-SWE method achieves state-of-the-art steganographic performance in the payload capacity, extraction error, and stealthiness of hiding large images. Intriguingly, DF-SWE shows a capability of domain generalization, which makes it applicable to privacy-critical, resource-limited scenarios.
The detailed contributions are as follows:
High payload capacity: DF-SWE works towards image-to-image generative steganography. Our payload capacity (BPP) achieves 24–72 BPP and is 8000∼16,000 times more than existing SWE methods. Moreover, DF-SWE is the first method to achieve multiple secret images hiding without embedding.
Low extraction error: We propose the reversible circulation of double flow to build a reversible bijective transformation between secret images and generated stego images. It is worth noting that reversible circulation of double flow is an invertible process. Hence, we can invert a secret image from a stego image in a nearly lossless manner.
Enhanced stealthiness: According to the experimental results, the proposed DF-SWE shows better hiding performance, providing diverse and realistic images to minimize the exposure risk compared to the prior steganography works. Meanwhile, our proposed SWE also achieves better security performance against steganalysis detections.
Domain generalization: Our experiments show that, once trained, DF-SWE can be applied in the steganography of secret images from different domains without further model training or fine-tuning. This property makes DF-SWE the first domain-agnostic steganography method that can be applied to unseen private data and executed on resource-limited systems.
This paper is organized as follows.
Section 2 introduces the related work.
Section 3 briefly describes the Glow model as a backbone network.
Section 4 elaborates the proposed DF-SWE method.
Section 5 presents and discusses the experimental results. A discussion and future work are drawn in
Section 6.
2. Related Work
Most existing steganographic approaches are embedded steganography (ES), which embeds the secret information imperceptibly into a cover image by slightly modifying its content. However, the modification traces of the embedded steganography will cause some distortion in the stego image, especially when embedding color image data that usually contains thousands of bits, making them easily detected by steganalysis. Steganography without embedding is proposed to improve security, which does not need to modify the cover image.
2.1. Embedded Steganography
Raditional ES methods: The Least Significant Bits (LSB) [
17] only modified the information of the last few bits, so it would not cause visible changes in the pixel values of the picture. In addition, LSB also had many variations [
18,
19]. For example, an information hiding technique [
20] has been proposed by utilizing the least significant bits (LSBs) of each pixel of a grayscale image, adopting XOR features of the host image pixels. Additionally, HUGO [
21] was proposed, and the main design principle was to minimize the properly defined distortion through an efficient coding algorithm. There are steganographic algorithms not only in the spatial domain, but also in the frequency domain, such as J-UNIWARD [
22], UED [
23], I-UED [
24], UERD [
25].
Deep learning-based ES methods: Baluja [
6] proposed an autoencoder architecture placing a full-size image in another image of the same size. After this, Wu et al. [
26] proposed an encoder–decoder architecture, where the cover image and the secret image were concatenated using Separable Convolution (SCR) with a residual block. Additionally, Zhang et al. [
27] combined the method of adversarial examples for steganography. Replacing the encoder–decoder architecture. CycleGAN-based methods that [
28,
29] had proposed for image steganography. Furthermore, Zhang et al. [
30] proposed ISGAN, which improved the invisibility by hiding the secret image only in the Y channel of the cover image. Wang et al. [
5] designed a multi-level feature fusion procedure based on GAN to capture texture information and semantic features. Recently, an Invertible Network was proposed for image hiding. Due to the reversible nature of an Invertible Network, HiNet [
31] significantly improves the restored quality of the secret image. Based on this, DeepMIH [
32] was proposed to hide multiple images and achieved excellent performance compared with ES methods.
2.2. Steganography Without Embedding (SWE)
Mapping-based SWE methods: In 2016, a bag-of-words (BOW) model was proposed to construct the mapping relationship between the dictionary and the words [
8]. Furthermore, Zheng et al. [
9] proposed robust image hashing, which calculated the scale-invariant feature transform (SIFT) points in 9 sub-images. Cao et al. [
33] divided the pixel values from 0 to 255 into 16 intervals, and built a mapping relationship with the bit string of length 4. After this, Qiu et al. [
34] first hashed the local binary pattern (LBP) features of the cover image and the secret image, and then the hashes were matched to create the hidden image. Additionally, a CIS algorithm based on DenseNet feature mapping was proposed [
35], which introduced deep learning to extract high-dimensional CNN features mapped into hash sequences. Based on GAN, a Star Generative Adversarial Network (StarGAN) was proposed to construct a high-quality stego image with the mapping relationship [
36].
Generating-based SWE methods: Stego-ACGAN was proposed to generate new meaningful normal images for hiding and extracting information [
10]. In 2018, Hu et al. [
12] mapped secret information into noise vectors and used DCGAN to generate a stego image. After this, Zhu et al. [
37] proposed a coverless image steganography method based on the orthogonal generative adversarial network, adding constraints to the objective function to make the model training more stable. For improving the steganography capacity and image quality, A GAN steganography without embedding that combines adversarial training techniques was proposed [
38]. Then, the attention-GAN model was proposed for steganography without embedding [
11]. Additionally, Liu et al. [
14] proposed IDEAS based on GAN, which disentangled an image into two representations for structure and texture and utilized the structure representation to improve secret message extraction. Different from GAN-based approaches, Generative Steganographic Flow (GSF) [
39] built a reversible bijective mapping between the input secret data and the generated stego images and took the stego image generation and secret data recovery process as an invertible transformation. After this, Zhou et al. [
15] proposed a secret-to-image reversible transformation (S2IRT), where a large number of elements of the given secret message were arranged into the corresponding positions to construct a high-dimensional vector. Then, the vector is mapped to a generated image. In addition, the aforementioned methods are all limited to single-image steganography, whereas our DF-SWE is capable of accomplishing multi-image steganography.
2.3. Comparison with DF-SWE
Unlike these SWE methods, we propose DF-SWE, which can hide multiple secret images into one image with the same size, bringing higher hiding capacity without losing the naturalness of the stego images. Meanwhile, we build a reversible bijective transformation between the secret images and the generated stego images, reducing the extraction error of the secret images.
3. Backbone Network
In this paper, we propose a double-flow-based model to build a reversible bijective transformation between secret images and a generated stego image. Our flow-based backbone network relies on Glow [
16]. The flow-based model is commonly used in image generation tasks by learning a bijective mapping between the latent space of simple distributions and the image space with complex distributions.
In flow-based generative models, the generative process is defined as follows:
where
z is the latent variable and
is usually a multivariate Gaussian distribution
. The function
is invertible, such that for a given a datapoint
x, latent-variable inference is performed by
. For brevity, we will omit subscript
from
and
. The function
f is composed of a sequence of transformations:
, such that the relationship between x and z can be written as follows:
where
is a reversible transformation function and
is the output of
.
Under the change of variables of Equation (
2), the probability density function of the model for a given datapoint can be written as
The network architecture of Glow comprises three modules, namely, the squeeze module, the flow module, and the split module. The squeeze module is used to downsample the feature maps, and the flow module is used for feature processing. The split module will divide the image features into halves along the channel side, and half of them are output as the latent tensor.
4. Methodology
DF-SWE builds a reversible circulation of double flow to generate stego images and hide secret images. In the reversible circulation of double flow, there are three strategies, i.e., prior knowledge sampling, high-dimensional space replacement, and distribution consistency transformation. In the following section, we propose a problem definition and a threat model. Based on this, we explain the reversible circulation of double flow and the hiding and restoring processes of DF-SWE in detail.
4.1. Problem Definition and Threat Model
Given a set of k secret images, an SWE encoder transforms the secret images into random noises , and is a transformation from to for secret image hiding. In closing, a generator produces a stego image from the noise . To maximize the reconstruction performance of the secret images, we propose using an invertible function for both and . That is, after taking the inverse and a transformation from to , the secret images can be revealed through .
In our threat model, the attacker has access to a public training dataset for training the steganography model. During the attacking phase, an attacker gathers the secret images and generates the stego image by a composition of and . Once the stego image is delivered to the recipient, the recipient recovers the secret images by the inverse of the same stego model . Moreover, the trained can be reused for various secret images, even those coming from different domains.
Inspired by Glow [
16], DF-SWE uses the double-flow-based model to build a reversible bijective transformation between secret images and generated stego images. The DF-SWE network takes a secret image as its input to generate a realistic stego image. Later on, it can directly recover the hidden secret image from the stego image via the reversible transformation.
As illustrated in
Figure 1, the key components of our DF-SWE network are the double-flow-based models and the reversible circulation of double flow. A flow-based model (
) can be regarded as an encoder to encode secret images
into multivariate Gaussian distributions, while another one (
) can be seen as a generator to generate stego images
from multivariate Gaussian distributions. Due to the invertibility of the flow model, the two flow models can be considered as decoders to extract secret images. If we construct a reversible circulation of double flow,
can be generated by
and
can be extracted from
through our strategy. More specifically, the latent tensor
and
.
L is the depth of the architecture.
We use two Glow models to learn multivariate Gaussian distributions of the secret images and the stego image , separately. Given functions and , we have , .
The existing flow model (Glow) implements a mapping relationship between the distribution of
z and that of the generated image. In contrast, large image steganography without embedding is a generative task from one image to another. Hence, the core task of image-to-image steganography without embedding is to construct a mapping between the secret image
and the stego image
while ensuring the mapping is reversible to enhance the extraction quality of
. This task can be formulated as follows:
is a transformation from a multivariate Gaussian distribution
to another multivariate Gaussian distribution
. Consequently, the core task is for the transformation
t to construct a reversible circulation in the double flow model to hide the secret image in the generated stego image and keep it reversible. It should be noted that since t is an invertible transformation, the information can be regarded as lossless under ideal conditions. For example, before applying the reversible transformation, one can incorporate a decoupled encryption scheme [
40] to first encrypt
and then convert it into
. In this way, even if an attacker has access to the public training dataset and attempts to reconstruct
back to
, the cryptographic protection ensures that no valid information can be obtained.
4.2. Reversible Circulation of Double Flow
For transmitting to and keeping it reversible, we divide the task of into three tasks that need to be solved.
How to initialize ?
How to transmit to ?
How to reduce the distortions on generated stego images?
In order to solve the above issues, we propose three techniques named prior knowledge sampling, high-dimensional space replacement, and distribution consistency transformation. We use the latent variables of z and its variants (e.g., ) to describe the circulation of two flows at different stages after different operations.
4.2.1. Prior Knowledge Sampling (PKS)
For initializing
, we utilize the prior knowledge of the generator of Glow. Firstly,
z is sampled from
and the generated image
is generated from a Glow model
. The process can be formulated as follows:
During the generation of
,
utilizes prior knowledge of Glow parameters to generate an image, and the generation is irreversible. Next, we obtain the initialized
by a sequence of invertible transformations, which can be formulated as follows:
4.2.2. High-Dimensional Space Replacement (HDSR)
For transmitting to and reducing the generated stego image distortion, we proposed the high-dimensional space replacement.
In the backbone network (Glow), each of the
L layers of feature maps in
is divided into halves along the channel dimension into two sets. Half of the sets are output as the latent tensor
, and the other half of the sets are cycled into the squeeze module. Hence,
contains different levels of information about the image. As shown in
Figure 1,
and
. Particularly, we find that the latent tensor from shallow layers of
has a greater effect on the reversibility of the image. If
is replaced with
directly, it will cause the distortion of the stego image due to the distribution differences between
and
.
Since different latent tensors of
have different effects on the reconstruction of the image, we propose high-dimensional space replacement, which replaces the high-dimensional distribution of generated images with the low-dimensional distribution of secret images. Our technique follows the principle of minimum information loss. As shown in
Figure 2,
is replaced with the concatenated
. For brevity, we abbreviate this process as that
is replaced with
. The
of the secret image is circulated to the
of the stego image, reducing the impact of the secret image and stego image generation. During the secret image extraction phase,
is replaced with
.
4.2.3. Distribution Consistency Transformation (DCT)
High-dimensional space replacement has circulated of secret image to the of the stego image and reduced the generated stego image distortion. For further improving the quality of image generation and reducing the generated image distortion, we propose distribution consistency transformation, which can decrease the distribution discrepancy between and .
As shown in
Figure 2, the distribution consistency transformation is implemented in the high-dimensional space replacement. Because flow-based generative models learn a reversible bijective transformation between images and a multivariate Gaussian,
and
obey the Gaussian distribution. Hence, the most important thing to measure the Gaussian distribution is its mean and variance.
Based on this, our proposed distribution consistency transformation is to maintain the consistency of the mean and variance between two distributions. Distribution consistency transformation is defined as follows:
Equations (
8)–(
10) can achieve the reduction of the distribution discrepancy between
and
. During the secret image extraction phase, the reversible transformation of distribution consistency transformation is expressed as Equation (
11):
4.3. Hiding and Restoring Processes
In this section, we will describe the secret image hiding and restoring processes in detail. As shown in
Figure 3, DF-SWE comprises two stages: a secret image hiding phase and an extracting phase.
4.3.1. Hiding Process
Figure 3a describes the hiding phase, which can hide large images without embedding.
and
are two different Glow models. Firstly, as shown in Step 1,
randomly samples a Gaussian distribution
z to generate an image
utilizing prior knowledge of
. Based on the generated image
, we use the reversible operation of
to obtain an initialized distribution
in order to better carry the secret flow. Secondly, as shown in Step 2,
encodes the secret image as
by the reversible operation of
. Specifically, Steps 1 and 2 can run in parallel or exchange their sequences. Through the operation of the high-dimensional space replacement and distribution consistency transformation on Step 3,
can be passed to
to generate a stego image. Meanwhile, the hiding phase maintains reversibility for extracted secret images. Finally,
will be generated by
utilizing the
in step 4.
4.3.2. Restoring Process
As shown in
Figure 3b, the extracting phase is the inverse process of the hiding phase. Hence, DF-SWE can extract the secret image with high quality because we construct an invertible mapping of the secret and stego images. Firstly, the stego image is decoded as
by utilizing the reversible operation of
in Step 5. And then, through the reverse operation of high-dimensional space replacement and distribution consistency transformation of Step 6,
can be passed to
to extract the secret image. The reverse operation of high-dimensional space replacement and distribution consistency transformation is described in detail in
Section 4.2.2 and
Section 4.2.3. Finally,
extracts the secret image
with high quality in Step 7.
5. Experimental Results
5.1. Experimental Setup
To demonstrate the superiority of DF-SWE, we compare it with six state-of-the-art SWE methods, namely DCGAN-Steg [
12], SAGAN-Steg [
11], SSteGAN [
13], WGAN-Steg [
41], CycleGAN [
42], and CRoSS [
43]. To verify the extraction quality of the secret images, we also compare DF-SWE with ES methods, including 4 bit-LSB, Baluja [
2], Weng et al. [
44], and HiDDeN [
1].
Our DF-SWE and baseline models are trained on the datasets of Bedroom (subsets of LSUN, including 3,033,042 color images) [
45], LFW [
46] (including 13,234 color images), and CelebA [
47] (including 202,599 color images). We train DF-SWE with the hyper-parameter
.
L is the depth of the model. The greater the depth of the model, the better the quality of the generated images, but the model parameters and computational resources increase. Therefore, the hyper-parameter
L can be set according to actual requirements. Additionally, the steganography process is completed in less than a second on a GPU RTX3090, with
. Therefore, our proposed method has excellent performance in real-time applications.
We evaluate the hiding capacity of DF-SWE by comparing the bits per pixel (BPP), BPP = , which is the number of message bits hidden per pixel of the encoded image. H/W is the height/width of stego images. Meanwhile, we evaluate the detection error (Pe), , where and represent the probabilities of false alarm and missed detection rate, respectively. ranges in , and its optimal value is 0.5. As a proxy to secrecy, we can also measure the secret image extraction performance using peak signal-to-noise ratio (PSNR), Root Mean Square Error (RMSE), and Structure Similarity Index Measure (SSIM). A larger value of PSNR, SSIM, and a smaller value of RMSE indicate higher image quality, which are formulated as follows:
RMSE: Root Mean Square Error (RMSE) measures the difference between two images. Given two images
X and
Y with width
W and height
H, RMSE is formulated as follows:
where
and
indicate the pixels at position
of images
X and
Y, respectively.
PSNR: Peak signal-to-noise ratio (PSNR) is a widely used metric to measure the quality of an image. PSNR is defined as follows:
where
R represents the maximum value of images, which is usually set as 255.
SSIM: Structural Similarity Index Measure (SSIM) is another commonly used image quality assessment based on the degradation of structural information [
48]. SSIM is computed by the means
and
, the variance
and
, and the co-variance
, as follows:
where
and
L is the dynamic range of the pixel values. The default configuration of
is 0.01, and
is 0.03.
5.2. Evaluation by Image Hiding Quality
Figure 4 compares our DF-SWE with SWE methods on the bedroom dataset (subsets of LSUN). Since SWE methods hide secret messages without embedding modifications and are immune to typical steganalysis tools, visual quality is crucial. From
Figure 4, we can see that images generated by DF-SWE have higher capacity and are more realistic with the FID (Fréchet Inception Distance) than those of the competitors. FID is a metric of image generation, and a lower FID score means that the generated image is more realistic. There are noticeable distortions in the stego images generated by these SWE methods.
Examples of stego images generated by DF-SWE are given in
Figure 5 and
Figure 6, which show the hiding quality of images in sizes
and
, respectively. It can be observed that the stego images leak no information about the secret images. Only through
and reversible circulation of double flow, can the secret image be extracted from the stego image.
and
have hundreds of millions of parameters and different network structures, which makes decrypting the secret images difficult.
Once trained, DF-SWE can be generalized to hiding images from various domains.
Figure 5 and
Figure 6 show secret images and generated stego images, in different domains. For example, the LFW-CelebA signifies that the secret image is randomly selected from the LFW dataset, and the generated stego image is similar to the style of the CelebA dataset. From
Figure 4, we can see that images generated by DF-SWE are more realistic, and extracted secret images have nearly lossless extraction quality.
5.3. The Extraction Quality Compared with Prevalent Methods
Table 2 lists the performances of information extraction accuracy of different steganographic approaches, i.e., DCGAN-Steg, SAGAN-SSteGAN, WGAN-Steg, IDEAS, and S2IRT, with the increase in hiding payloads. From this table, it is clear that DF-SWE achieves much higher information extracted accuracy than SWE approaches under different hiding payloads. The extracted accuracy rates of DF-SWE are kept at a very high level when the hiding payload ranges from 1 BPP to 4 BPP. Additionally, the proposed generative steganographic approach can achieve a high hiding capacity (up to 12 BPP) and the accurate extraction of the secret message (almost
accuracy rate), simultaneously. Even when hiding images (BPP = 24), the extracted accuracy achieves 0.5124 and the pixel errors of the extracted images are mostly ranged in
. That is because DF-SWE built an image-to-image reversible bijective mapping, reducing the extraction error. In contrast, the extracted accuracy of other SWE methods decreases with the increase in hiding payload. At high hiding capacity, existing SWE methods cannot hide secret messages or generate stego images that are twisted and distorted. Thus, existing SWE methods cannot achieve accurate information extraction under high hiding payloads.
The extraction metrics of the different ES methods are given in
Table 3, which describes the extraction quality of secret images by PSNR, SSIM, and RMSE. The columns of LSUN, CelebA, and LFW represent the experimental results under different datasets. Unlike our DF-SWE, to build an image-to-image reversible bijective mapping, existing SWE methods directly write the secret message into a latent space and generate the image directly from the latent space. It is difficult to balance the hiding capacity and generation quality. Since the existing SWE methods face the problem of low hidden capacity and the incapability of hiding secret images of the same size, we compared DF-SWE with ES methods to verify the extraction quality of the secret images. In particular, ES methods usually have a better extraction performance than SWE methods, because ES methods have cover images to hide the secret image and do not consider the generated quality. On the contrary, SWE methods require a plausible visual quality of both the generated stego image and the recovered secret image. From
Table 3, it is evident that DF-SWE outperforms all other methods, providing better secret image extraction quality.
5.4. Security Evaluation by Steganalysis
We horizontally compared the performance of the proposed DF-SWE with that of YeNet [
49] in
Table 4, and vertically evaluated the performance of DF-SWE against different steganalyzers (i.e, DFNet [
50], ESNet [
51], LWENet [
52], and SiaStegNet [
53]) in
Table 5. To ensure experimental fairness, for the same steganalyzer, we adopted the model trained with identical hyperparameters (e.g., learning rate, number of training epochs, and dataset); for different steganalyzers, the same evaluation metrics were employed. The performance of the steganalysis was quantified using the Probability of Error (Pe) metric. The optimal value of the detection error (Pe) is 0.5, at which point the steganalyzer (Ye-net [
49]) fails to distinguish the source of images and can only conduct random guessing. Most Existing Steganography (ES) methods exhibit inadequate steganographic security, whereas the proposed Steganography with Wasserstein Estimation (SWE) achieves superior security performance with higher Pe values. Compared with existing SWE schemes, DF-SWE has demonstrated significant advancements in multiple aspects. Specifically, its payload is over 8000 times higher than that of other counterparts. As illustrated in
Table 4 and
Table 5, the proposed DF-SWE has achieved Pe values that outperform most of the work and maintain robust performance under the different steganalyzers.
5.5. Multiple Image Hiding
Most image hiding work can only hide a secret image within a cover image. However, it is not applicable to hide multiple secret images in an image when specific integrated or sequentially related multiple images are not separable. Especially in image steganography without embedding, there is no need to do multiple image hiding, and our method is the first proposed to achieve multi-image hiding without embedding.
In this section, we demonstrate the experimental results of DF-SWE for hiding multiple images in sizes of
and
in
Figure 7 and
Figure 8, respectively.
is the
i-th secret image and
is the
i-th extracted image from the Stego image (Stego) with respect to
. It can be observed that, even though there are three images (i.e.,
) hidden in the same stego image, the generated stego images remain natural. Moreover, the recovered secret images are nearly lossless. The related extraction of multiple image hiding is given in
Table 6.
5.6. Domain Generalization
We define domain generalization as the ability of a steganographic model to preserve its hiding–extraction functionality when applied to images drawn from distributions that differ from the training domain. Current image steganography usually requires that the secret images to be hidden are from the same domain as the samples used to train the steganography model. However, it is expensive to train individual steganography models for images from new domains. Furthermore, collecting training data from particular domains could be difficult due to data privacy or other concerns. Therefore, existing methods cannot achieve image steganography when accessing images from the same domain as the secret images, which is prohibited. However, as depicted in the workflow of
Figure 3, the entire process of DF-SWE involves two models, as shown in Step 2 and Step 4. Specifically,
is utilized for encoding the secret image
, while
is employed for generating the stego image
. This indicates that the two components: hiding the secret information and generating the stego image, are decoupled. Additionally, when
generates images, compared with the noise z (sampled from random noise) used for generating the normal image
, the adopted noise
only replaces the part containing low-density information (Step 3). For
, the distribution of
is nearly identical to that of z, thus enabling the generated image
that is almost consistent with
. Based on the aforementioned characteristics, DF-SWE inherently possesses domain generalization capability. The extraction metrics of our DF-SWE on the random dataset (i.e., Stanford Dogs) are presented in
Table 7. Although these metrics are not as optimal as those in
Table 3, the method still maintains favorable visual performance. As shown in
Figure 9, images in the first three columns on the left side are from the Stanford-dog dataset, and the other images are randomly selected from the Internet. All the images have totally different distributions from those of the images used to train DF-SWE. According to
Figure 9, DF-SWE can successfully hide and recover these images with a satisfactory visual quality. This property greatly boosts the capability of DF-SWE and makes it the first domain-agnostic steganography method.
5.7. Ablation Experiment
Figure 10 performs an ablation analysis of three tactics employed by DF-SWE, which are prior knowledge sampling, high-dimensional space replacement, and distribution consistency transformation. The first three, middle three, and last three columns of images are the effect of different tactics on the LFW, CelebA, and LSUN datasets, respectively.
In the first row, the generated stego image is abnormal, particularly in the first three columns. The main change of the direct replacement tactic is that z is replaced with directly without utilizing prior knowledge of . The high-dimensional space replacement is our proposed tactic shown in the second row, which uses the low-dimensional space of to replace the high-dimensional space of . We can see that high-dimensional space replacement effectively generates realistic images, but this technique is not adequate for the abnormal images of the first three columns. The prior knowledge sampling is our proposed method. In the third row, is replaced with , which utilizes prior knowledge and a multivariate Gaussian latent-variable z. The distribution consistency transformation is proposed to reduce the distortion from the difference between the two distributions. The fourth row is that z is replaced with , but is changed by the distribution consistency transformation. In the last three columns, the generated stego image is more normal than the first row.
The fifth row combines our proposed prior knowledge sampling and distribution consistency transformation. is replaced with which is modified by the distribution consistency transformation. In the last three columns, the quality of the generated image is a significant improvement compared with the first row. In the sixth row, the first three columns indicate that only high-dimensional space replacement and distribution consistency transformation cannot generate a realistic image. Compared with the second and seventh rows, the first three columns clearly show that prior knowledge sampling effectively improves the quality of the generated stego images.
In summary, the ablation experiments verify the effectiveness of our proposed method to circulate two latent flows and guarantee reversibility.
6. Discussion and Future Work
In this paper, we propose a novel double-flow-based steganography without embedding (DF-SWE) method for hiding large images. Specifically, we propose the reversible circulation of double flow to build a reversible bijective transformation between secret images and generated stego images. The reversible circulation ensures the small extraction error of the secret images and the high quality of the generated stego images. Importantly, DF-SWE is the first SWE method that enables hiding multiple large images in one stego image. Specifically, the payload capacity of DF-SWE achieves 24–72 BPP and is 8000–16,000 times more than that of the other SWE methods. In this way, DF-SWE provides a way to directly generate stego images without a cover image, which greatly improves the security of the secret images. According to the experimental results, the proposed DF-SWE shows better hiding/recovering performance. Intriguingly, DF-SWE can be generalized to hiding secret images from different domains with that of the training dataset. This nice property indicates that DF-SWE can be deployed to privacy-critical scenarios in which the secret images are hidden from the provider of DF-SWE. Despite the excellent performance of our method in secret image recovery, there remains room for further improvement. For instance, the method is highly dependent on the Glow model and thus unable to accomplish steganography for high-resolution images (e.g., 256 × 256, 512 × 512). Additionally, it lacks robustness against Gaussian noise and JPEG lossy compression, and the entire extraction process is not completely lossless.
In the future, it is of great significance to further explore the potential of SWE in lossless secret image recovery and investigate effective countermeasures against common perturbations such as noise attacks and JPEG lossy compression. We can attempt to adopt other reversible models similar to Glow or conduct in-depth research on the internal mechanism of the Glow model to develop solutions, all of which are promising research directions.