Multisource Data Hiding in Digital Images

: In this paper, we propose a new data-hiding framework: multisource data hiding, in which multiple senders (multiple sources) are able to transmit different secret data to a receiver via the same cover image symmetrically. We propose two multisource data-hiding schemes, i.e., separable and anonymous, according to different applications. In the separable scheme, the receiver can extract the secret data transmitted by all senders using the symmetrical data-hiding key. A sender is unable to know the content of the secret data that is not transmitted by them (non-source sender). In the anonymous scheme, it is unnecessary to extract all secret data on the receiver side. The content extracted by the receiver is a co-determined result of the secret data transmitted by all senders. Details of the secret data are unknown to the receiver and the non-source senders. In addition, the two proposed schemes achieve multisource data hiding without decreasing the undetectability of data hiding.


Introduction
The technology of data hiding has been well developed in recent years [1], in which secret data can be embedded into a given cover media without causing serious distortion. In the past decades, digital images have been widely transmitted over social networks, e.g., Twitter, Facebook(Meta), and WeChat, and have become the most popular media used for data hiding. Currently, data-hiding methods are designed with a single source: secret data are transmitted by one sender. In this paper, we aim to achieve data hiding with multiple sources: multiple senders transmit different secret data to a receiver using the same image.
Modern data hiding aims to minimize the distortion on a given cover image, which is caused by the modification operation of embedding. To achieve minimal distortion, a user-defined distortion function [2] is designed to assign an embedding cost for each cover element. The obtained embedding costs are able to quantify the distortion caused by modification. After that, secret data are embedded into a given cover image with minimal theoretical distortion using a near-optimal steganographic coding, e.g., STC (syndrome trellis coding) [3] and SPC (steganographic polar codes) [4]. A mass of distortion functions have been developed for digital images in the literature, e.g., HILL (high-pass, low-pass, and low-pass) [5], MiPOD (minimizing the power of optimal detector) [6], and UT-GAN (ternary embedding U-Net with generative adversarial networks) [7].
In the existing data-hiding framework, as shown in Figure 1a, a sender (single source) embeds secret data into a given cover media. The obtained stego media are then transmitted through public channels without drawing suspicion. In some situations, e.g., military intelligence collection, multiple spies (the senders) intend to transmit different intelligence (secret data) to their commander (the receiver). In addition, the media contain multiple secret data that should be sent only once to guarantee satisfactory security and efficiency. In this situation, the intelligence should be embedded into the same given media and then sent to the commander; therefore, multisource data-hiding is desirable, as shown in Figure 1b.   In this paper, we propose two schemes to achieve multisource data hiding in separable and anonymous manners, respectively. In the separable scheme, the receiver can extract the secret data transmitted by all senders using the symmetrical data-hiding key. On the other hand, a sender is unable to discover the content of the secret data that is not transmitted by them (non-source sender). In the anonymous scheme, the content extracted by the receiver is a co-determined result of all secret data instead of the details. Details of the secret data are unknown to the receiver and the non-source senders. To achieve the separable scheme, non-overlapping locations for embedding are determined for different senders. For the anonymous scheme, data embedding is executed in sequence, and data extraction is executed after the last embedding operation. More details are presented in Section 3. The novelty and contributions of this paper are summarized as follows: (1) We propose a new concept called multisource data hiding, which is a new form in the field of data hiding. It is an extension of existing data hiding instead of an application; (2) We propose two schemes to achieve multisource data hiding to fit different scenarios by improving the data-hiding coding, which are enriched versions of the existing data-hiding framework. (3) The proposed two schemes achieve new functions (multisource data hiding) with the same rate-distortion performance. It is verified by experiments that our schemes have not decreased the undetectability of existing data hiding.

Related Work
In this section, we introduce some related work, including modern data hiding and steganalytic methods for digital images.

Modern Data Hiding
Modern data hiding aims to minimize the distortion on a given cover image, which is caused by the modification during the embedding process. The embedding distortion can be measured by a user-defined distortion function, e.g., HUGO (highly undetectable stego) [8], WOW (wavelet-obtained weights) [9], SUNIWARD (spatial universal wavelet relative distortion) [10], HILL [5], MiPOD [6], and UT-GAN [7]. The distortion function is used to assign an embedding cost for each cover element. With assigned embedding costs ρ = [ρ(1), ρ(2), . . . , ρ(k)] T , secret data can be embedded into a given cover sequence c = [c(1), c(2), . . . , c(k)] T ∈ {0, 1} k . Then, the minimal embedding distortion theoretically with capacity L (bits) is where is the probability for modifying c(i), and λ > 0 is used to make information entropy of the modification probabilities equal to L (the capacity), L < k, as shown in Equation (3).
With the distortion minimization framework, the improvements can be achieved by the embedding costs ρ. At present, the embedding costs can be obtained by designing the distortion function, e.g., HILL, MiPOD, as introduced above, or improving the designed distortion function, e.g., direction aggregation strategy [11], block artifact compensation [12], reference construction [13], and adversarial embedding [14]. With the given embedding costs, minimal distortion can be approximated by practical embedding with STC coding, in which secret bits m = [m(1), m(2), . . . , m(L)] T ∈ {0, 1} L can be embedded into c by modifying cover elements to fit Equation (4).
where s is the corresponding stego sequence after embedding, H ∈ {0, 1} L×k is a lowdensity parity-check matrix depending on the embedding efficiency and payload. It is clear that m can be directly extracted by the matrix computation in Equation (4). Meanwhile, distortion D caused by modification can be minimized with the distortion function.
In Section 3, we use the STC-based framework to achieve two multisource data-hiding schemes, which is a symmetry work of previous work: multichannel data hiding [15]. In this work, there are multiple senders and one receiver (multiple senders transmit different secret data to a receiver via the same cover image). While in [15], there is one sender and multiple receivers (a sender transmits different secret data to multiple receivers via the same cover image). The two frameworks are achieved using different strategies and used for different scenarios.

Steganalysis for Digital Images
As the adversarial technique of data hiding, modern steganalysis for digital images can be classified into two categories: handcrafted feature-based steganalysis and deeplearning-based steganalysis.
In handcrafted feature-based steganalysis [16], a large number of methods have been proposed to extract features of digital images [17][18][19][20]. With the steganalytic feature sets, the ensemble classifier [21] is popularly used to evaluate the feature property. The ensemble classifier consists of many FLD (Fisher linear discriminant)-based sub-learners with low complexity. The aggregation of decisions made by all FLD learners is used as the final decision of the ensemble classifier.
In deep-learning-based steganalysis, the phases of feature extraction and image classification are joined using a CNN (convolutional neural network) [22][23][24]. In [22], the basic high-pass filters defined in SRM are employed to initialize the weights in the first layer of the CNN. Moreover, a linear unit with a truncated threshold is improved as the activation function of the steganalytic network. In [23], the popular residual network is firstly used for steganalysis, which minimizes the utilization of external elements enforced by heuristics. The method in [24] fully employed the embedding probability of data hiding. In Section 4, some of the above steganalytic tools are employed to examine the undetectability of data hiding.

Proposed Data-Hiding Schemes
In this paper, we focus on multisource data hiding for the applications of multiple senders. Based on the STC framework, two schemes are designed to achieve multisource data hiding in separable and anonymous manners, respectively, without decreasing the undetectability of data hiding.

Separable Multisource Data Hiding
In the situations such as military intelligence collection, multiple spies (the senders) intend to transmit different intelligence (secret data) to their commander (the receiver). The image containing multiple secret data should be sent only once to guarantee satisfactory security and efficiency. To this end, the multiple pieces of intelligence should be embedded into the same cover image and then sent to the commander; therefore, separable multisource data hiding is desirable.
The procedure of the proposed separable multisource data-hiding scheme is shown in Figure 2. For n senders, the i-th sender can transmit the i-th secret data m i to the receiver using the i-th data-hiding key K i , but is unable to know the content of other parts of secret data without the symmetrical (correct) data-hiding key, i ∈ {1, 2, . . . , n}.  , ω = k/n . To guarantee that the locations of the elements in the n subsequences are non-overlapping, c i (j) = c(ω × (i − 1) + j), j ∈ {1, 2, . . . , ω}. Then, the n subsequences {c 1 , c 2 , . . . , c n } are correspondingly assigned to the n senders for data embedding. Before embedding, α elements (1 ≤ α < ω) in each subsequence c i are removed to obtainĉ i using the corresponding data-hiding key K i . That means the i-th sender embeds the i-th secret bits Thus, there are C α ω possibilities to obtainĉ i ; "C" is the combination operation in mathematics. Satisfactory security on secret data can be achieved since the number of the possibilities is huge. For example, there are 2.74 × 10 33 possibleĉ i when α = 10 and ω = 10000, and the number of pixels in an image is much larger than 10,000 (there are 262,144 pixels in an image sized 512 × 512). As a result, the embedded secret bits cannot be extracted without the corresponding data-hiding key, since the subsequenceĉ i after removing is unknown.
During embedding, the i-th sender modifiesĉ i to obtainŝ i , which meetŝ whereĤ ∈ {0, 1} L i ×(ω−α) . In this way, the i-th secret bits m i can be embedded intoĉ i . On the side of receiver, all the secret bits {m 1 , m 2 , . . . , m n } can be extracted using Equation (5) with the corresponding data-hiding key.

Anonymous Multisource Data Hiding
In some tasks such as anonymous voting, only the final result is necessary instead of the ballot content of each voter. In this case, the content of each secret data transmitted by multiple senders is unnecessary to the receiver. To this end, we propose an anonymous multisource data-hiding scheme, in which the data extracted by the receiver are a codetermined result of all secret data instead of the details. The procedure of the proposed scheme is shown in Figure 3, in which data embedding is executed in sequence, and data extraction is executed after the last embedding operation. The n senders sequentially embeds secret bits {m 1 , m 2 , . . . , m n } into a given cover image using the same data-hiding key K. On the receiver side, a co-determined result f (m 1 , m 2 , . . . , m n ) of {m 1 , m 2 , . . . , m n } is extracted, while the details of each m i kept unknown. This is achieved by the data-hiding key K 0 , which is only held by receiver and the first sender symmetrically. Details are as follows.  In anonymous multisource data hiding, the LSB (least significant bits) of the pixels in the cover image are also employed as the cover sequence c = [c(1), c(2), . . . , c(k)] T ∈ {0, 1} k . To achieve anonymous multisource data hiding, data embeddings of the n senders are executed in sequence, and data extraction of the receiver is executed after the last embedding operation. In contrast to separable multisource data hiding, which divides c into non-overlapping parts, in the anonymous scheme, all n senders employ the whole c for the embedding. In other words, the i-th sender embeds the i-th secret bits For the task of anonymous voting, the final result (extracted data by the receiver) is the summation value of all ballots (embedded data by the senders). That means where "mod(·)" stands for the modulo operator, " · " means the operation of rounding down, and Since the extracted data on the receiver side is m n , that is {m n (1), m n (2), . . . , m n (L)}, the value of γ(n) can be obtained by In addition, it can be deduced from Equations (7) and (8) that That is The values of {m 0 (1), m 0 (2), . . . , m 0 (L)} are determined by the data-hiding key K 0 , which is held by the receiver; therefore, the final result f (m 1 , m 2 , . . . , m n ) can be obtained by the receiver using Thus, the co-determined result (the summation value of all ϕ(m i ) can be obtained on the receiver side. Meanwhile, the value of each ϕ(m i ) will not be revealed. Since the number of secret bits for all senders are the same, we employ the repeatable datahiding strategy [25] for embedding, which is also based on the STC framework. Using the repeatable strategy, the distortion caused by data hiding is invariable no matter how many times the embedding operation is executed. In this way, the undetectability of data hiding can be maintained during the n times of embedding.

Experimental Results
To verify the feasibility and effectiveness of our schemes, we conducted a number of experiments in this section. We first describe the experimental conditions and environments. After that, we provide the results and discussions about undetectability checked by some modern steganalytic tools. Finally, we discuss the computational complexity of our scheme.

Experiment Setup
In our experiments, the popular image dataset UCID [26] was employed, which contains 1338 color images sized 512 × 384. All the images in UCID were used as cover images. The popular data-hiding methods HILL [5] and MiPOD [6] were employed for data embedding.
To examine the undetectability of steganographic schemes, the steganalytic methods maxSRMd2 (selection-channel-aware rich model) [18] and SCRMQ1 (spatial color rich model) [20] were employed. One-half of the image features were employed for training, while the remaining half were employed for testing. The criterion to measure the unde-tectability of the data hiding was the minimal total error P E obtained from the testing sets [21], as shown in Equation (13), where P FA is the false alarm rate and P MD is the missed detection rate. Higher P E stands for higher undetectability. All of the results are the average value of P E over 10 random tests.

Undetectability
For the cases of a different quantity of senders, the undetectability (values of P E ) of the proposed separable multisource data-hiding scheme is shown in Figure 4, where the horizontal axis represents the embedding payload (bits per pixel), secret data are embedded using the baseline embedding algorithm HILL. The results indicate that values of P E are approximate to each other with different number of senders. The slight fluctuation is caused by the randomness of testing, which means that the undetectability performance is independent of the number of senders. The reason is that undetectability is determined by the embedding algorithm and payload, which is always L/k bits per pixel for different number of senders; therefore, an increment on the quantity of senders will not change the undetectability of the data hiding. Since our scheme is based on the baseline embedding algorithms, e.g., HILL, MiPOD, it is necessary to verify that our scheme will not decrease the undetectability of the existing data-hiding methods. The undetectability comparisons between our scheme and the baseline embedding algorithms with 5 senders (n = 5) are shown in Figures 5 and 6, where "HILL-Separable" and "MiPOD-Separable" stand for the cases of our scheme with secret data embedded using HILL and MiPOD, respectively. It can be observed that our scheme that achieves multisource data hiding has not decreased the undetectability of the existing data-hiding methods. This is reasonable since the undetectability is determined by the embedding matrices, which are obtained using an existing data-hiding framework and kept unchanged in our schemes. This verifies that our scheme achieves multisource data hiding without decreasing the undetectability of the data hiding.  To further verify the effectiveness of our scheme, we considered the cases of a big number of senders. With embedding algorithm HILL and payload 0.5 bpp, the undetectability of our scheme for the cases of 100, 200, 300, 400, and 500 senders (n = 100, 200, . . . , 500) are shown in Figure 7. The results indicate that the P E values for big number of senders are comparable with the case of one sender. That means the increment of number of senders have not cause inferior undetectability. This is reasonable since the undetectability of data hiding is determined by the embedding algorithm and payload. In our scheme, the two issues are unchanged; therefore, our scheme achieved the function of multisource without decreasing the undetectability of the existing data hiding ability.
For the proposed anonymous multisource data-hiding scheme, the repeatable datahiding strategy was employed for embedding. With the repeatable strategy, the repeatability of undetectability has been theoretically proved in [25]. Thus, we do not demonstrate the undetectability performance of the anonymous scheme.

Computational Complexity
For data hiding, computational complexity is also an important indicator. We also conducted experiments to compare the computational complexity between our scheme and the baseline embedding algorithms. All 1338 images in UCID were employed for data embedding, and then the average embedding time (s) for each image is shown in Figure 8. Similarly, "HILL-Separable" and "MiPOD-Separable" stand for the cases of our scheme with secret data embedded using HILL and MiPOD, respectively. The results were tested on a server with 3.7 GHz CPU, 16 GB memory, and Windows 7. The type of system is 64 bit and the version of MATLAB is R2017b. It can be observed from Figure 8 that the computational complexity of our scheme is comparable or less than that of the baseline embedding algorithms. That means our scheme achieves the function of multisource in the modern data-hiding framework without increasing the computational complexity simultaneously. In addition, it can be noticed that the embedding time is not increased with a larger payload. This is because the computational complexity of a modern data-hiding framework is mainly determined by the distortion function. With the obtained embedding costs, secret data can be embedded quickly via the near-optimal steganographic coding.

Conclusions
A new field of data hiding called multisource data hiding is explored in this paper. In multisource data hiding, multiple senders are able to transmit different secret data to a receiver via the same cover image. Two schemes are proposed to achieve multisource data hiding in separable and anonymous manners, respectively. In the separable scheme, the receiver can extract the secret data transmitted by all senders using the corresponding data-hiding key. In the anonymous scheme, the receiver aims to extract a co-determined result of the secret data transmitted by all senders, instead of the details of all secret data. The proposed schemes are suitable for many scenarios, e.g., military intelligence collection or anonymous voting. Experimental results show that the two schemes achieve multisource data hiding without decreasing the undetectability of data hiding.