Compressive Sampling with Multiple Bits Spread Spectrum-Based Data Hiding

: We propose a novel data hiding method in an audio host with a compressive sampling technique. An over-complete dictionary represents a group of the watermark. Each row of the dictionary is a Hadamard sequence representing multiple bits of the watermark. Then, the singular values of segment-based host audio in a diagonal matrix multiply by the over-complete dictionary producing a lower size matrix. At the same time, we embed the watermark into the compressed audio. In the detector, we detect the watermark and reconstruct the audio. This proposed method offers not only hiding the information but also compressing the audio host. The application of the proposed method is a broadcast monitoring and biomedical signal recording. We can mark and secure the signal content by hiding the watermark inside the signal while we compress the signal for memory efﬁciency. We evaluate the performance in terms of payload, compression ratio, audio quality, and watermark quality. The proposed method can hide the data imperceptibly, in range 729-5292 bps with compression ratio 1.47-4.84 and perfect detected watermark.


Introduction
At present, the exchange of data and information in the internet network has increased very dramatically. With more and more people accessing the internet and more and more content that can be accessed, the size of the data accessed in a given time increases on an exponential scale. With the increase in data access, more and more crimes related to data include data falsification, data theft, claiming unilateral ownership of data, leaking data, deception of data, and many other crimes related to internet data access. These problems have implications for the more losses experienced by data owners, which also affect state losses. Losses suffered by the state causes harm to its people. So that crime in the internet world only benefits certain parties and gives a big loss for the wider community. Thus technology that provides security for data, including marking ownership rights to data and hiding important data when sent over the internet, becomes mandatory to anticipate losses suffered by the wider community.
With the more and more data content accessed, the greater the memory capacity needed, besides assuming the network infrastructure does not increase, the network capacity also decreases due to increased data traffic accessed, and power requirements on the network infrastructure also increase. These conditions bring problems in how to access data efficiently so that we can save the infrastructure and energy needs to minimum usage. One technique that can provide solutions to these problems is Compressive Sampling or Compressed Sensing (CS). This technique takes or picks up part of the data or signal from the sensor and then sends the data from the sample, and the receiver can reconstruct it back to the data as if it were authentic.
In this paper, we propose a technique for sampling audio signals and inserting or hiding data into them at the same time, so that the sampled signals have a smaller size and at the same time there is data inserted into the encoded data. With this technique, the signal stored in the cloud system from recording results by sampling is smaller in size, and we can mark it with hidden data at the same time. The broadcast monitoring application is an example of how to monitor signals in real-time and stores the results into the cloud. Monitoring such signals is more efficient if partial signal sampling is applied, such that the signal size becomes smaller than the original signal. At the same time, marking or indexing is applied by hiding data on the signal at any given duration to secure the authenticity of the monitored signal or to index the monitored signal by hiding its index on the encoded signal. Another application example is the recording of biomedical signals in which one sampled them using several sensors, and at the same time, embeds the ownership marking or index into the encoded signal. Thus the recorded biomedical signal has a smaller size than the original size but does not reduce the quality of the biomedical signal, and there is a marking that is inserted in the encoded signal to secure the biomedical signal.
CS in the audio combining with the data hiding technique is a rare topic. The combination of CS and data hiding makes it possible to compress the audio and, at the same time, to hide the watermark. Hua in [1] and Xin in [2] formerly proposed the CS applications in the audio combining with data hiding. In [2], Xin proposed an embedding method on host audio that is semi-fragile zero-watermarking by decomposing the audio in the wavelet domain and applying the CS technique to the audio wavelet coefficients without describing the audio reconstruction to determine the audio quality after the embedding process.
Xin on [2] proposes an insertion method on audio that is semi-fragile by first decomposing audio in the wavelet domain and applying the CS technique to the audio wavelet coefficient. Watermarks inserted in the measurement vector utilize positive and negative signs on the matrix elements. The result is that the inserted watermark is resistant to damage samples from the signal. But this paper does not explain the function of CS in terms of reducing signal size. Xin only explained CS techniques as a technique of inserting data with the properties of semi-fragile.
Griffin, in [3], proposed the CS method to compress the sinusoidal signal. Griffin investigated whether CS can be used to compress sinusoidal audio at a low speed because audio models like this have a high degree of spacing in the frequency domain. In his proposed method, Griffin performed CS techniques on single channels and multi-channel of audio signals with sinusoidal characteristics only. Griffin stated that the research he did was not to develop audio compression techniques and compare with existing compression techniques, but to find out how far CS was able to be applied in reducing the size of audio files so that the application applied, in this case, was for wireless sensor networks. Griffin can produce the smallest compression ratio by 5.4%. He applied spectral whitening first on the new audio then applied the CS technique to the spectral results, so that produced a tiny compression ratio with good quality of reconstruction results.
Fakhr on [4] proposed an insertion method using CS techniques by first thinning the host audio and watermark signals using Walsh Hadamard Transform (WHT), Discrete Cosine Transform (DCT) and Karhunen-Loeve Transform (TLC). Watermark extraction and audio host are done by reconstruction L 1 minimization. Fakhr claimed that the technique could withstand MP3 attacks at the lowest rate of 64 kbps with an 11 bps watermark payload, and the highest payload at 172 bps against additive noise attacks. But Fakhr uses CS not for compression techniques but as an insert technique. Fakhr uses MP3 attacks as compression to reduce the size of the audio signal after embedding a watermark.
In [1], Hua proposed a data hiding technique which combined with CS synthetically. Suppose we define an over-complete dictionary A ∈ R p×r , an uncompressed vector z ∈ R r×1 , a watermark bit to be inserted as b ∈ {−1, +1}, a watermark code sequence w ∈ R r×1 , a compressed vector y ∈ R p×1 and α as gain control of the watermark, then we have Hua inserted b as the additional operation to z after multiplied by αw. In this paper, we embed the watermark bits into the over-complete matrix A. Then we multiply A to the diagonal matrix from the singular values of host audio.
The data hiding technique proposed in this paper is multiple orthogonal codes on Spread Spectrum (SS) based as formerly introduced by Xin on [5] in time domain embedding and continued by Xiang in DCT domain embedding on [6] and [7]. We use the Hadamard code as the sequence for multiple bits of the watermark due to its best code performance [8]. The matrix A consists of p Hadamard sequences that represent p groups of multiple bits.
One of the signal sparsity techniques is a shrinkage technique on Singular Value Decomposition (SVD) output. This SVD technique truncates U, S, and V with specific rank as also described in [9], [10], and [11]. This shrinkage technique yields a more compressed signal as the CS output, but certainly decreases the quality of the reconstructed signal. In this paper, we decompose a host signal using SVD. Then, the output of SVD, i.e., U, S, and V, are truncated at a specific rank. We transformed the truncated singular matrix S r to compressed domain Y via an over-complete dictionary containing SS-based data hiding A. Thus, the ready matrices to be transmitted to the detector are U r , Y, and V r . Then, in the receiver, firstly, we detect dictionary A containing the hidden data. We can extract the hidden data from the detected dictionary. Not only can we take back the hidden data, but also we can get the reconstructed signal to the original domain. Note that the process on the receiver needs only the compressed domain signal, such as U r , Y, and V r . There is no dictionary and original data needs for data detection and signal reconstruction.
We organize the rest of this paper as follows. Section II describes the sparsity of singular value and CS technique for the audio compression. Section III explains the mathematics model and derivation of audio watermarking including the embedding, the extraction, the audio reconstruction process and the effect of the noisy environment to this proposed method. Section IV discusses the result of the simulation, while Section V reports the conclusion of this paper.

Sparse Singular Value and CS technique
The host signal in the form of a vector x = [x 1 , x 2 , · · · x L ] ∈ R 1×L is converted to a 2-dimensional matrix X ∈ R M×M where L = M 2 . The conversion to a 2-dimensional matrix X is applied in such a way that it produces The SVD process of X obtains orthogonal matrices U ∈ R M×M , S ∈ R M×M , and V ∈ R M×M , where the relationship is described as where S is a sparse diagonal matrix having M non-zero elements in diagonal of the matrix as M singular values. For compression needs, U, S, and V can be truncated or reduced to U r = U[1, .., M; 1, .., r] ∈ R M×r , S r = S[1, .., r; 1, .., r] ∈ R r×r , and V r = V[1, .., M; 1, .., r] ∈ R M×r with r < M. Then, we apply CS acquisition S r as where A ∈ R p×r is an over-complete dictionary containing SS-based encoded watermark, Y ∈ R p×r is an output of CS acquisition with smaller size than S. The truncated matrix S r has the form of where σ 1 , σ 2 , ..., σ r are the singular value elements. The matrix A is described later in Subsection 3.1. Finally, we have three matrices to be transmitted, that is U r , V r , and Y. From this result, we can calculate the Compression Ratio (CR) as the comparison between the original signal length and the transmitted signal length as where L X is the elements number of X, that is M 2 and L T is the total number of the transmitted elements U r , Y, and V r , i.e. 2Mr + pr.
We can calculate the reconstructed audio matrix with the same size as X in the form of where X r ∈ R M×M but its element values are slightly different than X. The r value controls the signal quality and the signal compression ratio. If r is lower, then the compression ratio is higher but the signal quality is worse. Finally, we can getx = [x 1 ,x 2 , · · ·x M 2 ] as a reconstructed or decompressed version of the signal by converting back 2 dimensions matrix X r to a vector or one dimensional signal x, thus we can calculate the signal quality by comparing x andx.

An Overcomplete Dictionary with SS-based Content
In this proposed method, firstly, we convert the audio host to the frequency domain using DCT in the process before applying insertion and compression. In the audio receiver, after being reconstructed or decompressed, the reconstructed audio is re-converted to the time domain with IDCT. The DCT and Inverse DCT (IDCT) formulations used for this method are in the following equation [12] X(k) = w(k) where X(k) is the audio signal in the DCT domain, x(n) is the audio signal in the time domain, and N p is the number of DCT points. While l(k) is defined in the following equation In this paper, the orthogonal code mapping to multiple bit watermarks is a Hadamard sequence which taken from the Hadamard matrix. Denote the Hadamard matrix H r ∈ {−1, +1} r×r generated by [13,14] as where H 1 = [1]. Assume H r (j) is a vector from j-th row of H r , then the orthogonal Hadamard sequence p j , where j = 1, 2, ..., r, are obtained from Let A 0 ∈ {−1, +1} p×r be an SS-based content matrix, where p < r and p t i ∈ R 1×r is a Hadamard sequence associated with the watermark bits in i-th row of A 0 , and t i = {t 1 , t 2 , ..., t p } be the set of Hadamard sequence indices where i is a row index of A 0 . Thus A 0 contains p t i as where semicolon from (13) restrict each p t i to the different row. Since there are p rows of A 0 , there are p Hadamard sequences in A 0 . Thus, we have an over-complete dictionary A ∈ R p×r with unit norm of its columns: a m 2 2 = 1, where m = 1, 2, ..., r. A Hadamard sequence represents multiple watermark bits. Assume that there are N s watermark bits for a Hadamard sequence, then there are N p different Hadamard sequence possibilities, where N p = 2 N s . Note that the length of a Hadamard sequence and also the row of matrix A is r bits, thus r = N p due to the square size of Hadamard matrix as (11). Denote w t i as a watermark vector in i-th segment of the watermark with a vector index or a Hadamard index t i , then where w t i (l) ∈ {−1, +1} and l = 1, 2, ..., N s . In multi bit SS, the watermark vector w t i is mapped to a Hadamard sequence p t i . For example, if we have 3 bits watermark in a Hadamard sequence, or N s = 3 bits, then N p = 2 N s = 8 bits, thus all watermark possibilities and their mapping to Hadamard sequences are displayed in Table 1. If we have 2 segments or 2 vectors of watermark Table 1 we get t 1 = 3 and t 2 = 5, thus The over-complete matrix A 0 contains pN s bits of watermark for the host with length M 2 thus we can compute watermark payload C in bps as where F s is the host signal sampling rate in sample/s. Due to N s = log 2 N p = log 2 r , thus (16) will be as Once A is generated from the associated watermark bits, it is embedded into S r using matrix multiplication in (4). The result Y is not only a matrix with a smaller size than S r , but also it is Table 1. Watermarks and Hadamard sequences Example for N s = 3, N p = 8 and r = 8 embedded by the watermark bits. The matrix S r is a diagonal matrix whose size is reduced from the original one S. From (4), (5) and (13), the equation Y = AS r can be exploited as where y t i ∈ R 1×r is a vector of matrix Y at row i which also corresponds to p t i and σ 1 , σ 2 , ..., σ r are singular value elements of S r . Each row of A or p t i is a vector with size 1 × r. S r is a diagonal matrix with size r × r. Thus, we can simplify (18) to the following equation Then, we can have the following simple vector expression

Data and Dictionary Detection
Once we get the compressed and watermarked signal Y or y i , it is transmitted to the receiver, thus we get the received signal Y or y i . The received signals along y i are U r and V r as described in Subsection 2. One can choose whether to decompress the signal or to extract the watermark. Anyway, to decompress the signal, we need A or p t i using (22) for reconstructing y i to get S r . It is clear that, Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 May 2020 doi:10.20944/preprints202005.0172.v1 Table 2. Embedding Process Step 1: Read a host signal x(n) and transform it into frequency domain by DCT L-point obtaining X(k) Step 2: Reshape X(k) in L sample it to 2-D square matrix producing X with size M × M Step 3: Decompose X to U, S, and V using SVD Step 4: Reduce Matrix Size of U, S, and V with rank r to U r , S r , and V r Step 5: Generate A matrix containing p Hadamard sequences by mapping each multi watermark bits to an associated random Hadamard sequence using (13) Step 6: Apply CS acquisition to A and S r by (4) producing Y Step 7: Transmit the compressed signal with hidden data represented using U r , Y and V r Figure 2. Watermark Detection and Audio Decoding either to extract the watermark or to decompress the signal, extracting A from y i is the first thing to be applied in the receiver since the compression and the data hiding process is blind. Once we get A, then we can extract the data or we can reconstruct y i with detected A to obtain S r using (37), (39), (40) and (41). Thus, we can use SVD reconstruction to S r , U r and V r for obtaining a square matrix or X r using (7). Finally, we get the reconstructed signal x by converting the 2 dimensions matrix X r to the vector x . Table 3. Detection and Reconstruction Process Step 1: Detect t i from Y using (22) for extracting the hidden data Step 2: Associate detected t i to p ti and form A using (13) Step 3: Reconstruct Y using A by (37), (39), (40) and (41) to obtain S r Step 4: Reconstruct U r , S r , and V r by SVD reconstruction to obtain decompressed signal in 2-D matrix X r by (7) Step 5: Reshape 2-D matrix X r to 1-D matrix obtaining X (k) Step 6: Transform X (k) to time domain by IDCT L-point obtaining the reconstructed signal x (n) For p t i detection, we need to correlate y t i to p T j as where i = 1, 2, .., p and j = 1, 2, .., N p . From (21), there is an index of j whose the correlation K ij is the highest, that is j = t. Thus, the formula to detect the correct index of Hadamard sequence embedded into y t i is Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 May 2020 doi:10.20944/preprints202005.0172.v1 Since we can detect t i , we decode the detected Hadamard code to the associated watermark bits according to one to one mapping between the index, the Hadamard code, and the associated watermark bits. For detection proving needs, Assume there is no attack, then y t i =y t i . Thus, (21) is Substituting (20) into (23) results in Assume that t i = j, thus p t i = p j then (24) is an autocorrelation as Assume that p j consists of such elements as therefore (24) becomes By a matrix multiplication operation, (27) is described as Since σ i > 0 and p 2 j i 0 for all j and all i, then (28) becomes If p t i = p k and p t i = p j , then (24) is a cross-correlation as Since p k is mutually orthogonal with p j , it is confirmed that K a is comparable to K c with the following inequality K a K c , which means that the autocorrelation of the same Hadamard sequence is still much higher than the cross-correlation of the different Hadamard sequence on the singular value intervention. It confirms that the Hadamard sequence can be detected successfully, thus from (22) t i is detected for t i = {t 1 , t 2 , ..., t p } then we can obtain the associated watermark bitsŵ t i = {ŵ t 1 ,ŵ t 2 , ...,ŵ t p }, and also all Hadamard sequencesp t i = {p t 1 ,p t 2 , ...p t p } which form A using (13) and (14) as where p is the row number of A. This procedure assures that there is no dictionary needed to detect the hidden data and also to reconstruct the signal. Since the associated watermark bitsŵ t i are detected. Thus we can calculate Bit Error Rate (BER) as a robustness parameter. The following equation is a BER formula where w i is the original watermark bit,ŵ i is the detected watermark bit, and L w is the total number of watermark bits.

Security Model
Hadamard matrix is easily generated as described in (11). Anyone can attempt with the Hadamard matrix to reconstruct the dictionary for detecting the hidden data and also to reconstruct the audio. This leads insecure watermark bits hidden in the host audio, accordingly we apply a procedure to secure the Hadamard matrix as also discussed in [15][16][17]. The Hadamard matrix is multiplied by -1 at the row and the column of the matrix in a random manner. Denote l i ∈ {1, r} as an integer random permutation value where i = 1, 2, ..., N l , and N l is the number of generated integer random permutation value. Denote H s as a secured Hadamard matrix, H s (j) as a vector from j-th row of H s , H T s (j) as a vector from j-th column of H s , then the security model of Hadamard matrix after initial definition H s = H r , is defined as The above procedure is repeated N l times from l 1 to l N l . Thus, with the secured Hadamard matrix, (12) is replaced by Note that H s is not only needed on the embedding process but also in the detection/extraction process. However, it is not needed to pass H s to the detector directly. We only pass l i as the integer random permutation value to the detector as the security key. By the procedure (35), H s can be generated in the detector using l i as the key. According to [15,16], the modified Hadamard matrix combination using (35) has (r!2 r ) 2 possibilities. For example, if r = 16, the number of modified Hadamard matrix is 1.88 × 10 36 possibilities. If the simulation needs 1 second to run detection and reconstruction process using 1 Hadamard matrix, then it needs 1.88 × 10 36 seconds or 5.962 × 10 28 years using all Hadamard matrix possibilities. It confirms that this proposed security model is appropriate and meets the security requirement for the embedding and compression process.

Signal Reconstruction
Once A is obtained, S r reconstruction is simply solved by Orthogonal Matching Pursuit (OMP) [18], [19]. The reconstruction process is carried out on each column of Y in sequence with A as a Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 May 2020 doi:10.20944/preprints202005.0172.v1 dictionary. Let y m as a vector taken from m-th column of Y, then for a general case we can find the row position of the strongest atom as q m = argmax i∈{1,2,...,p} For a specific case, i.e., a singular matrix solution as the reconstructed one, the position of the highest atoms are indeed known, then (37) can be simplified as Denote a r as a vector taken from r-th column of A, then we take a column of A which makes a strongest atom as ∇ = a q m .
We reconstruct a non-zero element of S r in column m by This reconstruction procedure including (37), (39), and (40) is repeated r times with the increment of m, thus obtaining Then, the next step is to form the signal by SVD reconstruction, as described in (7). Thus, finally, we can compute the signal quality.

Noisy Environment
Note that the compressed and watermarked audio in this paper is the coded audio. A human cannot directly listen to the coded audio without decoding it first. It means that the signal processing attacks against the coded audio are not the same as the attacks against the real audio signal. The signal processing attacks against the real audio signal were standardized in the Stirmark benchmark [20]. However, the Stirmark benchmark is not appropriate for the robustness evaluation of this proposed method except for the additive noise attack. The additive noise attack is the signal processing attack in which we can generally use to evaluate the watermarking-compression robustness. In the real situation, this additive noise attack in the receiver happens due to the existing thermal condition of the hardware. In this subsection, we describe mathematically how our proposed method is robust to additive noise attack. If the compressed and watermarked signal y i is under an additive noise environment, then (23) becomes Assume p t i = p j then (42) becomes Because n i is independent to p T j , thus p j S r p T j n i p T j , then (43) becomes Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 May 2020 doi:10.20944/preprints202005.0172.v1 Thus, we confirm that the data inserted with the proposed method can be detected even in the additive noise environment. The performance evaluation of the proposed method, when attacked by additive noise, depends on the power ratio between host audio and the additive noise represented by Signal to Noise power Ratio (SNR) with the following formula SNR = 10 log 10 where i is row index at y and n, y i is the signal after compressed using CS at row i, n i is noise at row i and r are the number of rows from y.

Discussion
In this paper, there is more than one work to do in the signal processing environment. The first work is to encode the watermark into the secure Hadamard code. The second work is to make the host audio to be a sparse signal. The third work is to hide the coded watermark into the sparse signal by CS acquisition. Thus, there are two objects for performance analysis, such as the detected watermark and the reconstructed audio from the detected sparse signal. From the embedded watermark relative to the length of the host audio, we can calculate the watermark payload, as described in (17). We can also calculate the CR of sparse technique and CS performance as described in (6) from the host audio length relative to the coded and compressed audio.
Mathematically, we can simply determine the trade-off parameters between the watermark payload and the CR as presented in (17) and (6) respectively. In (17) and (6), there are three same parameters affecting the payload and the CR, such as M, r and p, where M is the square root of the host audio length or the row/column number of the diagonal matrix (S), r is the row/column number of the truncated diagonal matrix (S r ) and p is the sample number of the compressed signal or the row number of the output of CS acquisition (Y). First, we can see that p, r and M 2 have different position in (17) and (6). In (17) p and r position are in the numerator which means the decrease of p and r cause the lower payload. In (6) p and r position are in the denumerator which means the decrease of p and r cause the higher CR. Parameter M 2 also has different position. This case certainly is a trade-off between payload and CR which we can find the moderate value of p and r to produce high payload and high CR.
The relation between three parameters p, r and M is such that p ≤ r < M. Referring to (6), the above relation causes the denumerator pr 2Mr if M has a high value, thus Note that CR for compression must be more than 1, thus M/2r > 1 or r < M/2. This means that the minimum truncation for compression is applied at a half of diagonal matrix S ∈ R M×M obtaining Consequently, the relation of the three parameters becomes Thus we can exploit those three parameters in the above relation. Next, we find possible p and r value such that (17) reaches the maximum payload. The position of parameter p and r are in the numerator of (17), thus r should be set to maximum value or M/2 in order to obtain maximum payload and p should be set to approximate to r. Certainly, setting r to the maximum value or M/2 obtains minimum CR, then we have to be careful setting r parameter since it controls the trade-off between C and CR. Due to its position, p parameter should be to the maximum value for reaching the maximum payload. The maximum value of p is r. If p = r, then CS acquisition, as described in (4)  with the same size to the input of CS. This condition is still acceptable when CR from (6) is more than 1. CS acquisition still contributes to the watermarking process. Figure 3a displayed the payload versus CR with M ∈ {34, 66, 98, · · · , 482} and r ∈ {0.01M, 0.02M, · · · , 0.5M}. All possibilities of the r and M combination with the restriction (47) are plotted as the magenta dots in Figure 3b. Blue dots in Figure 3a means the mapping between payload using Equation (17) and CR using Equation (6) where p = r, whereas magenta plus signs means the mapping between the payload and CR where p = 1. The red vertical dotted line means the minimum CR or 1. The green horizontal dash line means the minimum payload or 20 bps [21]. Thus the area with feasible payload and CR is the right side of the red vertical dotted line and the top side of the green horizontal dash line. We see that many blue dots have a higher payload and CR than the magenta plus signs, which means the payload and CR with p = r has many possibilities to reach much higher ones than the payload and CR with p = 1. The payload and CR mapping displayed in the blue dots where payload > 20 bps and CR > 1 in Figure 3a are obtained from r and M in blue circle in Figure  3b, thus we set p = r for the experiment in the next section where r and M combination values are selected from the blue circle in Figure 3b.

Experimental Result
We assess several evaluations in this section by simulations. The evaluation aspects of the proposed method include audio quality, security, watermark quality, watermark payload, and compression ratio level aspect. The simulations run on ASUS notebooks using Matlab with the following specifications, Advanced Micro Devices (AMD) Fx with 12 compute cores, 16 GB Random Access Memory (RAM), and Windows 10 operating system. There are 50 mono audio host files as the clips tested with the different genres of music, sampling rate 44.1 kHz and 16 bits audio quantization. All clips are in the original wave files and licensed as the free audio files for research [22]. The simulation output in this section shows the average of the simulation result. The evaluated performance parameters are such as the audio quality, the watermark robustness, the watermark payload, and CR. Objective Difference Grade (ODG) represents the audio quality using Perceptual Evaluation of Audio Quality (PEAQ) [23]. Parameter C represents the watermark payload in bps as described at (16). Parameter BER represents the watermark robustness in (34). CR represents the compression ratio, as explained in (6).  We measure the audio quality between the original host audio and the reconstructed audio. The reconstructed audio quality is affected by two factors, such as the truncation of the diagonal matrix and the CS acquisition. The truncation of the diagonal matrix gives worse quality to the audio than the CS acquisition due to the loss of the audio signal information. The audio quality represented by ODG has a range from -4 to 0, which -4 means the worst audio quality or the distortion is very annoying, -3 means the distortion is annoying, -2 means the distortion is slightly annoying, -1 means the distortion is perceptible but not annoying and 0 means the best audio quality or the distortion is imperceptible [23].

Audio Quality Performance in Relation with r, M, Payload and Compression Ratio
From Section 4, we select M and r values to obtain CR > 1 and payload > 20 bps using p = r as displayed in Figure 3b with the blue circle symbol. Using the selected M and r from M ∈ {34, 66, 98, · · · , 482} and r ∈ {0.01M, 0.02M, · · · , 0.5M}, we apply the simulation on 5 clips as the hosts. The simulation consists of the embedding process, the data detection process, and the audio reconstruction process. It calculates the BER between the detected watermark and the original watermark, and it finally calculates the audio quality from the reconstructed audio in the ODG performance parameter. The simulation results are displayed in Figure 4a and 4b. From the simulation using all combination parameter M and r with 5 clips, we get a perfect watermark detected without any errors or BER=0 on average. Figure 4a shows the trade-off relation between CR and payload with a negative exponential relation. Red star symbols mean the mapping between CR and payload with ODG ≥ -1, while blue dot symbols mean the mapping between CR and payload with ODG < -1. We also plot the blue dots and the red stars in Figure 4b, in the relation between ODG and M. We can say that the longer the length of audio processed for embedding and compression causes the worse the reconstructed audio quality. For the above case with 5 selected clips, good reconstructed audio quality or ODG≥ -1 are obtained when M<128 samples with certain values of r. The required M parameter does not have to be large until 482 samples, but only up to 128 samples to achieve audio quality with ODG ≥-1. Figure 4b shows the results. Also, large M values have a long impact on the time processing of the insertion, detection, and reconstruction. Therefore, we apply the same simulation as the simulation displayed in Figure 4a and 4b using more detail M and r, i.e., M ∈ {5, 6, ..., 128}, r ∈ {1, 2, ..., 64} which is similar to r ∈ {0.0156M, 0.0234M, ..., 0.5M} and 50 clips. We average the audio quality results from 50 clips, and all watermarks are perfectly detected. The simulation results are displayed in Figure 4c and 4d. From Figure 4d, there are much more options of M from 5 to 128 obtaining the results with ODG ≥ -1. The simulation as displayed in Figure 4c, also obtains the high CR (up to 7.03), and the high payload (up to 8296 bps). To explore which M and r obtaining the above result, we also capture the simulation results into the table. Table 4, 5 and 6 respectively display 10 highest ODG, payload and CR with certain M and r. This simulation results generally show that we can control the audio quality, payload, and CR by adjusting the M and r parameters.
We apply the simulation using 50 clips with M = 32 and r ∈ {1, 2, ..., 16} which is similar to r ∈ {0.03M, 0.06M, ..., 0.5M} to see how the audio truncation affects the performance parameters. Figure 5a displays the simulation result. This case also produces perfect detected watermark or BER=0 on average. Three performance parameters, i.e., ODG, CR, and payload as y-axis, are displayed in one figure after averaged, and the x-axis is the normalized rank or r/M ∈ {0.03, 0.06, ..., 0.5}. The black line with the right triangle symbol shows the average ODG producing -1.16 to -0.16. The blue line with a square symbol shows the payload of an embedded watermark in bps, obtaining 172.26 to 44100 bps. The red line with a circle symbol means the CR of the encoded audio resulting from 0.20 to 7.53. The red horizontal line with the dashdot symbol means the minimum CR or CR=1. We can see that increasing the normalized rank represented by r/M, raises the ODG and the watermark payload but lower the CR of encoded audio. If the CR with the red line and circle symbol is less than the minimum CR, then it means the CS process does not compress the audio signal overall; instead, it increases the length of the encoded signal. In this case, we can select the normalized rank less than 0.2 or r/M ≤ 0.2

Complexity and Computational Time
The major components of the proposed data hiding and compression method in this paper consist of DCT, the multi bits SS mapping, Singular Value Decomposition, and CS acquisition process in the embedding and multi bits SS de-mapping, SVD reconstruction, audio decoding via CS reconstruction and IDCT. Each component has a different complexity. The SVD process to obtain U ∈ R M×M , S ∈ R M×M and V ∈ R M×M from X ∈ R M×M has a complexity of O(M 3 ) [24]. When we need to get X from U, S and V as (3), its complexity is O(M 2.37 ) [25]. DCT and IDCT described in (8) and (9) has a complexity of O(N 2 p ) where N p is the number of the DCT point and N p = M in this case. The CS acquisition in (4), which is also the multi bits SS embedding, has a complexity of O(pr 2 ). The multi bits SS detection, as described in 22, has a complexity of O(r 3 ). Finally, the audio reconstruction by OMP approach in (40) has a complexity of O(p 2 r). Due to the relation p ≤ r < M, the highest computational cost is found in the Singular Value decomposition, i.e., O(M 3 ), thus the complexity of overall components is dominated by the SVD. This finding confirms the reason to use the lower M value. However, we still need to check the computational time by the simulation to find out a proper M value to avoid a very long processing time.
We apply the simulation to find out the computational time, which should represent the complexity of the embedding and the detection stage. In the simulation, we apply parameter M from 16 to 1024 with multiple of power of 2, parameter r = 0.125M, r = 0.25M and r = 0.5M. We use 10 clips in the simulation, and we average the time processing result. The result is displayed in Figure 5a. The processing time exponentially increases when M raises. Parameter r/M has no significant impact on the computational time. From this figure, lower M is recommended due to the low computational time. Moreover, as confirmed in Subsection 5.1, the lower M has a significant impact on the reconstructed audio quality.

Security Analysis
In Section 3.3, there are two parameters having impact to the model security, i.e., N l as the number of generated integer random permutation value and r as the row and column number of the diagonal matrix after truncated, S r . The original Hadamard matrix is denoted as H r , and the secured Hadamard matrix is denoted as H s . We apply the simulation using vary r and N l to understand how much r and N l affect the security performance. In the real situation, one can try to break the security model by using the original Hadamard matrix for detecting the watermark and reconstructing the audio due to the simplicity of the Hadamard matrix generation. With the secured Hadamard matrix in the encoder, we apply the decoding by the original Hadamard matrix to analyze the strength of the security model. If the security model works well, the detection watermark should ideally be damaged, or the BER should be near to 0.5.
In the simulation, we assume p = r = 20 and M = 128 samples. N l varies from 0 to r. Parameter N l is zero, meaning that H s = H r . We use 5 clips for analysis by calculating the average BER after the watermark detection process. We apply the simulation in 100 iterations for each clip. The simulation result is shown in Figure 6a. The worst detected watermark is obtained when N l is a half of r and the perfect watermark is detected when N l = 0 and N l = r = 20. We can limit accepted minimum BER for restricting the value of N l . We choose BER=0.4 as the safe minimum BER because we can still interpret the digital visualization from the detected watermark with BER<0.3 [26]. Therefore, we choose N l > 6 or generally N l > 0.3r as the minimum value of N l and N l < 14 or generally N l < 0.7r as the maximum value of N l to keep the detected watermark uninterpretable when one try to detect the watermark by the original Hadamard matrix. Figure 6b shows the relation between BER and r and comparing the detected watermark quality using the different N l /r. The simulation was applied in 50 clips via 10 iterations for each clip. The range of r is [6,30]. The worst watermark is detected when N l /r = 0.5. The detected watermark quality is better when N l /r decreases and as the value of r increases. When N l /r = 0.3, most of the BER values are more than 0.4. This result confirms the restriction N l in the range [0.3r, 0.7r].

Noisy Environment
In the noisy environment, our proposed method is robust to additive noise attack as confirmed mathematically in Section 3.5. Nevertheless, it is necessary to know how robust the method if the additive noise attacks the encoded audio by simulation. We analyze the detected watermark quality represented by BER and the reconstructed audio quality represented by ODG as two performance parameters affected by the additive noise. In the simulation we use 50 clips with 50 iterations for each clip, M = 23, r = 6 and p = r. The additive noise parameter or the input parameter for the simulation is SNR, as described in (45), whose range is 0 to 40 dB. ODG and BER as the performance parameters obtained are averaged before displayed in Figure 7. Decreasing the noise power or increasing the SNR raise the reconstructed audio quality or ODG and the detected watermark quality or BER.
We embed the watermark image with the letters "ITB" and resolution 20 × 35 to understand the interpretation of the value of BER. The detected watermarks are displayed in Table 7 with various BER. We use one selected clip as the audio host using parameter M = 256, r = 100 and p = r. The original watermark image is shown at the very bottom of Table 7 since its BER is zero. We use the additive noise as the attack with various SNR from 0 to 55 dB. The detected watermark is interpretable as "ITB" when the SNR of the noise is more than 25 dB, or its BER is less than 10%. Thus, the maximum acceptable BER for the detected watermark is up to 10%. In Figure 7, BER less than 10% can be achieved on SNR 10 dB and above. It means that the detected watermark is already interpretable when the noise power is still half of the signal power. Also, ODG is already more than -1. These results confirm the robust proposed method of additive noise. The reconstructed audio is also robust to the additive noise since the ODG already achieves more than -1 when the SNR is still 10 dB.  Table 7. Additive Noise Effect and The Detected Watermark in Certain SNR

Method Comparison to References
As described in Section 1, there are several references related to this proposed method. We propose a new method with more benefits than the mentioned references. Our proposed method can be used for both audio watermarking or audio steganography with compression due to the controllable parameter between the payload, the audio quality, and the compression ratio. Besides, our proposed method produces the encoding audio, which cannot be attacked by a general signal processing attack, i.e., Stirmark benchmark, except the additive noise as described in Section 3.5. Table 8 displays the comprehensiveness comparison between our proposed method and the previous references, which Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 May 2020 doi:10.20944/preprints202005.0172.v1 also used CS as the embedding or compression method and the audio as the object to embed or to compress. From the previous reference in Table 8, The reference [3] proposed the audio compression scheme only. The reference [2] proposed the hiding method only. The reference [1] and [4] proposed the hiding method and the compression to the audio but did not analyze all performance parameters.

Conclusion
In this paper, we propose and report a novel audio watermarking method with CS technique which attempts to insert the watermark into the host audio and simultaneously compresses the audio that has been inserted by the watermark so that the watermarked audio has a smaller size. We also provide the security aspect of this proposed method using a secure Hadamard matrix. The proposed method works well in a noiseless and noisy environment by mathematical derivation. Parameter performance, such as payload, CR, ODG, and BER, are reported in this paper. The experimental result shows that the proposed method presents the high imperceptibility property with payload in the range 729-5292 bps and compression ratio 1.47-4.84. There is a trade-off relation between payload and CR. We can choose the performance, specifically adapting to requirement needs.