Wavelet-Domain Information-Hiding Technology with High-Quality Audio Signals on MEMS Sensors

Due to the rapid development of sensor technology and the popularity of the Internet, not only has the amount of digital information transmission skyrocketed, but also its acquisition and dissemination has become easier. The study mainly investigates audio security issues with data compression for private data transmission on the Internet or MEMS (micro-electro-mechanical systems) audio sensor digital microphones. Imperceptibility, embedding capacity, and robustness are three main requirements for audio information-hiding techniques. To achieve the three main requirements, this study proposes a high-quality audio information-hiding technology in the wavelet domain. Due to the fact that wavelet domain provides a useful and robust platform for audio information hiding, this study applies multi-coefficients of discrete wavelet transform (DWT) to hide information. By considering a good, imperceptible concealment, we combine signal-to-noise ratio (SNR) with quantization embedding for these coefficients in a mathematical model. Moreover, amplitude-thresholding compression technology is combined in this model. Finally, the matrix-type Lagrange principle plays an essential role in solving the model so as to reduce the carrying capacity of network transmission while protecting personal copyright or private information. Based on the experimental results, we nearly maintained the original quality of the embedded audio by optimization of signal-to-noise ratio (SNR). Moreover, the proposed method has good robustness against common attacks.


Introduction
The uses of digital information transmission in Internet applications, artificial intelligence, and data sensing [1][2][3][4][5][6][7][8][9][10] are more frequent. In many cases, without permission from the legal owner, the digital information is often stolen, copied, or even turned into profit by criminal individuals. In general, an audio information-hiding technique should possess three properties: make the piece of hidden information imperceptible in the embedded audio, provide a signal-to-noise ratio (SNR) of 20 dB or more, and maintain the embedding capacity of at least 20 bps (bits per second) [11,12]. Moreover, hidden information is resistant to most attacks, which include re-sampling, MP3 compression, filtering, amplitude modification, time scaling, and so on [12][13][14][15].
Audio information-hiding techniques are classified according to their domain. These algorithms are categorized as time-domain techniques and transform-domain techniques. Discrete wavelet transform (DWT) is one practical transform domain for hiding audio information [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. In the literature, several earlier procedures embedded watermarks into DWT low-frequency coefficients using the quantization-based technique so that they could obtain adaptive performance [15,20,24]. Chen et al. [15,24] proposed an optimization quantization approach to fixed-weighting DWT coefficients to gain high-quality modified audio and high robustness against many common attacks. Li et al. [27] proposed a new audio watermarking technique. They performed the norm ratio on fixed-scaling DWT coefficient quantization without considering the signal compression in the implementation. In addition, the quality of modified audio worsens with weighting variation, and hidden information is inadequately robust to time-scaling attacks.
This study proposes an optimization model to integrate optimization-based signal steganography [24] with threshold-based compression in the wavelet domain. Firstly, we utilized binary digits to store data and represent information. We modified the signal-tonoise ratio (SNR) and the amplitude-quantization rules in the wavelet domain as performance index and constraints. At the same time, to reduce the amount of the embedded audio data signal, we also employed threshold-based compression technology in the constrictions. Then, we obtained an optimization model that enhanced the audio quality in the information-hiding and compression processes. Secondly, the optimization model was solved by the matrix-type Lagrange principle and graphic illustration Accordingly, we performed information hiding and data compression on each audio signal for private information transmission on the Internet or MEMS (micro-electro-mechanical systems) audio sensor digital microphones. On the other end of the transmission process, the hidden information was extracted smoothly without either the original audio or the recovery of the compressed audio signal adopting a cubic spline. To demonstrate the quality of the proposed performance, we measured the appropriate threshold ε and embedding strength Q in our experiment. The proposed algorithm reduces the amount of carried network transmission but preserves the original audio signals and protects personal privacy.
The rest of this study is as follows: Section 2 presents the proposed method and introduces the embedding technique, optimization model, and compressions. The illustrations of the optimization model, presentation of the recovery method, and the extraction technique are in Section 3. Section 4 contains discissions of experimental results, and some remarks and conclusions are in Section 5.

Proposed Method
This section introduces the embedding technique, the extraction technique, and the compression of the proposed method. Figure 1 shows the block diagram of the proposed algorithm; further detailed introduction will appear in Sections 2.1 and 2.2. a new audio watermarking technique. They performed the norm ratio on fixed-scaling DWT coefficient quantization without considering the signal compression in the implementation. In addition, the quality of modified audio worsens with weighting variation, and hidden information is inadequately robust to time-scaling attacks. This study proposes an optimization model to integrate optimization-based signal steganography [24] with threshold-based compression in the wavelet domain. Firstly, we utilized binary digits to store data and represent information. We modified the signal-tonoise ratio (SNR) and the amplitude-quantization rules in the wavelet domain as performance index and constraints. At the same time, to reduce the amount of the embedded audio data signal, we also employed threshold-based compression technology in the constrictions. Then, we obtained an optimization model that enhanced the audio quality in the information-hiding and compression processes. Secondly, the optimization model was solved by the matrix-type Lagrange principle and graphic illustration Accordingly, we performed information hiding and data compression on each audio signal for private information transmission on the Internet or MEMS (micro-electro-mechanical systems) audio sensor digital microphones. On the other end of the transmission process, the hidden information was extracted smoothly without either the original audio or the recovery of the compressed audio signal adopting a cubic spline. To demonstrate the quality of the proposed performance, we measured the appropriate threshold ε and embedding strength Q in our experiment. The proposed algorithm reduces the amount of carried network transmission but preserves the original audio signals and protects personal privacy.
The rest of this study is as follows: Section 2 presents the proposed method and introduces the embedding technique, optimization model, and compressions. The illustrations of the optimization model, presentation of the recovery method, and the extraction technique are in Section 3. Section 4 contains discissions of experimental results, and some remarks and conclusions are in Section 5.

Proposed Method
This section introduces the embedding technique, the extraction technique, and the compression of the proposed method. Figure 1 shows the block diagram of the proposed algorithm; further detailed introduction will appear in Sections 2.1 and 2.2.

Embedding Technique
To embed the private information into the lowest DWT coefficients, we implemented DWT using the single prototype function ( ) x ψ . This function is regulated by a scaling parameter and a shift parameter [28,29]. The discrete normalized scaling and wavelet basis function was defined as

Embedding Technique
To embed the private information into the lowest DWT coefficients, we implemented DWT using the single prototype function ψ(x). This function is regulated by a scaling parameter and a shift parameter [28,29]. The discrete normalized scaling and wavelet basis function was defined as where i and n are the dilation and translation parameters, and h i and g i denote low-pass and high-pass filters. Orthogonal wavelet basis functions provide simple calculation of coefficient expansion and easily express audio signals S(t) ∈ L 2 (R) as a series expansion of orthogonal scaling functions and wavelets. Throughout this study, we used the host digital audio signal S(n), n ∈ N, to denote samples of the original audio signal at the nth sample time, and each piece of audio signal was cut into segments on which DWT was performed. As a result, the signal-to-noise ratio (SNR) SNR = −10 log 10 S(n) − S(n) 2 2 / S(n) 2 2 can be rewritten as where S(n) is the modified digital audio signal and the vector form consists of the n unknown absolute values of DWT coefficients with respect to the original DWT coefficient vector X n = |x 1 | |x 2 | · · · |x n | T in each segment.
For convenience, the secret information is usually stored as a binary sequence. To embed the binary bit "1 ∈ B" or "0 ∈ B" as shown in Figure 1, we performed DWT and then determined n unknown values of DWT coefficients, x 1 , x 2 , · · ·, x n . Accordingly, an optimization-based model for embedding the binary bit was proposed as follows.
We determined the vector X n such that the SNR = −10 log 10 X n − X n 2 2 / X n 2 2 is maximized. Due to the fact that all logarithmic functions are one-to-one, that is, for all x and y in the domain of logarithmic function, if log 10 x = log 10 y, then x = y. We defined a performance index of the form X n − X n 2 / X n 2 so that the binary sequence with binary bit "1 ∈ B" or "0 ∈ B" can be embedded by the proposed optimization model described below: where is the floor function; Q is the quantization size or embedding strength which is adopted as the secret key K; the compression constraint is described in Equations (6)-(8) in Section 2.2.

Compression Constraint
Solving the optimization models (4) and (5), the watermarked audio signal S with optimal SNR is obtained after applying the IDWT. To reduce the amount of data when trans-mitting on the Internet, we compressed the embedded audio signal s using the threshold compression method formulated as follows: where ε represents the threshold. To recover the signal {s i } N i=0 from the compressed signal {ŝ i } N i=0 , we used the cubic function, which is formulated as To ensure the recovery quality, we adjusted the compression threshold ε while considering Q to better fit the optimal SNR.

Proposed Optimization Solution in Embedding and Extraction Method
In this section, we solve the optimization problem described in models (4) and (5) in two steps. Since the optimization problems (4) and (5) are similar, we first solve (4) and then apply the optimal solution to (5) using the same method.

First Step in Finding the Optimal Solution
Applying Theorems A1 and A2 introduced in Appendix A, the Lagrange multiplier λ is utilized to combine (4a) and (4b) into a function F without any constraints, where setting Since A X n = u 1 , Equation (11) can be rewritten as Hence, the optimal solution of λ is Moreover, by substituting (13) into (10a), the optimal DWT coefficients are where the superscript * denotes the optimal result with respect to the corresponding variable.

Audio Recovery and Information Extraction
To extract the hidden confidential data, we first recover the signal {s i } N i=0 from the compressed signal {ŝ i } N i=0 using the cubic function, which is formulated as Next, we extract the hidden information from the DWT coefficients {c i } N i=0 of the recovered audio signal {s i } N i=0 according to the following steps: Split the test audio into segments and perform DWT on each segment. If , | x * n |} presents n consecutive DWT lowest-frequency coefficients, the binary sequence is extracted from X * n by the following proposed extraction technique: then the extracted value is 1.
then the extracted value is 0. Finally, the hidden information is recovered from the binary sequence. In addition, to closely monitor the accuracy of the extracted private data, its ratio of bit errors (BER) is measured to check if an attack occured. The BER is usually expressed as a percentage and can be formulated as BER =(B error /B total ) × 100%, where B error and B total denote the numbers of error binary bits and total binary bits during a tested period.

Application Scenarios of Our Proposal
The model and techniques proposed in this study combine information hiding and data compression of audio signals. During network transmission, if the amount of data (including the hidden private information) is large and the network speed is slow, the compression ratio can be increased to save transmission time; on the contrary, if the amount of data is small or the network speed is fast, one can just perform information hiding without performing data compression to improve the accuracy of the data transmission. This is the biggest difference between the proposed method and other methods.

Experimental Results
This section presents experimental results from testing the proposed algorithm. Without loss of generality, we investigate various forms of audio signals, such as love songs, symphonies, and dance and folkloric music. Ten songs per audio were averaged to evaluate the performance of the proposed method. These mono-type signals achieved a sampling rate of 44.1 kHz, which means there were 512,000 samples in each piece of selected information. They all came with a bit depth of 16 bits and 11.6 s in length. In the embedding procedure, each audio signal with 512,000 samples was initially cut into four segments of equal length; an 8-level discrete wavelet transform was performed on each evenly cut piece. This process ensured each piece of data had the total number of lowest-frequency coefficients 512000/(4·2 8 ) = 2000. The values for Q are 13,000 and 26,000 for n = 2 and 4, respectively. To show a better comparison, we also implemented the two methods listed in references [24,27]; the experimental results are shown to compare with our algorithm.

Embedding Capacity and Averaged SNR
As listed in Table 1, the embedding capacities for n = 2 and 4 are 1000 and 500 bits, which satisfy the IFPI requirement-providing at least 20 bps (200 bits/10 s) embedding capacity. However, if the group size is greater than 16, this requirement is violated. Since we aim to present an optimization model for DWT multi-coefficients in this study, the resulting SNR of the proposed method clearly shows our SNR is much better than those SNRs using the methods in [24,27].

Robustness Measurement
We used five types of common attacks: re-sampling, low-pass filtering, amplitude scaling, time scaling, and MP3 compression, to evaluate how robust the proposed algorithm is. The performance quality is measured according to the averaged BER and its standard deviation (SD). A detailed discussion is illustrated below.
(1) Re-sampling: In the re-sampling process, the sampling rate of an audio signal can be increased (up-sample) or decreased (down-sample) in three stages: (i) downsample, (ii) interpolation, and (iii) up-sample. We down-sampled the sampling rate of embedded audios from 44.1 kHz to 22.05 kHz, then up-sampled them from 22.05 kHz back to 44.1 kHz with a linear interpolation filter. A similar approach allowed the sampling rates to change from 44.1 kHz to 11.025 kHz and 8 kHz and regain the original rate of 44.1 kHz. Table 2 shows the BER of testing re-sampling on audio signals. One can see that when the re-sampling rate is 8 kHz, the proposed embedding method has lower BER than those from implementations in [24,27]. In those cases when the re-sampling rates are 22.05 kHz and 11.025 kHz, the proposed method shows comparable robustness. (2) Low-pass filtering: Table 3 presents the BER while testing low-pass filters with cutoff frequencies of 3 kHz and 5 kHz. The BER results show that models in [24,27] have slightly higher robustness. Since both references [24,27] also adopted quantizationbased embedding technique, the BER evaluation of the proposed method gives extremely similar results to theirs during the process of low-pass filtering. (3) MPEG Audio Layer-3 (MP3) compression: Table 4 shows the BER from testing MP3 compression with different bit rates on the embedded audio data. The BER values reflect that the proposed model has similar robustness to that in references [24,27]. (4) Amplitude scaling: Since the amplitude-scaling attack usually results in saturation, in this study, we selected four distinct values for the amplitude-scaling factor: 0.5, 0.8, 1.1, and 1.2. The experimental results in Table 5 confirm that the proposed algorithm is much more robust than the methods in references [24,27].
(5) Time scaling: Table 6 lists the BER from testing time-scaling attacks with a ±2% and ±5% range. The BER results show that our method has comparable robustness to those in references [24,27].
Based on the experimental outcomes and aforementioned discussions, the proposed method generally achieves high SNR and is almost zero-error against the amplitude-scaling attacks. However, it shows slightly lower robustness against low-pass filtering attacks and poor robustness against time-scaling attacks.

Compression Measurement
The purpose of compression is to have maximal compression ratio (CR) under maximal SNR, where CR is defined by CR = Data size before compression Data size after compression The threshold ε and the embedding strength Q directly affect compression ratio (CR) and SNR, respectively. As shown in Figure 2, it can be said that the larger the threshold and the embedding strength, the larger the compression ratio CR, that is, the better the compression effect, but the worse the SNR before and after compression. Data in Table 7 show that the CR and SNR obtained without embedding information (denoted by N) vary under distinct threshold values. Such a result is consistent with the fact that when CR increases, SNR worsens. While signals are embedded, the result of the investigation of the relationship between the two is also in Table 7. From the experimental outcomes, we observed two noteworthy findings. First, with the same threshold value, if the embedding strength is higher, the overall audio values vary less. Though SNR seems worse, it demonstrates a more powerful embedding strength and a better compression effect. Second, in cases where threshold values change, we see that if the threshold value is greater than the embedding strength, the CR value becomes higher. That is to say, the compression effect enhancesand the relative decompression effect worsens, but the total effectiveness remains almost unchanged. Data in Table 7 show that the CR and SNR obtained without embedding information (denoted by N) vary under distinct threshold values. Such a result is consistent with the fact that when CR increases, SNR worsens. While signals are embedded, the result of the investigation of the relationship between the two is also in Table 7. From the experimental outcomes, we observed two noteworthy findings. First, with the same threshold value, if the embedding strength is higher, the overall audio values vary less. Though SNR seems worse, it demonstrates a more powerful embedding strength and a better compression effect. Second, in cases where threshold values change, we see that if the threshold value is greater than the embedding strength, the CR value becomes higher. That is to say, the compression effect enhancesand the relative decompression effect worsens, but the total effectiveness remains almost unchanged.
Moreover, to better understand the relationship between CR and SNR, we used the graph in Figure 3 to find appropriate values of threshold ε and embedding strength Q. We changed the threshold ε to obtain the relationship between CR and SNR using different markers by keeping green and blue fixed to include all the Q values obtained in Table 7. By doing this, we made sure the effect between CR and SNR remained optimized.  . Changing the threshold ε to obtain the relationship between CR and SNR using different markers by keeping green and blue fixed to include all the Q values in Table 7.

Conclusions
This study proposes a method to seek the integration between the information-hiding process and data compression for five types of commonly seen audio signals. Under the proposed model, simulation results demonstrated that each piece of hidden audio signal attains high SNR and showed strong robustness. SNRs of most hidden audios were more than 35, and some were even higher than 40. On the other hand, most BERs were as low as 5% or less. In addition, we obtained the relationship between CR and SNR with embedded private information and observed two critical outcomes. First, with a fixed threshold value, a high embedding strength makes the differences between the overall audio values smaller. Such an algorithm shows better embedding strength and enhanced compression effects but reflects worse SNR values. Second, when playing with distinct threshold values, we found that if the threshold value ε is set higher than the embedding strength Q, the CR value drops. That means the compression effect becomes better and the relative decompression effect worsens, but the total effectiveness remains almost unchanged.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Theorem A1. Let A be a matrix of size n × n. If X and X are n × 1 column vectors, then the following statement holds [30,31]: Theorem A2. Suppose that g is a continuously differentiable function of X on a subset of the domain of a function f. If X 0 minimizes (or maximizes) f(X) subjected to the constraint g(X) = 0, then ∇f (X 0 ) and ∇g(X 0 ) are parallel and one is a constant multiple of another. That is, if ∇g(X 0 ) = 0, then there exists a non-zero scalar λ such that ∇ f (X 0 ) = λ∇g(X 0 ).
Based on Theorem A2, if an augmented function is defined as H(X, λ) = f (X) + λg(X), then an optimal solution of the optimization problem is to compute the extreme of the unconstraint function H(X, λ). The necessary conditions to ensure the existence of the extreme of H are [30,31] ∂H ∂λ = 0, ∂H ∂X = 0