A Wavelet-Based Steganographic Method for Text Hiding in an Audio Signal

The developed method of steganographic hiding of text information in an audio signal based on the wavelet transform acquires a deep meaning in the conditions of the use by an attacker of deliberate unauthorized manipulations with a steganocoded audio signal to distort the text information embedded in it. Thus, increasing the robustness of the stego-system by compressing the steganocoded audio signal subject to the preservation of the integrity of text information, taking into account the features of the psychophysiological model of sound perception, is the main objective of this scientific research. The task of this scientific research is effectively solved using a multilevel discrete wavelet transform using adaptive block normalization of text information with subsequent recursive embedding in the low-frequency component of the audio signal and further scalar product of the obtained coefficients with the Daubechies wavelet filters. The results of the obtained experimental studies confirm the hypothesis, namely that it is proposed to use recursive embedding in the low-frequency component (approximating wavelet coefficients) followed by their scalar product with wavelet filters at each level of the wavelet decomposition, which will increase the average power of hidden data. It should be noted that upon analyzing the existing method, which is based on embedding text information in the high-frequency component (detailed wavelet coefficients), at the last level of the wavelet decomposition, we obtained the limit CR = 6, and in the developed, CR = 20, with full integrity of the text information in both cases. Therefore, the resistance of the stego-system is increased by 3.3 times to deliberate or passive compression of the audio signal in order to distort the embedded text information.


Introduction
Recently, scientific research in the field of wireless acoustic sensor networks solves very important technical problems. Many areas have been covered, such as self-localization of acoustic sensors, recognition and coding of audio signals, active noise control, and localization of sound sources [1,2]. This paper considers another important area in acoustic sensory systems, information security, which will allow use of a highly redundant audio signal that is received from acoustic sensors as a container for hiding text information in it, so that the classical problem of audio steganography is solved. It will also be quite relevant to apply the developed method in voice messengers, where a fake voice message is transmitted that hides a true text message. In this case, the attacker will not be able to recognize the essence of the hidden correspondence of users, and if we assume that the microphone of a mobile device will act as an acoustic sensor, then it is possible to mask hidden correspondence against the background of another audio conference in real time, which also can confuse the attacker.
It is necessary to remember the features of text recognition systems against the background of multimedia information (images, video), which is obtained from video sensors, where the recognized text can also be hidden in the audio signal of acoustic sensor networks.
It should be noted that if we slightly modify the developed method at the stage of processing hidden information before integrating it into the audio signal (adapt the method to another type of information), then it can be easily used not only for hiding text information, but also for hiding signal parameters (object recognition features), which are the result of the analysis, processing, and classification of information received from different types of network sensors (video and audio sensors). This type of hidden information is very common today in computer vision, speech, and video recognition. In this case, it is not the carrier signal itself (audio, video, images) that is subject to hiding in the audio signal, but its recognition features, depending on the specific classification task being solved. For example, the semantic parameters of speech or the biometric features of the voice can be hidden in the audio signal; if we are talking about recognizing video information or images, then there is an opportunity to hide the parameters that characterize the tracking of moving objects in time, identification of a person by photo, optical character recognition, and other such signal parameters.
To ensure the effective hiding of text information in an audio signal, a deep understanding of their amplitude-frequency characteristics [3] is required. This is because many factors will depend on the correct analysis of where in the amplitude-frequency component text information is to be integrated. The main ones are the effectiveness of the hiding (masking) itself, as well as the resistance of the steganocodec to audio container transcoding. A fundamental understanding of the spectral features of audio signals [4] will allow balancing between increasing the efficiency of hiding text information in an audio container and resistance to various compression algorithms of a steganographic audio file.
Therefore, the question arises whether the secret text information will be preserved without distortion when re-transcoding the steganographic audio file, and if so, what is the maximum value of the compression ratio at which the secret information maintains integrity? In particular, this question prompted the authors to write this article and develop one of the methods for steganographic hiding of text information in an audio signal [5][6][7], which will allow for answering the contradictions that have arisen using modern methods of digital audio signal processing and spectral analysis methods.

Problem Statement
The developed method of steganographic hiding of text information in an audio signal based on the wavelet transform [8] acquires a deep meaning in the conditions of the use by an attacker of deliberate unauthorized manipulations with a steganocoded audio signal to distort the text information embedded in it; that is, to make its semantic constructions illegible. The main form of these manipulations is the use of various algorithms for compressing the audio signal [9,10], but not to remove its uninformative components, which, according to the human psychophysiological model of sound perception, are beyond the threshold of audibility, and to remove the text information hidden in the audio signal by deliberately introducing distortions by the compression algorithm.
Thus, increasing the robustness of the stego-system to compression (reducing redundancy) of the steganocoded audio signal [11,12] subject to the preservation of the integrity of text information (genuine semantic structures), taking into account the features of the psychophysiological model of sound perception (hiding the very fact of text transmission by masking in acoustic signals), is the main objective of this scientific research.

Analysis of Existing Research and Formation of a Scientific Hypothesis
The task of this scientific research is effectively solved using a multilevel discrete wavelet transform [8,13] based on adaptive block normalization of text information with subsequent recursive embedding in the low-frequency component of the audio signal and further scalar product of the obtained coefficients with the Daubechies wavelet filters [14,15], which is a new approach in the field of steganography that makes the stego-system more resistant to transcoding. The difference between the developed method and the existing ones is that in existing steganographic methods of information hiding based on wavelet transform [16][17][18][19][20], text information is usually embedded in the high-frequency wavelet coefficients (HFWC) at the last level of the wavelet decomposition, and in the developed method, it is proposed to use recursive embedding in the low-frequency wavelet coefficients (LFWC) followed by scalar product with orthogonal Daubechies filters at each level of the wavelet decomposition, which allows for increasing the average power of hidden data. This will increase the critical compression threshold of the steganocoded audio signal, at which the text will begin to distort (the transmitted message will be different from the received).
The formalization of the mentioned statements is as follows: (1) An existing method that is used in many studies [16][17][18][19][20] in different configurations, where for the most part, we may apply the idea of text integration according to the expression T 1...L → Y j in Formula (1): (2) The proposed method differs significantly in the expression T 1...L−1 → Z j D i in Formula (5), which allows for increasing the average power of hidden text information due to the scalar product with the coefficients of the low-frequency wavelet filter D i : where A, A are input and output audio signals with number of samples l A ; T, T are input and output texts divided into 1, 2, . . . , L blocks depending on the number of wavelet decomposition levels L; Z k , Y k are wavelet coefficients of low and high frequencies in quantity l Z ; D,V,R,W are Daubechies filters of the N-th order low and high frequencies for decomposition and reconstruction; ↓ 2, ↑ 2 are operations of double thinning and excess; → is a symbol used to logically explain the operation of integrating text information into wavelet coefficients. The expression T 1...L−1 → Z j D i in Formula (5) shows that the integration → of blocks of text information T 1...L−1 into wavelet coefficients Z j occurs at the levels of the wavelet decomposition 1, . . . , L − 1 to their scalar product with a low-pass Daubechies filter D i , as opposed to the expression T 1...L → Y j in Formula (1), where integration → into wavelet coefficients Y j occurs after the scalar product with the high-pass Daubechies filter V i (3).
Extraction of text information T from an audio signal A occurs recursively depending on the number of levels of the wavelet decomposition L according to Formulas (2), (3), (5), and (6) in the existing [16][17][18][19][20] and proposed approaches, respectively. Thus, using the developed method, it is possible to allow an attacker to re-encode the audio signal with various lossy compression algorithms, but at the same time, the text information embedded in the audio signal maintains integrity. This statement is based on the fact that the current variety of existing compression algorithms [9][10][11][12] operates according to the same principle, namely, the elimination of the uninformative redundant component of the audio signal. Since the proposed method hides text information at medium frequencies and amplitudes of wavelet coefficients, and because this is its main feature, it can significantly increase the resistance of the stego-system to audio signal compression, taking into account the features of the psychophysiological model of sound perception. The only exceptions are those cases of completely deleting an audio file or applying critical compression with a complete loss of meaningful audio information. A quantitative assessment of the boundary values of critical compression occurrence will be obtained in an experimental study. Critical compression should be understood as the degree of compression at which text information is distorted or completely deleted (violation of semantic links) from the audio signal with a significant reduction in redundancy (compression). Then the main evaluation for the effectiveness of the proposed stego-system is the maximum degree of audio signal compression and the integrity of text information; that is, the highest compression ratio that maintains the full integrity of the semantic structures of the text.
Analysis of the literature [16][17][18][19][20][21][22][23][24][25][26] shows an almost complete absence of methods for embedding compression-resistant audio signals. One of the transformations that allows for such an embedding is the multilevel discrete wavelet transform, which has clear advantages in representing the local characteristics of the signal and takes into account the features of the psychophysiological model of sound perception. The proposed method increases the robustness of the stego-system to deliberate compression (elimination of highly informative features). We will show that the application of this approach in the development of the steganography algorithm, which is designed to achieve maximum robustness, can solve the main tasks of steganography, namely, minimization of introduced distortions and resistance to attacks by a passive intruder.
The next section is devoted to the presentation of all the main theoretical aspects of the proposed method, namely, (1) integrating text information into low-frequency wavelet coefficients of an audio signal followed by their scalar product with low-frequency and high-frequency orthogonal Daubechies wavelet filters for decomposition; (2) reconstructing of the audio signal with the text integrated into it by low-frequency and high-frequency wavelet coefficients; (3) extracting text information from low-frequency wavelet coefficients of the audio signal.

Presentation of the Proposed Method
Structural diagrams of the developed method of steganographic protection of text information based on the wavelet transform are shown in Figures 1 and 2. A detailed explanation of all the blocks on the diagram and their formal presentation are given below.
Any text information in English can be represented as an ASCII encoding, where all characters of the computer alphabet are numbered from 0 to 127, describing the ordinal number of a character in the binary number system of a seven-digit code from 0000000 to 1111111. Thus, we will form a set of numbers S = {0, 1, 2, . . . , 127} that correspond to each specific character according to the ASCII encoding. Then text information can be represented as a set T = {S i , S i , . . . , S i }, which corresponds to a sequence of numbers S i from the set S, where the occurrence of each S i in the set T is determined by the sequence of characters in the text i. Any text information in English can be represented as an ASCII encoding, where all characters of the computer alphabet are numbered from 0 to 127, describing the ordinal number of a character in the binary number system of a seven-digit code from 0000000 to 1111111. Thus, we will form a set of numbers , which corresponds to a sequence of numbers i S from the set S , where the occurrence of each i S in the set T is determined by the sequence of characters in the text i . So, given some text information 1, ,l T  , where l is the total number of characters to be hidden in the audio signal, it is necessary to perform an interleaving operation to remove statistical dependencies between characters in the text. This operation is implemented using a pseudo-random number generator (PRNG), which forms a sequence of l uniformly distributed numbers in the range ( ) 0;1 . So, given some text information T 1,...,l , where l is the total number of characters to be hidden in the audio signal, it is necessary to perform an interleaving operation to remove statistical dependencies between characters in the text. This operation is implemented using a pseudo-random number generator (PRNG), which forms a sequence of l uniformly distributed numbers in the range (0; 1).
Given a random variable, we often compute the expectation and variance, two important summary statistics. The expectation describes the average value, and the variance describes the spread (amount of variability) around the expectation.
Then, the mathematical expectation m r and variance D r of such a sequence, which consists of l pseudo-random numbers r i , should tend → to the following values In order to shuffle the characters of the set 1, ,l T  in a pseudo-random way, it necessary that the pseudo-random numbers 1, ,l x  that are generated by the PRNG are In order to shuffle the characters of the set T 1,...,l in a pseudo-random way, it is necessary that the pseudo-random numbers x 1,...,l that are generated by the PRNG are in the range (1; l), which is different from (0; 1). Numbers in the range (1; l) are equivalent to the indexes of each character of text information T 1,...,l .
To solve this problem, we can use the formula where r 1,...,l -pseudo-random numbers from the range (0; 1). The correctness of this transform is described as follows and is demonstrated in Figure 3. The correctness of this transform is described as follows ( ) and is demonstrated in Figure 3. to l . Thus, we can form a set of non-repeating numbers which will correspond to the new indexes of each character of text information 1, ,l T  .
This set of numbers 1 Key will correspond to Key 1, which is used at the stage of integrating text into an audio signal ( Figure 1) and at the stage of extracting text from an audio signal ( Figure 2).
Then, the operations of interleaving, which is used at the stage of integrating text into an audio signal (Figure 1), and de-interleaving, which is used at the stage of extracting text from an audio signal ( Figure 2), can be represented as follows: Since the low-frequency wavelet coefficients will increase their absolute power with each next level of decomposition, then the text information 1 Key T must be sorted in such a way that its integration into low-frequency wavelet coefficients occurs from the minimum min to the maximum max values in accordance with the expression Then x 1,...,l are pseudo-random numbers uniformly distributed in the range from 1 to l. Thus, we can form a set of non-repeating numbers which will correspond to the new indexes of each character of text information T 1,...,l . This set of numbers Key1 will correspond to Key 1, which is used at the stage of integrating text into an audio signal ( Figure 1) and at the stage of extracting text from an audio signal ( Figure 2). Then, the operations of interleaving, which is used at the stage of integrating text into an audio signal (Figure 1), and de-interleaving, which is used at the stage of extracting text from an audio signal (Figure 2), can be represented as follows: Since the low-frequency wavelet coefficients will increase their absolute power with each next level of decomposition, then the text information T Key1 must be sorted in such a way that its integration into low-frequency wavelet coefficients occurs from the minimum min to the maximum max values in accordance with the expression min T Key1 , . . . , max T Key1 ; this is the main task of applying the sorting operation.
So, having received text information T Key1 that was subject to the interleaving operation using Key1, it is necessary to perform a sorting operation from the minimum min to the maximum max value of the set of characters T Key1 .
We presented the input text information in the form of a set T = {S i , S i , . . . , S i }, where S = {0, 1, 2, . . . , 127} is a set of numbers that correspond to each specific character according to the ASCII encoding, and i is determined by the initial sequence of characters in the text. Therefore, the expression can be rewritten as T Key1 = {S i , S i , . . . , S i }, where i defines a sequence of numbers in the range from 0 to 127 depending on Key1.
Then, the operations of sorting, which is used at the stage of integrating text into an audio signal (Figure 1), and de-sorting, which is used at the stage of extracting text from an audio signal (Figure 2), can be written as follows: where Key2 1,...,l is the sequence of indexes of the set of characters T Key1 that was formed according to the expression min T Key1 , . . . , max T Key1 , which corresponds to Key 2 in Figures 1 and 2. So, having text information T Key2 that has undergone a sorting operation according to the condition min T Key1 , . . . , max T Key1 , it needs to be divided into L − 1 blocks, where L is a maximum number of levels of wavelet decomposition of the audio signal, since there is no text integration at the last level of wavelet decomposition ( Figure 1).
Then, the number of blocks b of text information T Key2 is determined by finding the maximum level of wavelet decomposition L of the audio signal, which can be expressed as follows: The correctness of this expression is confirmed by the fulfillment of the condition where l A is a number of samples of the audio signal, l F is a number of coefficients of the Daubechies wavelet filter, and the symbol ≈ characterizes the rounding down of a number L [27][28][29].
Then, the number of characters in one block l b of text information T Key2 is determined according to the expression where l T is a total number of characters of text information T Key2 that should be hidden in the audio signal.
It should be noted that the number of characters of text information in one block l b directly depends on the maximum level of wavelet decomposition L of the audio signal, as can be seen from Formulas (16)- (19). Then, finding the maximum level of wavelet decomposition L allows for uniformly integrating all blocks b of text information T Key2 at all decomposition levels 1, . . . , L − 1 to increase the resistance to audio signal compression, since with an increase in the decomposition level, the amplitude of the wavelet coefficients will increase and, accordingly, the amplitude of the text information integrated into them, due to the subsequent scalar product with a wavelet filter at each decomposition level 1, . . . , L − 1, which is a characteristic feature of the proposed method.
Thus text information T Key2 , which is divided into b blocks, where the number of characters in one block is l b , can be represented in the form of a set where which corresponds to the operation of dividing text information into blocks, which is used at the stage of integrating text into an audio signal, according to Figure 1. Then, the operation of combining blocks of text information Tb 1,...,b , which is used at the stage of extracting text from an audio signal, according to Figure 2, will look like this: At the final stage of preparing text information for integration into an audio signal, it is necessary to perform the normalization operation so that text information Tbn 1,...,b and audio signal An 1,...,l A are in the same normalization scale, namely, so that values of ASCII codes of text characters Tb 1,...,b and audio signal samples A 1,...,l A are in the range from 0 to 1.
Then, the restoration of the normalized text information Tbn 1,...,b and the audio signal An 1,...,l A to the original normalization (de-normalization) scale can be carried out according to the expressions Tb 1,...,b = Tbn 1,...,b · max(Tb 1,...,b ), where this sequence of operations corresponds to the blocks of normalization and denormalization, which are used at the stage of integrating text into an audio signal ( Figure 1) and extracting text from an audio signal ( Figure 2). Thus, we get blocks of normalized text information Tbn 1,...,b that are ready for integration into a normalized audio signal An 1,...,l A . However, since the integration does not take place in the audio signal An 1,...,l A itself, but in its low-frequency wavelet coefficients (LFWC) followed by their scalar product with low-frequency (LPF-D) and high-frequency (HPF-D) orthogonal Daubechies wavelet filters at each 1, . . . , L − 1 level of the wavelet decomposition, it is necessary to perform a wavelet transform of the audio signal An 1,...,l A and find the low-frequency (LFWC) and high-frequency (HFWC) wavelet coefficients for each 1, . . . , L level of the wavelet decomposition [30,31]. It should be noted that not only Daubechies filters can be used, but also other orthogonal wavelet filters, such as Coiflets, Symlets, or Meyer.
These are the main scientific results of the proposed method of steganographic hiding of text information in an audio signal based on the wavelet transform.

Results of Scientific Experimental Research
A computer model of the method of steganographic protection of text information based on the wavelet transform was modeled and studied in the MATLAB R2021b software and mathematical complex using a set of the following libraries: Signal Processing Toolbox, Wavelet Toolbox, Audio Toolbox, Text Analytics Toolbox, Filter Design HDL Coder, DSP System Toolbox, Communications Toolbox, Statistics and Machine Learning Toolbox.
In the experimental study, the initial audio signal for the proposed method of steganographic hiding of text information is a mono recording of the announcer in a male voice. The duration of mono recording is 91 s of the poem The Road Not Taken, by Robert Frost, in audio format WAV with a sampling rate of 44.1 kHz and a quantization bit depth of 16 bits per sample. Therefore, the stream of the bit sequence of audio data at the input of the computer model of the developed method will be-705. 6 Kbps, and the total amount of audio data will be-7.8 MB.
The audio signal was recorded using a sound card with a maximum sampling rate of 192 kHz, number of bits per sample of 24 bits/sample, and a signal-to-noise ratio of 116 dB using a unidirectional 16-bit condenser microphone with an audio sensitivity of 110 dB. Figure 4 shows the original audio signal before steganographic processing to embed secret text information, and Figure 5 shows the wavelet coefficients of the 17th level of decomposition, where the Daubechies function of the 12th order was used as a generating wavelet function. It should be noted that the optimal choice of the generating wavelet function and the number of decomposition levels are not trivial tasks, since the speech signal is a non-stationary process, and it is not possible to predict changes in its spectral component over time. Therefore, in practice, it is recommended to use the smoothest wavelet functions with a large number of zero moments (function order) and maximum number of possible levels of decomposition, which is determined through the energy of the signal under study and the wavelet function. This will make the wavelet spectrum of the speech signal most suitable for integrating text information.   As the initial text information (to be hidden) in the method under study, the poem The Road Not Taken by Robert Frost was used in text format TXT in the amount of 740 characters according to the rules of ASCII encoding, where 8 bits are allocated per character, from which it follows that the total amount of text information at the input of the computer model of the developed method will be 740 bytes. Figure 6 shows the original text information in symbolic form before steganographic embedding in order to hide it in the audio signal, taking into account the psychophysiological features of human hearing. Figure 7 also shows text information, but already encoded according to the ASCII encoding rules. It is the normalized values of ASCII codes that we must mask as best as possible in a highly redundant audio data stream, to hide the very fact of text transmission.   As the initial text information (to be hidden) in the method under study, the poem The Road Not Taken by Robert Frost was used in text format TXT in the amount of 740 characters according to the rules of ASCII encoding, where 8 bits are allocated per character, from which it follows that the total amount of text information at the input of the computer model of the developed method will be 740 bytes. Figure 6 shows the original text information in symbolic form before steganographic embedding in order to hide it in the audio signal, taking into account the psychophysiological features of human hearing. Figure 7 also shows text information, but already encoded according to the ASCII encoding rules. It is the normalized values of ASCII codes that we must mask as best as possible in a highly redundant audio data stream, to hide the very fact of text transmission. As the initial text information (to be hidden) in the method under study, the poem The Road Not Taken by Robert Frost was used in text format TXT in the amount of 740 characters according to the rules of ASCII encoding, where 8 bits are allocated per character, from which it follows that the total amount of text information at the input of the computer model of the developed method will be 740 bytes. Figure 6 shows the original text information in symbolic form before steganographic embedding in order to hide it in the audio signal, taking into account the psychophysiological features of human hearing. Figure 7 also shows text information, but already encoded according to the ASCII encoding rules. It is the normalized values of ASCII codes that we must mask as best as possible in a highly redundant audio data stream, to hide the very fact of text transmission.    Table 1 presents the results of an experimental study, namely, quantitative estimates of the effectiveness of the existing stego-system based on wavelet transform under conditions of passive or deliberate distortion of text information hidden in the audio signal were obtained by applying redundancy reduction methods (compression).    Table 1 presents the results of an experimental study, namely, quantitative estimates of the effectiveness of the existing stego-system based on wavelet transform under conditions of passive or deliberate distortion of text information hidden in the audio signal were obtained by applying redundancy reduction methods (compression).   Table 1 presents the results of an experimental study, namely, quantitative estimates of the effectiveness of the existing stego-system based on wavelet transform under conditions of passive or deliberate distortion of text information hidden in the audio signal were obtained by applying redundancy reduction methods (compression). The main task formulated earlier is to increase the robustness of the stego-system to compression algorithms, so that when compressing a steganocoded audio signal, the text information that is hidden inside it remains as complete as possible. Objective metrics are used to automate the processes of evaluating the effectiveness of embedding text information in an audio signal, which allow evaluating the distortions introduced by the stego-system into the original audio signal. As such, criteria for evaluating the effectiveness of the stego-system include objective metrics such as compression ratio (CR), correlation coefficient (CC), normalized root mean square error (NRMSE), and signal-to-noise ratio (SNR), peak signal-to-noise ratio (PSNR). It should be noted that CC, NRMSE, SNR, and PSNR are very sensitive to changes in the amplitude of the audio signal. Since it is the change in the amplitude of the audio signal that characterizes the degree of its distortion, this is exactly what we need to evaluate the quality of masking text information in an audio container, since this process entails signal distortion (amplitude distortion). Also, in this experimental study, Daubechies wavelet filters of the 12th order were used. This fact should be taken into account when interpreting the results obtained in CR, CC, NRMSE, SNR, and PSNR, which directly depend on the specific implementation of the audio signal, text information, and the selected wavelet filter, which will result in changes in the critical compression threshold in different versions of the experiment.
The obtained values of performance indicators should be interpreted as follows: with CR = 1, the steganocoded audio signal is not subjected to distortions introduced by the compression algorithm; at the same time, a very high psychophysiological model of sound perception (masking) is observed, which is confirmed by indicators CC = 0.9999, NRMSE = 0.0060, SNR = 36.7794, and PSNR = 63.0616. In this case, text information, when extracted from the audio signal, has ideal performance CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞, and this means that text information has not been subjected to the slightest distortion and is completely integral. The infinity symbol ∞ in this case means an infinitely high value of the criterion. According to Table 1, the parsed text information will match the full copy of the input text; that is, at the output of the transformations, we will have a text of the form as in Figure 6. Figure 8 shows the wavelet coefficients after the audio signal is compressed by six times, but the integrity of the text information remains unchanged, which is ideal. PSNR = ∞, and this means that text information has not been subjected to the slightest distortion and is completely integral. The infinity symbol ∞ in this case means an infinitely high value of the criterion. According to Table 1, the parsed text information will match the full copy of the input text; that is, at the output of the transformations, we will have a text of the form as in Figure 6. Figure 8 shows the wavelet coefficients after the audio signal is compressed by six times, but the integrity of the text information remains unchanged, which is ideal. It should be noted that in the existing method, the indicator CR = 6 is the boundary value at which text information is not subjected to distortions of the compression algorithm; this can be seen by analyzing the values CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞ while maintaining a sufficient quality indicator in terms of masking according to CC = 0.9861, NRMSE = 0.0681, SNR = 30.2330, and PSNR = 51.6068. In other words, at a compression level of six times, there are no audible differences between the original and steganocoded audio signals. This is the so-called 'critical level of compression', at which there is no distortion of text information, by raising the threshold above the critical compression level, distortion occurs.
For clarity, we present the values of the wavelet coefficients of the compressed steganocoded audio signal by a factor of 30 in Figure 9. From CC = 0.8632, NRMSE = 0.4984, SNR = 6.8333, and PSNR = 11.8327, it can be seen that under such conditions, it is not necessary to talk about the good sound quality of the audio signal. Also, due to the fact that there is a significant reduction in the redundancy of the steganocoded audio signal, it becomes problematic to maintain the integrity of text information in it. Consider what happens to text information with such compression. Figure 10 shows the recovered text information when the steganocoded audio signal is compressed by 30 times. According to the indicators from Table 1, with CR = 30, we have text distortion in proportion to the values CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, which are sufficiently large distortions, the result of which is clearly visible in Figure 10. Consider what happens to text information with such compression. Figure 10 shows the recovered text information when the steganocoded audio signal is compressed by 30 times. According to the indicators from Table 1, with CR = 30, we have text distortion in proportion to the values CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, which are sufficiently large distortions, the result of which is clearly visible in Figure 10.
As can be seen from the above, the existing method of steganographic hiding of text information in an audio signal based on the wavelet transform shows rather mediocre results in terms of compression resistance.
Let conduct an experimental study of the developed method and clearly see its advantage over the existing one. Table 2 presents the results of an experimental study of the developed method for hiding text information in an audio signal based on the wavelet transform, and as will be seen below, the proposed approach significantly increases the robustness of stego-system to the deliberate and passive elimination of redundancy to distort text information.

method).
Consider what happens to text information with such compression. Figure 10 shows the recovered text information when the steganocoded audio signal is compressed by 30 times. According to the indicators from Table 1, with CR = 30, we have text distortion in proportion to the values CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, which are sufficiently large distortions, the result of which is clearly visible in Figure 10. As can be seen from the above, the existing method of steganographic hiding of text information in an audio signal based on the wavelet transform shows rather mediocre results in terms of compression resistance.
Let conduct an experimental study of the developed method and clearly see its advantage over the existing one.  In doing so CR = 1, we have CC = 0.9999, NRMSE = 0.0059, SNR = 36.7274, and PSNR = 63.7437, which corresponds to the high performance of the psychophysiological model of audio signal perception (masking efficiency), and CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞ characterizes the integrity of text information extracted from the audio signal.
Very close attention should be paid to the results shown in Table 2 for CR = 20, namely CC = 0.9109, NRMSE = 0.2990, SNR = 16.3873, and PSNR = 28.3405: they characterize a strong distortion of the steganographic audio signal, but according to CC = 1, NRMSE = 0. SNR = ∞, PSNR = ∞ text information remains integrity. These results are quite remarkable, since when compressed by 20 times, the integrity of the text is preserved in full: it is this result that is significant in our study.
It should be remembered that, by analyzing the existing method, we obtained the boundary value CR = 6, and in the developed, CR = 20, with full integrity of text information in both cases. Then, we can make reasonable conclusions that by applying the developed method of steganographic hiding of text information in an audio signal, we will get a gain of 3.3 times compared to the existing method, thereby increasing the robustness of the stego-system to deliberate or passive compression of the audio signal in order to distort the embedded text information.
The wavelet coefficients of the steganocoded audio signal after 20-times compression are shown in Figure 11. According to Table 2, CR = 20 is a borderline result, above which text information will be distorted.  Figure 12 shows the wavelet coefficients of the steganocoded audio signal after compression by 30 times. Given such compression, according to the values of the metrics CC = 0.9133, NRMSE = 0.3433, SNR = 21.3553, and PSNR = 34.3475, it can be concluded that text information is distorted, but comparing them with the indicators in Table 1 at the same compression level CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, we will come to the conclusion that objectively, we have many times gain in the fight against distortions, all other things being equal, using the developed method of steganographic hiding of text information in an audio signal.  Figure 13 shows text information with 30-fold compression of a steganocoded audio signal using the developed method. It is clearly seen that distortion occurs, but in comparison with the existing concealment method, the results of which are shown in Figure  10, we have a significant increase in the effective steganographic processing of audio signals to embed text information.
According to the results obtained in the experimental study, it is possible to draw reasonable conclusions that the proposed method of steganographic protection of text  Figure 12 shows the wavelet coefficients of the steganocoded audio signal after compression by 30 times. Given such compression, according to the values of the metrics CC = 0.9133, NRMSE = 0.3433, SNR = 21.3553, and PSNR = 34.3475, it can be concluded that text information is distorted, but comparing them with the indicators in Table 1 at the same compression level CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, we will come to the conclusion that objectively, we have many times gain in the fight against distortions, all other things being equal, using the developed method of steganographic hiding of text information in an audio signal.  Figure 12 shows the wavelet coefficients of the steganocoded audio signal after compression by 30 times. Given such compression, according to the values of the metrics CC = 0.9133, NRMSE = 0.3433, SNR = 21.3553, and PSNR = 34.3475, it can be concluded that text information is distorted, but comparing them with the indicators in Table 1 at the same compression level CC = 0.7453, NRMSE = 0.6893, SNR = 4.0383, and PSNR = 10.958, we will come to the conclusion that objectively, we have many times gain in the fight against distortions, all other things being equal, using the developed method of steganographic hiding of text information in an audio signal.  Figure 13 shows text information with 30-fold compression of a steganocoded audio signal using the developed method. It is clearly seen that distortion occurs, but in comparison with the existing concealment method, the results of which are shown in Figure  10, we have a significant increase in the effective steganographic processing of audio signals to embed text information.
According to the results obtained in the experimental study, it is possible to draw reasonable conclusions that the proposed method of steganographic protection of text  Figure 13 shows text information with 30-fold compression of a steganocoded audio signal using the developed method. It is clearly seen that distortion occurs, but in comparison with the existing concealment method, the results of which are shown in Figure 10, we have a significant increase in the effective steganographic processing of audio signals to embed text information.

Conclusions
The developed method of steganographic hiding of text information in audio signal based on the wavelet transform increases the robustness of the stego-system to compression of the steganocoded audio signal, while maintaining the integrity of text information, taking into account the features of the psychophysiological model of sound perception.
The results of the obtained experimental studies confirm the hypothesis; namely, the proposal to use recursive embedding in the low-frequency region (approximating wavelet coefficients) followed by scalar product with wavelet function, which will increase the average power of hidden data. The results are given in Table 2 for CR = 20; namely, CC = 0.9109, NRMSE = 0.2990, SNR = 16.3873, and PSNR = 28.3405: they characterize a strong distortion of the steganographic audio signal, but according to CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞, text information remains integrity. These results are quite remarkable, since when compressed by 20 times, the integrity of the text is preserved completely; this result is most significant in our study.
It should be noted that upon analyzing the existing method, which is based on embedding text information in the high-frequency component (detailed wavelet coefficients) we obtained the limit CR = 6, and in the developed, CR = 20, with full integrity of the text information in both cases. Thus, we can make reasonable conclusions that by applying the developed method of steganographic hiding of text information in audio signal, we will get a gain of 3.3 times compared to the existing method. Therefore, the resistance of the stego-system is increased by 3.3 times to deliberate or passive compression of the audio signal in order to distort the embedded text information. According to the results obtained in the experimental study, it is possible to draw reasonable conclusions that the proposed method of steganographic protection of text information is promising in this area and requires further research.

Conclusions
The developed method of steganographic hiding of text information in audio signal based on the wavelet transform increases the robustness of the stego-system to compression of the steganocoded audio signal, while maintaining the integrity of text information, taking into account the features of the psychophysiological model of sound perception.
The results of the obtained experimental studies confirm the hypothesis; namely, the proposal to use recursive embedding in the low-frequency region (approximating wavelet coefficients) followed by scalar product with wavelet function, which will increase the average power of hidden data. The results are given in Table 2 for CR = 20; namely, CC = 0.9109, NRMSE = 0.2990, SNR = 16.3873, and PSNR = 28.3405: they characterize a strong distortion of the steganographic audio signal, but according to CC = 1, NRMSE = 0, SNR = ∞, and PSNR = ∞, text information remains integrity. These results are quite remarkable, since when compressed by 20 times, the integrity of the text is preserved completely; this result is most significant in our study.
It should be noted that upon analyzing the existing method, which is based on embedding text information in the high-frequency component (detailed wavelet coefficients) we obtained the limit CR = 6, and in the developed, CR = 20, with full integrity of the text information in both cases. Thus, we can make reasonable conclusions that by applying the developed method of steganographic hiding of text information in audio signal, we will get a gain of 3.3 times compared to the existing method. Therefore, the resistance of the stego-system is increased by 3.3 times to deliberate or passive compression of the audio signal in order to distort the embedded text information.
The results obtained in this scientific study can be used to build systems for hiding text information in an audio file, but unlike existing methods, the developed method implements the proposed approach of the scalar product of the low-pass Daubechies filter with wavelet coefficients, where blocks of text information are already integrated. Therefore, there is an increase in the average power of low-frequency wavelet coefficients and an increase in the power of normalized ASCII codes of text information. At the same time, the developed method introduces more distortions into the signal than the existing methods, but in case of usage of audio signal with a high bitrate, we will get at the output a signal with indistinguishable quality. Because of this, we will increase by 3.3 times the resistance to intentional or unintentional compression of the output audio signal. Another disadvantage of the proposed method is that the amount of information that can be integrated into audio signal with equal measures of quality will be significantly less than in existing approaches, given the fact that the error will grow with each successive level of wavelet decomposition. Therefore, it must be emphasized that this approach will be very effective if not a large amount of data is hidden in the audio container; that is, with an increase in the amount of textual information that must be integrated into the audio signal, the effectiveness of this approach will decrease. In case of a need to hide a small amount of data, this approach will be many times more efficient than existing methods. The authors plan to consider specific quantitative assessments, at which this method will not be effective, in following scientific studies. Currently, we can conclude that when integrating text information with a volume of 740 bytes into audio signal with a volume of 7.8 MB, we get very decent results: an increase in the critical compression threshold of 3.3 times.