Simultaneous Audio Encryption and Compression Using Compressive Sensing Techniques

: The development of coding schemes with the capacity to simultaneously encrypt and compress audio signals is a subject of active research because of the increasing necessity for transmitting sensitive audio information over insecure communication channels. Thus, several schemes have been developed; firstly, some of them compress the digital information and subsequently encrypt the resulting information. These schemas efficiently compress and encrypt the information. However, they may compromise the information as it can be accessed before encryption. To overcome this problem, a compressing sensing-based system to simultaneously compress and encrypt audio signals is proposed in which the audio signal is segmented in frames of 1024 samples and transformed into a sparse frame using the discrete cosine transform (DCT). Each frame is then multiplied by a different sensing matrix generated using the chaotic mixing scheme. This fact allows that the proposed scheme satisfies the extended Wyner secrecy (EWS) criterion. The evaluation results obtained using several genres of audio signals show that the proposed system allows to simultaneously compress and encrypt audio signals, satisfying the EWS criterion.


Introduction
The large amount of digital information transmitted over unsecure channels has led to the necessity of developing efficient schemes for increasing the amount of information transmitted over the existing unsecure communication channels, as well as improving the security of the transmitted information. Thus, to meet these two requirements, many efforts have been undertaken that intend to develop encoding schemes able to simultaneously compress and encrypt audio signals, before their transmission over unsecure communication channels [1,2]. These topics have attracted the attention of a significant number of researchers, consequently leading to the development of several efficient schemes, which firstly compress and subsequently encrypt the compressed information. These schemes intuitively simplify the encryption task because the redundant information has been eliminated during the compression operation. However, because the compressed information is stored before encryption, its security may be compromised because it can be accessed before performing the encryption task. To overcome this problem, several schemes have been proposed in which the information is firstly encrypted and then the resulting information is compressed [1]. The main disadvantage of such schemes is the fact that a lossless compression scheme must be used to possible user. In the receiver side, using the secret keys k1, k2, and k3, the sensing matrix required for decoding is generated. After matrix A is generated, the sensing matrix A0, of size n × m, used for compressing and encrypting the first block of audio signal, is estimated using the chaotic mixing approach described in Section 2.1. Next, the first block of the input signal given by X0(k) = X(k), k = 1, 2, …, n is extracted, which is transformed using the (DCT) to estimate a sparse representation of such block, Si. Then, S1 is multiplied by the sensing matrix A0 to generate the compressed and encrypted version of the first block of input signal 1 , which is transmitted to the received side. In general, the input signal X(k) is segmented in non-overlapped blocks given by Xi(k) = Xi(k + in), k = 1, 2, …, n; i = 1, 2, …, which are then transformed using the DCT to generate a sparse frame, Si. Next, using the chaotic mixing method [13,14], the n × m sensing matrix of the i-th frame, Ai, is generated from the random matrix A. This allows to generate a different sensing matrix for each frame, without significantly increasing the computational complexity, satisfying at the same time the extended Wyner secrecy (EWS) criterion [12]. Next, the sparse vector Si, estimated using the DCT, is multiplied the sensing matrix Ai to obtain the encrypted frame with a compression rate of n/m, which is sent to the reception side. In the receiver stage, provided in Figure 1b, the received information is decoded using the RSA des-encryption module, which allows to recover the values of m and n as well as the users secret keys k1, k2, and k3. These parameters are then used to generate the matrix A and then, using the chaotic mixing method, the sensing matrix for the i-th block, Ai, is generated in the same form as it is estimated in the encoding stage. Next, the sensing matrix Ai and the input frame i ( )are fed into the CS recovery stage to obtain ̂. Then, the inverse DCT (IDCT) of ̂ is computed to obtain ̂, which is then concatenated with the previously decoded frames to estimate the decoded signal. The encoding and decoding process described above is performed with each frame of input signal X. The following sections provide a description of each stage of the proposed system.

Sensing Matrix Generation
The sensing matrix, A, required in a CS-based audio compression system, becomes the secret key used for the proposed encryption system. Thus, to obtain sufficiently accurate signal decoding, the sensing matrix A must satisfy the restrictive isometry property (RIP) given by [6,15,16] (1 − )⟦ ⟧ 2 2 ≤ ⟦ ⟧ 2 where 0 ≤ ≤ 1. Thus, because and assuming that T = σ 2 , from Equation (1), it follows that Thus, A satisfies the RIP if 2 = 1. Then, if the sensing matrix, A, satisfies (3), the signal S can be accurately recovered [6]. Then, in the encoding stage, the sensing matrix A is constructed using a pseudo random number generator whose initial value is the key, k1, provided by the user, while in the decoding stage, using the same user key, k1, the sensing matrix, A, required to decode S, is generated. To this end, firstly, (L/2) 2 pairs of uniformly distributed random numbers (Uj, Vj), j = 0, 1, …, L/2 − 1, are generated [16]. Next, using the Marsaglia polar method [16], the (L/2) 2 pairs of uniformly distributed random numbers (Uj, Vj) are converted into L 2 Gaussian distributed random numbers used to estimate the matrix A. To this end, L uniformly distributed random numbers are computed (Uj, Vj), j = 0, 1, …, L/2 -1. then The matrix A, described by (5) and (6), will be used to generate the sensing matrixes required to compress and encrypt the input signal, simultaneously satisfying the RIP property [15,17] and the EWS criterion [10].
To satisfy the EWS criterion [10], a different sensing matrix must be used in each frame, which must also satisfy the RIP. To generate a different sensing matrix for each frame, the chaotic mixing scheme [13,14] is applied to random matrix A, as described in Figure 2 and Equations (7)-(15). To modify only the position of the matrix elements and not their values themselves, the chaotic mixing method is used, because it performs a mapping from → . To achieve this goal, the location of the element, (x, y) of the matrix, (x, y), is modified using a matrix L given by [13,14] where BL satisfies that det( ) = 1 and ( ) = 1 + (1/ 1 ), where 1 is the largest eigenvalue of ( ). Consider the largest and smallest eigenvalues of ( ), which are given by [13,14] and Next, from (5) and (6), it follows that ( ) = 2 + 2 = 1 + 2 = 1 + (1/ 1 ), and det( ) = ( 2 + 1) − 2 = 1.

Public Key Encryption of Secrete User Key
In the encoding stage of the proposed system, firstly, the compression parameters m and n together with the user secret keys k1, k2, and k3 are encrypted with the public key RSA algorithm [14,[18][19][20][21], whose security depends on the difficulty of factoring large integer numbers into their prime components. In order for the transmitter A to be able to send the above information to the receiver B, using the RSA algorithm, the receiver B must send to the transmitter A the product of two prime secrete numbers, NB, where = , and are two secrete prime numbers of B, together with a no-secrete public encryption exponent . Thus, for sending the public key, the transmitter, A, must firstly receive from the receiver, B, the product of two prime secrete numbers, , together with its public encryption exponent . Additionally, the receiver generates its secret decryption exponent . Thus, using the parameters received from B, the secret key of the proposed scheme is that transmitted by the encoder stage as [18,21] = ( ) ( ); = 1, 2, 3.
Next, using , the receiver decrypts the secret keys, sent by the transmitter and used for generating the sensing matrix, i , as follows [18,21]: where satisfies the relation where are two prime secret numbers generated in the reception side. Finally, the receiver sent to the transmitter a conformation massage, given by [18,21] which is decrypted by the transmitter as follows [18,21] = ( ).
Because EB is public, the message M can be recovered by any member of the network, however, only the receiver B may have sent the confirmation of the encryption of the message M.

Encrypted and Compressed Signal
In the transmission stage, firstly, the input audio signal, X(m), is segmented in a set of nonoverlapped frames, such that its i-th frame is given by i ( ) = ( + ), where 0 ≤ k≤ n. Xi(k) is then transformed to the DCT domain, which provides a k sparse representation with only k<< n terms different to zero, that is, S = Ψ . Finally, the encrypted and compressed signal is computed by multiplying the sparse vector by the i-th, × size sensing matrix . Thus, the i-th frame of the transmitted signal is given by [6,17,22].
where Ψ denotes the DCT basis functions. Thus, according to the compressive sensing theory, Si can be reconstructed if the input signal is represented with at least m samples, where m ≥ O (k log n ) [6,17,22].

Decrypted and Decompressed Signal
The transmitted signal is decrypted and decompressed by minimizing the norm l 1 because, if the received frame Si is sparse enough, the probability that the recovered signal is almost equal to the original one is very high [6,17,22], because the norm l 1 improves the signal reconstruction, that is, for a given sensing matrix ∈ × and a received vector ∈ R m , the i-th transmitted sparse vector, ̂, can be estimated minimizing [6,17,22] min y ∈R m ‖y-Â‖ 1 given that y=AS, (22) using orthogonal matching pursit (OMP) [6]. Finally, the transmitted vector X is estimated computing the inverse DCT of ̂, that is, = Ψ −1 . Because , ≪ and Si is sparse, it can be recovered with about k × n × m operations [6], and then, the CS-based scheme is a highly competitive compression-encryption system. However, as in any other encryption system, its security is of great importance. Thus, attending to this fact, the next subsection presents a security analysis of the CSbased encryption system.

Security Analysis of CS-Based System
The security of the proposed system strongly depends on the fact that the encoding and decoding sensing matrixes be enough different from each other. To carry out this analysis, consider the binary hypothesis testing theory developed by Ramezani-Mayiami et al. [5], using the Norman-Pearson test. Using this theory, it can be shown that, when the same sensing matrix, Ai, is used in both the encoding and decoding stages, the probability of correctly detecting the transmitted signal, Ps(), is given by [5] Meanwhile, when two different matrixes Ai and Bi are used in the encoding and decoding stages, the probability of correctly detection, Pd(), is given by where θ is the angle between sub-spaces ‖ ‖ 2 and ‖ ‖ 2 . Because 0 ≤ cos ( ) ≤ 1 from (23) and (26), it follows that, when ‖ ‖ 2 and ‖ ‖ 2 are orthogonal sub-spaces, that is, cos( ) = 0, the decoded signal is completely useless because, in this situation, Pd() = 0.5. From the information theoretic perspective, this situation is satisfied when the perfect secrecy is satisfied [5].
Equations (23)-(26) show that, to correctly decode the incoming signal, the encoding matrix must be the same as the decoding sensing matrix. Thus, a possible attack is to try to estimate the sensing matrix using several received frames by means of some blind signal separation methods, such as the independent component analysis (ICA). Thus, it is necessary to determine the conditions that allow to increase the security against ICA or other blind separation analysis. To analyze the security of the CS-based crypto system, it will be assumed that the secrecy of sensing matrix A is guaranteed. To this end, consider the plain text to be k sparse, where k < n, such that there is at most k elements different from zero in a frame Si of length n, such that the CS-based encoded signal is given by [6,10] = , where ∈ is the encoded vector, and ∈ × and ∈ is the input vector. Next, defining = [ , ] and = [ , ], a ∈ ×( − ) , b ∈ × , ∈ ( − )×1 , ∈ ×1 , without loss of generality, assume [10] and 1 ∈ ×1 . Substituting (28)-(31) into (27), and using the Moore-Penrose inverse matrix, is given by [10] = (( ) Next, consider the conditional entropy function of given S, which satisfies [10,19] ( / ) = , where the entropy of S and conditional entropies satisfies and then ( , a ) = ( ̸ ) + ( ).
Substituting (38) and (35) into (36), from (34), it follows that Assuming that the entropy of is smaller than or equal to the sum of the entropy of its elements, that is, Using the fact that all elements of given by (33) have the same distribution, it follows that where = 2 and B is the number of bits used for representing an information sample, that is, an audio sample in an audio signal or a pixel in an image.
Next, consider the conditional mutual information of and S, given a , which is given as [6] ( ; / ) = ( / ) − ( / , ). (44) Substituting (41) into (44), it follows that Next, consider the mutual information between the input vector y and the sensing matrix A given , ( ; / ), which, using the chain rule and the fact that = [ , ], can be expressed as follows [10]: Because a is independent of b and S is independent of A, besides that the elements of A and S are statistically independent, it follows that As and are statistically independent of S, from (52), it follows that [10] Next, if = 0, from (32), it follows that Then, Then, from (59), it follows that Next, consider Equation (58), and substituting (60) and (55) into (50), it follows that Next, consider the mutual information of y and A, given , which is given by [10] ( ; Next, assuming that has k × m entries, which are mutually independent and also independent of , it follows that and Thus, from (69) and (68), it follows that [10] ( ; / ) + ( ; / ) ≤ ( ) + ( ).
Finally, considering that = 0, from (70), it follows that Then, from (68), it follows that As the mutual information is always positive, that is, ( ; ) + ( ; ) is always non-negative, it approaches zero as n increases. Then the CS-based joint encryption-compression system satisfies the EWS criterion [10], when the key is used only once.

Experimental Results
To evaluate the compression and encryption capability of proposed algorithm, it is necessary to simultaneously compress and encrypt different genres of audio signals, such as Mexican, Caribbean, classic, pop, and rock music, as well as speech signals with different compression rates. To this end, these signals are encoded and decoded using either the same or different sensing matrixes. To evaluate the security performance of proposed system, several tests are performed that are described in the following subsections.

Waveform Plotting
One of the more common evaluations of the system performance is the waveform plotting, which allows a visual comparison about the similarity between the original audio and the decrypted/decompressed signals. Figure 3a-e show the plot of decrypted/decompressed violin audio signal segment of 1.1 s corresponding to a Bach concert with a sampling rate of 44 kHz and 16 bits/sample without compression, that is, with a bit rate of 704 kb/s, plot in Figure 3a. Figure 3b shows the decrypted signal using the same sensing matrix for both the encryption and decryption process without compression. Figure 3c shows the decrypted signal when the sensing matrix used for encryption is different from that used for decryption; in this case, the original signal was encoded without any compression. Figure 3d plots the decrypted/decompressed signal when the sensing matrixes used for both encryption/compression and decryption/decompression are the same. In this situation, the original signal was encoded with 176 kb/s. Finally, Figure 3e shows the decoded signal when the sensing matrix used for decoding is different to that used during the encoding process.   Figure 3f plots the original signal. Figure 3g plots the decoded signal when the encoding and decoding sensing matrixes are the same, and the original signal was compressed to 352 kb/s. Finally, Figure 3h plots the decoded signal obtained when different sensing matrixes are used during the encoding and decoding processes. In this case, the transmission rate was equal to 352 kb/s. Figure 3a-h show that when the same sensing matrix is used for encoding and decoding, the decoded signal closely resembles the original one, independently of the audio signal genre and compression rate used. On the other hand, when the sensing matrix used for decoding is different from that used for encoding, the decoded signal is quite different to the original one, even though, for some genre signals, the envelope has some similarity.

Spectrogram
Another important evaluation method consists of the comparison of the spectral characteristics of the original, encrypted, and decrypted signals, using different compression rates. Figure 4a-f show the spectrogram of violin music obtained from a Bach concert. These signals are encrypted using compressive sensing with different compression rates. Figure 4a shows the spectrogram of the original Bach concert signal. Figure 4b shows the spectrogram of the encrypted signal without compression. Figure 4c shows the decrypted and decompressed signal when the encoded signal is decoded using the same sensing matrix used during the encoding process. The original signal is encoded with a bit rate of 176 kb/s. Figure 4d shows the spectrogram of the decoded signal when the decoded signal is obtained using a sensing matrix different from that used during the encoding process. Here, the original signal was encoded with a bit rate of 176 kb/s. Figure 4e shows the decoded signal obtained when the original signal is encoded with a bit rate of 88 kb/s and decoded using the same matrix used during the encoding process. Finally, Figure 4f shows the spectrogram of the signal decoded using a sensing matrix different from that used during the encryption and compression processes. These figures show that the spectrum obtained when the sensing matrix used for encoding and decoding is different, and is almost flat, and they strongly infer the signal, shown in Figure 4a, from the knowledge of the signal in Figure 4b. On the other hand, these figures also show that the spectrogram of the signals obtained when the sensing matrix used for encoding and decoding is the same clearly resembles to that of the original one, while they are quite different from those obtained when the sensing matrix used for encoding and decoding is different. Thus, when the decoded signal is obtained using the same sensing matrixes in both the encoding stage and decoding stages, it clearly resembles the original one, while the spectrum of the decoded signal obtained using different sensing matrix in both encoded and decoded stages is clearly different.

Pearson Correlation Analysis
Another important parameter used for evaluating the similarity between the original signal and the decoded one is the Pearson correlation coefficient, which is given as follows:  Figure 5a-h show the comparison of the Pearson correlation coefficient obtained when the received signal is decoded using the same sensing matrix used for encoding, ( ), together with the Person correlation coefficient obtained when the received signal is decoded using a sensing matrix different to that used during the encoding process ( ). Figure 5a shows the Pearson correlation coefficients when the original signal is popular music with a bit rate of 704 kb/s. Figure 5b shows the Pearson correlation coefficients when the original signal is encoded using 352 kb/s. Figure 5c Figure 5i shows that, when the same matrix is used for encoding and decoding, the dispersion diagram is close to a straight line with a slope. This means that, from the decoded audio signal, it is possible to obtain the input one. Figure 5j shows when the sensing matrix used for encoding and decoding is different, the dispersion diagrams are quite spread, such that the decoded signal cannot be inferred from the original one. The evaluation results show that the Pearson correlation coefficient, of each frame, between the original and decoded signal when the sensing matrixes used in the encoded and decoded stages are different, is close to zero, around 10 −2 . From the dispersion diagram shown in Figure 5j, it follows that, if the sensing matrix used for encoding and decoding is different, the dependence between the original and signals decoded is too weak, such that the original signal cannot be estimated from the decoded one. Thus, it would be tough for an intruder to hack the audio signal during the transmission. On the other hand, when the sensing matrixes used for encoding and decoding are the same, the correlation coefficients for each frame are close to one, even when the bit rate used is relatively low. This fact can be observed from the dispersion diagram of Figure 5i, which plots the dispersion diagram between the original and decoded signal. Here, we can see that, when the same matrix is used, the decoded signal closely approaches the original one, grouping around a straight line with a slope. This means that the original signal can be accurately inferred from the decoded one. Thus, the proposed system allows secure and high-quality audio signal transmission.

Normalized Mean Square Error Analysis
Other important parameter used to evaluate the quality of the proposed system is the normalized mean square error between the original and decoded signals, when the sensing matrixes used for decoding are the same or different to those used for encoding. The normalized mean square error (MSE) is given by where xo(n) and xd(n) are the original and recovered signals, respectively. For evaluating the performance of the proposed system, several audio signals were used, such as popular Mexican and Caribbean music, POP music, classic music, and rock music signals sampled at 44 KHz. Each signal is encoded using 16 bits/sample, that is, a bit rate of 704 kb/s. For encoding, as described in Section 2.1, each signal is divided in frames of 1024 samples/frame before computing the DCT, whose resulting vector is multiplied by the sensing matrix. Figure 6a-h show the MSE obtained when the input signals are decoded using either the same or different sensing matrixes used for encoding, that is, the correct or incorrect private secret key. These figures show that the MSE obtained when the decoded sensing matrix is different to that used during the encoding process, that is, an incorrect private decoding key, is larger, whereas when a correct sensing matrix is used, the MSE is close to zero. This fact can be also observed from Figure 5i,f, which show that, when the same matrix is used for encoding and decoding, the decoded signal closely approaches to the original one, which results in an approximation error close to zero. If we consider that the MSE given by (76) can be considered as the inverse of the signal-to-noise ratio (SNR), that is, −1 = , the evaluation results show that, when the same matrix is used for encoding and decoding, a decoded signal with high SNR is obtained, that is, a high quality signal can be obtained. Meanwhile, when the sensing matrix used for encoding and decoding is different among them, a rather noisy decoded signal with SNR smaller than zero is obtained, which results in an unintelligible decoded signal. Thus, it can be expected that the proposed system allows the secure transmission of the high-quality signal. When the compression rate increases, as can be expected, the quality of the decoded signal becomes lower. Figure 6. (a) Mean square error (MSE) estimated when classic music, with a bit rate of 704 kb/s, is decoded using the same, As, and different, Ad, sensing matrix. (b) MSE obtained when classic music, with a bit rate of 352 kb/s, is decoded using the same, As, and different, Ad, sensing matrix. (c) MSE obtained when classic music with a bit rate of 176 kb/s is decoded using the same, As, and different, Ad, sensing matrix. (d) MSE obtained when classic music, with a bit rate of 88 kb/s, is decoded using the same, As, and different sensing matrix. (e) MSE using the same, As, and different, Ad, sensing matrix with a bit rate of 704 kb/s. (f) MSE using the same, As, and different, Ad, sensing matrix with a bit rate of 352 kb/s. (g) MSE using the same, As, and different, Ad, sensing matrix with a bit rate of 176 kb/s. (h) MSE using the same, As, and different, Ad, sensing matrix with a bit rate of 88 kb/s.

Spectral Similarity Analysis
Another metric that can be used for evaluating the security and reconstruction quality of proposed system is the spectral similarity (SMSE), which is given by where (1024( − 1) + ) and (1024( − 1) + ) is the k-th component of the m-th frame of original and decoded signals, respectively. Figure 7a-d show the spectral similarity obtained when the sensing matrix used for decoding is equal and different to that used in the encoding stage. Figure  7a,b show the spectral similarity obtained when the sensing matrixes equal and different to that used for encoding used classical music signals with bit rates of 704 and 352 bits/s, respectively.  Figure 7c,d show the spectral similarity obtained when the sensing matrix is equal and different to that used for encoding using a classical music signals with bit rates of 176 kb/s and 88 kb/s, respectively.
The evaluation results show that the MSE obtained when the signal is transmitted without and with compression rates of 50% is close to zero, providing secure communications with high quality decoded signals. Meanwhile, when the compression rate increases, the quality of the decoded signal becomes lower. Tables 1 and 2 show the MSE, SMSE, and correlation coefficient of the proposed algorithm when it is used for encoding popular Mexican music. Table 2 shows the performance of the proposed scheme when it is used for compressing and encrypting classic music. Table 3 shows the NSCR and UACI parameters obtained when the incoming signal is classic music, popular music, and pop music with different bit rates. Table 1. Similarity, spectral similarity, and Person correlation obtained using the proposed system when popular music is encoded using a different number of samples/frames. MSE, mean square error; SMSE, spectral similarity.

NSCR and UACI Parameters
Other important parameters included in the NIST recommendations to determine the quality of speech encryption are the NSCR and UACI, which determine the number of changing samples and the number of average of changes in the intensity of the encrypted speech, respectively. The Number of Sample Change Rate (NSCR) and Unified Average Changing Intensity (UACI) are given by where ( ) and ′ ( ) are the ith sample of two cyphered audio signals, whose original versions differ only in one sample, and N denotes the length of the audio frame. Table 3 provides the NSCR and UACI when the proposed algorithm is required to compress and encrypt several genders of audio signals. Table 3 shows that the values of UACI and NCSR provided by the proposed scheme are close to the optimum ones reported in the literature [23].

Comparison with Other Reported Schemes
An important evaluation of the proposed scheme is the comparison of its performance with the performance provided by other previously proposed schemes. Table 4 shows a comparison of the Pearson correlation coefficients and the mean square error provided by the proposed scheme and other previously proposed schemes when the sensing matrix used for decoding the encrypted signal is either the same or different sensing matrix to that used for encoding the audio signals. Tables 4 and 5 show a comparison of the correlation coefficient and MSE provided by the proposed scheme together with the system proposed by G. Sudhish et al. [3], Sathiyamurthi [23,24] Kordov [25], and when the input signals are classic and popular music, with a bit rate of 704 kb/s. Table 6 shows a comparison of the correlation coefficient and MSE provided by the proposed scheme and the system proposed by G. Sudhish et al. [3], when the input signals are classic and popular music, with bit rates of 352 kb/s and 176 kb/s, respectively. Finally, Table 7 shows a comparison of the NSCR and UACI parameters provided by the proposed scheme together with the system proposed by Sathiyamurthi [23] and Kordov [25] when the input signals are classic and popular music, with a bit rate of 704 kb/s. Table 4. Person correlation coefficient obtained using the proposed system and those proposed by G. Sudhish et al. [3], Kordov [25], and Sathiyamurthi [24], with a bit rate of 704 kb/s.  Table 6. Pearson-correlation coefficient and mean square error obtained using the proposed system and the system proposed by G. Sudhish et al. [3], with bit rates of 352 kb/s and 176 kb/s.  Table 7. NSCR and UACI parameters obtained using the proposed system, and the systems proposed by G. Kordov [25] and Sathiyamurthi [24] with a bit rate of 704 kb/s.  7 show that the proposed scheme provides results that are quite competitive compared with other previously proposed schemes. It provides the same correlation coefficients and smaller MSE than other previously proposed schemes, when the audio signals are transmitted without compression [3,21,24]. On the other hand, when the audio signals are simultaneously compressed and encrypted, the proposed scheme is quite competitive with other previously proposed schemes [3].

Conclusions
This paper presents a CS-based encoding system for jointly encrypting and compressing audio signals. In proposed scheme, the audio signals are firstly segmented in frames of 1024 samples, which are then transformed using the DCT for generating a sparse frame. Each frame is then multiplied by a different sensing matrix for compression and encryption, which is constructed using a Gaussian random number generator and a chaotic mixing scheme. This assures that the sensing matrixes used in the proposed system are different in each frame, and then satisfies the EWS criterion.
The evaluation results obtained show that the proposed algorithm provides a rather secure transmission system with a very good quality of decoded signal, because when the same matrix is used for encoding and decoding; the correlation coefficient is close to one, while the MSE and SMSE are close to zero. Meanwhile, when the sensing matrixes used for encoding and decoding are different, the correlation coefficients for each frame are close to zero, and MSE and SMSE become larger than one, even when the bit rates used are relatively low. Besides that, the NSCR and UACI obtained are close to 100% and 33%, respectively. Thus, the proposed scheme allows the secure transmission of high-quality audio signals.
Finally, the evaluation results show that the proposed scheme provides results that are quite competitive compared with other previously proposed schemes. It also provides the same correlation coefficients and smaller MSE than other previously proposed schemes, when the audio signals are transmitted without compression, whereas when the audio signals are simultaneously compressed and encrypted, the proposed scheme is quite competitive compared with other previously proposed schemes.