Simultaneous Audio Encryption and Compression Using Parallel Compressive Sensing and Modiﬁed Toeplitz Measurement Matrix

: With the explosive growth of voice information interaction, there is an urgent need for safe and effective compression transmission methods. In this paper, compressive sensing is used to realize the compression and encryption of speech signals. Firstly, the scheme of linear feedback shift register combined with inner product to generate measurement matrix is proposed. Secondly, we adopt a new parallel compressive sensing technique to tremendously improve the processing efﬁciency. Further, the two parties in the communication adopt public key cryptosystem to safely share the key and select a different measurement matrix for each frame of the voice signal to ensure the security. This scheme greatly reduces the difﬁculty of generating measurement matrix in hardware and improves the processing efﬁciency. Compared with the existing scheme by Moreno-Alvarado et al., our scheme has reduced the execution time by approximately 8%, and the mean square error (MSE) has also been reduced by approximately 5%.


Introduction
With the rapid development of Internet and digital technology, a large amount of information is presented in the form of pictures and voices.This information is stored in various types of hard disks and contains a huge amount of information, which involves information security issues.Generally, in a sending and receiving system, compression and encryption are performed by the sending end, and decryption and decompression are performed by the receiving end.The purpose of compression is to reduce the size of data, transmit as much data as possible and reduce storage space, and the purpose of encryption is to ensure the confidentiality of data and prevent information leakage [1].Therefore, the compressed information is stored before encryption, so the compressed information can be accessed before the encryption task is performed, so its security may be affected [2].While compressed sensing (CS) can realize encryption and compression at the same time [3], there is no such problem.
As a new signal sampling technology, CS has attracted widespread attention in many fields such as image processing and speech processing.Many CS based compression and encryption schemes have been developed.Zhang et al. proposed a chaotic system and two-dimensional fractional Fourier transform for image encryption, which resulted in good compression performance, reconstruction robustness and high security [4].In [5], Gong et al. first performed the Arnold transformation on the original image to reduce the blocking effect in the compression process, and then scrambled the compressed and sensed data to improve the reliability of the encryption algorithm.Zh et al. designed a picture embedding scheme based on adaptive threshold sparse algorithm and parallel compressed sensing to improve the quality of reconstructed pictures and the processing speed of compressive sensing [6].Combining compressive sensing and least significant bit (LSB) embedding [7], Chai proposed an effective visually meaningful image compression encryption scheme, which shows the effectiveness of the cryptographic system.At the same time, Chai et al. used the zigzag confusion to obfuscate the picture, and compress the picture into a cipher image, to then embed it in the carrier image using compressed sensing to obtain a visually secure cryptographic image.The algorithm is highly sensitive to plaintext images and has good visual security and data security [8].Ye et al. embed the singular value of the secret image into the singular value of the carrier image with a certain embedding strength to obtain the final visually meaningful encrypted image [9].In [10], two logistic maps are used to scramble the measurement matrix DCT and the sparse base matrix DWT, which improves the space of the key and increases the security.CS can also be used for encryption and recovery of speech signals.The article [11] used the contourlet transform to increase the sparsity of the CS required signal, and uses the randomness of the chaotic system and the high sensitivity of the initial conditions to design an effective speech encryption system.In [12], Haneche et al. designed a speech enhancement system based on compressive sensing, and the experimental results show the superiority of the method.In [2], Moreno-Alvarado et al. proposed a chaotic mixing method to generate the measurement matrix, which satisfies the encryption strength and saves the size of the shared key.As a traditional technique, the linear feedback shift register (LFSR) is widely used in information security owning its high-efficency in hardware.In [13], a random bit stream generator based on LFSR is used to generate the initial state matrix of the cellular automaton.The results show that the encryption system has good reconstruction performance, robustness to noise, and security against multiple forms of attacks.In [14], the article develop a CS-based energy-saving encryption scheme based on linear feedback shift registers.The experimental simulation results show that the system can resist various attacks.
There are several problems in the known audio signal compression and encryption methods.Firstly, in the traditional method, compression and encryption are performed sequentially.Compressed data will inevitably be touched before performing encryption tasks, which will cause data security issues.For encryption and then compression, this kind of scheme must use a lossless compression scheme to avoid the leakage of encrypted information.Secondly, most compression schemes based on CS use the entire measurement matrix as the key, which makes the key consumption and storage space too large.Furthermore, even if the key is used to generate the measurement matrix, it takes a lot of time to generate a large number of measurement matrices.Thirdly, in practical applications, common compressed measurement matrices, such as random Bernoulli matrices and random Gaussian matrices, can meet the conditions of compressive sensing with a high probability, but practical applications are very difficult.
Based on the above analysis, this paper uses CS to encrypt and decrypt the audio signal, which improves the security of the signal transmission process.At the same time, this paper proposes a method to generate a measurement matrix using LFSR, which is conducive to the hardware realization and reduce the size of the shared key [15].In addition, in order to improve the performance of meassurement matrix in compressive sensing, after generating the Toeplitz matrix, the inner product selection method is adopted [16] in the process of generating the measurement matrix.Furthermore, in order to reduce the time to generate the measurement matrix, this paper uses a parallel CS method [6].
The rest of this paper is organized as follows.The second section introduces compressed sensing, Toeplitz matrix, logistic map and evaluation parameters, the third section introduces the proposed compressive sensing scheme, and the fourth section is the analysis of simulation results.Finally, the conclusions are presented.

Preliminary Work 2.1. Compressive Sensing
The mathematical derivation of the compressive sensing can be expressed in the following formula [17]: where Φ,Ψ is measurement matrix and sparse transform matrix, respectively, x is the original signal and y is encryption signal x, s is the transformed sparse representation of signal.Figure 1 explains the frame of CS. x is the recovered signal and ŝ is the recovered sparse representation of x.
The recovery algorithm of the compressive sensing is to reconstruct sparse signals s from a small number of linear observations y.In order to solve this equation, the restricted isometry property (RIP) condition needs to be satisfied, with the RIP is as follows: for all k-sparse signals z, where δ k ∈ (0, 1).The original signal can be recovered using convex optimized pairs [6]: where ||s|| 0 is the l 0 norm of the vector s, which is a non-deterministic polynomial NP problem, which can be transformed into a convex optimization problem: where ||s|| 1 is the l 1 norm of the vector s.
The performance of compressive sensing mainly depends on the sparse base matrix, measurement matrix and recovery algorithm.This paper uses discrete cosine transformation (DCT) as sparse transform, modified Toeplitz matrix as measurement matrix.In addition, we adopt convex optimization method to recovery original signal.

Toeplitz Matrix
As a method of compressive sensing measurement matrix, the Toeplitz matrix has the advantage that few elements need to be stored or transmitted.In particular: is a Toeplitz matrix.For ∀i, j, θ ∈ N and 0 ≤ i + θ ≤ m − 1, 0 ≤ j + θ ≤ n − 1, there are t i+1,j+1 = t i,j .It can be easily found from ( 5) that the Toeplitz matrix can be directly determined by the elements in the first row and the first column.That is only m + n − 1 elements need to be transmitted for a m × n Toeplitz measurement matrix, and the elements along any diagonal parallel to the main diagonal (including the main diagonal) are the same [18].
LFSR is a mechanism that can generate a binary bit sequence.Its working principle is simply summarized, given the output of the previous state, the linear function of the output is used as the input again, and this cycle is performed.The XOR function is often used as the single-bit linear function [19].
To construct a Toeplitz matrix based on LFSR, the following conditions must be meet.The number of bits in the register are equal to the number of rows in the Toeplitz matrix, and the current value of the register represents the LFSR state.In the Toeplitz matrix T m×n , each column of the Toeplitz matrix is a continuous LFSR state of length m.The number of matrix columns n are the total number of LFSR states.Therefore, the LFSR-based Toeplitz matrix is constructed as follows: (1) firstly, initialize the first column of the matrix, that is, determine the initial state of the LFSR; (2) then, move each column of the Toeplitz matrix down one unit, i.e., the LFSR moves one unit to the right; (3) next, update the first element of the current column by adding all XOR values obtained by XORing the elements of the previous column and the corresponding elements of the feedback polynomial (when the feedback polynomial of LFSR is a primitive polynomial, its output is a m-sequence); (4) finally, repeat (2) and ( 3) until all elements in the last column of the matrix are determined.
The elements of the primitive polynomial of degree m are XORed with the corresponding position elements of the previous column, and then the sum of all XOR values obtained is determined to the top elements of all columns except the first column.Suppose a primitive polynomial of degree m, and the corresponding coefficients except constant coefficient are P = (P 0 , P 1 , . . ., P m−1 ), the current status of LFSR is S 0 = (S 0,0 , S 1,0 , . . ., S m−1,0 ).The top element of the jth-LFSR state is Once the top element of all columns are determined, the entire Toeplitz can be determined [7]. Figure 2 shows the generation process of Toeplitz matrix based on LFSR.The LFSR-based Toeplitz matrix only needs to store m elements of the first column, not the sum of the elements m + n − 1 of the first row and the first column.Therefore, this method saves the required hardware memory resources.The random generation of the first column of the matrix and the random selection of the primitive polynomial also provide the randomness of the Toeplitz matrix construction method at the same time.
The following is an example to illustrate the Toeplitz matrix generation process.The initial value of the shift register is S 0 = (1, 0, 0, 0, 1).The primitive polynomial P(x) is x 5 + x 2 + 1 and its corresponding coefficients vector is P = (0, 1, 0, 0, 1).The number of columns in the Toeplitz matrix is 6.By shifting down we can obtain that the second to fifth elements of the second LFSR state are 1000.The first element of the second LFSR state is S 0 • P T = 1.Then the second LSFR state S 1 = (1, 1, 0, 0, 0).Similarly, by shifting down we can obtain that the second to fifth elements of the third LFSR state are 1100.The first element of the third LFSR state is S 1 • P T = 1.Then the third state of LFSR is S 2 = (1, 1, 1, 0, 0).Repeat above steps until the number of columns in the Toeplitz matrix reachs 6.The generated Toeplitz matrix follows that: After generating the Toeplitz matrix, we use the algorithm proposed in [8] to perform inner product processing to produce measurement matrix pool (MMP) of size m × 2n, where m n is the compression rate of compressive sensing.The following Algorithm 1 performs this process.
Step 1 Take the first column of Toeplitz as the i-th (i = 1) column of the measurement matrix pool.
Step 2 Starting from the second column of the Toeplitz matrix, each column has an inner product with the first column.If the inner product is 0, the corresponding column is put into the measurement matrix pool as the (i + 1)-th column.If it is not 0, discard it.Step 3 Repeat Step 2 until a 2n columns measurement matrix pool is generated.

Logistic Map
In this paper, we also use the logistic map to select the measurement matrix from measurement matrix pool.The logistic map is defined as [20]: Algorithm 2 is the selection method proposed in [4], named parallel compression sensing (PCS).
Step 1 Use logistic map to generate a 2cn points sequence S = {s 1 , s 2 , • • • , s 2cn } and convert S to c × 2n index matrix.Where c and 2n correspond to the total number of measurement matrices, the number of columns in the measurement matrix pool.
Step 2 Let I i the i-th row of the index matrix.Rearrange I i from big to small to form Sort(I i ).Take the original index position of the first n elements of Sort(I i ) in I i , record as W i .
Step 3 Take out the corresponding column from the matrix pool according to W i to form the measurement matrix.
Step 4 Repeat Steps 2 and 3 until all c measurement matrices are generated.

NSCR, UACI, MSE and PCCs
The following introduces the indicators for evaluating the strength of audio encryption and the quality of recovery.
The sampling rate of change (NSCR), determines the number of samples to be changed, and its formula is given as: The average change intensity (UACI) determines the average number of encrypted speech intensity changes, defined as: where x i and xi are the i-th sample of two cyphered audio signals, whose original version is only different in one sample, and N represents the length of the audio frame.Mean Square Error (MSE), a parameter of the quality of speech recovery, its physical meaning is the normalized mean square error between the original signal and the decoded signal, which is calculated as follows: where x o (i) and x d (i) are the original and recovered signals, respectively.Person correlation coefficient (PCCs), used to evaluate the similarity between the original signal and the decoded signal, is as follows: where

Proposed Algorithm
Figure 3 and 4 below are the overall block diagram of the solution proposed in this article.Figure 3 is the encryption block diagram and Figure 4 is the recovery block diagram.The following Table 1 is the parameters in Figures 3 and 4.  As shown in Figure 3, the sender's compression and encryption process is as Algoritm 3.
The sender and the receiver share the key k 1 and k 2 through public key cryptography system such as RSA, ECC, etc.In this scheme, different frame use different CS measurement matrix for higher security, which will increase the data processing time, but the use of PCS and the simplicity of LFSR-based measurement matrix can make up for this problem.The security of the above encryption framework has been proven in [2].As long as the key is only used once, the security of the entire encryption system can be guaranteed.
Contrary to the compression and encryption process, the decryption and decompression of receiver is shown in Figure 4 and Algorithm 4.   Step 1 The receiver obtains the key k 1 and k 2 after decrypting the public key cryptosystem.
Step 2 k 1 is used to generate the same Toeplitz matrix.
Step 3 Using MMP algorithm to generate the corresponding measurement matrix pool.
Step 4 k 2 is the initial value of the logistic mapping, Use PCS algorithm to generate the measurement matrix Φ i corresponding to each audio frame.
Step 5 y i is the encrypted voice frame sent by sender.The convex optimization method is used to recover ŝi , then the IDCT transform is used to generate xi , and the original audio signal x is finally recovered.

Results
In this scheme, we use a 20 KHz audio signal as experimental object, and a frame is 20 ms.There are 400 sample values in a frame, and a total of 100 frames are selected.In this experiment, the operating environment of MATLAB R2018a (64 bit) is 11th Gen Intel(R) Core(TM) i5-1135G7, CPU 2.40GHZ and memory 8.00GB, and the operating system is Microsoft Windows 10.

MSE and MSEError
Using the propsed scheme, we calculate the MSE under condition of m/n equal to 0.5, 0.7, 0.8 and 0.9, respectively.The results are shown in Figure 5a-d, where the data recovered using the correct measurement matrix.The data recovered using the error measurement matrix, called MSEError, is also shownn in the figure.It can be clearly seen that MSEError fluctuates between 0.75 and 1.5 when m/n = 0.5, fluctuates between 0.5 and 1.5 when m/n = 0.7, fluctuates between 0.75 and 1.75 when m/n = 0.8 and 0.9.At the same time, the MSE under various compression conditions always remains at 0 and slightly fluctuates.

PCCs and PCCsError
The data recovered from the correct and wrong measurement matrix, PCCs and PCCsError, are shown in Figure 6a-d under the condition of m/n equal to 0.5, 0.7, 0.8 and 0.9, respectively.It can be clearly seen that PCCsError fluctuates around 0 and PCCs is always maintained at 1 with a slight fluctuation when m/n = 0.5, 0.7, 0.8, 0.9.At the same time, the MSEError fluctuates between 0.75 and 1.8. Figure 8a,b is the comparison of the original signal and the recovery signal in the case of a frame of signal.Figure 8a is in the case of m/n = 0.5.It can be seen that the original signal and the recovery signal have obvious positions that do not overlap in time domain and frequency domain.While in the case of m/n = 0.8, this phenomenon is significantly reduced in Figure 8b.The NSCR, UACI, MSE and PCCs obtained by the our scheme and the scheme proposed in Moreno-Alvarado [2] under the conditions when m/n = 0.5, 0.7, 0.8, 0.9, 1 are shown in Table 2.In the work by Moreno-Alvarado [2], the measurement matrix is constructed using a Gaussian random number generator and a chaotic mixing scheme.It can be seen that the scheme proposed in this paper performs well on both UACI and NSCR.Our scheme has a certain improvement in the recovery of voice signal quality through the comparison of MSE indicators in different CR.The signal quality is improved by about 5%.At the same time, through the comparison of PCCs indicators, our scheme also has a better performance.

Running Time Analysis
Figure 9 shows the comparison of the running time in the entire encryption and decryption process between the method proposed in this article and the method proposed by Moreno-Alvarado [2].It can be seen that our algorithm saves about 8% in time, which is due to the use of parallel compressive sensing algorithms.

Key Space Analyses
In terms of the size of the shared key, our scheme requires both parties to share the keys k 1 and k 2 .Among them, k 1 is the initial value of generating the Toeplitz matrix and its length is m, and k 2 is the initial value of the logistic mapping µ and x n .So using our scheme to generate the measurement matrix instead of sharing the entire measurement matrix greatly reduces the key size.In addtion, not only the initial value key1 generated by the measurement matrix needs to be shared, but also the permutation parameter key2 and the each frame also needs a random number of permutations key3 to ensure that the measurement matrix is generated randomly in [2], the shared key size in our scheme is more advantageous.

Robustness Analysis 4.5.1. Occlusion Attack
Robustness against occlusion in encryption systems is an important indicator.Furthermore, the loss of voice encryption samples has a bad impact on the recovery of the voice signal.The following Figure 10 shows the time domain and frequency domain recovery when the encrypted sample is intact in the encrypted sample, and the encrypted sample is lost by 1%, 5%, and 10%.Their corresponding signal-to-noise ratio (SNR) is 28.3123 dB, 25.2313 dB, 15.2321 dB respectively (the SNR is 32.8097 dB when the encrypted sample is no lost).It is very obvious that as the number of missing encrypted voice signal samples increases, the recovered voice signal becomes worse and worse.However, the recovered voice signal is still similar to the original signal.This shows that the voice encryption system has a certain degree of robustness against occlusion.

Noise Attack
The encrypted signal is transmitted in the channel and is often affected by various noises, such as Gaussian noise (GN), pepper noise (SPN).When the Gaussian noise intensity is 0.001%, 0.003%, 0.005%, 0.009%, the corresponding SNR is 32.8083 dB, 32.7103 dB, 32.5832 dB, 32.5353 dB, respectively.The value of SNR changes very slightly.The following Figure 11 shows the performance of the recovered speech signal in the time domain and frequency domain when the Gaussian noise intensity of the speech encryption sample is 0.001%, 0.003%, 0.005% and 0.009%, respectively.Analyzing from Figure 11 and the data, GN has little effect on speech recovery, indicating that our scheme has anti-interference ability against Gaussian noise.

Discussion
The solution in this paper is mainly to use the feature of compressive sensing that can be encrypted and compressed at the same time to improve the security of the system, and to use the feature of the Toplitz matrix to reduce the size of the shared key.
In the scheme, we use a public key cryptography system such as RSA, ECC to share secret key k 1 and k 2 , which can be realized before the sending audio signal.Therefore, RSA or ECC has no impact on the performance of the proposed scheme.
In the scheme, we use a one-dimensional logistic map to select the measurement matrix, which may bring some security problems by the method of phase space reconstruction.In fact, the proposed scheme is a general framework, we can use a complex high-dimensional chaotic system to obtain better security.
In addition, the design of sparse basis matrix is also a method to improve system security, which will be studied in the future.

Conclusions
This article uses compressive sensing to realize the compression and encryption of audio signals.The key is shared in a confidential channel.At the sending end, the signal is divided into frames and then processed by DCT, and the measurement matrix generated by the LFSR-based Toeplitz matrix and the inner product selection.Since the measurement matrix required for each frame is different, this paper designs a parallel compressive sensing algorithm.At the decryption side, the shared key is used to generate the required

Figure 1 .
Figure 1.The frame of CS.
and xo,d denotes either xo or xd , x 2 o,d denotes either x 2 o or x 2 d .

Algorithm 3 :
ECS.Step 1 Secret key k 1 is used as the initial value of the LFSR, given by sender.The corresponding Toeplitz matrix based on LFSR can be generated by k 1 .Step 2 The MMP algorithm is used to generate the measurement matrix pool of size m × 2n from Toeplitz matrix.Step 3 The PCS algorithm is used to generate the different measurement matrix Φ i of size m × n from measurement matrix pool.The initial condition of Logical mapping is obtained from secret key k 2 , which is the hash function output of original signal.Step 4 Framing the original speech signal x into x i with frame length of 400, then DCT transform x i to produce s i .Step 5 y i is generated by compressed sensing of Φ i and s i , then sent to receiver.
Figure 7a,b below shows the MSE and PCCs in the uncompressed state.It can be clearly seen that PCCsError fluctuates around 0 and PCCs is always maintained at 1 with a slight fluctuation when m/n = 1.At the same time, the MSE always remains at 0 and slightly fluctuates while MSEError fluctuates between 0.75 and 1.75 when m/n = 1.

8 Figure 8 .
Figure 8.(a) Comparison of the original signal and the recovery signal in the case of a frame of signal and m/n = 0.5, the top is the time domain and the bottom is the frequency domain.(b) Comparison of the original signal and the recovery signal in the case of a frame of signal and m/n = 0.8, the top is the time domain and the bottom is the frequency domain.

Figure 9 .
Figure 9.The percentage of time that our proposed algorithm saves compared to the Moreno-Alvarado [2] scheme is under different compression ratios.

( a )Figure 10 .Figure 11 .
Figure 10.(a) m/n = 0.7, the time domain and frequency domain recovery of the encrypted sample without losing signal.(b) m/n = 0.7, the time domain and frequency domain recovery of the signal when the encrypted sample loses 1%.(c) m/n = 0.7, the time domain and frequency domain recovery of the signal when the encrypted sample is lost 3%.(d) m/n = 0.7, the time domain and frequency domain recovery of the signal when 5% of the encrypted sample is lost.

Table 1 .
Parameters and corresponding explanations ŝi DCT domain signal of the i-th frame recovered by Receiver xi time domain signal of the i-th frame recovered by Receiver x original signal recovered by Receiver