A Multi-Image Encryption Based on Sinusoidal Coding Frequency Multiplexing and Deep Learning

Multi-image encryption technology is a vital branch of optical encryption technology. The traditional encryption method can only encrypt a small number of images, which greatly restricts its application in practice. In this paper, a new multi-image encryption method based on sinusoidal stripe coding frequency multiplexing and deep learning is proposed to realize the encryption of a greater number of images. In the process of encryption, several images are grouped, and each image in each group is first encoded with a random matrix and then modulated with a specific sinusoidal stripe; therefore, the dominant frequency of each group of images can be separated in the Fourier frequency domain. Each group is superimposed and scrambled to generate the final ciphertext. In the process of decryption, deep learning is used to improve the quality of decrypted image and the decryption speed. Specifically, the obtained ciphertext can be sent into the trained neural network and then the plaintext image can be reconstructed directly. Experimental analysis shows that when 32 images are encrypted, the CC of the decrypted result can reach more than 0.99. The efficiency of the proposed encryption method is proved in terms of histogram analysis, adjacent pixels correlation analysis, anti-noise attack analysis and resistance to occlusion attacks analysis. The encryption method has the advantages of large amount of information, good robustness and fast decryption speed.


Introduction
As the development of the Internet, networks and information systems are playing an increasingly important role in people's life, work and study. However, as people's expectations for informatization have deepened, a subject that cannot be ignored has been placed in front of people, that is, the security of information. The resolution of these security issues depends on the progress and development of information security technology. Therefore, the research on information security technology plays a vital role, which not only has academic value but also plays an important role in promoting the development of the entire human society. Data encryption technology based on optical theory and method is a new generation of information security theory and technology that has begun to develop internationally in recent years. Compared with traditional information security technology, optical information security technology has the following advantages: firstly, optical cryptography system has parallelism [1]. Secondly, optical cryptography systems usually have large key space. Thirdly, optical cryptography system has the characteristics of multi-dimension [2]. Some inherent parameters of the system, such as amplitude, phase, wavelength and optical element parameters, can be used as the key parameters of the optical cryptosystem to achieve multi-dimensional encryption. Therefore, optical information security technology has the characteristics of large capacity, fast storage speed, multi-dimensional parallel processing and so on and has unique advantages in data

Downsampling in Fourier Frequency Domain
If we convert the obtained ciphertext image to the Fourier domain after scrambling recovery, its spectrum can be expressed as: where [ ] •  represents the Fourier transform, ω  and φ represent the frequency and phase of the sinusoidal stripe, respectively. As shown in Figure 2, here we take m = 4 and n = 8 as an example. Each group is coded using four random matrices, followed by eight sinusoidal codes that move the frequency components of each group by different offsets; therefore, their dominant frequencies are staggered in Fourier domain. For each group, the shifted frequencies have two symmetrically conjugated positions (denoted by solid and dashed circular of the same color). Assuming that there are many n × m plaintext images, the encryption process is described as follows: 1.
All the n × m plaintext images are divided into n groups; the plaintext images in each group and the sinusoidal code corresponding to each group of the plaintext images are successively sent to a spatial light modulator (SLM1) for display.

2.
The L2 and L3 lenses form the 4F system, and the hole P is located on the spectral plane of the 4F system, which is used to extract the zero-order frequency after SLM1 and reduce the influence of errors caused by other orders. The random matrix corresponding to each plaintext image is uploaded to SLM2 for coding. 3.
The superimposed light intensity recorded by the CCD is the time integral of the light field reaching the target surface within a certain period of time. Therefore, we make the exposure time of the CCD equal to the sum of the encoding time of all images in the SLM. In addition, their starting time should be synchronized, and the encoding time of each image should also be the same, which can be expressed as follows: Here I( → p ) represents the ciphertext, s j ( → p ) represents the j-th specially designed sinusoidal stripe, r i ( → p ) represents the i-th random matrix, I ij ( → p ) represents the plaintext image, and ∑ (·) represents the sum of the elements. 1.
According to the number of pixels with superimposed light intensity, an integer random sequence without repeating elements is generated. Secondly, replace the light intensity value on each pixel of the superimposed light intensity according to the value of the random sequence, so as to realize the scrambling operation. Scramble (S) [ If we convert the obtained ciphertext image to the Fourier domain after scrambling recovery, its spectrum can be expressed as: where F[•] represents the Fourier transform, → ω and φ represent the frequency and phase of the sinusoidal stripe, respectively. As shown in Figure 2, here we take m = 4 and n = 8 as an example. Each group is coded using four random matrices, followed by eight sinusoidal codes that move the frequency components of each group by different offsets; therefore, their dominant frequencies are staggered in Fourier domain. For each group, the shifted frequencies have two symmetrically conjugated positions (denoted by solid and dashed circular of the same color).

Downsampling in Fourier Frequency Domain
If we convert the obtained ciphertext image to the Fourier domain after scrambling recovery, its spectrum can be expressed as: represents the Fourier transform, ω  and φ represent the frequency and phase of the sinusoidal stripe, respectively. As shown in Figure 2, here we take m = 4 and n = 8 as an example. Each group is coded using four random matrices, followed by eight sinusoidal codes that move the frequency components of each group by different offsets; therefore, their dominant frequencies are staggered in Fourier domain. For each group, the shifted frequencies have two symmetrically conjugated positions (denoted by solid and dashed circular of the same color).  Since most energy of the plaintext images and the random matrix concentrates at low frequencies, can be extracted via the operation ε as follows: Here the operation ε includes two steps: first extracting the Fourier modulus from I( → p ), and then padding its surroundings with zeros to keep the original pixel resolution. Then, I j is taken as the input to the network, and the output is m plaintext images. The detailed process of deep neural network decryption will be described in the next section. So far, we can extract and decrypt m plaintext images from each group of I j .

The Network Structure
Deep learning (DL) is a powerful tool in many areas. In the aspect of network structure, this paper adopts the classic U-Net [35] network structure, which is applied to plaintext reconstruction after modifying the output layer and loss function of U-Net. The input of the network is a single-channel two-dimensional image, and the final output is an m-channel two-dimensional image after passing through five down-sampling convolutional layers and five up-sampling deconvolution layers. Each channel represents a plaintext image. Its network structure and specific parameters of each layer are shown in Figure 3. can be extracted via the operation ε as follows: Here the operation ε includes two steps: first extracting the Fourier modulus from ( ) I p   , and then padding its surroundings with zeros to keep the original pixel resolution. Then, Ij is taken as the input to the network, and the output is m plaintext images. The detailed process of deep neural network decryption will be described in the next section. So far, we can extract and decrypt m plaintext images from each group of Ij.

The Network Structure
Deep learning (DL) is a powerful tool in many areas. In the aspect of network structure, this paper adopts the classic U-Net [35] network structure, which is applied to plaintext reconstruction after modifying the output layer and loss function of U-Net. The input of the network is a single-channel two-dimensional image, and the final output is an m-channel two-dimensional image after passing through five down-sampling convolutional layers and five up-sampling deconvolution layers. Each channel represents a plaintext image. Its network structure and specific parameters of each layer are shown in Figure 3. In the process of neural network training, the datasets play an important role. A high quality dataset can often improve the quality of model training, speed up the progress of training and improve the final output results. Considering the particularity of the image encryption method proposed in this paper, we choose the self-made dataset, in which the data pair consists of m plaintext images and ciphertext. In this paper, 15,000 images are selected from MNIST handwritten dataset as the plaintext images, and then encrypted according to the encryption method mentioned above, so as to obtain the ciphertextplaintext data pairs.
In order to better restore the plaintext images, mean-square error (MSE) is utilized and defined as follows: where M and N represent the width and height of the image respectively, and i x and i y represent the output value of the last layer of the network and the truth value of the original image respectively. Since the last layer of the modified U-Net network is M-channel, the loss function is: We further optimize the loss function and the benefit of this is to improve the training ability of the network. Specifically, the m plaintext images output from the network are In the process of neural network training, the datasets play an important role. A high quality dataset can often improve the quality of model training, speed up the progress of training and improve the final output results. Considering the particularity of the image encryption method proposed in this paper, we choose the self-made dataset, in which the data pair consists of m plaintext images and ciphertext. In this paper, 15,000 images are selected from MNIST handwritten dataset as the plaintext images, and then encrypted according to the encryption method mentioned above, so as to obtain the ciphertextplaintext data pairs.
In order to better restore the plaintext images, mean-square error (MSE) is utilized and defined as follows: where M and N represent the width and height of the image respectively, and x i and y i represent the output value of the last layer of the network and the truth value of the original image respectively. Since the last layer of the modified U-Net network is M-channel, the loss function is: We further optimize the loss function and the benefit of this is to improve the training ability of the network. Specifically, the m plaintext images output from the network are encrypted again according to the proposed encryption method, and then the MSE is calculated with the real ciphertext, which can be expressed as: where I( → p ) represents the real ciphertext, and I ( → p ) represents the reconstruction ciphertext that is re-encrypted using the plaintext image output from the network. Therefore, the total loss function is: In the training process, the learning rate is set to 0.001 and the Adam optimizer [36] is used to optimize and update the parameters of the network. The number of training epoch is 50. All programs run in Python 3.7 with NVIDIA GeForce GTX 3060 GPU for acceleration.

Experiment Results
In order to prove the feasibility of our method in encryption and the superiority of decryption, we conduct numerical simulation experiments to verify it. In the process of encryption, 32 images are selected from MNIST handwritten digital dataset and divided into eight groups, with four images in each group, and the resolution is 256 × 256. Figure 4 shows the encryption process for multiple images.
Sensors 2021, 21, x FOR PEER REVIEW 6 of 15 encrypted again according to the proposed encryption method, and then the MSE is calculated with the real ciphertext, which can be expressed as: where ( ) I p  represents the real ciphertext, and ' ( ) I p  represents the reconstruction ciphertext that is re-encrypted using the plaintext image output from the network. Therefore, the total loss function is: In the training process, the learning rate is set to 0.001 and the Adam optimizer [36] is used to optimize and update the parameters of the network. The number of training epoch is 50. All programs run in Python 3.7 with NVIDIA GeForce GTX 3060 GPU for acceleration.

Experiment Results
In order to prove the feasibility of our method in encryption and the superiority of decryption, we conduct numerical simulation experiments to verify it. In the process of encryption, 32 images are selected from MNIST handwritten digital dataset and divided into eight groups, with four images in each group, and the resolution is 256 × 256. Figure  4 shows the encryption process for multiple images. The first line (a-d) in Figure 4 shows the plaintext images in the first group, Figure  4e represents one of the corresponding four random matrices, and Figure 4f-h represent the images encoded by random codes and sinusoidal stripes of the first group, the 8 groups of superimposed ciphertext image and the scrambled ciphertext images respectively. It can be found intuitively that it is almost impossible to detect any information of the original image from the ciphertext.
As shown in Figure 5, the detailed decryption process is described as follows: 1. The pixel location of the ciphertext is rearranged by the correct index keys to get the superimposed images; The first line (a-d) in Figure 4 shows the plaintext images in the first group, Figure 4e represents one of the corresponding four random matrices, and Figure 4f-h represent the images encoded by random codes and sinusoidal stripes of the first group, the 8 groups of superimposed ciphertext image and the scrambled ciphertext images respectively. It can be found intuitively that it is almost impossible to detect any information of the original image from the ciphertext.
As shown in Figure 5, the detailed decryption process is described as follows: 1.
The pixel location of the ciphertext is rearranged by the correct index keys to get the superimposed images; 2.
Fourier transform is applied to the superimposed ciphertext image and appropriate down-sampling is carried out according to the specific spectrum distribution of each group; 3.
Its surroundings are padded with zeros to keep the original pixel resolution;

4.
Inverse Fourier transform is carried out and it is fed into the trained U-Net network. 2. Fourier transform is applied to the superimposed ciphertext image and appropriate down-sampling is carried out according to the specific spectrum distribution of each group; 3. Its surroundings are padded with zeros to keep the original pixel resolution; 4. Inverse Fourier transform is carried out and it is fed into the trained U-Net network. The correlation coefficient (CC) is used to calculate the similarity between the original plaintext image and the decrypted image, which is defined as follows: where { } E  denotes the expected value operator, and σ is the standard deviation of the corresponding image. It and I are the original plaintext images and decrypted images, respectively. The closer CC value is to 1.0, the better the quality of the reconstructed images.
The decryption results are shown in Figure 6, with CC of 0.9939, 0.9903, 0.9951, and 0.9966, respectively. Obviously, these final decrypted images with high quality are very similar to the corresponding plaintext images. At the same time, as the deep neural network is used to decrypt, the decryption time is very fast. Using Intel(R) Core(TM) i7-9700K CPU without using GPU acceleration, the entire decryption process can be completed in only 3.15 s. The decryption results for all groups are shown in Figure 7. The correlation coefficient (CC) is used to calculate the similarity between the original plaintext image and the decrypted image, which is defined as follows: where E{·} denotes the expected value operator, and σ is the standard deviation of the corresponding image. I t and I are the original plaintext images and decrypted images, respectively. The closer CC value is to 1.0, the better the quality of the reconstructed images. The decryption results are shown in Figure 6, with CC of 0.9939, 0.9903, 0.9951, and 0.9966, respectively. 2. Fourier transform is applied to the superimposed ciphertext image and appropriate down-sampling is carried out according to the specific spectrum distribution of each group; 3. Its surroundings are padded with zeros to keep the original pixel resolution; 4. Inverse Fourier transform is carried out and it is fed into the trained U-Net network.  Obviously, these final decrypted images with high quality are very similar to the corresponding plaintext images. At the same time, as the deep neural network is used to decrypt, the decryption time is very fast. Using Intel(R) Core(TM) i7-9700K CPU without using GPU acceleration, the entire decryption process can be completed in only 3.15 s. The decryption results for all groups are shown in Figure 7. Obviously, these final decrypted images with high quality are very similar to the corresponding plaintext images. At the same time, as the deep neural network is used to decrypt, the decryption time is very fast. Using Intel(R) Core(TM) i7-9700K CPU without using GPU acceleration, the entire decryption process can be completed in only 3.15 s. The decryption results for all groups are shown in Figure 7.
In addition, we use some plaintext images that do not belong to the training set for reconstruction to test the generalization of the encryption model proposed in this paper. For convenience, Handwritten English alphabets are used for testing and the reconstructed plaintext images are shown in Figure 8.  In addition, we use some plaintext images that do not belong to the training set for reconstruction to test the generalization of the encryption model proposed in this paper. For convenience, Handwritten English alphabets are used for testing and the reconstructed plaintext images are shown in Figure 8. Although the network has been trained with MNIST handwritten datasets, it can perform high-quality reconstruction of handwritten English alphabet, which indicates that the U-Net networks can learn the correspondence between ciphertext images and original plaintext images very well.
In order to further verify the feasibility of the encryption method, we use FEI FACE Database [37] for training, in which the images are gray images with more complex content and richer details. In the process of encryption, four face images are used and divided into two groups, according to the encryption steps mentioned above. By reducing the   In addition, we use some plaintext images that do not belong to the training set for reconstruction to test the generalization of the encryption model proposed in this paper. For convenience, Handwritten English alphabets are used for testing and the reconstructed plaintext images are shown in Figure 8. Although the network has been trained with MNIST handwritten datasets, it can perform high-quality reconstruction of handwritten English alphabet, which indicates that the U-Net networks can learn the correspondence between ciphertext images and original plaintext images very well.
In order to further verify the feasibility of the encryption method, we use FEI FACE Database [37] for training, in which the images are gray images with more complex content and richer details. In the process of encryption, four face images are used and divided into two groups, according to the encryption steps mentioned above. By reducing the Although the network has been trained with MNIST handwritten datasets, it can perform high-quality reconstruction of handwritten English alphabet, which indicates that the U-Net networks can learn the correspondence between ciphertext images and original plaintext images very well.
In order to further verify the feasibility of the encryption method, we use FEI FACE Database [37] for training, in which the images are gray images with more complex content and richer details. In the process of encryption, four face images are used and divided into two groups, according to the encryption steps mentioned above. By reducing the number of encrypted images, each group of images can obtain a large sampling rate in the Fourier frequency domain to ensure high quality plaintext reconstruction. The decryption result is shown in Figure 9. number of encrypted images, each group of images can obtain a large sampling rate in the Fourier frequency domain to ensure high quality plaintext reconstruction. The decryption result is shown in Figure 9.

Algorithm Analysis
We know that the ciphertext and the key will inevitably be attacked or changed in the process of transmission, such as data missing, affected by the noise and so on. A good information security system can not only guarantee the confidentiality of information but also ensure the integrity of information decryption, so the information encryption system proposed by researchers is required to have good security and robustness.

Key Security Analysis
The proposed encryption method has great security, even if the attacker knows what kind of network structure to use and the corresponding encryption method, and tries to attack the encryption system through training, but the use of random matrix in the encryption process is unknown, based on the wrong random matrix training network leading to decryption failure. The top row of Figure 10 shows the decryption results under the correct random matrix, and the bottom row shows four images that failed to decrypt based on incorrect random matrix training.

Algorithm Analysis
We know that the ciphertext and the key will inevitably be attacked or changed in the process of transmission, such as data missing, affected by the noise and so on. A good information security system can not only guarantee the confidentiality of information but also ensure the integrity of information decryption, so the information encryption system proposed by researchers is required to have good security and robustness.

Key Security Analysis
The proposed encryption method has great security, even if the attacker knows what kind of network structure to use and the corresponding encryption method, and tries to attack the encryption system through training, but the use of random matrix in the encryption process is unknown, based on the wrong random matrix training network leading to decryption failure. The top row of Figure 10 shows the decryption results under the correct random matrix, and the bottom row shows four images that failed to decrypt based on incorrect random matrix training.
number of encrypted images, each group of images can obtain a large sampling rate in the Fourier frequency domain to ensure high quality plaintext reconstruction. The decryption result is shown in Figure 9.

Algorithm Analysis
We know that the ciphertext and the key will inevitably be attacked or changed in the process of transmission, such as data missing, affected by the noise and so on. A good information security system can not only guarantee the confidentiality of information but also ensure the integrity of information decryption, so the information encryption system proposed by researchers is required to have good security and robustness.

Key Security Analysis
The proposed encryption method has great security, even if the attacker knows what kind of network structure to use and the corresponding encryption method, and tries to attack the encryption system through training, but the use of random matrix in the encryption process is unknown, based on the wrong random matrix training network leading to decryption failure. The top row of Figure 10 shows the decryption results under the correct random matrix, and the bottom row shows four images that failed to decrypt based on incorrect random matrix training.  It can be seen that if the wrong random matrix is used for training, no useful information can be seen from the decryption results. This fully shows that the proposed encryption algorithm has good security.

Anti-Noise Attack Analysis
It is also necessary to evaluate the robustness of the encryption algorithm when the ciphertext is attacked by noise. In order to verify that the encryption method has good anti-noise ability, speckle noise and Gaussian noise are added in the process of network training to improve the robustness. The mean values of speckle noise and Gaussian noise added in the training process are all 0, and the variance is randomly selected in (0.05, 0.06, 0.07, 0.08, 0.1). At the same time, four different types of noise attacks are added into the ciphertext: Gaussian noise, speckle noise, raylrnd noise and salt and pepper noise. CC change curve of decrypted image under noise attack is shown in Figure 11.
It can be seen that if the wrong random matrix is used for training, no useful in mation can be seen from the decryption results. This fully shows that the proposed cryption algorithm has good security.

Anti-Noise Attack Analysis
It is also necessary to evaluate the robustness of the encryption algorithm when ciphertext is attacked by noise. In order to verify that the encryption method has g anti-noise ability, speckle noise and Gaussian noise are added in the process of netw training to improve the robustness. The mean values of speckle noise and Gaussian n added in the training process are all 0, and the variance is randomly selected in (0.05, 0 0.07, 0.08, 0.1). At the same time, four different types of noise attacks are added into ciphertext: Gaussian noise, speckle noise, raylrnd noise and salt and pepper noise. change curve of decrypted image under noise attack is shown in Figure 11. For Gaussian noise and speckle noise, the abscissa represents the variance parame It can be seen from Figure 11 that the proposed encryption model has a very good ab to resist Gaussian noise. Even when the variance is 0.5, CC is still higher than 0.9. At same time, when the network is attacked by other kinds of noise, although the decryp quality decreases with the increase of noise, the image can still be decrypted clearly.

Resistance to Occlusion Attacks
Next, we analyze the influence of the occlusion attack on the decryption resu Three different occlusion styles and their corresponding decrypted images are show Figure 12. The corresponding plaintext images are shown on the right side of Figure  from which the primary information of the original plaintext images can be recogn visually. For Gaussian noise and speckle noise, the abscissa represents the variance parameter. It can be seen from Figure 11 that the proposed encryption model has a very good ability to resist Gaussian noise. Even when the variance is 0.5, CC is still higher than 0.9. At the same time, when the network is attacked by other kinds of noise, although the decryption quality decreases with the increase of noise, the image can still be decrypted clearly.

Resistance to Occlusion Attacks
Next, we analyze the influence of the occlusion attack on the decryption results. Three different occlusion styles and their corresponding decrypted images are shown in Figure 12. The corresponding plaintext images are shown on the right side of Figure 12, from which the primary information of the original plaintext images can be recognized visually.

Correlation Analysis
The correlation of adjacent pixels reflects the correlation degree of pixel values at adjacent positions of the image. A secure encryption algorithm should reduce the degree of correlation between pixel values in adjacent positions of the image. The correlation of the image should include horizontal correlation, vertical correlation and diagonal correlation. The formula for calculating correlation coefficient is as follows: As shown Table 1, the correlation coefficients between adjacent pixels of plaintext images are all greater than 0.9, indicating a high correlation. In the ciphertext image, the average correlation coefficient of adjacent pixels in three directions is closer to 0. This means that the pixel distribution of the ciphertext image is very chaotic and there is no statistical correlation.

Correlation Analysis
The correlation of adjacent pixels reflects the correlation degree of pixel values at adjacent positions of the image. A secure encryption algorithm should reduce the degree of correlation between pixel values in adjacent positions of the image. The correlation of the image should include horizontal correlation, vertical correlation and diagonal correlation. The formula for calculating correlation coefficient is as follows: As shown Table 1, the correlation coefficients between adjacent pixels of plaintext images are all greater than 0.9, indicating a high correlation. In the ciphertext image, the average correlation coefficient of adjacent pixels in three directions is closer to 0. This means that the pixel distribution of the ciphertext image is very chaotic and there is no statistical correlation. At the same time, in order to show the correlation between adjacent pixels of the image intuitively, one of the four plaintext images is selected and the correlation analysis diagram of three directions is drawn. The correlation analysis results are shown in Figure 13. At the same time, in order to show the correlation between adjacent pixels of the image intuitively, one of the four plaintext images is selected and the correlation analysis diagram of three directions is drawn. The correlation analysis results are shown in Figure  13. The experimental results show that the adjacent pixels of the ciphertext image have low correlation in horizontal, vertical and diagonal directions, which reduces the statisti cal characteristics of pixel correlation, thus proving that the proposed method for image encryption can resist the statistical attack based on pixel correlation.

Histogram Analysis
In the case of image feature leakage, it is vital that the encryption scheme can resis statistical analysis. The histogram of the image shows the distribution of pixel values in the image, so the histogram is a key indicator reflecting the robustness of the image en cryption scheme [38]. The histograms of four plaintext images, ciphertext images, and de crypted images are shown in Figure 14. The experimental results show that the distribu tion of pixel values in the ciphertext image is significantly different from that of the plaintext image. It can be seen from the histograms of the four plaintext images that the encryption model has successfully changed the distribution of pixel values, removed the statistical characteristics of pixel values, and can effectively resist attacks based on statis tical analysis.  The experimental results show that the adjacent pixels of the ciphertext image have low correlation in horizontal, vertical and diagonal directions, which reduces the statistical characteristics of pixel correlation, thus proving that the proposed method for image encryption can resist the statistical attack based on pixel correlation.

Histogram Analysis
In the case of image feature leakage, it is vital that the encryption scheme can resist statistical analysis. The histogram of the image shows the distribution of pixel values in the image, so the histogram is a key indicator reflecting the robustness of the image encryption scheme [38]. The histograms of four plaintext images, ciphertext images, and decrypted images are shown in Figure 14. The experimental results show that the distribution of pixel values in the ciphertext image is significantly different from that of the plaintext image. It can be seen from the histograms of the four plaintext images that the encryption model has successfully changed the distribution of pixel values, removed the statistical characteristics of pixel values, and can effectively resist attacks based on statistical analysis. At the same time, in order to show the correlation between adjacent pixels of the image intuitively, one of the four plaintext images is selected and the correlation analysis diagram of three directions is drawn. The correlation analysis results are shown in Figure  13. The experimental results show that the adjacent pixels of the ciphertext image have low correlation in horizontal, vertical and diagonal directions, which reduces the statistical characteristics of pixel correlation, thus proving that the proposed method for image encryption can resist the statistical attack based on pixel correlation.

Histogram Analysis
In the case of image feature leakage, it is vital that the encryption scheme can resist statistical analysis. The histogram of the image shows the distribution of pixel values in the image, so the histogram is a key indicator reflecting the robustness of the image encryption scheme [38]. The histograms of four plaintext images, ciphertext images, and decrypted images are shown in Figure 14. The experimental results show that the distribution of pixel values in the ciphertext image is significantly different from that of the plaintext image. It can be seen from the histograms of the four plaintext images that the encryption model has successfully changed the distribution of pixel values, removed the statistical characteristics of pixel values, and can effectively resist attacks based on statistical analysis.

Analysis of the Number of Encrypted Images
The number of encrypted images in multiple image encryption will directly affect the application of the algorithm in practice. Next, we analyze the relationship between the number of encrypted images and the quality of decrypted images. As shown in Figure 15, taking four images per group, as the number of encrypted images increases, the decryption result decreases. However, when the number of encrypted images is 64, the content of the image can still be clearly distinguished, which cannot be achieved by the traditional multiimage encryption algorithm [21][22][23][24][25]. Due to the limitation of image size, the sampling ratio in the frequency domain is too small to achieve better decryption when the number of encrypted images is 128.

Analysis of the Number of Encrypted Images
The number of encrypted images in multiple image encryption will directly affect the application of the algorithm in practice. Next, we analyze the relationship between the number of encrypted images and the quality of decrypted images. As shown in Figure 15, taking four images per group, as the number of encrypted images increases, the decryption result decreases. However, when the number of encrypted images is 64, the content of the image can still be clearly distinguished, which cannot be achieved by the traditional multi-image encryption algorithm [21][22][23][24][25]. Due to the limitation of image size, the sampling ratio in the frequency domain is too small to achieve better decryption when the number of encrypted images is 128.

Conclusions
In this paper, we propose a multi-image encryption method based on deep learning and sinusoidal stripe coding frequency multiplexing. The CCD camera can detect the superposed image after grouping, random matrix coding and sinusoidal stripe modulation operation. Then the deep neural network is trained to learn the correspondence between the plaintext and the ciphertext. After the training, the ciphertext image is transmitted to the trained network for decryption after scrambling recovery, Fourier transform, downsampling and inverse Fourier transform. Compared with the previous multi-image encryption methods, the proposed encryption method has more encrypted images and faster decryption speed, which makes it more widely used. Moreover, theoretical analysis, numerical simulation experiment results and robustness test all verify the feasibility and safety of the proposed method. In future work, we will further optimize the encryption method and deep neural network structure to enable it to encrypt more general grayscale images. Furthermore, the Bayer matrix can be used to preprocess color images into grayscale images, which is expected to restore the original color of the decrypted image after the introduction of De-Mosaic algorithm, thereby realizing color image encryption.

Conclusions
In this paper, we propose a multi-image encryption method based on deep learning and sinusoidal stripe coding frequency multiplexing. The CCD camera can detect the superposed image after grouping, random matrix coding and sinusoidal stripe modulation operation. Then the deep neural network is trained to learn the correspondence between the plaintext and the ciphertext. After the training, the ciphertext image is transmitted to the trained network for decryption after scrambling recovery, Fourier transform, downsampling and inverse Fourier transform. Compared with the previous multi-image encryption methods, the proposed encryption method has more encrypted images and faster decryption speed, which makes it more widely used. Moreover, theoretical analysis, numerical simulation experiment results and robustness test all verify the feasibility and safety of the proposed method. In future work, we will further optimize the encryption method and deep neural network structure to enable it to encrypt more general grayscale images. Furthermore, the Bayer matrix can be used to preprocess color images into grayscale images, which is expected to restore the original color of the decrypted image after the introduction of De-Mosaic algorithm, thereby realizing color image encryption.