Improving Imaging Quality of Real-time Fourier Single-pixel Imaging via Deep Learning

Fourier single pixel imaging (FSPI) is well known for reconstructing high quality images but only at the cost of long imaging time. For real-time applications, FSPI relies on under-sampled reconstructions, failing to provide high quality images. In order to improve imaging quality of real-time FSPI, a fast image reconstruction framework based on deep learning (DL) is proposed. More specifically, a deep convolutional autoencoder network with symmetric skip connection architecture for real time 96 × 96 imaging at very low sampling rates (5–8%) is employed. The network is trained on a large image set and is able to reconstruct diverse images unseen during training. The promising experimental results show that the proposed FSPI coupled with DL (termed DL-FSPI) outperforms conventional FSPI in terms of image quality at very low sampling rates.


Introduction
Single pixel imaging (SPI) [1] illuminates the target scene with structured patterns (random or basis) and records data over time (using a photodetector) to reconstruct spatial information about a target scene. Fourier single pixel imaging (FSPI) is a type of SPI which employs Fourier basis patterns to acquire the Fourier spectrum of a target scene [2]. SPI approaches like differential imaging [3], normalized SPI [4] and frequency-locked SPI [5,6] all aim at increasing measurement signal-to-noise ratio (SNR). However, FSPI achieves better measurement SNR [2] to produce high-quality images. Compared to a basis scan strategy like Hadamard single pixel imaging (HSI), FSPI is known to be more efficient and performs well on under-sampled image reconstruction [7]. In its simplest form, FSPI uses a digital micromirror device (DMD) to project phase-shifted sinusoidal illumination patterns onto a target scene and collects back-scattered light using an ordinary photodiode. By using inverse Fourier transform (IFT), a high-quality target image can be reconstructed. FSPI has gained popularity due to its low-cost design, imaging under background noise, and ability to operate over a long spectral range. Owing to these benefits, FSPI is transitioning from laboratory towards practical applications [8].
To reconstruct high quality images, FSPI requires a large number of measurements (equal to number of pixels in the target image) to acquire sufficient spatial information, which increases its imaging time. The imaging time of FSPI is characterized by data acquisition time and image reconstruction time. Since image reconstruction in FSPI is merely an inverse transform, its image reconstruction time is very low and does not pose a problem as in conventional SPI [9]. The data acquisition time for FSPI primarily depends on the modulation rate of a spatial light modulator (SLM). At present, commercially available DMDs (commonly used SLM) can operate maximum at~22 kHz (fast FSPI [10]). Therefore, the imaging speed for FSPI is limited by the modulation rate of light modulator. In order to increase the imaging speed of FSPI, the only viable solution is to reduce its data acquisition time by capturing under-sampled images. For example, in practical applications, FSPI has been used for dynamic imaging [10,11] at~10 fps with 22 kHz modulation rate. To achieve this frame-rate, the images were reconstructed at a 2% sampling rate which deteriorated the image quality. This confirms that image acquisition time offsets the true potential of FSPI for real-time imaging by compromising image quality.
To reduce the imaging time in SPI, compressive sensing (CS) methods have been applied [12][13][14]. CS techniques have proved to be quite efficient in recovering an image from fewer (compressive) measurements [15]. FSPI is similar to a CS method as it can reduce the number of measurements by selecting only a portion of the Fourier spectrum where most natural images are sparse. By acquiring under-sampled images using FSPI, there is a need for a reconstruction algorithm that can improve image quality from under-sampled measurements, supporting real-time FSPI. One promising option is to consider deep learning (DL) for image reconstruction in under-sampled FSPI.
Recent years have seen a surge of interest in employing DL for computational imaging. DL approaches can extract distinctive features from a large dataset and have been successfully employed for unsupervised learning in many applications. Particularly, DL has been applied in image dehazing [16], object classification through scattering media [17], hidden human identification [18], phase imaging [19], and single-pixel video [20]. DL also has the potential to significantly enhance the performance of FSPI for real-time applications. For FSPI, the most relevant deep neural network model is the denoising autoencoder [21]. It has been observed that an under-sampled image reconstructed using FSPI contains blurring artifacts. To remove these artifacts and reconstruct high-quality image, a deep convolutional autoencoder network (DCAN) was employed with symmetric skip connections that learn an end-to-end mapping between under-sampled images and ground truth. In this way, the model is trained to remove different types of noise and blurring artifacts inherent in FSPI reconstruction, and retain fine image details.
This study demonstrates an imaging system that leverages the power of DL to reconstruct real-time high-quality 96 × 96 images. The proposed DCAN uses pairs of encoding and decoding layers connected by skip connections for improved image recovery and fast network convergence. The idea is to reduce acquisition time of FSPI by first acquiring under-sampled images at a 5-8% sampling rate, and then using our novel algorithm to reconstruct high-quality images with little computational cost to achieve higher frame rates. The proposed method can replace the conventional FSPI method for many real-time applications where a high-quality image is required at higher frame rates. Although work on increasing the frame rate of SPI has been done recently [22], FSPI still needs to make strides in this domain. Therefore, this work can provide guidelines for future application of DL in FSPI in this regard.

Fourier Single-Pixel Imaging
The imaging method of FSPI takes the Fourier transform as the basis. The idea is to capture the Fourier spectrum of a target scene by scanning the target scene with phase-shifting sinusoidal patterns and collect the back-scattered light using an ordinary photodiode. In this scheme, the method of 4-step phase shifting sinusoid was used to acquire the target image spectrum. This type of approach has proven to be robust against noise. The pattern for frequency pair F = (f x , f y ) across image plane is generated using the expression [2]: where a is the image intensity, and b is a contrast. The intensity back-scattered from the target scene integrated over the target can be given by: where r(x, y) is the reflectivity distribution across the target plane. Considering the environment noise and random reflections near the scene, the total response encapsulated by the detector is written as [2]: where k is associated with size of the detector [2], and R n is related to random light fluctuations around the detector. The following phase sequences are generated at a particular frequency to acquire the corresponding coefficients as: The phase shift between adjacent patterns is constant. By acquiring the response R φ for different phase values, a differential mechanism can be applied to cancel out noise, given by [2]: Further applying the inverse Fourier transform (IFT), the image reconstruction is given by: where r(x, y) is equal to the reconstructed image which is subsequently fed to the DCAN model. Through FSPI, the images are reconstructed at a very low sampling rate of 5-8%, allowing the DCAN to apply its learned model to improve image resolution and remove artifacts present in the under-sampled images.

Deep Learning Based FSPI
The proposed DCAN with symmetric encoding-decoding stages is shown in Figure 1. The network employs a convolutional layer (Conv2D) to extract features and remove corruptions using a set of trainable filters with a small receptive field. The encoding stages use 32 filters (5 × 5 × 1) and 64 filters (3 × 3 × 32). At the end of encoding stages, there is a single conv2D layer with 128 filters (3 × 3 × 64). The decoding stages use 64 filters (3 × 3 × 128) and 32 filters (3 × 3 × 64). The output is reconstructed using a single Conv2D filter (1 × 1 × 1). The network is initialized in an optimum state using Xavier initialization [23]. To accelerate the training process, every Conv2D layer is followed by a batch normalization (BN) layer [24]. The rectified linear unit (ReLU) nonlinear activation is used at every stage to avoid the vanishing gradient problem. The max-pooling layers are used to reduce dimensions and to provide transitional invariance. Conversely, the up-sampling layers restore the image resolution during decoding. To mitigate data over-fitting, l2-regularization (with same weights for all the layers) is used. During training, when the image data passes down the network pipeline, many smaller details are lost due to pooling and convolutional operations. To better reconstruct images, skip connections are used to traverse feature information between encoding and decoding stages which recovers important details and propagates gradients to deeper layers. The network architecture is carefully designed and fine-tuned to improve image quality with low computational time (for image reconstruction). After reconstructing under-sampled images via FSPI, the images are sent down the network pipeline for quality improvement. If r(x, y) is the target image, then the target captured by FSPI using under-sampled measurements is a corrupted version of the target image, given by: where r(x,y) is the clean image, r(x, y) is the under-sampled image, H represents a degradation loss function, and n is the noise. Here, DL is chosen for solving the ill-posed inverse problem of estimating the original image from an under-sampled image. To achieve this, the network is trained to learn an end-to-end mapping from r(x, y) to r(x, y). For the reconstructed targetr(x, y), the loss function that The network is fed with an under-sampled image reconstructed from FSPI explained in the above section. The reconstruction from under-sampled inputs through DCAN is depicted in Figure 1. To update network parameters and minimize loss, Adam optimization [25] was used with standard back propagation. The base learning rate (lr = 10 −4 ) for all the layers was set to be the same.
The network was trained on STL-10 [26] DL dataset which contains 96 × 96 size images. All images were converted to gray scale and normalized before training. The training was performed on 10,000 unlabeled images. A test set (of 1000 images) was used to verify network performance during training, and a validation set (2000 images) was used to test the performance of the final model. Keras with TensorFlow was used to implement our model on an Intel i7 CPU (Integration Lenovo, Beijing, China) with 16 GB RAM. To update network parameters and minimize loss, Adam optimization [25] was used with standard back propagation. The base learning rate (lr = 10 -4 ) for all the layers was set to be the same. The network was trained on STL-10 [26] DL dataset which contains 96 × 96 size images. All images were converted to gray scale and normalized before training. The training was performed on 10,000 unlabeled images. A test set (of 1,000 images) was used to verify network performance during training, and a validation set (2,000 images) was used to test the performance of the final model. Keras with TensorFlow was used to implement our model on an Intel i7 CPU (Integration Lenovo, Beijing, China) with 16 GB RAM.

Simulations
To observe how image quality deteriorates in FSPI under-sampled reconstruction, FSPI reconstruction was simulated for two test images i.e., Lena and cameraman, for different sampling ratios. The sampling rate (or ratio) 'S' (in percent) is the ratio between the number of measurements to image size in pixels multiplied by 100. The reconstruction results are shown in Figure 2. It can be seen that the image reconstruction quality for FSPI is very clear even for sampling ratios S ≤ 50%. However, for real-time applications, the FSPI reconstruction is usually based on S < 10% [10,11]. To observe image quality within 1-10% range, cameraman image is simulated for S = 1 to 10%, shown in Figure 3.

Simulations
To observe how image quality deteriorates in FSPI under-sampled reconstruction, FSPI reconstruction was simulated for two test images i.e., Lena and cameraman, for different sampling ratios. The sampling rate (or ratio) 'S' (in percent) is the ratio between the number of measurements to image size in pixels multiplied by 100. The reconstruction results are shown in Figure 2. It can be seen that the image reconstruction quality for FSPI is very clear even for sampling ratios S ≤ 50%. However, for real-time applications, the FSPI reconstruction is usually based on S < 10% [10,11]. To observe image quality within 1-10% range, cameraman image is simulated for S = 1 to 10%, shown in Figure 3. To update network parameters and minimize loss, Adam optimization [25] was used with standard back propagation. The base learning rate (lr = 10 -4 ) for all the layers was set to be the same. The network was trained on STL-10 [26] DL dataset which contains 96 × 96 size images. All images were converted to gray scale and normalized before training. The training was performed on 10,000 unlabeled images. A test set (of 1,000 images) was used to verify network performance during training, and a validation set (2,000 images) was used to test the performance of the final model. Keras with TensorFlow was used to implement our model on an Intel i7 CPU (Integration Lenovo, Beijing, China) with 16 GB RAM.

Simulations
To observe how image quality deteriorates in FSPI under-sampled reconstruction, FSPI reconstruction was simulated for two test images i.e., Lena and cameraman, for different sampling ratios. The sampling rate (or ratio) 'S' (in percent) is the ratio between the number of measurements to image size in pixels multiplied by 100. The reconstruction results are shown in Figure 2. It can be seen that the image reconstruction quality for FSPI is very clear even for sampling ratios S ≤ 50%. However, for real-time applications, the FSPI reconstruction is usually based on S < 10% [10,11]. To observe image quality within 1-10% range, cameraman image is simulated for S = 1 to 10%, shown in Figure 3.   From Figure 3, it can be seen that the reconstructions for S between 1-10% has blurring artifacts present in the image. By qualitative comparison, it can be inferred that the clear target reconstruction is achieved at S = 25%. Therefore, for performance comparison, FSPI reconstruction at 25% sampling rate is set as the quality benchmark. This 25% benchmark for FSPI is more suitable with practical imaging, as the dynamics of reconstruction change for practical imaging. Since real-time FSPI uses lower sampling rates (S < 10%), it is necessary to develop an imaging framework that can produce high-quality images from the under-sampled images generated by FSPI.
The proposed DL-FSPI framework was optimized by exhaustively testing it through numerical simulations. For training and testing, STL-10 dataset was used, which comprises of ten classes: Monkey, cat, dog, deer, car, truck, airplane, bird, horse, and ship. The DL-FSPI network was trained on training images reconstructed using conventional FSPI for 5%, 6%, 8%, and 10% sampling rates. For performance validation, 2,000 images are kept aside as the validation dataset, which are not seen during training by the model. First, the performance of the proposed model was compared with conventional FSPI for different sampling rates using the validation dataset. For a qualitative and quantitative comparison between FSPI and DL-FSPI, the results of image reconstruction along with corresponding Structural SIMilarity (SSIM) [27] values are shown in Figure 4. It can be observed from Figure 4 that the proposed DL-FSPI can produce better quality sharp images compared to the corresponding FSPI method. The proposed DL-FSPI, after rigorous training From Figure 3, it can be seen that the reconstructions for S between 1-10% has blurring artifacts present in the image. By qualitative comparison, it can be inferred that the clear target reconstruction is achieved at S = 25%. Therefore, for performance comparison, FSPI reconstruction at 25% sampling rate is set as the quality benchmark. This 25% benchmark for FSPI is more suitable with practical imaging, as the dynamics of reconstruction change for practical imaging. Since real-time FSPI uses lower sampling rates (S < 10%), it is necessary to develop an imaging framework that can produce high-quality images from the under-sampled images generated by FSPI.
The proposed DL-FSPI framework was optimized by exhaustively testing it through numerical simulations. For training and testing, STL-10 dataset was used, which comprises of ten classes: Monkey, cat, dog, deer, car, truck, airplane, bird, horse, and ship. The DL-FSPI network was trained on training images reconstructed using conventional FSPI for 5%, 6%, 8%, and 10% sampling rates. For performance validation, 2000 images are kept aside as the validation dataset, which are not seen during training by the model. First, the performance of the proposed model was compared with conventional FSPI for different sampling rates using the validation dataset. For a qualitative and quantitative comparison between FSPI and DL-FSPI, the results of image reconstruction along with corresponding Structural SIMilarity (SSIM) [27] values are shown in Figure 4.  From Figure 3, it can be seen that the reconstructions for S between 1-10% has blurring artifacts present in the image. By qualitative comparison, it can be inferred that the clear target reconstruction is achieved at S = 25%. Therefore, for performance comparison, FSPI reconstruction at 25% sampling rate is set as the quality benchmark. This 25% benchmark for FSPI is more suitable with practical imaging, as the dynamics of reconstruction change for practical imaging. Since real-time FSPI uses lower sampling rates (S < 10%), it is necessary to develop an imaging framework that can produce high-quality images from the under-sampled images generated by FSPI.
The proposed DL-FSPI framework was optimized by exhaustively testing it through numerical simulations. For training and testing, STL-10 dataset was used, which comprises of ten classes: Monkey, cat, dog, deer, car, truck, airplane, bird, horse, and ship. The DL-FSPI network was trained on training images reconstructed using conventional FSPI for 5%, 6%, 8%, and 10% sampling rates. For performance validation, 2,000 images are kept aside as the validation dataset, which are not seen during training by the model. First, the performance of the proposed model was compared with conventional FSPI for different sampling rates using the validation dataset. For a qualitative and quantitative comparison between FSPI and DL-FSPI, the results of image reconstruction along with corresponding Structural SIMilarity (SSIM) [27] values are shown in Figure 4. It can be observed from Figure 4 that the proposed DL-FSPI can produce better quality sharp images compared to the corresponding FSPI method. The proposed DL-FSPI, after rigorous training It can be observed from Figure 4 that the proposed DL-FSPI can produce better quality sharp images compared to the corresponding FSPI method. The proposed DL-FSPI, after rigorous training on different types of under-sampled images and inherent FSPI artifacts, learns to reconstruct high-quality images from under-sampled inputs. Figure 5 shows the image reconstructions for different sampling rates by the DL-FSPI method. It can be observed from the figure that there exist a trade-off between sampling rate and image quality. For DL-FSPI-5 (imaging at S = 5%) which reconstructs images from 5% FSPI input, the reconstructed images have low quality. In this case, the model captures coarse details about the target scene due to blurring effects present in the under-sampled FSPI images. Therefore, in an attempt to achieve more compression, the image quality is lost. For sampling rates of 6%, 8%, and 10%, better image reconstruction quality can be observed. For the DL-FSPI-10 (imaging at S = 10%) model, the reconstruction results are the best amongst all other models, which is understandable because of the higher sampling rate. It can also be observed from Figure 5 that the image quality of DL-FSPI-6 (imaging at S = 6%) and DL-FSPI-8 (imaging at S = 8%) models is also comparable to that of DL-FSPI-10. Therefore, it can be concluded that up to 94% compression (using DL-FSPI-6) can be achieved without losing fine details in the image. However, for background sharpness and details, this study resorts to using DL-FSPI-8. on different types of under-sampled images and inherent FSPI artifacts, learns to reconstruct high-quality images from under-sampled inputs. Figure 5 shows the image reconstructions for different sampling rates by the DL-FSPI method. It can be observed from the figure that there exist a trade-off between sampling rate and image quality. For DL-FSPI-5 (imaging at S = 5%) which reconstructs images from 5% FSPI input, the reconstructed images have low quality. In this case, the model captures coarse details about the target scene due to blurring effects present in the under-sampled FSPI images. Therefore, in an attempt to achieve more compression, the image quality is lost. For sampling rates of 6%, 8%, and 10%, better image reconstruction quality can be observed. For the DL-FSPI-10 (imaging at S = 10%) model, the reconstruction results are the best amongst all other models, which is understandable because of the higher sampling rate. It can also be observed from Figure 5 that the image quality of DL-FSPI-6 (imaging at S = 6%) and DL-FSPI-8 (imaging at S = 8%) models is also comparable to that of DL-FSPI-10. Therefore, it can be concluded that up to 94% compression (using DL-FSPI-6) can be achieved without losing fine details in the image. However, for background sharpness and details, this study resorts to using DL-FSPI-8. Figure 5. DL-FSPI reconstruction on validation dataset for 5%,6%, 8%, and 10% sampling rates. Figure 6 shows target images reconstructed by the proposed DL-FSPI model at different sampling rates, with zoomed portions to inspect background or low-level details in the image. It can clearly be seen that both DL-FSPI-6 and DL-FSPI-8 models are able to reconstruct low-level features efficiently in the images. These fine details are further enhanced in the reconstruction by DL-FSPI-10. Conversely, DL-FSPI-5 is only able to recover coarse details in the image, with fine details appearing blurred in the zoomed portions of Figure 6. Furthermore, the image quality of the proposed method with the conventional FSPI (at 25%) method was compared. Figure 7 compares the image reconstruction quality of conventional FSPI (25%) with DL-FSPI (8% and 10%). It can be seen from this qualitative comparison that both DL-FSPI-8 and DL-FSPI-10 reconstruct high-quality images and the performance in most cases is better than conventional FSPI. The images reconstructed by DL-FSPI-10 are slightly brighter compared to DL-FSPI-8, but both models reconstruct fine details about the target clearly and accurately. Overall, the reconstruction by DL-FSPI methods is smooth with no artifacts.  Figure 6 shows target images reconstructed by the proposed DL-FSPI model at different sampling rates, with zoomed portions to inspect background or low-level details in the image. It can clearly be seen that both DL-FSPI-6 and DL-FSPI-8 models are able to reconstruct low-level features efficiently in the images. These fine details are further enhanced in the reconstruction by DL-FSPI-10. Conversely, DL-FSPI-5 is only able to recover coarse details in the image, with fine details appearing blurred in the zoomed portions of Figure 6. Furthermore, the image quality of the proposed method with the conventional FSPI (at 25%) method was compared. Figure 7 compares the image reconstruction quality of conventional FSPI (25%) with DL-FSPI (8% and 10%). It can be seen from this qualitative comparison that both DL-FSPI-8 and DL-FSPI-10 reconstruct high-quality images and the performance in most cases is better than conventional FSPI. The images reconstructed by DL-FSPI-10 are slightly brighter compared to DL-FSPI-8, but both models reconstruct fine details about the target clearly and accurately. Overall, the reconstruction by DL-FSPI methods is smooth with no artifacts. Sensors 2019, 19, x FOR PEER REVIEW 7 of 12  For quantitative comparison, the performance of conventional FSPI (at 25%) was compared with the proposed DL-FSPI model using validation dataset. The reconstruction results are quantified using SSIM metric. Images from the validation dataset (2,000 images) were reconstructed using conventional FSPI (S = 25%) and DL-FSPI (S = 6%, 8%, and 10%). The SSIM values of the reconstructions are plotted as histograms shown in Figure 8. The distribution from the histograms indicates that FSPI (25%) has slightly better reconstruction compared to DL-FSPI. However, the DL-FSPI method also outperforms FSPI (25%) for some images in the dataset. To quantify this performance comparison, the mean SSIM for the validation dataset for different methods is also presented in Figure 8. Both DL-FSPI-10 and DL-FSPI-8 compete well with 25% FSPI. Although the reconstruction quality of DL-FSPI method is similar to conventional FSPI (at 25%); the proposed method outperforms FSPI in terms of image reconstruction time.  For quantitative comparison, the performance of conventional FSPI (at 25%) was compared with the proposed DL-FSPI model using validation dataset. The reconstruction results are quantified using SSIM metric. Images from the validation dataset (2,000 images) were reconstructed using conventional FSPI (S = 25%) and DL-FSPI (S = 6%, 8%, and 10%). The SSIM values of the reconstructions are plotted as histograms shown in Figure 8. The distribution from the histograms indicates that FSPI (25%) has slightly better reconstruction compared to DL-FSPI. However, the DL-FSPI method also outperforms FSPI (25%) for some images in the dataset. To quantify this performance comparison, the mean SSIM for the validation dataset for different methods is also presented in Figure 8. Both DL-FSPI-10 and DL-FSPI-8 compete well with 25% FSPI. Although the reconstruction quality of DL-FSPI method is similar to conventional FSPI (at 25%); the proposed method outperforms FSPI in terms of image reconstruction time. For quantitative comparison, the performance of conventional FSPI (at 25%) was compared with the proposed DL-FSPI model using validation dataset. The reconstruction results are quantified using SSIM metric. Images from the validation dataset (2000 images) were reconstructed using conventional FSPI (S = 25%) and DL-FSPI (S = 6%, 8%, and 10%). The SSIM values of the reconstructions are plotted as histograms shown in Figure 8. The distribution from the histograms indicates that FSPI (25%) has slightly better reconstruction compared to DL-FSPI. However, the DL-FSPI method also outperforms FSPI (25%) for some images in the dataset. To quantify this performance comparison, the mean SSIM for the validation dataset for different methods is also presented in Figure 8. Both DL-FSPI-10 and DL-FSPI-8 compete well with 25% FSPI. Although the reconstruction quality of DL-FSPI method is similar to conventional FSPI (at 25%); the proposed method outperforms FSPI in terms of image reconstruction time.
To quantify reconstruction time, different values of imaging time (physical experiment-based values) are presented for conventional FSPI and the proposed DL-FSPI in Table 1. The reconstruction time for conventional FSPI is the time taken by IFT, whereas for the proposed method this reconstruction time is the time taken by IFT pre-processing and the DL algorithm. From Table 1, it can be seen that the image acquisition time of FSPI is very long, whereas the proposed DL-FSPI method reconstructs similar quality images in a short time. This in turn affects the frame rate, which is critical for real-time applications. Therefore, our proposed method can generate more frames per second compared to conventional FSPI and can be used for real-time high-quality image reconstruction.  Table 1. The reconstruction time for conventional FSPI is the time taken by IFT, whereas for the proposed method this reconstruction time is the time taken by IFT pre-processing and the DL algorithm. From Table 1, it can be seen that the image acquisition time of FSPI is very long, whereas the proposed DL-FSPI method reconstructs similar quality images in a short time. This in turn affects the frame rate, which is critical for real-time applications. Therefore, our proposed method can generate more frames per second compared to conventional FSPI and can be used for real-time high-quality image reconstruction. The average PSNR and SSIM values were also computed for the reconstructed images in the validation dataset (2,000 images) using different methods for quantitative comparison. Figure 9 shows the results for DL-FSPI-5, DL-FSPI-6, DL-FSPI-8, and DL-FSPI-10 methods. To select a particular method, there exists a trade-off between image quality and maximum achievable frame rate (fps). The trend in the graphs shows that as the image quality increases, the frame rate decreases. For rudimentary reconstruction, DL-FSPI-5 can be used to achieve higher frame rates. Whereas, for higher quality reconstruction, DL-FSPI-6, DL-FSPI-8 and DL-FSPI-10 (having higher frame rates compared to conventional FSPI) can be used.  The average PSNR and SSIM values were also computed for the reconstructed images in the validation dataset (2000 images) using different methods for quantitative comparison. Figure 9 shows the results for DL-FSPI-5, DL-FSPI-6, DL-FSPI-8, and DL-FSPI-10 methods. To select a particular method, there exists a trade-off between image quality and maximum achievable frame rate (fps). The trend in the graphs shows that as the image quality increases, the frame rate decreases. For rudimentary reconstruction, DL-FSPI-5 can be used to achieve higher frame rates. Whereas, for higher quality reconstruction, DL-FSPI-6, DL-FSPI-8 and DL-FSPI-10 (having higher frame rates compared to conventional FSPI) can be used.

Physical Experiments
The experimental arrangement of DL-FSPI is shown in Figure 10. An integrated projection system was used to illuminate the scene with sinusoidal patterns. The projection system uses a light

Physical Experiments
The experimental arrangement of DL-FSPI is shown in Figure 10. An integrated projection system was used to illuminate the scene with sinusoidal patterns. The projection system uses a light emitting diode (LED) operating at 450 nm (@30W) to illuminate the digital micromirror device (DMD) (TI DLP6500, Texas Instrument, Dallas, TX, USA). The light from the DMD is modulated and further projected onto the target using a projection lens. The scene to be captured is printed on a photograph paper for better quality reconstruction through FSPI, and is kept at a distance of 430 mm from the projector and photodetector. The light back-scattered from the scene is collimated onto the photodetector (18 mm 2 active area, Thorlabs, Newton, NJ, USA) using an imaging lens (Computar H0514-MP2, 5 mm, Torrance, CA, USA). The intensity measurements from the photodetector were digitized using 16-bit data acquisition (DAQ) card (Gage CSEG8 sampling at 1.3 MS/s, Lockport, IL, USA). Customized software developed in LabVIEW was used to generate (and store) and project basis patterns as well as record intensity measurements from the photodetector. The software synchronously controls both DMD and photodetector. An Intel i7 CPU with 16 GB RAM was used for data processing.

Physical Experiments
The experimental arrangement of DL-FSPI is shown in Figure 10. An integrated projection system was used to illuminate the scene with sinusoidal patterns. The projection system uses a light emitting diode (LED) operating at 450 nm (@30W) to illuminate the digital micromirror device (DMD) (TI DLP6500, Texas Instrument, Dallas, TX, USA). The light from the DMD is modulated and further projected onto the target using a projection lens. The scene to be captured is printed on a photograph paper for better quality reconstruction through FSPI, and is kept at a distance of 430 mm from the projector and photodetector. The light back-scattered from the scene is collimated onto the photodetector (18 mm 2 active area, Thorlabs, Newton, NJ, USA) using an imaging lens (Computar H0514-MP2, 5 mm, Torrance, CA, USA). The intensity measurements from the photodetector were digitized using 16-bit data acquisition (DAQ) card (Gage CSEG8 sampling at 1.3 MS/s, Lockport, IL, USA). Customized software developed in LabVIEW was used to generate (and store) and project basis patterns as well as record intensity measurements from the photodetector. The software synchronously controls both DMD and photodetector. An Intel i7 CPU with 16 GB RAM was used for data processing. (1) Experiment 1: In the first experiment, the under-sampled images were acquired (through FSPI) from the imaging setup, and then the network was trained on those images for reconstruction.
(2) Experiment 2: In the second experiment, the DL-FSPI model (DCAN block in Figure 10) trained on STL-10 dataset was applied directly onto the data from the imaging setup (under-sampled FSPI based images).
The results from the first experiment are shown in Figure 11. In this experiment, the images were taken from random datasets (Peppers, Lena, Dog etc.). The under-sampled FSPI reconstructions (5%, 6%, 8%, and 10%) were first acquired from the imaging setup and these were set aside as input images for training. The output label for training the network was set to be ground truth for the (1) Experiment 1: In the first experiment, the under-sampled images were acquired (through FSPI) from the imaging setup, and then the network was trained on those images for reconstruction. (2) Experiment 2: In the second experiment, the DL-FSPI model (DCAN block in Figure 10) trained on STL-10 dataset was applied directly onto the data from the imaging setup (under-sampled FSPI based images).
The results from the first experiment are shown in Figure 11. In this experiment, the images were taken from random datasets (Peppers, Lena, Dog etc.). The under-sampled FSPI reconstructions (5%, 6%, 8%, and 10%) were first acquired from the imaging setup and these were set aside as input images for training. The output label for training the network was set to be ground truth for the images under consideration. By training the network to learn an end-to-end mapping between the under-sampled FSPI images and ground truth counterparts, the network learns to remove noise present in the image from the imaging setup. Therefore, high-quality images can be reconstructed from the under-sampled inputs. The SSIM values corresponding to images in Figure 11 indicate that all DL-FSPI methods can produce high-quality image reconstructions.
The results from the second experiment are shown in Figures 11 and 12. In this experiment, the DCAN model trained on STL-10 dataset was applied (through simulations) to reconstruct diverse target scenes. The difference between the simulations and experimental results are shown in Figure 12.
The experimental results of Figure 12 show that the proposed model trained on the STL-10 dataset has enough knowledge of artifacts appearing in FSPI imaging that it easily removes them as seen in the DL-FSPI-10 image. It is important to note that as the sampling rate for FSPI increases to 25%, there still appears to be some fine-grained noise/artifacts in the image. Whereas, the DL-FSPI- 10  images under consideration. By training the network to learn an end-to-end mapping between the under-sampled FSPI images and ground truth counterparts, the network learns to remove noise present in the image from the imaging setup. Therefore, high-quality images can be reconstructed from the under-sampled inputs. The SSIM values corresponding to images in Figure 11 indicate that all DL-FSPI methods can produce high-quality image reconstructions. The results from the second experiment are shown in Figures 11 and 12. In this experiment, the DCAN model trained on STL-10 dataset was applied (through simulations) to reconstruct diverse target scenes. The difference between the simulations and experimental results are shown in Figure 12. The experimental results of Figure 12 show that the proposed model trained on the STL-10 dataset has enough knowledge of artifacts appearing in FSPI imaging that it easily removes them as seen in the DL-FSPI-10 image. It is important to note that as the sampling rate for FSPI increases to 25%, there still appears to be some fine-grained noise/artifacts in the image. Whereas, the DL-FSPI- 10

Conclusions
This study focused on improving the efficiency of conventional FSPI, which fails to produce high-quality images in real-time. To shorten the imaging time and produce high-quality images, FSPI requires an efficient image recovery framework. This study proposed a novel image reconstruction framework for FSPI that leverages the power of DL to reconstruct real-time high-quality images from  The results from the second experiment are shown in Figures 11 and 12. In this experiment, the DCAN model trained on STL-10 dataset was applied (through simulations) to reconstruct diverse target scenes. The difference between the simulations and experimental results are shown in Figure 12. The experimental results of Figure 12 show that the proposed model trained on the STL-10 dataset has enough knowledge of artifacts appearing in FSPI imaging that it easily removes them as seen in the DL-FSPI-10 image. It is important to note that as the sampling rate for FSPI increases to 25%, there still appears to be some fine-grained noise/artifacts in the image. Whereas, the DL-FSPI-10 image is sharp and the proposed algorithm removes all the artifacts and reconstructs a clear image. The SSIM values (compared with the ground truth pepper image) for Figure

Conclusions
This study focused on improving the efficiency of conventional FSPI, which fails to produce high-quality images in real-time. To shorten the imaging time and produce high-quality images, FSPI requires an efficient image recovery framework. This study proposed a novel image reconstruction framework for FSPI that leverages the power of DL to reconstruct real-time high-quality images from

Conclusions
This study focused on improving the efficiency of conventional FSPI, which fails to produce high-quality images in real-time. To shorten the imaging time and produce high-quality images, FSPI requires an efficient image recovery framework. This study proposed a novel image reconstruction framework for FSPI that leverages the power of DL to reconstruct real-time high-quality images from under-sampled low-quality FSPI images. The proposed DL-FSPI method employs a deep convolutional autoencoder network which uses symmetric pairs of encoding-decoding layers connected by skip connections for fast high-quality image reconstruction. Simulations and experiments validate the superiority of our model by comparing it with conventional FSPI method. The proposed method can replace the conventional FSPI method for many real-time applications where a high-quality image is required at higher frame rates. This work also provides guidelines for future application of DL in FSPI. Future investigations would involve characterizing the algorithm for very low S = 1-3%.