Underwater Object Detection and Reconstruction Based on Active Single-Pixel Imaging and Super-Resolution Convolutional Neural Network

Due to medium scattering, absorption, and complex light interactions, capturing objects from the underwater environment has always been a difficult task. Single-pixel imaging (SPI) is an efficient imaging approach that can obtain spatial object information under low-light conditions. In this paper, we propose a single-pixel object inspection system for the underwater environment based on compressive sensing super-resolution convolutional neural network (CS-SRCNN). With the CS-SRCNN algorithm, image reconstruction can be achieved with 30% of the total pixels in the image. We also investigate the impact of compression ratios on underwater object SPI reconstruction performance. In addition, we analyzed the effect of peak signal to noise ratio (PSNR) and structural similarity index (SSIM) to determine the image quality of the reconstructed image. Our work is compared to the SPI system and SRCNN method to demonstrate its efficiency in capturing object results from an underwater environment. The PSNR and SSIM of the proposed method have increased to 35.44% and 73.07%, respectively. This work provides new insight into SPI applications and creates a better alternative for underwater optical object imaging to achieve good quality.


Introduction
In the underwater environment, reconstruction of objects is a challenging task, due to light attenuation by water absorption and the illumination volatility by the scattering medium [1,2]. Various approaches have been developed for imaging objects under lowlight conditions and for improving the quality of underwater images. Polarimetric imaging systems [3,4] were used to reduce the backscattering effect and produced a good quality image. Although polarization correction can improve the renovation quality, it also blocks part of the light incident on the detector, thereby reducing the signal to noise ratio and complicating the problem of imaging underwater. Range-gated imaging was developed in Mariani et al. [5] by combining the time of flight technique. The use of a pulsed laser as a light source enabled the authors to measure the distance of the object from the light source, thereby calculating the depth of the object by eliminating the backscattering effect. However, relative to other cameras, range-gated systems are generally more expensive, more complicated to operate, and limited in resolution and frame rate. Geotagging and color correction approaches were implemented to successfully generate 2D and 3D images of the underwater environment [6]. To capture underwater images, a digital camera with waterproof housing was used and the location of the camera was identified through geo-tagging. Though it could produce good-quality 2D and 3D maps of the underwater environment, requirements such as clear water and calm surface conditions limited its performance. To further improve the underwater image quality, in Lu et al. [7] used a multi-scale cycle generative adversarial network. In that, the discriminator could produce high-quality images under homogeneous lighting conditions. However, it is failed under inhomogeneous illumination. The work was extended in Tang et al. [8] using an algorithm called dark channel prior to further improving underwater image quality. Empirical mode decomposition was developed in Çelebi et al. [9], where obtained underwater images were decomposed based on the spectral components and then, again, constructed by combining the intrinsic mode functions to enhance visual image quality. However, this three-channel component calculation method increased the amount of calculation. Most of the schemes utilize the silicon-based Charge Coupled Device (CCD) or Complementary Metal Oxide Semiconductor (CMOS) to image underwater objects. The limited response of the detection medium to a specific bandwidth has always been a drawback of those sensors imaging systems under low illumination conditions.
Recently, single-pixel imaging (SPI) has attracted widespread attention in imaging objects under low-light conditions and a scattering medium. SPI uses pre-programmed modulated light patterns and knowledge of the scene under view to acquire spatial information on the target [10,11]. In projection-based SPI, the object is illuminated with two-dimensional spatially coded patterns and collects reflected or transmitted light signals from the object with the single-pixel detector (SPD) to gather the fine details of the object [11]. In addition to cost-effectiveness, other advantages of SPI are low dark current and light sensitivity, producing good-quality images. Simultaneously, the introduction of compressed sensing (CS) in SPI has enabled reconstruction with fewer measurements [12,13]. Owing to these improvements, SPI has been exploited in scattering media [14,15]. Although it has been proved that SPI can be imaged underwater, the fluctuation in illumination caused by absorption and the scattering medium of water still seriously reduce the reconstruction quality. To suppress these effects. Chen, et al. [16] proposed a SPI detection scheme in which an object can be recovered through turbid water by transmission signal data. Although it has been proved that SPI can be imaged underwater, the scattering and absorption of water seriously reduce the reconstruction quality. In addition, these methods also have some noise on the detector, decreasing the image quality [17]. Hence, for SPI underwater, an SPI restoration algorithm that is robust to noise is demanded. Moreover, limitations in the visual aspect of the above imaging system can be overcome by combining both deep learning and a basic SPI system.
Deep learning [18], as an emerging field of research, has proven its performance in image processing, such as image recognition [19,20], object detection [21], and person pose estimation [22]. When combining neural networks with SPI systems, there are two ways to get the final target image. One exploits the robust neural network to rehabilitate the object image and the other trains and predicts images with the neural network after reconstruction. For the former, Higham, et al. [23] utilized MATCONVNET to create a deep convolutional auto-encoder using the patterns as the encoding layer and Hadamard patterns as the optimization basis. For subsequent decoding layers, three conventional layers were used to recover the real-time high-resolution video. For the latter scheme, Rizvi, et al. [24] raised a deep convolutional autoencoder network that uses reconstructed under-sampled training and testing images as input and high-quality predicted images as output. Compared with previous methods, deep learning algorithms have been proved to be more reliable for restoration. Therefore, the employment of deep learning in SPI system has brought drastic changes in image quality.
Though different studies about the influence of turbulence [25] and deep learning on SPI are discussed in Dutta et al. [26] and Jauregui-Sánchez et al. [27], there are limited studies that focus on SPI with deep learning for underwater image reconstruction. Therefore, in this paper, we leverage single-pixel imaging and deep learning to improve underwater imaging restoration. An experimental setup was established to verify the presented method. The effectiveness of our technique is demonstrated through both experiments and simulation. We summarize them as follows: • An optical imaging system based on SPI is developed for imaging objects in the underwater environment. Our experimental results validated that the recovered object by the underwater SPI system is affected by scattering and absorptions. • Compressive sensing-based super-resolution convolutional neural network (CS-SRCNN) is implemented by combining the advantages of the SRCNN and SPI system. The newly introduced CS-SRCNN takes reconstructed underwater SPI images to train the network and to predict high-resolution images.

•
We also demonstrate the effectiveness of our technique through simulation. In the simulation, the proposed method can restore objects with a low sampling rate and can produce more robust reconstructions. Our experimental results also validated that the recovered object by the underwater SPI system is affected by scattering and absorptions.

•
We experimentally demonstrated reconstruction to get better results with a low sampling rate of only 30%.  Figure 1 shows the schematic diagram of the SPI system. Assuming that X is the column vector reshaped from the N = p × q pixel image and that it is sparsely sampled by measurement matrix Φ, the corresponding measurement data can be expressed as follows:

Underwater Object Reconstruction
where Y is the measurement data by stacking of y i , which is an Mx1 column vector having linear measurements. M is the number of measurements used for image acquisition. The measurement matrix Φ ∈ R M×N contains M row vectors, which are the stacking of the 2-dimension coded patterns φ m . e of size Mx1 denotes the noise. Equation (1) can be written as follows: method. The effectiveness of our technique is demonstrated through both experiments and simulation. We summarize them as follows:  An optical imaging system based on SPI is developed for imaging objects in the underwater environment. Our experimental results validated that the recovered object by the underwater SPI system is affected by scattering and absorptions.  Compressive sensing-based super-resolution convolutional neural network (CS-SRCNN) is implemented by combining the advantages of the SRCNN and SPI system. The newly introduced CS-SRCNN takes reconstructed underwater SPI images to train the network and to predict high-resolution images.  We also demonstrate the effectiveness of our technique through simulation. In the simulation, the proposed method can restore objects with a low sampling rate and can produce more robust reconstructions. Our experimental results also validated that the recovered object by the underwater SPI system is affected by scattering and absorptions.  We experimentally demonstrated reconstruction to get better results with a low sampling rate of only 30%. Figure 1 shows the schematic diagram of the SPI system. Assuming that X is the column vector reshaped from the N = p × q pixel image and that it is sparsely sampled by measurement matrix Φ , the corresponding measurement data can be expressed as follows:

Compressive Sensing
where Y is the measurement data by stacking of i y , which is an  A measurement matrix Φ was constructed with a random pattern combined with the observed signal y i to obtain a system of equations. However, as mentioned above, Φ has fewer rows than columns (N = p × q). Therefore, solving out X using Y and Φ seems impossible because the solution of the system of equations is not unique. To address the issue, we relied on the principle of compressed sensing (CS) [28], which can be used effectively for sparse images. Sparsity indicates that a signal is represented in an appropriate basis, where most of the information in images are close to zero. In most instances, the gradient integral of a natural signal is statistically low; hence the image can be compressed by the CS method. When X is k-sparse, it can be recovered with M = O(k log N) incoherent nonadaptive linear measurements. Equation (1) can be summarized as follows: Let D i x denote the discrete gradient vector of X at position (i,j), D be the gradient (horizontal D h i x and vertical D ν i x) operator with the following operational definitions: The total variation regularization (TV) in X is simply the sum of the magnitudes of this discrete gradient at each point and the error term: where ∑ i D i x 1 is the discrete TV of X, µ is a constant scalar used to balance these two terms, and x 1 = ∑ N i=1 |x i | 1 represents 1 norm. TV regularization to recover the image is discussed in Li [29]. The first term in Equation (6) is small when D i x is sparse. When the optimal X is consistent with Equation (1) with a small error, the second term is small [30].
The image reconstruction problem of Equation (1) can be expressed as follows: min TV(X) subject to ΦX = Y Accurate recovery can be achieved by solving a convex optimization program that is easy to handle [31].

The Super-Resolution Convolutional Neural Network
Convolutional neural network (CNN), a typical deep learning algorithm, has been widely used in image processing due to its powerful feature learning function in computer vision research. SRCNN [32,33] is the first end-to-end super-resolution algorithm using the CNN structure. An SRCNN architecture was adopted as the benchmark, which learns to map between low or high-resolution images. The low-resolution (LR) components, X SPI , were loaded into bicubic interpolation to get interpolated components X II . The patch extraction layer extracted patches from the bicubic interpolated components, and feature extraction by convolution can be denoted as follows: where the variables F, X II , W 1 , and B 1 represent the mapping function, the original highresolution (HR) image after interpolation, the filters, and the biases (n 1 -dimensional vector), respectively, and * represents the convolution operation. The size of W 1 is n 1 × c × f 1 × f 1 , f is the size of the filter, c is the number of channels contained in the input image, and n is the number of convolution kernels, with suffix 1 indicating the first layer. A Rectified Linear Unit (ReLU) was used as the activation function of the network. The function expression of ReLU is as follows: ReLU can be computed faster than traditional activation functions such as sigmoid and allows for easier optimization of the neural network. The nonlinear mapping layer maps from LR space to HR space. The operation is as follows: The size of W 2 is n 2 × n 1 × f 2 × f 2 , with B 2 being an n 2 -dimensional vector. This layer outputs the n 2 -dimensional feature map as an input to the third layer.
The reconstruction of HR images in the third layer can be expressed as follows: where the size of W 3 is c × n 2 × f 3 × f 3 . Combining these three operations constitutes a CNN in which all filter weights and deviations are optimized. In addition, during the training phase, the mapping function F needs to estimate the network parameters The reconstructed images were described as F(X I I ; Θ), and the HR image was X SR . The error between F(X I I ; Θ) and the real image X SR was minimized.
To achieve this goal, the mean square error (MSE) was chosen to construct the loss function of the network model of this paper. The loss function L is expressed as follows: where o is the number of training images and X SRi is a set of HR image. The loss function was minimized by applying the gradient descent method and the standard backpropagation algorithm.

The Reconstruction of Underwater Single-Pixel Imaging based on Compressive Sensing and Super-Resolution Convolutional Neural Network
According to those theories, we regarded CS image regeneration as an inverse problem and tackled this question based on SRCNN. In this study, we propose an underwater object reconstruction CS-SRCNN method which combines the improved SRCNN and SPI system. In addition, the specific implementation architecture of the proposed method is shown in Figure 2. This method architecture was adopted as the benchmark, which learns to map between low or high-resolution images. That is, high-frequency components were increased in low-resolution (LR) components. In the mapping process, network measurement data were taken as input and produce high-resolution (HR) images were taken as output [34,35].
The architecture combined both deep learning and traditional sparse coding technique ( in Figure 3). The CS-SRCNN can be divided into their respective functions, namely patch extraction, nonlinear mapping, and final reconstruction. The first part of the SRCNN (patch extraction) consisted of a CS layer and a three-layer CNN. The input to the entire neural network was M-dimensional compressed raw data from SPI system. The raw data ran iterative processing by TV regularization to attain LR components. The network enlarged the LR components by bicubic interpolation, which is a preprocessing step for CNN. Compared to other methods, bicubic interpolation is faster and does not introduce too much additional information. After that, the first convolutional layer extracted 9 × 9 LR patches. The output was passed to the rectified linear unit (ReLU) activation function. The second part of the SRCNN (nonlinear mapping) made use of a convolution layer with a kernel size of 1 × 1 to output 64 feature maps, which were then concatenated into a matrix. The output was passed to the ReLU activation function. In the third part of the SRCNN (reconstruction), the matrix from the preceding layer was passed through a convolution layer with a kernel size of 5 × 5. HR feature vectors were reconstructed, and all HR patches were combined to form the highest enough HR components. Subsequently, we applied postprocessing (e.g., filtering to remove noise) to the reconstructed images. In our study, the dataset was divided into two subsets: the training set and test set. We used 400 images of handwritten digits downloaded from the MNIST database with the corresponding SPI images to train the SRCNN. The other 100 images were put into the test set. MNIST handwritten digits (28 × 28 resolution) were resized into the correct resolution (e.g., 32 × 32). For each training image, the same set of M different random patterns was applied in both the simulation and experiment (more details in Section 3). After the set of M patterns was run with the object, we attained a light intensity signal y i of length M. We then paired up the resulting signal y i and the corresponding 2D image. The objects were replaced, and the above simulation process was rerun to obtain another pair of labeled data. For training, no additional noise was added to the simulated light intensity signal. This process was run repeatedly across the whole 400 images in the training set and were paired up with y i . LR patches. The output was passed to the rectified linear unit (ReLU) activation function. The second part of the SRCNN (nonlinear mapping) made use of a convolution layer with a kernel size of 1 × 1 to output 64 feature maps, which were then concatenated into a matrix. The output was passed to the ReLU activation function. In the third part of the SRCNN (reconstruction), the matrix from the preceding layer was passed through a convolution layer with a kernel size of 5 × 5. HR feature vectors were reconstructed, and all HR patches were combined to form the highest enough HR components. Subsequently, we applied postprocessing (e.g., filtering to remove noise) to the reconstructed images. In our study, the dataset was divided into two subsets: the training set and test set. We used 400 images of handwritten digits downloaded from the MNIST database with the corresponding SPI images to train the SRCNN. The other 100 images were put into the test set. MNIST handwritten digits (28 × 28 resolution) were resized into the correct resolution (e.g., 32 × 32). For each training image, the same set of M different random patterns was applied in both the simulation and experiment (more details in Section 3). After the set of M patterns was run with the object, we attained a light intensity signal yi of length M. We then paired up the resulting signal yi and the corresponding 2D image. The objects were replaced, and the above simulation process was rerun to obtain another pair of labeled data. For training, no additional noise was added to the simulated light intensity signal. This process was run repeatedly across the whole 400 images in the training set and were paired up with yi .

Simulation Results and Analysis
In the underwater SPI system, to investigate the effect of the number of measurements and water turbidity, random patterns were employed to sample the object with 32 × 32 resolution. The simulation method using Gaussian blur [36] (different Gaussian

Simulation Results and Analysis
In the underwater SPI system, to investigate the effect of the number of measurements and water turbidity, random patterns were employed to sample the object with 32 × 32 resolution. The simulation method using Gaussian blur [36] (different Gaussian noises of 10, 15, and 25 dB were added to simulate different underwater turbidity conditions) as the disturbance factor simulated scattering of the water medium and qualitatively analyzed the non-disturbance ability of SPI under different sampling rates in turbid water. The simulation results for different sampling rates are shown in Figure 4. We can see that there are blurry artifacts in the refurbished images at different sampling rates between 4.8-30%.
Furthermore, to inspect how the image quality changes in the underwater environment, we simulated CS-SRCNN reconstruction for an image of the letter G with a sampling rate of 0.29 for different underwater turbidities. We then verified the influence of turbid- ity on the reconstruction quality by applying the conventional SRCNN and our method. According to our setup, we compared images from SPI, SRCNN, and our method by applying upscaling factors of 3 on the underwater image. Figure 5 shows the visual effects at different turbidities (0, 20, 40, and 60 nephelometry turbidity unit (NTU)). As the turbidity increased from 0 NTU to 60 NTU, we observed a significant degradation of the reconstructed image, comparing Figure 5b with Figure 5e-g. However, CS-SRCNN can reconstruct clear images, even if the turbidity is as high as 60 NTU, and the reconstructed image can still be satisfactory with a sampling ratio of about 29%. noises of 10, 15, and 25 dB were added to simulate different underwater turbidity conditions) as the disturbance factor simulated scattering of the water medium and qualitatively analyzed the non-disturbance ability of SPI under different sampling rates in turbid water. The simulation results for different sampling rates are shown in Figure 4. We can see that there are blurry artifacts in the refurbished images at different sampling rates between 4.8-30%. Furthermore, to inspect how the image quality changes in the underwater environment, we simulated CS-SRCNN reconstruction for an image of the letter G with a sampling rate of 0.29 for different underwater turbidities. We then verified the influence of turbidity on the reconstruction quality by applying the conventional SRCNN and our method. According to our setup, we compared images from SPI, SRCNN, and our method by applying upscaling factors of 3 on the underwater image. Figure 5 shows the visual effects at different turbidities (0, 20, 40, and 60 nephelometry turbidity unit (NTU)). As the turbidity increased from 0 NTU to 60 NTU, we observed a significant degradation of the reconstructed image, comparing Figure 5b with Figure 5e-g. However, CS-SRCNN can reconstruct clear images, even if the turbidity is as high as 60 NTU, and the reconstructed image can still be satisfactory with a sampling ratio of about 29%.  see that there are blurry artifacts in the refurbished images at different sampling rates between 4.8-30%. Furthermore, to inspect how the image quality changes in the underwater environment, we simulated CS-SRCNN reconstruction for an image of the letter G with a sampling rate of 0.29 for different underwater turbidities. We then verified the influence of turbidity on the reconstruction quality by applying the conventional SRCNN and our method. According to our setup, we compared images from SPI, SRCNN, and our method by applying upscaling factors of 3 on the underwater image. Figure 5 shows the visual effects at different turbidities (0, 20, 40, and 60 nephelometry turbidity unit (NTU)). As the turbidity increased from 0 NTU to 60 NTU, we observed a significant degradation of the reconstructed image, comparing Figure 5b with Figure 5e-g. However, CS-SRCNN can reconstruct clear images, even if the turbidity is as high as 60 NTU, and the reconstructed image can still be satisfactory with a sampling ratio of about 29%.   Figure 6 compares the peak signal to noise ratio (PSNR) and structural similarity index (SSIM) of images taken with different turbidities of SPI before reconstructed with TVregularization, SRCNN, and CS-SRCNN. We kept the number of measurements constant at 300. The figure shows that both PSNR and SSIM decreased with turbidity for all three methods of reconstruction. In the low turbidity region from 0 NTU to 40 NTU, the PSNR and SSIM of the three methods declined at a similar rate. On the contrary, in areas where the turbidity is higher than 40 NTU, the imaging quality of our method shows robustness (the slope tends to be flat) while the other two approaches followed the former trend. Since these three evaluation factors have good robustness to the influence of the scattering factor from the water medium, this makes CS-SRCNN more sensitive, and the results of underwater imaging can be further enhanced.
We also compared the image recovered by our method with the images reconstructed by the end-to-end learning method and image processing method, which is the scattering media image recovery method based on a polarimetric ghost imaging system formulated by Li, et al. [37]. The PSNR and SSIM were used to evaluate and compare the results. The results of those methods are shown in Table 1. Compared with Li et al.'s method, the reconstructed image PSNRs and SSIMs in our method were significantly increased by 17.39%/83.78% and 16.11%/90.32%, which means that our method has better performance than the other methods.  Figure 6 compares the peak signal to noise ratio (PSNR) and structural similarity index (SSIM) of images taken with different turbidities of SPI before reconstructed with TVregularization, SRCNN, and CS-SRCNN. We kept the number of measurements constant at 300. The figure shows that both PSNR and SSIM decreased with turbidity for all three methods of reconstruction. In the low turbidity region from 0 NTU to 40 NTU, the PSNR and SSIM of the three methods declined at a similar rate. On the contrary, in areas where the turbidity is higher than 40 NTU, the imaging quality of our method shows robustness (the slope tends to be flat) while the other two approaches followed the former trend. Since these three evaluation factors have good robustness to the influence of the scattering factor from the water medium, this makes CS-SRCNN more sensitive, and the results of underwater imaging can be further enhanced. We also compared the image recovered by our method with the images reconstructed by the end-to-end learning method and image processing method, which is the scattering media image recovery method based on a polarimetric ghost imaging system formulated by Li, et al. [37]. The PSNR and SSIM were used to evaluate and compare the results. The results of those methods are shown in Table 1. Compared with Li et al.'s method, the reconstructed image PSNRs and SSIMs in our method were significantly increased by 17.39%/83.78% and 16.11%/90.32%, which means that our method has better performance than the other methods.

Experimental Setup
The experimental setup for the underwater SPI system is illustrated in Figure 7, which consists of a laser source and a DMD to project binary patterns. A fast response time SPD with a collection lens was used to measure the transmission intensity resulting from each pattern. The corresponding intensity values were captured by a data acquisition (DAQ) device. The object image was acquired by correlating the values from the DAQ with the pattern martrix.

Experimental Setup
The experimental setup for the underwater SPI system is illustrated in Figure 7, which consists of a laser source and a DMD to project binary patterns. A fast response time SPD with a collection lens was used to measure the transmission intensity resulting from each pattern. The corresponding intensity values were captured by a data acquisition (DAQ) device. The object image was acquired by correlating the values from the DAQ with the pattern martrix. The experimental setup is shown in Figure 8. The continuum laser is monochromatic, highly intense, and power-efficient and produces a coherent wavelength of 520 nm (the  The experimental setup is shown in Figure 8. The continuum laser is monochromatic, highly intense, and power-efficient and produces a coherent wavelength of 520 nm (the laser power was 20 mW). It is, therefore, a viable choice of light source that illuminates the DMD. The DMD allows for a configurable input (MATLAB programming and light source)/output (random pattern) trigger for convenient synchronization with continuum laser and SPD peripheral devices. The DLP (DLP Light Crafter 6500, Texas Instruments) provides a true HD resolution of 1920 × 1080, and more than 2 million programmable micromirrors by the DMD chip mounted on it. The pre-programmed random patterns are produced by the DMD, which is in the form of a 32 × 32 matrix. Each light pattern that passed through the underwater object (transparent "G"; the length, width, and height of the water tank were 20 cm, 20 cm, and 25 cm) was directed to a collecting lens (the model used was THORLABS LMR75/M). The collecting lens focused the light on the SPD (the model used was PDA36A2), which acted as a bucket detector and had a bandwidth of 12 MHZ and an area of 13 × 13 mm 2 . The total light intensity value detected by the digital-to-analog converter (National Instrument DAQ USB-6001) was converted to an electrical signal and was transmitted to a laptop via USB for image reconstruction. The synchronization between DMD and SPD was done by sending a trigger signal to the DMD to refresh the patterns. Simultaneously, the trigger signal was sent to the DAQ to attain the SPD voltage. The data management and sampling were controlled and synchronized by MATLAB software. The automation scripts for data collection were executed with the MATLAB software on a laptop. After minimizing the total variation, the reconstructed algorithm was implemented to reconstruct the image. The experimental setup is shown in Figure 8. The continuum laser is monochromatic, highly intense, and power-efficient and produces a coherent wavelength of 520 nm (the laser power was 20 mW). It is, therefore, a viable choice of light source that illuminates the DMD. The DMD allows for a configurable input (MATLAB programming and light source)/output (random pattern) trigger for convenient synchronization with continuum laser and SPD peripheral devices. The DLP (DLP Light Crafter 6500, Texas Instruments) provides a true HD resolution of 1920 × 1080, and more than 2 million programmable micromirrors by the DMD chip mounted on it. The pre-programmed random patterns are produced by the DMD, which is in the form of a 32 × 32 matrix. Each light pattern that passed through the underwater object (transparent "G"; the length, width, and height of the water tank were 20 cm, 20 cm, and 25 cm) was directed to a collecting lens (the model used was THORLABS LMR75/M). The collecting lens focused the light on the SPD (the model used was PDA36A2), which acted as a bucket detector and had a bandwidth of 12 MHZ and an area of 13 × 13 mm 2 . The total light intensity value detected by the digitalto-analog converter (National Instrument DAQ USB-6001) was converted to an electrical signal and was transmitted to a laptop via USB for image reconstruction. The synchronization between DMD and SPD was done by sending a trigger signal to the DMD to refresh the patterns. Simultaneously, the trigger signal was sent to the DAQ to attain the SPD voltage. The data management and sampling were controlled and synchronized by MATLAB software. The automation scripts for data collection were executed with the MATLAB software on a laptop. After minimizing the total variation, the reconstructed algorithm was implemented to reconstruct the image.

TV Regularization (Compressive Sensing) Reconstruction
The experimental results of the reconstruction of underwater SPI using TV regularization are shown in Figure 9. The reconstruction was affected by the number of measurements, imaging speed, and compressibility. Random patterns were employed to reconstruct the object with 32 × 32 resolution. To investigate the effect of the number of measurements on the underwater image, different numbers of measurements were used to capture and reconstruct the images. Also, Figure 9 represents the reconstruction results for three different numbers of random patterns. The images in Figure 9b-d,f-h represent the reconstruction results for 100, 200, and 300 random patterns, respectively. It can be seen that the simple object ("+") was restored better than the complex object ("G"). Furthermore, the object can be reconstructed even with 29% of the total number of pixels in the original image by using CS.
In order to quantify the accuracy of image reconstruction results, image quality assessment techniques such as the mean square error (MSE), PSNR, SSIM, and visibility (V) were employed. The MSE was calculated by comparing the reconstructed image and original object image, and it is defined as follows: where x is the original image with p x q resolution, and X R is the reconstructed image. Suppose x 2 max = 2 K − 1, where K represents the number of bits used for a pixel, K = 8, and x max = 225. Similarly, PSNR describes the ratio of original image pixels to reconstructed pixels, and it is defined as follows: SSIM is a full-reference metric that describes the statistical similarity between two images. The indicator was first proposed by the University of Texas at Austin's Laboratory for Image and Video Engineering [38]. This is given by the following: where µ x and µ X R are the averages of x and X R , respectively; σ x and σ X R are standard deviations of x; X and σ xX R are covariances of x and X R , respectively; c 1 and c 2 are variables to stabilize the division with the weak denominator (constants); c 1 = (K 1 L) 2 ; and c 2 = (K 2 L) 2 . Generally, K 1 = 0.01, K 2 = 0.03, and L = 255 (L is the dynamic range of the pixel value, generally taken as 255). The visibility (V) is defined as follows: where <S in > and <S out > are the average values of SPI of the interesting region and the background region, respectively. The PSNR, SSIM, and V were calculated using the MATLAB function.

TV Regularization (Compressive Sensing) Reconstruction
The experimental results of the reconstruction of underwater SPI using TV regularization are shown in Figure 9. The reconstruction was affected by the number of measurements, imaging speed, and compressibility. Random patterns were employed to reconstruct the object with 32 × 32 resolution. To investigate the effect of the number of measurements on the underwater image, different numbers of measurements were used to capture and reconstruct the images. Also, Figure 9 represents the reconstruction results for three different numbers of random patterns. The images in Figure 9b-d, f-h represent the reconstruction results for 100, 200, and 300 random patterns, respectively. It can be seen that the simple object ("+") was restored better than the complex object ("G"). Furthermore, the object can be reconstructed even with 29% of the total number of pixels in the original image by using CS.
In order to quantify the accuracy of image reconstruction results, image quality assessment techniques such as the mean square error (MSE), PSNR, SSIM, and visibility (V) As can be seen from Tables 2 and 3, the measurements are directly proportional to the PSNR and SSIM. To be clear, the PSNR and SSIM ranges are higher for image restoration at more measurements. For example, in Table 2, PSNR = 11.59 dB, SSIM = 0.26, and V = 0.29 for 300 measurements and PSNR = 9.02dB, SSIM = 0.10, and V = 0.24 for 100 measurements. Compared with the third column, the reconstructed images PSNR, SSIM, and V in the first column were significantly increased by 28.49%/1.6/20.83%. Therefore, the more the number of measurements, the better the similarity of images reconstruction.

Reconstruction Based on Compressive Sensing Super-resolution Convolutional Network (CS-SRCNN)
In this scheme, the data set with 300 measurements from an underwater SPI system was fed into the TV regularization algorithm. The resultant output enlarged the desired matrix size by bicubic interpolation. The interpolated components were set as the input to the following layers. The output of each layer of convolution is shown in Figure 10. For the first layer, the convolution kernel size was 9 × 9 (f 1 × f 1 ), the number of convolution kernels was n 1 = 64, and the output was 64 feature maps. Optimization of the output was performed by ReLU as a nonlinear activation function. For the second layer, the convolution kernel size was 1 × 1 (f 2 × f 2 ), the number of convolution kernels was n 2 = 32, and the output was 32 feature maps. Similarly, for the third layer, the convolution kernel size was 5 × 5 (f 3 × f 3 ), the number of convolution kernels was n 3 = 1, and the output was 1 feature map. The final feature map is the reconstructed HR components. Furthermore, in this part, according to our setup, a comparison was made for SPI, SRCNN, and our method images by applying upscaling factors of 3 on the underwater image. Figure 11 shows the visual effects of different methods. Furthermore, in this part, according to our setup, a comparison was made for SPI, SRCNN, and our method images by applying upscaling factors of 3 on the underwater image. Figure 11 shows the visual effects of different methods. Furthermore, in this part, according to our setup, a comparison was made for SPI, SRCNN, and our method images by applying upscaling factors of 3 on the underwater image. Figure 11 shows the visual effects of different methods. To investigate the relationship among the image quality of super-resolution images reconstructed by the CS-SRCNN scheme and SPI image, two indicators are commonly used such as PSNR and SSIM. The analysis of results focused more on the complex letter "G". The higher PSNR and SSIM, the closer the pixel value to the standard. With the idea of controlling the variates, the index values for SRCNN and our method at upscaling factors of 3 are presented in Tables 4 and 5 by keeping other conditions such as compression ratio of the reconstructed image, the reconstructed image algorithm, and so on constant. Moreover, the respective PSNR and SSIM for each recovered image in Figure 9d,h are shown in Tables 4 and 5. Compared with the third column, the reconstructed images To investigate the relationship among the image quality of super-resolution images reconstructed by the CS-SRCNN scheme and SPI image, two indicators are commonly used such as PSNR and SSIM. The analysis of results focused more on the complex letter "G". The higher PSNR and SSIM, the closer the pixel value to the standard. With the idea of controlling the variates, the index values for SRCNN and our method at upscaling factors of 3 are presented in Tables 4 and 5 by keeping other conditions such as compression ratio of the reconstructed image, the reconstructed image algorithm, and so on constant. Moreover, the respective PSNR and SSIM for each recovered image in Figure 9d,h are shown in Tables 4 and 5. Compared with the third column, the reconstructed images PSNR, SSIM, and V in the second column increased significantly by 29.83%/40.62%/10.00%, and 15.97%/44.00%/6.45%, respectively. We can know that PSNR, SSIM, and V have been significantly improved, consistent with imaging theory. The results validate our analysis of neural networks in underwater SPI. The above analysis only evaluates the quality of images from the two aspects PSNR and SSIM. To comprehensively evaluate the performance of the algorithm, a comparative analysis was performed using the original image shown in Figure 12a at different compression ratios for TV regularization and our algorithms. The above analysis only evaluates the quality of images from the two aspects PSNR and SSIM. To comprehensively evaluate the performance of the algorithm, a comparative analysis was performed using the original image shown in Figure 12a at different compression ratios for TV regularization and our algorithms. In comparison, the images with compression ratios of 0.70, 0.13, and 0.20 were reconstructed. Figure 13a-c shows the reconstruction results for TV regularization for different compression ratios. From the figure, the signal reconstructed is closer to the original image signal when compression ratio increases and convergence occurs. Figure 13d-f are the recovery results corresponding to our reconstruction algorithm. As the compression rate increases, the reconstruction accuracy of the image gets higher and higher; however, the result shows that there are peaks, which are quite different from the original signal distribution curve. The main reason for this is that our reconstruction algorithm performs block processing on the input image and then reconstructs each small block independently and combines it into the whole picture. Therefore, there is an image blocking artifact problem which is not obvious when the compression ratio is high but is more evident at a low compression ratio. Filtering and smoothing can be performed at the end of reconstruction to alleviate the blocking artifact problem in the resultant HR image. In comparison, the images with compression ratios of 0.70, 0.13, and 0.20 were reconstructed. Figure 13a-c shows the reconstruction results for TV regularization for different compression ratios. From the figure, the signal reconstructed is closer to the original image signal when compression ratio increases and convergence occurs. Figure 13d-f are the recovery results corresponding to our reconstruction algorithm. As the compression rate increases, the reconstruction accuracy of the image gets higher and higher; however, the result shows that there are peaks, which are quite different from the original signal distribution curve. The main reason for this is that our reconstruction algorithm performs block processing on the input image and then reconstructs each small block independently and combines it into the whole picture. Therefore, there is an image blocking artifact problem which is not obvious when the compression ratio is high but is more evident at a low compression ratio. Filtering and smoothing can be performed at the end of reconstruction to alleviate the blocking artifact problem in the resultant HR image.

Conclusions
In the current study, an improved underwater SPI system and an image super-reso-

Conclusions
In the current study, an improved underwater SPI system and an image superresolution method are proposed, which are cost-effective for underwater imaging. This system can overcome the limitations of obtaining good-quality images in low-light conditions. The proposed SPI system employed a combination of the CS algorithm and a neural network to enhance the quality of the images to reduce sampling time and postprocessing of data. Referring to the network structure of an CS-SRCNN, the employed network algorithm successfully deduced the HR images from being trained with multiple measurement data. The resultant 2D underwater image using the proposed single-pixel imaging setup and network algorithm ensures better quality compared to the conventional SRCNN method. The experimental results show that, while the PSNR and SSIM of SPI images reach 11.57 dB and 0.26, respectively, in clear water, the PSNR and SSIM of our method image reach 15.67 dB and 0.45. Therefore, the application of network algorithms in our work improves the accuracy of the overall SPI system. Although our method can improve the image quality to some extent, there are limitations, mainly, blur in the resultant image, which will be addressed in the future. Also, we will investigate the approaches suitable for turbid water and will expand the structure of the CS-SRCNN to improve image quality further.