GPU-Accelerated Monte Carlo Simulation for a Single-Photon Underwater Lidar

: The Monte Carlo (MC) simulation, due to its ability to accurately simulate the backscattered signal of lidar, plays a crucial role in the design, optimization, and interpretation of the backscattered signal in lidar systems. Despite the development of several MC models for lidars, a suitable MC simulation model for underwater single-photon lidar, which is a vital ocean remote sensing technique utilized in underwater scientiﬁc investigations, obstacle avoidance for underwater platforms, and deep-sea environmental exploration, is still lacking. There are two main challenges in underwater lidar simulation. Firstly, the simulation results are signiﬁcantly affected by near-ﬁeld abnormal signals. Secondly, the simulation process is time-consuming due to the requirement of a high number of random processes to obtain reliable results. To address these issues, an algorithm is proposed to minimize the impacts of abnormal simulation signals. Additionally, a graphics processing unit (GPU)- accelerated semi-analytic MC simulation with a compute uniﬁed device architecture is proposed. The performance of the GPU-based program was validated using 10 9 photons and compared to a central processing unit (CPU)-based program. The GPU-based program achieved up to 68 times higher efﬁciency and a maximum relative deviation of less than 1.5%. Subsequently, the MC model was employed to simulate the backscattered signal in inhomogeneous water using the Henyey– Greenstein phase functions. By utilizing the look-up table method, simulations of backscattered signals were achieved using different scattering phase functions. Finally, a comparison between the simulation results and measurements derived from an underwater single-photon lidar demonstrated the reliability and robustness of our GPU-based MC simulation model.


Introduction
Oceanic lidar technology is an important complement to passive ocean color remote sensing [1,2] and has been applied in various fields, including underwater target detection [3], bathymetric measurement [4], phytoplankton [5], bubbles [6], and the inherent optical parameters (IOPs) of water [7].Due to the ability of lidar to penetrate the air-water interface, it has the flexibility to be deployed on various platforms, such as ships [7], unmanned aerial vehicles (UAVs) [8], aircraft and even satellites [4].However, despite the significant importance of the underwater deployment of lidar for underwater scientific investigations, obstacle avoidance for underwater platforms, and deep-sea environmental exploration, the development of underwater lidar is very limited [9].To improve the signalto-noise ratio (SNR) in lidar technology, an effective approach is to utilize high-energy pulses, typically in the range of tens of millijoules, along with large apertures, typically in the range of tens of cm.However, this results in high power consumption and large system volume, making it difficult to achieve the underwater deployment of lidar.
To achieve underwater deployment of lidar, the miniaturization of lidar is required.Although some compact lidar systems have been proposed and used in UAV-based oceanic lidar, such as RIEGL VQ-840-G [10], ASTRALiTe edge™ [11], and Fugro RAMMS [12], these systems are usually limited to detecting water depths below 10 m.Fortunately, the emergence of single-photon detection technology has driven the development of oceanic lidar towards miniaturization and low power consumption.By improving the detection sensitivity to the single-photon level, single-photon lidar can achieve long-range and highprecision parameter detection while being compact and low power [6].As a result, this system has been widely applied in atmospheric remote sensing [13][14][15], underwater target imaging [9], ship-mounted lidar [16], and, more recently, underwater lidar systems [6,17].However, despite the development of single-photon underwater lidar systems, there is a lack of simulation techniques for accurately simulating their backscattered signals.These simulation techniques are crucial for designing, optimizing instrument parameters, and interpreting the backscattered signals in underwater single-photon lidar systems.
There are two main methods for simulating the backscattered signals of lidar.One is the lidar equation [18], and the other is Monte Carlo (MC) simulation [19].Due to the presence of a significant amount of multiple scattering information in the backscattered signals of ocean lidar, the simulation using the lidar equation has certain limitations.Fortunately, MC simulation based on statistical theory can accurately simulate lidar signals.
In oceanic lidar, various MC simulation models have been developed [26], including simulations of the elastic [26] and inelastic [27] scattering of lidar, as well as polarization [28], and the consistency between simulated and measured results has been experimentally verified [20].At the same time, MC simulation techniques have played an important role in establishing the relationship between two parameters of lidar direct inversion, namely volume scattering function at 180 • (β, in sr −1 m −1 ) and the lidar attenuation coefficient (K lidar , in m −1 ), and the inherent optical properties (IOPs) of water, promoting the application of lidar in the ocean.Furthermore, K lidar represents the rate at which the lidar signal is attenuated while propagating through water.For example, MC simulation has been used to establish the relationship between K lidar , the beam attenuation coefficient (c, in m −1 ), and the diffuse attenuation coefficient (K d , in m −1 ) [24,29,30], as well as the relationship between K lidar of spaceborne lidar and IOPs of seawater [25].K d quantifies how quickly light is absorbed or scattered as it travels through water, while c describes the rate at which a laser beam undergoes attenuation as it passes through water.However, MC simulation requires many random processes and is computationally intensive.Therefore, to improve the computational efficiency, a white MC method, along with a semi-analytical model, has been developed [31,32].White MC is a variant of the MC method used in computational simulations, where "white" signifies that the random samples are generated independently and identically distributed.Furthermore, in the semi-analytical model, the calculation of the expected received signal is based on the probability distribution of the scattering phase function.This methodology optimizes photon utilization, resulting in reduced time requirements for achieving an equivalent SNR compared to standard MC simulations conducted under the same conditions.However, the computational efficiency of these methods remains relatively low.Fortunately, since Alerstam et al. first applied graphics processing unit (GPU) acceleration to MC simulation in 2009 [31], GPU technology based on massively threaded parallel computing has greatly improved the efficiency of MC simulation and been applied in various fields.Ren et al. used GPU-accelerated MC simulation to parallelly compute photon propagation in complex tissue media [33].Li et al. proposed an untruncated framework to replace the truncated framework, further improving the computational speed of the MC simulation of polarized photon scattering in anisotropic media [34].In addition, Yang constructed a model of the wind-generated bubble layer in the upper ocean and used GPU parallel MC simulation to simulate reflection and transmission spectra [35,36].
In summary, when using standard MC simulations to obtain high SNR backscattered signals for underwater lidar, the simulation time can be quite extensive.Typically, it ranges from several tens of minutes to even longer periods, depending on the computational performance of the processor and the parameter settings.Therefore, there is an urgent need to develop a MC simulation method based on GPU computing to improve the simulation speed of underwater single-photon lidar.This paper is structured as follows: Firstly, the MC simulation model and the framework for GPU parallel computing will be proposed.Then, the accuracy of GPU-based MC will be validated by comparing it to the CPU-MC results, and the simulation results will be presented for different scattering phase functions and non-uniform water.Finally, the comparison between the GPU-based MC simulation results and the measurement results from a self-developed single-photon underwater lidar system will be carried out to validate the effectiveness of the model.

Principle and Implement of the MC Methods
The MC method is used to simulate the random trajectories of photon propagation in a specific medium.Both the step and direction are determined based on the scattering and absorption properties of the medium.The step refers to the distance or interval traveled during each random sampling iteration, while the direction denotes the path taken by the photon.The MC method ignores the photon's wave properties, and the propagation of laser signal in the water is defined as the combination of many photon trajectories.The attenuation of laser energy is determined based on three factors: the absorption of the medium, the scattering probability, and the probability distribution of the steps.Thus, the MC method is widely utilized to simulate the photon propagation trajectories and monitor the energy changes.In this work, the MC algorithm is applied to simulate the backscattered signal of an underwater lidar.The framework of lidar transmission and reception are presented in Figure 1.The MC simulation could be divided into two categories, namely a standard model and a semi-analytic model.The standard MC model assumes that the motion of photon from emission to the receiver is completely random, and it can, therefore, theoretically reconstruct the real situation.However, due to the low backscattering probability and the limitations of small-aperture receiving, only a small percentage of photons can meet the reception conditions.To obtain high SNR backscattered signals, a large number of photons is required.As for semi-analytic MC simulation, whenever each photon moves to a new position, as long as it falls within the receiver's field of view (FOV), the expected energy in the direction of the receiver due to photon scattering is calculated based on the probability distribution of the scattering phase function [32].
Remote Sens. 2023, 15, x FOR PEER REVIEW 3 of 14 bubble layer in the upper ocean and used GPU parallel MC simulation to simulate reflection and transmission spectra [35,36].In summary, when using standard MC simulations to obtain high SNR backsca ered signals for underwater lidar, the simulation time can be quite extensive.Typically, it ranges from several tens of minutes to even longer periods, depending on the computational performance of the processor and the parameter se ings.Therefore, there is an urgent need to develop a MC simulation method based on GPU computing to improve the simulation speed of underwater single-photon lidar.This paper is structured as follows: Firstly, the MC simulation model and the framework for GPU parallel computing will be proposed.Then, the accuracy of GPU-based MC will be validated by comparing it to the CPU-MC results, and the simulation results will be presented for different sca ering phase functions and non-uniform water.Finally, the comparison between the GPU-based MC simulation results and the measurement results from a self-developed single-photon underwater lidar system will be carried out to validate the effectiveness of the model.

Principle and Implement of the MC Methods
The MC method is used to simulate the random trajectories of photon propagation in a specific medium.Both the step and direction are determined based on the sca ering and absorption properties of the medium.The step refers to the distance or interval traveled during each random sampling iteration, while the direction denotes the path taken by the photon.The MC method ignores the photon's wave properties, and the propagation of laser signal in the water is defined as the combination of many photon trajectories.The a enuation of laser energy is determined based on three factors: the absorption of the medium, the sca ering probability, and the probability distribution of the steps.Thus, the MC method is widely utilized to simulate the photon propagation trajectories and monitor the energy changes.In this work, the MC algorithm is applied to simulate the backscattered signal of an underwater lidar.The framework of lidar transmission and reception are presented in Figure 1.The MC simulation could be divided into two categories, namely a standard model and a semi-analytic model.The standard MC model assumes that the motion of photon from emission to the receiver is completely random, and it can, therefore, theoretically reconstruct the real situation.However, due to the low backscattering probability and the limitations of small-aperture receiving, only a small percentage of photons can meet the reception conditions.To obtain high SNR backsca ered signals, a large number of photons is required.As for semi-analytic MC simulation, whenever each photon moves to a new position, as long as it falls within the receiver's field of view (FOV), the expected energy in the direction of the receiver due to photon sca ering is calculated based on the probability distribution of the sca ering phase function [32].Figure 2 shows the framework of the semi-analytic MC model.The entire program can be broadly divided into three parts: the initialization of photon emission state, the propagation of photons in water, and the reception of photon energy using the telescope.Before each photon is emitted, its position and energy are initialized.Once emitted, the photon moves to a new position in random steps and can be subject to absorption and scattering events during this process.Subsequently, the photon's position is assessed, and if it falls within the FOV and satisfies certain noise reduction threshold conditions, its received energy is calculated and recorded.The photon then continues to move until it is extinguished, and the simulation concludes when all emitted photons have ceased to exist.For a more comprehensive understanding of the simulation process, it is advisable to refer to a recent article [26].The MC model is coded and tested using MATLAB software (MATLAB 2021b).The main simulation input parameters are listed in Table 1.If not otherwise specified, the input parameters of GPU program are the same as those shown in Table 1.
(MATLAB 2021b).The main simulation input parameters are listed in Table 1.If not otherwise specified, the input parameters of GPU program are the same as those shown in Table 1.In the semi-analytic MC model, every photon in the telescope FOV will return its expected value of energy and record its position.The expected value W b is calculated as follows: where K is related to the photon energy and the aperture of the receiver, P f is the probability of the cumulative distribution phase function between the light propagation direction and the axis from the photon to the telescope, z is the photon position on the z axis, H is the distance from the lidar to the water, and d is the position parameter and can be represented as follows: In the semi-analytic MC model, every photon in the telescope FOV will return its expected value of energy and record its position.The expected value Wb is calculated as follows: where K is related to the photon energy and the aperture of the receiver, Pf is the probability of the cumulative distribution phase function between the light propagation direction and the axis from the photon to the telescope, z is the photon position on the z axis, H is the distance from the lidar to the water, and d is the position parameter and can be represented as follows: where x, y, and z represent the position of a photon in Cartesian coordinates.

GPU-Accelerated MC Simulation
The Compute Unified Device Architecture (CUDA) was developed and proposed by NVIDIA in 2006, and it is recognized as a powerful platform and programming model for performing parallel computations.Parallel computation is the key kernel of the device, derived by the threads in GPU.During the GPU acceleration, the computing tasks of the GPU are defined and dispatched under CPU's command, including parallel and sequential computation.
The flowchart using GPU parallel acceleration computation is displayed in Figure 4, where the parameter settings of threads and optical parameters of water are completed in CPU and later transferred to GPU.In the GPU-based MC simulation, including the photon status initialization, the random motion of photons and the photon reception via the telescope are completed in each thread of GPU.Furthermore, the photon idle of MC simulation is independently determined in each thread, and they are collected together to make sure each thread has completed its MC simulation.Finally, the data recorded in the GPU are transferred to the CPU and saved in a certain format.
tion and the receiver telescope to be less than 0.99 of the threshold value of energy exp tation via a statistical analysis of the photon state with the abnormal signals, as show the red rhombic box in Figure 2. Subsequently, most abnormal signals were elimina as shown in Figure 3b.

GPU-Accelerated MC Simulation
The Compute Unified Device Architecture (CUDA) was developed and proposed NVIDIA in 2006, and it is recognized as a powerful platform and programming model performing parallel computations.Parallel computation is the key kernel of the dev derived by the threads in GPU.During the GPU acceleration, the computing tasks of GPU are defined and dispatched under CPU's command, including parallel and sequ tial computation.
The flowchart using GPU parallel acceleration computation is displayed in Figur where the parameter se ings of threads and optical parameters of water are complete CPU and later transferred to GPU.In the GPU-based MC simulation, including the pho status initialization, the random motion of photons and the photon reception via the t scope are completed in each thread of GPU.Furthermore, the photon idle of MC sim tion is independently determined in each thread, and they are collected together to m sure each thread has completed its MC simulation.Finally, the data recorded in the G are transferred to the CPU and saved in a certain format.

Verification of GPU Simulation
The results of the GPU-based MC simulation and the CPU-based simulation are c pared to 10 9 photons, as shown in Figure 5.The verification of parallel computation p gram is tested on a laptop equipped with Intel i7-8750H (using single thread) CPU NVIDIA Geforce ® GTX 1050Ti GPU.The GPU is equipped with 768 CUDA cores runn at a clock frequency of 1.12 GHz, while the CPU has 6 physical cores with a base cl

Verification of GPU Simulation
The results of the GPU-based MC simulation and the CPU-based simulation are compared to 10 9 photons, as shown in Figure 5.The verification of parallel computation program is tested on a laptop equipped with Intel i7-8750H (using single thread) CPU and NVIDIA Geforce ® GTX 1050Ti GPU.The GPU is equipped with 768 CUDA cores running at a clock frequency of 1.12 GHz, while the CPU has 6 physical cores with a base clock frequency of 2.2 GHz and the capability to reach a maximum turbo frequency of 4.1 GHz.In terms of performance evaluation, the CPU achieved a single-core benchmark score of 1059 in Cinebench R23.The relative deviation is calculated as follows: where GPU is result of the GPU-based model, and the CPU is result of the CPU-based model.
where GPU is result of the GPU-based model, and the CPU is result of the CPU-base model.
In Figure 5a, it can be seen that the results of the GPU-based simulation are consiste with those of the CPU-based simulation.Moreover, Figure 5b demonstrates that within depth range of 100 m, the relative deviation is within 1.5%.Due to the inherent random ness of the simulation, even with a sufficient number of photons, errors are unavoidab Additionally, as the depth increases, the simulated lidar backsca ered signals becom weaker, leading to a poorer SNR and, consequently, an increase in deviation with dept

Estimation of Speedup Rate
Both the number of photons and the number of threads have a significant effect o the GPU speedup rate.Thus, the effect of these two parameters is independently eval ated: under 10 8 photons, the GPU and CPU computations are executed 10 times in diffe ent threads, and the runtime of the program is averaged to calculate its speedup rate, shown in Figure 6a.Similarly, under 500 threads, the GPU and CPU computations a executed 10 times with different photon numbers, as shown in Figure 6b.It should noted that due to variations in the time required for each thread to complete a singl photon calculation, the coordination of output and initialization is necessary.After threads have completed the calculation for a single photon, the next photon calculation initiated.As the number of threads increases, the waiting time for other threads to com plete their calculations also increases, resulting in a deviation from the linear relationsh between the number of threads and the calculation speed, as shown in Figure 6a.A simil pa ern can be observed when considering the number of photons used in the calculatio as shown in Figure 6b.The testing involved a specific type of simulation scenario: a h mogeneous water medium with the Henyey-Greenstein (HG) sca ering phase functio The HG phase function has been widely applied in several studies, which only requir In Figure 5a, it can be seen that the results of the GPU-based simulation are consistent with those of the CPU-based simulation.Moreover, Figure 5b demonstrates that within a depth range of 100 m, the relative deviation is within 1.5%.Due to the inherent randomness of the simulation, even with a sufficient number of photons, errors are unavoidable.Additionally, as the depth increases, the simulated lidar backscattered signals become weaker, leading to a poorer SNR and, consequently, an increase in deviation with depth.

Estimation of Speedup Rate
Both the number of photons and the number of threads have a significant effect on the GPU speedup rate.Thus, the effect of these two parameters is independently evaluated: under 10 8 photons, the GPU and CPU computations are executed 10 times in different threads, and the runtime of the program is averaged to calculate its speedup rate, as shown in Figure 6a.Similarly, under 500 threads, the GPU and CPU computations are executed 10 times with different photon numbers, as shown in Figure 6b.It should be noted that due to variations in the time required for each thread to complete a single-photon calculation, the coordination of output and initialization is necessary.After all threads have completed the calculation for a single photon, the next photon calculation is initiated.As the number of threads increases, the waiting time for other threads to complete their calculations also increases, resulting in a deviation from the linear relationship between the number of threads and the calculation speed, as shown in Figure 6a.A similar pattern can be observed when considering the number of photons used in the calculation, as shown in Figure 6b.The testing involved a specific type of simulation scenario: a homogeneous water medium with the Henyey-Greenstein (HG) scattering phase function.The HG phase function has been widely applied in several studies, which only requires an anisotropic constant and is easily programmed and computed.As the number of photons and threads increased, the speedup gradually improved.When utilizing 500 GPU threads to simulate 10 8 photons, the CPU simulation took 2192 s, while the GPU simulation was completed in just 32 s with an average memory consumption of 224 MB, resulting in a speedup of 68.4.
Remote Sens. 2023, 15, x FOR PEER REVIEW 8 an anisotropic constant and is easily programmed and computed.As the number of p tons and threads increased, the speedup gradually improved.When utilizing 500 G threads to simulate 10 8 photons, the CPU simulation took 2192 s, while the GPU sim tion was completed in just 32 s with an average memory consumption of 224 MB, resu in a speedup of 68.4.

Simulation in Inhomogeneous Sea Water
To validate the GPU-based MC simulation, the model is developed to study pr gation of laser in inhomogeneous seawater.One of the hierarchical Gaussian mix models for inhomogeneous water, proposed by Kameda et al., as shown in Figure 7a been selected for our MC simulation [37].The corresponding absorption coefficient sca ering coefficient can been calculated as follows [38]: , where a (in m −1 ) is the total absorption coefficient of water, aw (in m −1 ) is the absorp coefficient of pure seawater, A is the normalized spectral absorption values of phytopl ton pigments, C (in mg/m 3 ) is the chlorophyll concentration, ay (in m −1 ) is the absorp coefficient of yellow substance, λ is the wavelength of light, and bp (in m −1 ) is the sca e coefficient of particles.The profiles of the chlorophyll concentration, the correspond absorption, and sca ering coefficient are shown in Figure 7.
The HG phase function, which only requires an anisotropic constant and is ea programmed and computed, has been adopted to simulate the backsca ered signal i homogeneous sea water [34].During the simulation process, the anisotropy factor v

Simulation in Inhomogeneous Sea Water
To validate the GPU-based MC simulation, the model is developed to study propagation of laser in inhomogeneous seawater.One of the hierarchical Gaussian mixture models for inhomogeneous water, proposed by Kameda et al., as shown in Figure 7a, has been selected for our MC simulation [37].The corresponding absorption coefficient and scattering coefficient can been calculated as follows [38]: a y (440) = 0.2 a w (440) + 0.06A(440)C 0.65 , ( 6) where a (in m −1 ) is the total absorption coefficient of water, a w (in m −1 ) is the absorption coefficient of pure seawater, A is the normalized spectral absorption values of phytoplankton pigments, C (in mg/m 3 ) is the chlorophyll concentration, a y (in m −1 ) is the absorption coefficient of yellow substance, λ is the wavelength of light, and b p (in m −1 ) is the scattering coefficient of particles.The profiles of the chlorophyll concentration, the corresponding absorption, and scattering coefficient are shown in Figure 7.The HG phase function, which only requires an anisotropic constant and is easily programmed and computed, has been adopted to simulate the backscattered signal in inhomogeneous sea water [34].During the simulation process, the anisotropy factor value was set to 0.9185, and the backscattering ratio (the ratio of backscattering coefficient to scattering coefficient) was set to 0.0183, which is consistent with the backscattering ratio of the Petzold phase function [39].Figure 7c presents the MC simulation result in stratified water layers using 10 9 photons.The vertical distribution of the absorption and the scattering coefficients are shown in Figure 7a,b, respectively.Since the c is the sum of the absorption and scattering coefficients, the vertical distribution of c can be obtained from Figure 7a,b.
was set to 0.9185, and the backsca ering ratio (the ratio of backsca ering coefficien sca ering coefficient) was set to 0.0183, which is consistent with the backsca ering ra of the Pe old phase function [39].Figure 7c presents the MC simulation result in stratifi water layers using 10 9 photons.The vertical distribution of the absorption and the sca ing coefficients are shown in Figure 7a,b, respectively.Since the c is the sum of the abso tion and sca ering coefficients, the vertical distribution of c can be obtained from Fig 7a,b.

Simulation with Different Sca ering Phase Functions
The HG phase function has been widely applied in several studies, as it only requ an anisotropic constant and is easily programmed and computed [40].However, the phase function shows a relatively poor performance with small and large angles.The fore, in the subsequent simulations, both the Fourier-Forand phase function and Pe old phase function will be utilized.The Fourier-Forand phase function, based on F rier series expansion, accurately represents sca ering phenomena at both small and la angles [39,41].On the other hand, the Pe old phase function is derived from actual me urement data, providing a highly accurate description of the sca ering distribution in w ter [40].We conducted the GPU parallel computation MC simulation using selected ph functions based on the LUT methods from previous researches [42].Four phase functio namely Founier-Forand, Pe old, small particle phase function, and large particle ph function, have been chosen to test the GPU acceleration performance of the MC simu tion.Figure 8 presents the corresponding MC simulation results with four phase fu tions.The simulation result is good at presenting the characteristics of the probability d tribution of those sca ering phase functions at the forward angles.Among them, the M simulation of the small particle phase function, which has the minimum forward sca ing, shows a much smaller signal a enuation in the water body of the first 10 m than other three sca ering phase functions.With the other sca ering phase functions,

Simulation with Different Scattering Phase Functions
The HG phase function has been widely applied in several studies, as it only requires an anisotropic constant and is easily programmed and computed [40].However, the HG phase function shows a relatively poor performance with small and large angles.Therefore, in the subsequent simulations, both the Fourier-Forand phase function and the Petzold phase function will be utilized.The Fourier-Forand phase function, based on Fourier series expansion, accurately represents scattering phenomena at both small and large angles [39,41].On the other hand, the Petzold phase function is derived from actual measurement data, providing a highly accurate description of the scattering distribution in water [40].We conducted the GPU parallel computation MC simulation using selected phase functions based on the LUT methods from previous researches [42].Four phase functions, namely Founier-Forand, Petzold, small particle phase function, and large particle phase function, have been chosen to test the GPU acceleration performance of the MC simulation.Figure 8 presents the corresponding MC simulation results with four phase functions.The simulation result is good at presenting the characteristics of the probability distribution of those scattering phase functions at the forward angles.Among them, the MC simulation of the small particle phase function, which has the minimum forward scattering, shows a much smaller signal attenuation in the water body of the first 10 m than the other three scattering phase functions.With the other scattering phase functions, the stronger the forward scattering of the scattering phase function, the greater the signal attenuation of the MC simulation results.
emote Sens. 2023, 15, x FOR PEER REVIEW stronger the forward sca ering of the sca ering phase function, the greater the tenuation of the MC simulation results.

Validation
To verify the simulation ability of MC model, a field experiment was condu the single-photon underwater lidar in a swimming pool with dimensions of 5 (length×width×depth).The single-photon underwater lidar is composed of a gr laser, an optical receiver, a single-photon avalanche diode (SPAD), a time-to-d verter (TDC), and a function generator (FG).The 532 nm laser emits a pulse wi duration of 501 ps and a repetition rate of 1 MHz.The laser system employs oscillator power amplifier (MOPA) structure, which incorporates a single-mo seed laser operating at 1064 nm.The seed laser undergoes two stages of amplific through a single-mode y erbium-doped fiber amplifier (SM-YDFA) and then high-power y erbium-doped fiber amplifier (HP-YDFA).The amplified laser ou passes through a lithium borate crystal (LBO), which converts the 1064 nm fun frequency light to 532 nm green light using second harmonic generation.Finally W of average power is achieved.The telescope diameter is ~20 mm, and the F mrad.
To achieve a miniaturized and robust structure for the single-photon und

Validation
To verify the simulation ability of MC model, a field experiment was conducted using the single-photon underwater lidar in a swimming pool with dimensions of 50 × 25 × 2 m 3 (length × width × depth).The single-photon underwater lidar is composed of a green pulse laser, an optical receiver, a single-photon avalanche diode (SPAD), a time-to-digital converter (TDC), and a function generator (FG).The 532 nm laser emits a pulse with a pulse duration of 501 ps and a repetition rate of 1 MHz.The laser system employs a master oscillator power amplifier (MOPA) structure, which incorporates a single-mode pulsed seed laser operating at 1064 nm.The seed laser undergoes two stages of amplification, first through a single-mode ytterbium-doped fiber amplifier (SM-YDFA) and then through a high-power ytterbium-doped fiber amplifier (HP-YDFA).The amplified laser output then passes through a lithium borate crystal (LBO), which converts the 1064 nm fundamental frequency light to 532 nm green light using second harmonic generation.Finally, up to 1.0 W of average power is achieved.The telescope diameter is ~20 mm, and the FOV is ~10 mrad.
To achieve a miniaturized and robust structure for the single-photon underwater lidar, a fiber-connected configuration was adopted.The backscattered signal from the water is coupled in a multimode fiber (MMF) via a collimator.This narrow receiving FOV in our lidar system offered significant backscattering noise suppression.The distance between the transmitted laser and the received collimator is ~15 mm.A 1-inch filter is placed in front of the collimator to filter out background radiation noise, with a central wavelength of 532 nm and a bandwidth of 0.08 nm.The backscattered elastic signal is then detected via a corresponding SPAD.The output of the SPAD is then connected to a time-to-digital converter (TDC).After the histogram statistic, the TDC sends the collected photon signals via the serial port to a personal computer (PC) for display.The photo of the single-photon underwater lidar is shown in Figure 9b.The lidar chamber is made of titanium alloy with high-pressure resistance characteristics, meaning that the lidar can operate underwater at depths of up to 1 km.The optical window of the lidar is made of a sapphire lens, which can maintain >96% transmission under high pressure.The cylindrical lidar has a diameter of 20 cm and a length of 40 cm.The average power consumption of the lidar is ~80 W, and it weighs 15 kg.It should be noted that the direction of the lidar is parallel to the axial direction of the lidar chamber.The direction of the lidar can be adjusted by modifying the arrangement of the lidar chamber.The key parameters of the single-photon underwater lidar are present in Table 2.
emote Sens. 2023, 15, x FOR PEER REVIEW converter (TDC).After the histogram statistic, the TDC sends the collecte via the serial port to a personal computer (PC) for display.The photo of t underwater lidar is shown in Figure 9b.The lidar chamber is made of tit high-pressure resistance characteristics, meaning that the lidar can opera depths of up to 1 km.The optical window of the lidar is made of a sapp can maintain >96% transmission under high pressure.The cylindrical lid of 20 cm and a length of 40 cm.The average power consumption of the li it weighs 15 kg.It should be noted that the direction of the lidar is pa direction of the lidar chamber.The direction of the lidar can be adjusted arrangement of the lidar chamber.The key parameters of the single-ph lidar are present in Table 2.  To extend the detection distance, a 300 × 300 mm 2 reflective mirror with high reflection (≥98%) at 532 nm was placed on the wall of the pool to reflect the laser.The simulation signals are obtained via the MC simulation model derived using the GPU parallel computa-tions with a photon number of 10 9 .The raw profile with 2 s temporal resolution and 0.5 ns sample ratio is plotted in Figure 10a.Using an absorption and attenuation meter AC-9 (WET Labs, Inc., Philomath, OR, USA), the absorption coefficient (a, in m −1 ) and the beam attenuation coefficient (c, in m −1 ) at 532 nm were simultaneously measured.The Petzold was adopted as the scattering phase function in the simulation.The simulation signals were compared to the measurement signals after being normalized, as illustrated in Figure 10a.It is found that the simulation signals are consistent with the measurement signals, and the measurement signals show a peak value in 55 m, which is due to the reflection of the total reflector placed on the wall of the swimming pool.Figure 10b

Conclusions
In this study, a semi-analytic MC simulation algorithm based on GPU parallel computation was proposed to simulate lidar signals.The implementation of the semi-analytic simulation program on a laptop demonstrated a speedup ratio of up to 68.The LUT method was utilized in the simulation model with different sca ering phase functions.An algorithm was introduced to effectively minimize abnormal simulation signals.A comparison between the simulated and experimental signals showed a relatively high level of  Despite achieving a nearly 68-fold speed improvement, there remains potential for the further optimization and acceleration of GPU computation and programming.Firstly, modifying the program architecture by adopting a non-stop scheme instead of the conventional cut-off approach could better utilize CUDA threads [43].Additionally, optimizing the codes to minimize command divisions would significantly reduce computation time.Moreover, optimizing the data storage structure can reduce data transfer time and volume during simulation.In this study, only the HG scattering phase function was used for simulation in inhomogeneous water.In future studies, other scattering phase functions, including the double HG phase function, Petzold phase function, large particle phase function, and small particle phase function will also be adopted.In addition, the MC simulation simulated not only the elastic unpolarized scattering signals of water bodies but also the inelastic or polarized scattering signals.In future research, this model will be applied not only in pristine water but also in coastal and oceanic environments to further validate its performance.Efforts will be made to further enhance the computation speed of the GPU-based semi-analytic MC simulation and use this model to simulate polarization and fluorescence signals.

Figure 2 .
Figure 2. Flow chart of the MC model.

Figure 2 .
Figure 2. Flow chart of the MC model.

Table 1 .
Main parameter settings of MC simulation.
) where x, y, and z represent the position of a photon in Cartesian coordinates.The simulated signals are shown in Figure 3.During the simulation process, the overall profile of the lidar backscattered signals and the vertical distribution of the single scattering, as well as multiple scattering signal components, including second-, third-, and higher-order scattering signals, were statistically analyzed.Based on the figure, it could be observed that in the near-field backscattered signals, the proportion of multiple scattering signals was relatively low compared to the total signal.However, as the depth increased, the proportion of multiple scattering signals gradually increased.Moreover, abnormal high values, characterized by random peak values, were obvious in the simulated signals, particularly in the multiple scattering signals.In the underwater lidar setup, the small value of H, representing the distance from the lidar to the water surface (approximately 5 cm in this study), increased the likelihood of encountering a small denominator (d), as shown in Equation (1).Additionally, the scattering phase function of water particles exhibited strong forward scattering, leading to the potential for large values in the term P f in Equation (1).Both of these factors contributed to the occurrence of anomalous high values when calculating the expected value W b using Equation (1).To minimize this abnormal signal, we set the cosine value of the angle between the photon propagation direction and the receiver telescope to be less than 0.99 of the threshold value of energy expectation via a statistical analysis of the photon state with the abnormal signals, as shown in the red rhombic box in Figure 2. Subsequently, most abnormal signals were eliminated, as shown in Figure 3b.Remote Sens. 2023, 15, x FOR PEER REVIEW 5 of 14

Figure 3 .
Figure 3.The simulation results (a) before and (b) after removing the abnormal signals with 109 photons.The number n represents the order of sca ering, and the "total" represents the total backsca ered signal.The simulated signals are shown in Figure 3.During the simulation process, the overall profile of the lidar backsca ered signals and the vertical distribution of the single scattering, as well as multiple sca ering signal components, including second-, third-, and higher-order sca ering signals, were statistically analyzed.Based on the figure, it could be observed that in the near-field backsca ered signals, the proportion of multiple scattering signals was relatively low compared to the total signal.However, as the depth in-

Figure 3 .
Figure 3.The simulation results (a) before and (b) after removing the abnormal signals with 109 photons.The number n represents the order of scattering, and the "total" represents the total backscattered signal.

Figure 4 .
Figure 4. Framework of the GPU program.

Figure 4 .
Figure 4. Framework of the GPU program.

Figure 6 .
Figure 6.(a) Acceleration effect of GPU under a different number of threads.(b) Acceleration e of GPU under a different number of photons.

Figure 6 .
Figure 6.(a) Acceleration effect of GPU under a different number of threads.(b) Acceleration effect of GPU under a different number of photons.

Figure 7 .
Figure 7. (a) The chloroplast Gaussian stratification.(b) The corresponding absorption coeffic stratification and sca ering coefficient stratification.(c) The simulation results obtained using phase function.The value of n the order of sca ering, and the "total" represents the multiple sca ering signal.

Figure 7 .
Figure 7. (a) The chloroplast Gaussian stratification.(b) The corresponding absorption coefficient stratification and scattering coefficient stratification.(c) The simulation results obtained using HG phase function.The value of n represents the order of scattering, and the "total" represents the full multiple scattering signal.

Figure 8 .
Figure 8.(a) Simulation results obtained using the Peztold phase function (a) and the Forand (FF) phase function (b).The large particle phase function (c) and the small par function (d).The n represents the order of sca ering, and the "total" represents the fu sca ering signal.

Figure 8 .
Figure 8.(a) Simulation results obtained using the Peztold phase function (a) and the Fournier-Forand (FF) phase function (b).The large particle phase function (c) and the small particle phase function (d).The n represents the order of scattering, and the "total" represents the full multiple scattering signal.
shows the regression diagram of the simulated signals and experimental signals.The simulation signals show an excellent consistency with experimental signals with a high R-square of 0.98, as shown in Figure 10b.The results demonstrated that the GPU-accelerated MC simulation program can efficiently simulate the experimental signals in clean water.

Table 2 .
Key parameters of single-photon underwater lidar.Parameter Value Pulse duration of the laser 501 ps Pulse energy of the laser 1 µJ Pulse repetition rate of the laser 1 MHz Diameter of the telescope 20 mm Detection efficiency at 532 nm 52% Dark count 100 cps Size of the lidar Φ20 cm × 40 cm Power consumption of the lidar ≈80 W PEER REVIEW 12 of 14 ns sample ratio is plo ed in Figure 10a.Using an absorption and a enuation meter AC-9 (WET Labs, Inc., Philomath, OR, USA), the absorption coefficient (a, in m −1 ) and the beam a enuation coefficient (c, in m −1 ) at 532 nm were simultaneously measured.The Pe old was adopted as the sca ering phase function in the simulation.The simulation signals were compared to the measurement signals after being normalized, as illustrated in Figure 10a.It is found that the simulation signals are consistent with the measurement signals, and the measurement signals show a peak value in 55 m, which is due to the reflection of the total reflector placed on the wall of the swimming pool.Figure 10b shows the regression diagram of the simulated signals and experimental signals.The simulation signals show an excellent consistency with experimental signals with a high R-square of 0.98, as shown in Figure 10b.The results demonstrated that the GPU-accelerated MC simulation program can efficiently simulate the experimental signals in clean water.

Figure 10 .
Figure 10.(a) Comparison between experimental and simulated data.(b) Linear regression of experimental and simulated data.

Figure 10 .
Figure 10.(a) Comparison between experimental and simulated data.(b) Linear regression of experimental and simulated data.
In this study, a semi-analytic MC simulation algorithm based on GPU parallel computation was proposed to simulate lidar signals.The implementation of the semi-analytic simulation program on a laptop demonstrated a speedup ratio of up to 68.The LUT method was utilized in the simulation model with different scattering phase functions.An algorithm was introduced to effectively minimize abnormal simulation signals.A comparison between the simulated and experimental signals showed a relatively high level of agreement.

Table 1 .
Main parameter se ings of MC simulation.