Efficient and Noise Robust Photon-Counting Imaging with First Signal Photon Unit Method

Efficient photon-counting imaging in low signal photon level is challenging, especially when noise is intensive. In this paper, we report a first signal photon unit (FSPU) method to rapidly reconstruct depth image from sparse signal photon counts with strong noise robustness. The method consists of acquisition strategy and reconstruction strategy. Different statistic properties of signal and noise are exploited to quickly distinguish signal unit during acquisition. Three steps, including maximum likelihood estimation (MLE), anomaly censorship and total variation (TV) regularization, are implemented to recover high quality images. Simulations demonstrate that the method performs much better than traditional photon-counting methods such as peak and cross-correlation methods, and it also has better performance than the state-of-the-art unmixing method. In addition, it could reconstruct much clearer images than the first photon imaging (FPI) method when noise is severe. An experiment with our photon-counting LIDAR system was conducted, which indicates that our method has advantages in sparse photon-counting imaging application, especially when signal to noise ratio (SNR) is low. Without the knowledge of noise distribution, our method reconstructed the clearest depth image which has the least mean square error (MSE) as 0.011, even when SNR is as low as −10.85 dB.


Introduction
Photon-counting imaging with time-correlated single-photon counting technique (TCSPC) has obtained much research interest due to its high time resolution and sensitivity [1][2][3][4][5][6]. Among other advantages, photon-counting imaging allows to operate in extremely low light level environment [7,8], which is of significant help for improving image quality when signal intensity is restricted, such as in remote imaging or low-reflectivity target imaging. For photon-counting LIDAR applied in such weak echo scenario, it generally requires a long acquisition time to not only suppress false alarms caused by noise counts but also to obtain sufficient signal photons to reconstruct clear images. However, such requirement would obviously limit imaging efficiency.
Methods to promote imaging efficiency (less dwell time) at low light level have been previously investigated by some researchers, and their methods can generally be divided into two categories. The first category is to set acquisition time for each pixel fixed and to use a variety of prior information such as correlation of adjacent pixels or the number of depth clusters to recover depth image [9][10][11][12][13]. Methods of this category are essentially not optimal for high efficiency imaging because signal photon levels are actually different from pixel to pixel, and the fixed acquisition time is either redundant for bright (high reflectivity) pixels or inadequate for dark (low reflectivity) pixels. The other category is to scan each pixel with a photon level-dependent time. A well-known method of this category is first photon imaging (FPI) [14][15][16] which utilizes only the first detected photon to reconstruct images. However, according to its supplementary material [15], in the conducted experiment, average signal photon and noise photon per pulse were 0.09 and 0.1, respectively. This signal to noise ratio (SNR) is almost 0 dB and is too high to be achieved for many applications such as daylight imaging or underwater imaging where solar noise or backscattered noise is high. As would be demonstrated later, FPI is not suited to low SNR environments where most of its first photon counts would be due to uninformative noise.
For photon-counting LIDARs used in extremely low signal photon environments with significant noise, an efficient imaging method with noise robustness is needed. In this article, we reported a first signal photon unit (FSPU) method which utilizes the different statistics of noise counts and signal counts to rapidly distinguish signal photon clusters. In our method, more pulses are transmitted to illuminate dark pixels, while less pulses are transmitted to bright pixels to maintain both efficiency and image quality. The method is able to reduce acquisition time sharply compared to a traditional photon-counting algorithm such as the peak and cross-correlation method, and it also performed better than FPI, especially in low SNR scenarios. The method does not require any of the background noise calibrations that need to be conducted in the state-of-the-art photon-efficient methods by D. Shin et al. [9] and J. Rapp et al. [12]. Both simulation and experiment were conducted to verify the feasibility of the method.

Methods
The overall photon-counting process is the Poisson point process [17], and it could be viewed as the sum of two independent Poisson processes, that is, signal inhomogeneous Poisson process and noise homogenous Poisson process. The rate of signal Poisson process is actually concentrated within pulse width, whereas the rate of noise Poisson process is flat. Considering the low light level environment, pile-up effect [18] or dead time effect could be ignored, and, accordingly, the time tags of signal counts and noise counts would obey Gaussian distribution (we suppose laser pulse waveform is Gaussian) and uniform distribution, respectively, as Figure 1 indicates. scan each pixel with a photon level-dependent time. A well-known method of this category is first photon imaging (FPI) [14][15][16] which utilizes only the first detected photon to reconstruct images. However, according to its supplementary material [15], in the conducted experiment, average signal photon and noise photon per pulse were 0.09 and 0.1, respectively. This signal to noise ratio (SNR) is almost 0 dB and is too high to be achieved for many applications such as daylight imaging or underwater imaging where solar noise or backscattered noise is high. As would be demonstrated later, FPI is not suited to low SNR environments where most of its first photon counts would be due to uninformative noise. For photon-counting LIDARs used in extremely low signal photon environments with significant noise, an efficient imaging method with noise robustness is needed. In this article, we reported a first signal photon unit (FSPU) method which utilizes the different statistics of noise counts and signal counts to rapidly distinguish signal photon clusters. In our method, more pulses are transmitted to illuminate dark pixels, while less pulses are transmitted to bright pixels to maintain both efficiency and image quality. The method is able to reduce acquisition time sharply compared to a traditional photon-counting algorithm such as the peak and cross-correlation method, and it also performed better than FPI, especially in low SNR scenarios. The method does not require any of the background noise calibrations that need to be conducted in the state-of-the-art photon-efficient methods by D. Shin et al. [9] and J. Rapp et al. [12]. Both simulation and experiment were conducted to verify the feasibility of the method.

Methods
The overall photon-counting process is the Poisson point process [17], and it could be viewed as the sum of two independent Poisson processes, that is, signal inhomogeneous Poisson process and noise homogenous Poisson process. The rate of signal Poisson process is actually concentrated within pulse width, whereas the rate of noise Poisson process is flat. Considering the low light level environment, pile-up effect [18] or dead time effect could be ignored, and, accordingly, the time tags of signal counts and noise counts would obey Gaussian distribution (we suppose laser pulse waveform is Gaussian) and uniform distribution, respectively, as Figure 1 indicates. According to Gaussian theory, the variance of signal time tags would be equal to where denotes root mean square (RMS) pulse width. Variance of noise time tags, from uniform distribution theory, would be 1 /12 , where is the pulse repetition period. For ≫ , the most evident difference is that the variance in signal counts is rather tiny, while variance in noise counts is large, which means signal counts tend to cluster together and noise photons would occupy the whole temporal duration. Exploiting this characteristic, we developed our FSPU method with two aspects; acquisition strategy and reconstruction strategy. The essence of acquisition strategy is straightforward and it can be briefly interpreted that we will not transfer to the next pixel until the first signal photon unit of current pixel has been distinguished. More specifically, we define two acquisition parameters, unit size and unit range . For each laser pulse, Signal and noise Poisson model, λ s (t), λ n (t) and λ(t) are signal rate, noise rate and total rate respectively, Z is depth and c is velocity of light.
According to Gaussian theory, the variance of signal time tags would be equal to T 2 p where T p denotes root mean square (RMS) pulse width. Variance of noise time tags, from uniform distribution theory, would be 1/12 T 2 r , where T r is the pulse repetition period. For T r T p , the most evident difference is that the variance in signal counts is rather tiny, while variance in noise counts is large, which means signal counts tend to cluster together and noise photons would occupy the whole temporal duration.
Exploiting this characteristic, we developed our FSPU method with two aspects; acquisition strategy and reconstruction strategy. The essence of acquisition strategy is straightforward and it can be briefly interpreted that we will not transfer to the next pixel until the first signal photon unit of current pixel has been distinguished. More specifically, we define two acquisition parameters, unit size µ and unit range . For each laser pulse, time tags of photon counts, if any, will be registered to system, and when the registered photon counts are found to be clustered, that is, µ photon counts within range in our method, we believe these µ counts are all due to signal photons and they are defined as FSPU, and then we transfer to next pixel. Figure 2 demonstrates our method. tonics 2021, 8,229 time tags of photon counts, if any, will be registered to system, and when the registered photon counts are found to be clustered, that is, photon counts within range in our method, we believe these counts are all due to signal photons and they are defined as FSPU, and then we transfer to next pixel. Figure 2 demonstrates our method. Reconstruction strategy consists of three steps. The first step is to estimate range time through maximum likelihood estimation (MLE). Since we believe distinguished FSPU is caused by signal photons, the MLE of expectation of Gaussian is actually the average of FSPU as Equation (1) shows.
Subscript ( , ) represents the -th row -th column pixel and , means the time tag of -th count of FSPU. The second step is anomaly censorship. Although noise counts generally would not cluster, there is a chance that noise counts would meet the -criteria earlier than signal counts, and thusly they are falsely deemed as FSPU. In this step, for pixel ( , ), we will calculate the median value , of its 3 × 3 neighborhood and compare the pixel value , with this median. If the pixel value is 2 away from median, it is believed to be noisy. The underlying reason is that 95% signal counts would be within ±2 around expectation for Gaussian distribution. The noisy pixels would be replaced with their medians as Equation (2) The third step is total variation (TV) regularization [19,20]. We add a TV penalty term to our estimated time image, as: where is an ( , ) size matrix whose -th row -th column element is , and ( , ) is the size of target image. TV is defined as: Equation (3) is actually a minimum optimization problem where is a coefficient to make a trade-off between MLE data and correlations of adjacent pixels. In experiment, we used = 0.5 and its optimal value is the focus of our future research. After acquisition of reconstructed time image , range image is obtained by = /2. Reconstruction strategy consists of three steps. The first step is to estimate range time through maximum likelihood estimation (MLE). Since we believe distinguished FSPU is caused by signal photons, the MLE of expectation of Gaussian is actually the average of FSPU as Equation (1) shows.
Subscript (i, j) represents the i-th row j-th column pixel and t k i,j means the time tag of k-th count of FSPU. The second step is anomaly censorship. Although noise counts generally would not cluster, there is a chance that noise counts would meet the µ-criteria earlier than signal counts, and thusly they are falsely deemed as FSPU. In this step, for pixel (i, j), we will calculate the median value T med i,j of its 3 × 3 neighborhood and compare the pixel value T i,j with this median. If the pixel value is 2T p away from median, it is believed to be noisy. The underlying reason is that 95% signal counts would be within ±2T p around expectation for Gaussian distribution. The noisy pixels would be replaced with their medians as Equation (2) demonstrates.
The third step is total variation (TV) regularization [19,20]. We add a TV penalty term to our estimated time image, as: where T is an (M, N) size matrix whose i-th row j-th column element is T i,j and (M, N) is the size of target image. TV is defined as: Equation (3) is actually a minimum optimization problem where α is a coefficient to make a trade-off between MLE data and correlations of adjacent pixels. In experiment, we used a = 0.5 and its optimal value is the focus of our future research. After acquisition of reconstructed time imageT, range imageẐ is obtained byẐ = cT/2.

Simulations
To quantify our method performance, we first used the Middlebury dataset [21] to conduct simulations. Figure 3a describes the distribution of signal counts numbers per pulse I i,j and Figure 3b is the truth range image Z i,j . Signal counts were generated from an inhomogeneous Poisson process with as its rate where RMS width T p was set as 0.6 ns to maintain consistency with the laser used in our later experiment. Noise counts were generated from a homogeneous Poisson process with noise intensity Φ as its rate. Image size (M, N) and T r were set to be (139, 111) and 200 ns, respectively, in simulations. Φ were chosen as 0.01 MHz, 0.1 MHz, 0.25 MHz, 0.5 MHz, 0.75 MHz and 1 MHz, consecutively, to generate several different noise levels. The mean SNR is evaluated as:

Simulations
To quantify our method performance, we first used the Middlebury dataset [21] to conduct simulations. ) as its rate where RMS width was set as 0.6 ns to maintain consistency with the laser used in our later experiment. Noise counts were generated from a homogeneous Poisson process with noise intensity Φ as its rate. Image size ( , ) and were set to be (139,111) and 200 ns, respectively, in simulations. Φ were chosen as 0.01 MHz, 0.1 MHz, 0.25 MHz, 0.5 MHz, 0.75 MHz and 1 MHz, consecutively, to generate several different noise levels. The mean SNR is evaluated as: In the simulation, we chose and to be 5 and 1.2 ns, respectively, which meant our FSPU was defined as five counts within 1.2 ns temporal width. The reconstruction processes with Φ = 0.25 MHz are demonstrated in Figure 4.  We also reconstructed depth images using other methods such as peak, cross-correlation, unmixing [12] and FPI [13] to compare their performances with our FSPU method. Mean square error (MSE), defined as: is utilized as a measure of reconstruction performance where , is true depth of pixel ( , ). Efficiency is measured through the number of emitted pulses per pixel (ppp). Less pulses means less acquisition time and, thus, higher efficiency. Although numbers of pulses for each pixel vary for our FSPU and FPI, we can calculate the average number of pulses per pixel to evaluate efficiency, that is, total emitted pulses number divided by pixel size. In the simulation, we chose µ and to be 5 and 1.2 ns, respectively, which meant our FSPU was defined as five counts within 1.2 ns temporal width. The reconstruction processes with Φ = 0.25 MHz are demonstrated in Figure 4.

Simulations
To quantify our method performance, we first used the Middlebury dataset [21] to conduct simulations. ) as its rate where RMS width was set as 0.6 ns to maintain consistency with the laser used in our later experiment. Noise counts were generated from a homogeneous Poisson process with noise intensity Φ as its rate. Image size ( , ) and were set to be (139,111) and 200 ns, respectively, in simulations. Φ were chosen as 0.01 MHz, 0.1 MHz, 0.25 MHz, 0.5 MHz, 0.75 MHz and 1 MHz, consecutively, to generate several different noise levels. The mean SNR is evaluated as: In the simulation, we chose and to be 5 and 1.2 ns, respectively, which meant our FSPU was defined as five counts within 1.2 ns temporal width. The reconstruction processes with Φ = 0.25 MHz are demonstrated in Figure 4.  We also reconstructed depth images using other methods such as peak, cross-correlation, unmixing [12] and FPI [13] to compare their performances with our FSPU method. Mean square error (MSE), defined as: is utilized as a measure of reconstruction performance where , is true depth of pixel ( , ). Efficiency is measured through the number of emitted pulses per pixel (ppp). Less pulses means less acquisition time and, thus, higher efficiency. Although numbers of pulses for each pixel vary for our FSPU and FPI, we can calculate the average number of pulses per pixel to evaluate efficiency, that is, total emitted pulses number divided by pixel size. We also reconstructed depth images using other methods such as peak, cross-correlation, unmixing [12] and FPI [13] to compare their performances with our FSPU method. Mean square error (MSE), defined as: is utilized as a measure of reconstruction performance where Z i,j is true depth of pixel (i, j). Efficiency is measured through the number of emitted pulses per pixel (ppp). Less pulses means less acquisition time and, thus, higher efficiency. Although numbers of pulses for each pixel vary for our FSPU and FPI, we can calculate the average number of pulses per pixel to evaluate efficiency, that is, total emitted pulses number divided by pixel size. First, we ran our FSPU method, and the number of emitted pulses per pixel was obtained through above-mentioned calculation. To compare performances of these methods with a common baseline, we restricted the number of emitted pulses for each pixel of peak, cross-correlation and unmixing to be identical to our FSPU, which means we restricted the efficiency of these three methods to be the same as our method. The result is displayed in Figure 5. First, we ran our FSPU method, and the number of emitted pulses per pixel was obtained through above-mentioned calculation. To compare performances of these methods with a common baseline, we restricted the number of emitted pulses for each pixel of peak, cross-correlation and unmixing to be identical to our FSPU, which means we restricted the efficiency of these three methods to be the same as our method. The result is displayed in Figure 5. Although FPI is the most efficient method (FPI only needs 1 cpp to reconstruct images and therefore the number of emitted pulses is the lowest), it has poor performance, especially when SNR is low. For example, when SNR is lower than 0 dB, MSEs of FPI are large and the reconstructed images are almost nonsense because most of its first photon counts are noisy and they convey no range information. Emitted pulses number for the first four methods seems much larger than FPI; however, as signal is weak, the signal detections are essentially as few as several counts. The peak method is a traditional photon-counting imaging method which chooses the position of histogram peak as rang time. The cross-correlation method is to make a cross correlation between system instrument response function (IRF) and histogram, and then it finds the position of maximum value.
From Figure 5, we could see that MSEs of these three fixed acquisition time methods are larger than our method, which indicates the reconstruction performance of our method is much better. The result is intuitive because the pulses number is identical for Although FPI is the most efficient method (FPI only needs 1 cpp to reconstruct images and therefore the number of emitted pulses is the lowest), it has poor performance, especially when SNR is low. For example, when SNR is lower than 0 dB, MSEs of FPI are large and the reconstructed images are almost nonsense because most of its first photon counts are noisy and they convey no range information. Emitted pulses number for the first four methods seems much larger than FPI; however, as signal is weak, the signal detections are essentially as few as several counts. The peak method is a traditional photon-counting imaging method which chooses the position of histogram peak as rang time. The crosscorrelation method is to make a cross correlation between system instrument response function (IRF) and histogram, and then it finds the position of maximum value.
From Figure 5, we could see that MSEs of these three fixed acquisition time methods are larger than our method, which indicates the reconstruction performance of our method is much better. The result is intuitive because the pulses number is identical for each pixel for these three methods, which does not take pixel differences into consideration.
Actually, when SNR decreases, the image quality of our method worsens, as Figure 5 shows. For low SNR conditions, we can improve the quality by increasing µ or decreasing to further avoid false clusters, although this may result in lower efficiency. Monte Carlo simulations were conducted to demonstrate how µ and influence efficiency and robustness, as Figure 6 indicates. each pixel for these three methods, which does not take pixel differences into consideration. Actually, when SNR decreases, the image quality of our method worsens, as Figure  5 shows. For low SNR conditions, we can improve the quality by increasing or decreasing to further avoid false clusters, although this may result in lower efficiency. Monte Carlo simulations were conducted to demonstrate how and influence efficiency and robustness, as Figure 6 indicates. Here we set noise intensity as 0.25 MHz, signal photon number as 0.01 per pulse and RMS laser width as 0.6 ns. If counts of a FSPU are all due to noise, it is deemed as a false alarm. From Figure 6, we could notice that larger and smaller would promote robustness, but reduce efficiency. In fact, the method allows us to make a trade-off between efficiency and noise robustness through and . The method is actually an extension of FPI. When = 1, it would be simplified to be FPI, which only focuses on imaging efficiency. In addition, background noise calibration is not required in our method. We only need to know an approximate level of noise to provide information to choose and , instead of measuring accurate noise distribution of each pixel which is generally a timeconsuming task. Experience shows = 5 and = 2 would be reliable choices for a variety of scenarios.

Experiments
To explore the performance of the method in a real imaging environment, we carried out experiments in a laboratory. The photon-counting LIDAR system is designed as Figure 7 demonstrates. Here we set noise intensity as 0.25 MHz, signal photon number as 0.01 per pulse and RMS laser width as 0.6 ns. If counts of a FSPU are all due to noise, it is deemed as a false alarm. From Figure 6, we could notice that larger µ and smaller would promote robustness, but reduce efficiency. In fact, the method allows us to make a trade-off between efficiency and noise robustness through µ and . The method is actually an extension of FPI. When µ = 1, it would be simplified to be FPI, which only focuses on imaging efficiency. In addition, background noise calibration is not required in our method. We only need to know an approximate level of noise to provide information to choose µ and , instead of measuring accurate noise distribution of each pixel which is generally a time-consuming task. Experience shows µ = 5 and = 2T p would be reliable choices for a variety of scenarios.

Experiments
To explore the performance of the method in a real imaging environment, we carried out experiments in a laboratory. The photon-counting LIDAR system is designed as Figure 7 demonstrates. Laser (CryLaS FDSS532) emits pulses at the fixed repetition rate of 10 KHz. Mirror 1 deflects the laser beam at 90 • into a linear polarizer. As the laser beam is linearly polarized from the laser, rotating the polarizer can adjust pulse energy afterwards. Two lenses, L1 and L2, are jointly used as beam expander to lower down laser divergence. A beam splitter deflects 1% beam into a PIN photodiode, the signal from which serves as start baseline. The remaining 99% beam continues its travel through a perforated mirror and then emits out to scenes of interest by two galvo scanning mirrors (Thorlabs GVS012). The back reflected echo photons enter the system in the same way and as the reflected beam is diffusely scattered, most of it would be deflected by the PERM into the receiving part. NBF moves away noise photons which are out of emitting wavelength. Eventually, the echo photons are coupled into a multimode fiber that connects to the Gm-APD (Micro Photon Devices PDM series $PD-050-CTC-FC whose photon detection efficiency η is about 42% for 532 nm wavelength and dead time is 77 ns) at the other end. A gate of 200 ns was utilized to block system back-reflection light. Laser (CryLaS FDSS532) emits pulses at the fixed repetition rate of 10 KHz. Mirror 1 deflects the laser beam at 90° into a linear polarizer. As the laser beam is linearly polarized from the laser, rotating the polarizer can adjust pulse energy afterwards. Two lenses, L1 and L2, are jointly used as beam expander to lower down laser divergence. A beam splitter deflects 1% beam into a PIN photodiode, the signal from which serves as start baseline. The remaining 99% beam continues its travel through a perforated mirror and then emits out to scenes of interest by two galvo scanning mirrors (Thorlabs GVS012). The back reflected echo photons enter the system in the same way and as the reflected beam is diffusely scattered, most of it would be deflected by the PERM into the receiving part. NBF moves away noise photons which are out of emitting wavelength. Eventually, the echo photons are coupled into a multimode fiber that connects to the Gm-APD (Micro Photon Devices PDM series $PD-050-CTC-FC whose photon detection efficiency is about 42% for 532 nm wavelength and dead time is 77 ns) at the other end. A gate of 200 ns was utilized to block system back-reflection light.
A football, a man-made mannequin, a box and the background wall were about 4.0 m, 5.5 m, 5.8 m and 6.1 m away from the front of our photon-counting LIDAR, respectively. The setup was placed in dark laboratory environment and an incandescent lamp with adjustable illumination was utilized as the source of noise. Details of the experiment are included in Appendix A. During the experiment, the echo level was extremely low. The signal counts level, on average, was about 0.0074 per pulse which meant that every 135 pulses would only generate 1 signal count. Three noise levels, the corresponding SNR of which were −5.14 dB, −8.35 dB and −10.85 dB, were achieved through the adjustment of the lamp. The result of the experiment is demonstrated in Figure 8. A football, a man-made mannequin, a box and the background wall were about 4.0 m, 5.5 m, 5.8 m and 6.1 m away from the front of our photon-counting LIDAR, respectively. The setup was placed in dark laboratory environment and an incandescent lamp with adjustable illumination was utilized as the source of noise. Details of the experiment are included in Appendix A. During the experiment, the echo level was extremely low. The signal counts level, on average, was about 0.0074 per pulse which meant that every 135 pulses would only generate 1 signal count. Three noise levels, the corresponding SNR of which were −5.14 dB, −8.35 dB and −10.85 dB, were achieved through the adjustment of the lamp. The result of the experiment is demonstrated in Figure 8. Acquisition of the ground truth depth image and signal-noise evaluation processes are available in Appendix A. From the result, our method shows better MSEs compared to others. FPI, without any doubt, is the most efficient method, but it suffers much larger error to be almost noninformative. Results of the experiment have the same tendency as simulation. For the same total acquisition time (which means the same efficiency), our method could recover higher quality depth image, which, on the other hand, reveals that to obtain the same quality image, our method would require much less time. The experiment data and code are provided as we show in the Supplementary Material.
It is worth noting that MSEs of the unmixing method are also tiny, almost the same Acquisition of the ground truth depth image and signal-noise evaluation processes are available in Appendix A. From the result, our method shows better MSEs compared to others. FPI, without any doubt, is the most efficient method, but it suffers much larger error to be almost noninformative. Results of the experiment have the same tendency as simulation. For the same total acquisition time (which means the same efficiency), our method could recover higher quality depth image, which, on the other hand, reveals that to obtain the same quality image, our method would require much less time. The experiment data and code are provided as we show in the Supplementary Material.
It is worth noting that MSEs of the unmixing method are also tiny, almost the same as our FSPU method. However, the unmixing method needs accurate noise levels to reconstruct clear images, while our method does not require such information. In this experiment, noise levels were evaluated beforehand and if noise levels are unknown, the unmixing method would have much worse reconstruction quality.

Discussion and Conclusions
In conclusion, we reported an efficient and noise robust photon-counting imaging method which is specifically suitable for low signal and high noise environments such as daylight imaging or underwater imaging. The method exploits the statistical difference that signal would cluster while noise would decentralize, in order to distinguish signal counts rapidly. The acquisition time of each pixel is a photon level-dependent variable and background noise calibration is not needed any more. The method is an extension of FPI and it outperforms the mainstreaming photon-counting methods in low SNR scenarios. By introducing two acquisition parameters µ and , the method allows users to make a trade-off between efficiency and robustness. For example, when SNR is low, larger µ and smaller would be helpful to avoid noise, although acquisition time might be longer. More importantly, the method does not need noise calibrations. Although it is not mentioned in the paper, the method has the ability to recover reflectivity images with the elapsed pulse number of each pixel. However, the elapsed pulse number is not only related to reflectivity, but also to noise level, and there is no such a mature theory or analytical solution to settle this problem, which is the priority of our future work.
to be due to signal, and using this dataset we reconstructed depth image as ground truth image, which is represented in the next section. Thereafter, we fixed the polarizer to keep the echo level unchanged. Then, we aimed to scan the scene with different noise levels.
Stage 2 (low): We adjusted the illumination of the lamp to low intensity, as Figure A1b shows. We raster scanned the scenes with the same 2 s acquisition time for each pixel and the same scanning resolution, and registered every photon count. A photon count dataset for low noise intensity was therefore acquired, which is named photon_counts_data_low in the Supplementary Material. Stage 3 (middle): Similarly, we adjusted the illumination of the lamp to middle intensity, as Figure A1c shows. We also raster scanned the scenes with the same 2 s acquisition time for each pixel and the same scanning resolution, and registered every photon count. A photon count dataset for middle noise intensity was therefore acquired, which is named photon_counts_data_mid in the Supplementary Material. Stage 4 (high): In the same way, we adjusted the illumination of the lamp to high intensity, as Figure A1d shows. We also raster scanned the scenes with the same 2 s acquisition time for each pixel and the same scanning resolution, and registered every photon count. A photon count dataset for high noise intensity was therefore acquired, which is named photon_counts_data_high in the Supplementary Material.

Appendix A.2. Evaluation of Signal and Noise Level
Before our reconstruction, we needed to quantify the levels of signal and noise. The signal level can be evaluated by finding the number of photon_counts_data_none minus the number of detector dark counts. The detector's dark count is 100 Hz according to its specification sheet which is 4 K for the whole dataset with consideration of 200 ns gate time, 20 K accumulation pulses and (100, 100) pixel size. Consequently, the overall signal counts number is estimated as 1.49 M. On average, for each pixel, the signal count level is 0.00744 per pulse, which means that more than 134 pulses would generate only one signal count.
Since we fixed a linear polarizer to keep the echo level unchanged, the noise count

Appendix A.2 Evaluation of Signal and Noise Level
Before our reconstruction, we needed to quantify the levels of signal and noise. The signal level can be evaluated by finding the number of photon_counts_data_none minus the number of detector dark counts. The detector's dark count is 100 Hz according to its specification sheet which is 4 K for the whole dataset with consideration of 200 ns gate time, 20 K accumulation pulses and (100, 100) pixel size. Consequently, the overall signal counts number is estimated as 1.49 M. On average, for each pixel, the signal count level is 0.00744 per pulse, which means that more than 134 pulses would generate only one signal count.
Since we fixed a linear polarizer to keep the echo level unchanged, the noise count number of photon_counts_data_low can be simply evaluated through the counts of pho-ton_counts_data_low minus previously obtained signal counts. Similar evaluations have also been applied to photon_counts_data_mid and photon_counts_data_high. The result indicated that the noise levels (with the consideration of photon detection efficiency) were 0.29 M, 0.61 M and 1.08 M, respectively, and their corresponding SNRs were −5.14 dB, −8.35 dB and −10.85 dB.

Appendix A.3 Ground Truth Depth Image
The dataset of photon_counts_data_none was utilized to recover the ground truth image because it is assumed to include only a few of dark noise counts. First, we filtered out dark noise counts using a straightforward median-like pixel-wise method. For each pixel, we calculated the median of its counts and compared every count with this median. If count time is 2T p away from the median (T p is RMS pulse width of IRF), this count is believed to be a dark noise count, and therefore would be abandoned. After dark noise filter, we recovered the ground truth depth image, simply using the mean of unfiltered counts. Figure A2 demonstrates depth images reconstructed from photon_counts_data_none with and without dark counts filter processing. filter, we recovered the ground truth depth image, simply using the mean of unfiltered counts. Figure A2 demonstrates depth images reconstructed from pho-ton_counts_data_none with and without dark counts filter processing.

Appendix A.4. Reconstruction Process
In this experiment, we used the same data set for our FSPU method and other methods to evaluate their performances. For our reconstruction process, we added photon counts from the dataset one-by-one, and when FSPU was discovered, we transferred to next pixel data. FPI operated in similar way to only use the first photon count of each pixel. After our FSPU method, average emitted pulses per pixel ℵ could be obtained and for each pixel, we only included photon counts generated by first ℵ pulses to reconstruct depth images with peak, cross-correlation and unmixing methods to maintain the same efficiency (same baseline) for performance comparison.

. Dataset Format
There are four experiment datasets and one SNR dataset provided in our Supplementary Material. They are .m files which are readable with Matlab. Each experiment dataset has two (100, 100) cell arrays which represent time and synchronization information under different noise levels for each pixel. SNR contains information of evaluated signal level and noise levels.

Appendix A.4 Reconstruction Process
In this experiment, we used the same data set for our FSPU method and other methods to evaluate their performances. For our reconstruction process, we added photon counts from the dataset one-by-one, and when FSPU was discovered, we transferred to next pixel data. FPI operated in similar way to only use the first photon count of each pixel. After our FSPU method, average emitted pulses per pixel ℵ could be obtained and for each pixel, we only included photon counts generated by first ℵ pulses to reconstruct depth images with peak, cross-correlation and unmixing methods to maintain the same efficiency (same baseline) for performance comparison.

Appendix A.5 Dataset Format
There are four experiment datasets and one SNR dataset provided in our Supplementary Material. They are .m files which are readable with Matlab. Each experiment dataset has two (100, 100) cell arrays which represent time and synchronization information under different noise levels for each pixel. SNR contains information of evaluated signal level and noise levels.