Signal Photon Extraction Method for Weak Beam Data of ICESat-2 Using Information Provided by Strong Beam Data in Mountainous Areas

: The Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) can measure the elevations of the Earth’s surface using a sampling strategy with unprecedented spatial detail. In the daytime of mountainous areas where the signal–noise ratio (SNR) of weak beam data is very low, current algorithms do not always perform well on extracting signal photons from weak beam data (i.e., many signal photons were missed). This paper proposes an effective algorithm to extract signal photons from the weak beam data of ICESat-2 in mountainous areas. First, a theoretical equation of SNR for ICESat-2 measured photons in mountainous areas was derived to prove that the available information provided by strong beam data can be used to assist the signal extraction of weak beam data (that may have very low SNR in mountainous areas). Then, the relationship between the along-track slope and the noise level was used as the bridge to connect the strong and weak beam data. To be speciﬁc, the along-track slope of the weak beam was inversed by the slope–noise relationship obtained from strong beam data, and then was used to rotate the direction of the searching neighborhood in the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. With the help of this process, the number of signal photons included in the searching neighborhood will signiﬁcantly increase in mountainous areas and will be easily detected from the measured noisy photons. The proposed algorithm was tested in the Tibetan Plateau, the Altun Mountains, and the Tianshan Mountains in different seasons, and the extraction results were compared with the results from the ATL03 datasets, the ATL08 datasets, and the classical DBSCAN algorithm. Based on the ground-truth signal photons obtained by visual inspection, the parameters of the classiﬁcation precision, recall, and F -score of our algorithm and three other algorithms were calculated. The modiﬁed DBSCAN could achieve a good balance between the classiﬁcation precision (93.49% averaged) and recall (89.34% averaged), and its F -score (more than 0.91) was higher than that of the other three methods, which successfully obtained a continuous surface proﬁle from weak beam data with very low SNRs. In the future, the detected signal photons from weak beam data are promising to assess the elevation accuracy achieved by ICESat-2, estimate the along-track and cross-track slope, and further obtain the ground control points (GCPs) for stereo-mapping satellites in mountainous areas. when the same environmental parameters and system parameters were used, the SNR values of the strong beam were approximately 4~6 dB larger than that of the weak beam because the energy of the strong beam was approximately four times larger than that of the weak beam, while the noise level was nearly identical. To be speciﬁc, the SNR value of the strong beam would be larger than the 4.7 dB threshold in most cases. However, when the surface slope exceeds 25 ◦ , the SNR value of the weak beam would be lower than the 4.7 dB threshold, which means that it might be difﬁcult to extract signal photons.


Introduction
The Ice, Cloud, and land Elevation Satellite (ICESat) launched in 2003 has achieved great success in monitoring the changes in the Earth's polar region and the datasets have also been widely used in many other scopes such as retrieving forest canopy heights, assessing biomass changes, and monitoring sea levels [1][2][3][4][5][6]. The laser transmitter of ICESat operates at a repetition rate of 40 Hz with a single beam at 532/1064 nm [7], and the detector and receiver were designed to have a linear response for echo pulses (nonlinear response and pulse distortion were found in on-orbit performance due to the saturation effect) [8]. In 9 September 2018 (https://icesat-2.gsfc.nasa.gov/articles/nasa%E2%80%99sicesat-2-arrives-launch-site), the National Aeronautics and Space Administration (NASA) launched its successor (i.e., the Ice, Cloud, and land Elevation Satellite-2 (ICESat-2)), which was on board with a photon-counting laser altimeter. The ICESat-2 can measure the elevations of the Earth's surface using a sampling strategy with unprecedented spatial detail [9]. Benefiting from the low energy requirement of photon-counting detectors, the laser transmitter is able to operate at a frequency of 10 kHz and the along-track spacing intervals reduce to only 0.7 m, which can obtain the along-track terrain profiles with much higher spatial resolution compared to that from the ICESat laser altimeter. A laser source in the satellite is separated into a six beam configuration with three beam pairs by the diffractive optical elements (DOE) with constant relative beam positioning [10]. The slight yaw of the satellite separates the components (including a strong and a weak beam) in a beam pair by 90 m [11].
The photomultiplier tube (PMT) detectors used in ICESat-2 are very sensitive to signal photons (i.e., the detector can respond to a single photon). However, the PMT detectors are simultaneously sensitive to noise photons at the same wavelength. As a result, the measured geolocated photons are normally very noisy, especially in the daytime when the solar radiance is strong. Many previous studies have proposed algorithms to distinguish signal photons from noise photons for airborne LiDARs [12][13][14][15][16]. For airborne scanning photon-counting LiDARs, Tang et al. [12] presented an alternate voxel-based spatial filtering algorithm to extract signal photons from the measured geolocated photons for the airborne Sigma Space's Single Photon LiDAR (SPL). Wang et al. [13,14] proposed an adaptive ellipsoid searching method to filter noise photons from the Sigma Space high-resolution quantum LiDAR system (HRQLS). Additionally, other inspiring works have been proposed and used for the photons captured by scanning LiDARs or imaging systems [15,16].
As the measured photons from profiling photon-counting LiDARs are quite different, NASA's research team has proposed many specific surface-finding algorithms, which performed well on, planned initially to some specific surface types (e.g., sea ice, ice sheets, land/vegetation, ocean, and inland water) [17][18][19][20][21][22][23][24][25][26][27][28][29][30]. The terrain characteristics significantly vary in different land-cover types (e.g., with different reflectivity, roughness, and slopes) and make the distribution of signal photons quite different. The measured geolocated photons include signal and noise and were labeled as signal or noise through signal finding techniques. For land/vegetation areas, many methods have also been proposed to identify signal photons from noisy measured photons from profiling photon-counting LiDARs. Magruder et al. [17] proposed three methods (i.e., the canny edge detection (CDE) algorithm, the probability distribution function (PDF) based signal extraction, and the localized statistical analysis method) to discard noise photons for the Multiple Altimeter Beam Experimental LiDAR (MABEL), and the qualitative evaluation indicated that the CDE algorithm had better performance in forest areas. Herzfeld et al. [19] proposed an algorithm based on spatial statistical and discrete mathematical concepts to detect the ground under a dense canopy for the simulated ICESat-2 data. Their further study utilized the radial basis function to determine an adaptive threshold and applied it in separating signal photons from the noise for data measured by an ICESat-2 simulator instrument (i.e., the Slope Imaging Multi-polarization Photon-counting LiDAR (SIMPL)) [22]. Popescu et al. [23] developed a multi-level noise filtering approach to minimize noise photons and subsequently classified remaining photons into ground and top of canopy using an overlapping moving window and cubic spline interpolation. This method performed well on Multiple Altimeter Beam Experimental LiDAR (MABEL) and simulated ATLAS data, and is one of the custom algorithms for processing ICESat-2 data. Neuenschwander and Pitt [24] proposed an inspiring and effective algorithm to detect signal photons on the ground and top of canopy surfaces and generate along-track elevation profile of the terrain and canopy heights. The Differential, Regressive and Gaussian Adaptive Nearest Neighbor (DRAGANN) algorithm has been proven to have a good performance on ICESat-2 ATLAS data in land-vegetation areas [24]. DRAGANN algorithms have been used to process ATL03 data and generate the mission's land-vegetation along-track products (ATL08) [27]. Malambo et al. [30] developed an inter-disciplinary platform, named PhotonLabeler, for visual interpretation and labeling for the ICESat-2 data product. This platform can be used to provide training and validation data in the development of new algorithms for the ICESat-2 mission.
In addition to the above methods, Wang et al. [31] calculated the probability of the k-th nearest neighbor (KNN) and used the Bayesian decision theory to successfully classify measured photons into noise and signal photons for the airborne MABEL. Zhang et al. [32,33] introduced the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm (that was originally studied and applied in image processing) into detecting signal photons from geolocated photons measured by photon-counting laser altimeters. This innovation performed very well and was further modified by many other studies. Based on Zhang's achievement, an adaptive threshold in the DBSCAN algorithm was determined and used for MABEL datasets [34]. Nie et al. [35] further modified the DBSCAN algorithm by rotating the elliptical searching neighborhood in all directions, which can be applied to extract the tree canopy as well as the terrain profile for MABEL datasets. Zhu et al. [36,37] applied Nie's algorithm in datasets measured by ICESat-2 and retrieved the vegetation height. In these studies, the volume and direction of the ellipsoid searching area were determined by a principle component analysis method. Huang et al. [38] proposed a modified DBSCAN algorithm for the simulated MATLAS data [39], but the study area was relatively flat with an elevation change of less than 20 m. In fact, as the energy of the weak beam of ICESat-2 was only a quarter to that of the strong beam and the background noise remained nearly identical, the signal-noise ratio (SNR) of weak beam data was much lower than that of strong beam data. When it comes to regions with rugged terrain and large surface slopes such as the Tibetan Plateau, the SNR of weak beam data would worsen and extracting the signal photons from weak beam data would be even more challenging.
The arrangement of pairs and beams allows for measuring the surface slopes in both along-and across-track directions with a single pass, enabling determination of height change from any two passes over the same site [11,40]. The extraction of precise signal photons from weak beam data makes it possible to calculate the local cross-track slope and interpolate the elevation to the reference ground track (RGT). In addition, if the signal photons in weak beam data were not precisely detected, half of the six along-track profiles were abandoned for ICESat-2. On this point, signal photons from weak beam data are of the same importance as those from strong beam data. The motivation of this study was to detect signal photons from very low SNR data of ICESat-2 s weak beams, and these data were mainly captured in mountainous areas with large slopes and at midday with high background noise level. First, a theoretical model was derived to estimate the SNR of geolocated photons measured by both strong and weak beams. Next, based on the available information provided by the strong beam in the same pair, the relationship between the along-track terrain slope and the background noise was fitted. Then, the key parameters of the modified DBSCAN algorithm were determined according to the system parameters of ICESat-2 and the fitted slope-noise relationship for weak beam data. Finally, in four typical mountainous areas (i.e., the Tibetan Plateau, the Altun Mountains, and Tianshan Mountains), the modified DBSCAN algorithm was tested using geolocated photons of ICESat-2 s weak beams and compared with the results from previous methods.

Overview of the Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) Photon-Counting LiDAR
ICESat-2 was launched in September 2018 and carried a new generation laser altimeter (i.e., the advanced topographic laser altimeter system (ATLAS)), which works in the photoncounting mode. The laser pulse generated by a 532-nm laser transmitter is split into six Remote Sens. 2021, 13, 863 4 of 31 beams (i.e., three strong beams and three weak beams) to effectively measure the Earth's surface. The footprint geometry of six beams is illustrated in Figure 1. The diameter of each laser footprint was less than 17 m at the nominal altitude of 500 km [10]. The six beams were divided into three pairs and each pair consisted of one strong beam and one weak beam. The energy of strong beams could be adjusted from 48 to 172 µJ, so that the mean number of signal photons reflected from the snow surface would be maintained to approximately 12 photoelectrons under a clear sky. The energy of the weak beams was fixed to a quarter to that of the strong beam [10,11].

Overview of the Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) Photon-Counting LiDAR
ICESat-2 was launched in September 2018 and carried a new generation laser al eter (i.e., the advanced topographic laser altimeter system (ATLAS)), which works in photon-counting mode. The laser pulse generated by a 532-nm laser transmitter is s into six beams (i.e., three strong beams and three weak beams) to effectively measure Earth's surface. The footprint geometry of six beams is illustrated in Figure 1. The dia ter of each laser footprint was less than 17 m at the nominal altitude of 500 km [10]. six beams were divided into three pairs and each pair consisted of one strong beam one weak beam. The energy of strong beams could be adjusted from 48 to 172 μJ, so the mean number of signal photons reflected from the snow surface would be maintai to approximately 12 photoelectrons under a clear sky. The energy of the weak beams fixed to a quarter to that of the strong beam [10,11]. Figure 1. Schematic diagram of laser footprints of the Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) on the ground in the forward orientation. The laser beam pattern is a 3 × 2 array in which one strong beam and one weak beam are grouped into a pair. When advanced topographic laser altimeter system (ATLAS) is oriented in the forward orientation, the weak beams are on the left side of the beam pair and are associated with ground tracks 1L, 2L, and 3L. When ATLAS is oriented in the backward orientation, the relative positions of weak and strong beams change, and the strong beams are on the left side of the ground track pairs and lead the weak beams [40]. The distance between each pair was approximately 3.3 km. In the same pair, the interval between the strong beam and the weak beam was 2.5 km. The cross-track distance between the two ground tracks in the same pair was 90 m.
The photon-counting receiver of each laser beam consisted of a 4 × 4 multi-an PMT. The difference between the strong and weak beams is that each PMT channel strong beam has an independent timing chain, while every four PMT channels sha timing chain for a weak beam [41]. The equipped PMT detectors are highly sensitiv signal photons as well as the noise photons at the same wavelength. The noise phot are generally composed of three parts (i.e., the sunlight that is scattered and reflected the atmosphere and the Earth's surface (normally called the background noise); the l pulse scattered by the atmosphere; and the dark counts triggered by the dark curren the detector itself). The dark count of each 16-channel receiver module is approxima 400 counts per second (CPS). In the daytime, the noise is dominated by the backgro Figure 1. Schematic diagram of laser footprints of the Ice, Cloud, and land Elevation Satellite-2 (ICESat-2) on the ground in the forward orientation. The laser beam pattern is a 3 × 2 array in which one strong beam and one weak beam are grouped into a pair. When advanced topographic laser altimeter system (ATLAS) is oriented in the forward orientation, the weak beams are on the left side of the beam pair and are associated with ground tracks 1L, 2L, and 3L. When ATLAS is oriented in the backward orientation, the relative positions of weak and strong beams change, and the strong beams are on the left side of the ground track pairs and lead the weak beams [40]. The distance between each pair was approximately 3.3 km. In the same pair, the interval between the strong beam and the weak beam was 2.5 km. The cross-track distance between the two ground tracks in the same pair was 90 m.
The photon-counting receiver of each laser beam consisted of a 4 × 4 multi-anode PMT. The difference between the strong and weak beams is that each PMT channel of a strong beam has an independent timing chain, while every four PMT channels share a timing chain for a weak beam [41]. The equipped PMT detectors are highly sensitive to signal photons as well as the noise photons at the same wavelength. The noise photons are generally composed of three parts (i.e., the sunlight that is scattered and reflected by the atmosphere and the Earth's surface (normally called the background noise); the laser pulse scattered by the atmosphere; and the dark counts triggered by the dark current of the detector itself). The dark count of each 16-channel receiver module is approximately 400 counts per second (CPS). In the daytime, the noise is dominated by the background noise, which is normally larger than 10 6 CPS [41]. Compared to the signal photons that Remote Sens. 2021, 13, 863 5 of 31 mainly concentrate upon the target, the noise photons are loosely distributed in the whole vertical range gate.

Study Areas and Datasets
Four mountainous areas were chosen in this study to verify the performance of the modified method, which was also compared with the classical method and the standard results from the ATL03 and ATL08 datasets. The first study area was near Xigaze City in the Tibetan Plateau and ICESat-2 flew over this area in December 2018. The second study area was in the Altun Mountains in the Xinjiang Uygur Autonomous Region, China, and ICESat-2 flew over this area in September 2019. The third and fourth study areas were in the eastern part and western part of the Tianshan Mountains, China, and ICESat-2 flew over these areas in February 2019 and in June 2020, respectively. The geographical locations of the four study sites are shown in Figure 2. According to the land-cover types from the GLOBELAND30 product in 2020 (http://www.globallandcover.com/home.html?type= data, accessed on 9 September 2018), in the four study areas, the bare-land and grassland were the dominant land-cover types. In addition, surface water (e.g., rivers in valleys) and snow (normally on the ubac slope in winter) were also observed from the Sentinel-2 image that was captured along the ICESat-2 tracks. The acquisition time of the Sentinel-2 image was close to that of the ICESat-2 data in each study area. Note that Sentinel-2 images were only used to show the along-track surface types or environmental information. The environmental information of the four study areas and the descriptions of used data are listed in Table 1.
Remote Sens. 2021, 13, x FOR PEER REVIEW noisy mainly due to the solar-induced background noise. In ATL03 datas ter named 'confidence level' is provided to identify every photon as sign The value of the confidence ranges from 0 to 4, and the photon is more like if the confidence value is higher. The algorithm that was used to calculat value assumes the noise photons satisfy a Poisson distribution and extract that are outliers to the Poisson distribution [44]. However, confidence resu are not always credible, especially in some complex surface shapes (e.g and mountainous areas). Therefore, many methods have been proposed t nal photons in complex surfaces, and the DBSCAN method that was used one of these methods. In addition, the DRAGANN algorithm [24] and noise filtering approach [23] can effectively detect signal photons on the of canopy surfaces and generate along-track elevation profiles of terra heights. As the ATL08 product covers mountainous areas, which were areas in this study, the detected signal photons in the ATL08 product wer comparison with the results from our method [45].

Signal-Noise Ratios (SNRs) of Different Laser Beams in Mountainous Area
The signal-to-noise ratio (SNR) is one of the key parameters for the rec a photon-counting LiDAR. The SNR can be used to assess if the signal detected from measured noisy photons. For photon-counting detectors, t did not show a linear correlation with the energy of the return pulse, s probability per shot was alternatively used [46,47]. In addition, multiple d ally equipped for photon-counting LiDARs to reduce the dead-time im  The ICESat-2 Level-2 ATL03 geolocated photons were used in this study and can be easily searched from https://search.earthdata.nasa.gov/search (accessed on 9 September 2018) and freely downloaded from the National Snow and Ice Data Center (NSIDC) of NASA. The L2 ATL03 datasets record the geolocated photons that include both signal and noise photons in six separate ground tracks (corresponding to three strong beams and three weak beams) [42]. Each photon event (PE) in the ATL03 datasets is recorded by its time tag, latitude, longitude, and elevation, which is based on the WGS84 ellipsoidal height [43]. In the daytime, measured geolocated photons in ATL03 datasets are very noisy mainly due to the solar-induced background noise. In ATL03 datasets, the parameter named 'confidence level' is provided to identify every photon as signal or noise [43]. The value of the confidence ranges from 0 to 4, and the photon is more likely to be a signal if the confidence value is higher. The algorithm that was used to calculate the confidence value assumes the noise photons satisfy a Poisson distribution and extracts signal photons that are outliers to the Poisson distribution [44]. However, confidence results in the ATL03 are not always credible, especially in some complex surface shapes (e.g., in forest areas and mountainous areas). Therefore, many methods have been proposed to detect the signal photons in complex surfaces, and the DBSCAN method that was used in this study is one of these methods. In addition, the DRAGANN algorithm [24] and the multi-level noise filtering approach [23] can effectively detect signal photons on the ground and top of canopy surfaces and generate along-track elevation profiles of terrain and canopy heights. As the ATL08 product covers mountainous areas, which were the main study areas in this study, the detected signal photons in the ATL08 product were also used as a comparison with the results from our method [45].

Signal-Noise Ratios (SNRs) of Different Laser Beams in Mountainous Areas
The signal-to-noise ratio (SNR) is one of the key parameters for the recorded PEs from a photon-counting LiDAR. The SNR can be used to assess if the signal photons can be detected from measured noisy photons. For photon-counting detectors, the recorded PEs did not show a linear correlation with the energy of the return pulse, so the detection probability per shot was alternatively used [46,47]. In addition, multiple detectors are usually equipped for photon-counting LiDARs to reduce the dead-time impact. There-fore, considering these effects, the theoretical SNR for a photon-counting LiDAR can be expressed as: p N s /n channel + f n /n channel · 4σ p · n channel p f n /n channel · 4σ p · n channel ≈ 10lg p(N s /n channel ) · n channel + f n · 4σ p f n · 4σ p , where N s is the theoretical number of signal photons received per shot; n channel is the channel number of the receiver module; p(N s /n channel ) is the detection probability of a photon-counting LiDAR considering the dead-time effect; σ p is the (root-mean square) RMS width of the distribution of return signal photons; and f n is the total noise rate. When illuminating on an inclined plane, the return laser pulse of a laser altimeter generally satisfies a Gaussian distribution [48,49], and the time duration of 4σ p (or ±2σ p ) covers over 95% of the total energy of a laser pulse. Within the time bins that may contain signal photons (i.e., ±2σ p ), the theoretical number of total noise photons N n is equal to f n ·4σ p . Based on the LiDAR equation, the theoretical number of signal photons N s for a nearly nadir incidence LiDAR can be expressed as where E 0 is the transmitted laser energy per shot; η t is the optical efficiency of the transmitter; η r is the optical efficiency of the receiver; η QE is the quantum efficiency of detectors; A r is the area of the receiver aperture; z is the LiDAR flight height; T a is the atmospheric transmittance (one way); h is the Planck constant; v is the frequency of the laser source; β L is the reflection coefficient of a target; and σ L is the surface slope of the target. Considering the dead-time effect and multiple channel effect, the detection probability p(N s /n channel ) can be expressed as [50,51] where t d is the dead-time of a single photon-counting detector. The average signal PE recorded by the system in one shot can be expressed as p(N s /n channel )·n channel . The total noise rate f n can be expressed as [46] f where f L and f A are related to the solar-induced background noise and represent the noise reflected by the Earth's surface and backscattered by the atmosphere, respectively; f D is the dark count of the receiver module; N λ 0 is the spectral solar irradiance measured outside the atmosphere; ∆λ is the band-pass of the optical filter; θ r is the half of field of view (FOV); θ s is the solar zenith angle; and ψ s is the angle between the normal direction of the target surface and the incident sunlight and can be calculated by [52] cos ψ s = cosσ L cosθ s − sin σ L sin θ s cos ϕ s , where ϕ s is the solar azimuth angle. The RMS width σ p would be mainly broadened by the surface slope and roughness, and it can be expressed as [48] where σ h is the pulse width stretching induced by the electric circuit; Var(ξ) is the square of the surface roughness; c is the light velocity in atmosphere; and the other system parameters of the ICESat-2 ATLAS are listed in Table 2 [10]. Substituting these parameters into Equation (1), the theoretical SNRs for the ICESat-2 strong and weak beams can be calculated under different environmental parameters.  Figure 3a illustrates the theoretical SNR contours of received signals from the ICESat-2 strong beam under different surface slopes (the abscissa axis) and solar zenith angles (the vertical axis), and Figure 3b corresponds to the weak beam. From Figure 3, the SNR will sharply decrease with the increase in the surface slope. When the solar zenith angle increases, the solar-induced background noise will decrease, and the SNR will improve correspondingly. Generally, if the SNR is less than 4.7 dB (or the number of signal photons is two times larger than that of noise photons), some signal photons may probably be missed when detecting signal photons from measured noisy photons. Given that the ground track of the strong beam is very close to that of the weak beam (i.e., 90 m in the cross-track direction), the environmental parameters were assumed to be identical for both beams. In Figure 3, when the same environmental parameters and system parameters were used, the SNR values of the strong beam were approximately 4~6 dB larger than that of the weak beam because the energy of the strong beam was approximately four times larger than that of the weak beam, while the noise level was nearly identical. To be specific, the SNR value of the strong beam would be larger than the 4.7 dB threshold in most cases. However, when the surface slope exceeds 25 • , the SNR value of the weak beam would be lower than the 4.7 dB threshold, which means that it might be difficult to extract signal photons.  Figure 3a illustrates the theoretical SNR contours of received signals from the ICESa 2 strong beam under different surface slopes (the abscissa axis) and solar zenith angl (the vertical axis), and Figure 3b corresponds to the weak beam. From Figure 3, the SN will sharply decrease with the increase in the surface slope. When the solar zenith ang increases, the solar-induced background noise will decrease, and the SNR will improv correspondingly. Generally, if the SNR is less than 4.7 dB (or the number of signal photon is two times larger than that of noise photons), some signal photons may probably b missed when detecting signal photons from measured noisy photons. Given that th ground track of the strong beam is very close to that of the weak beam (i.e., 90 m in th cross-track direction), the environmental parameters were assumed to be identical fo both beams. In Figure 3, when the same environmental parameters and system param ters were used, the SNR values of the strong beam were approximately 4~6 dB larger tha that of the weak beam because the energy of the strong beam was approximately fou times larger than that of the weak beam, while the noise level was nearly identical. To b specific, the SNR value of the strong beam would be larger than the 4.7 dB threshold most cases. However, when the surface slope exceeds 25°, the SNR value of the weak bea would be lower than the 4.7 dB threshold, which means that it might be difficult to extra signal photons.

Current Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Algorithm and Its Modification
The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) metho

Current Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Algorithm and Its Modification
The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method was proposed by Ester et al. [33] and was modified by Zhang et al. [32] to detect the signal photons from measured noisy photons of photon-counting LiDARs. The basic criteria of the DBSCAN algorithm are as follows. For each point p in a cluster, the point density in its neighborhood (normally within an ellipse or a circle) is calculated, and if the point density exceeds the threshold parameter Minpts, the point p will be identified as a "signal point". The neighborhood is defined as the Euclidean distance for given points p and q (i.e., dist(p, q)) and it can be set as a circle or an ellipse, as illustrated in Figure 4. The threshold MinPts normally represents the minimum number of points within the neighborhood range. In a previous study, the MinPts were derived from the constraint of the probability of detecting signals and the probability of false alarm. The detailed derivation process was shown in Equations (15)-(19) in Degnan's study [46].
Remote Sens. 2021, 13, x FOR PEER REVIEW probability of detecting signals and the probability of false alarm. The det process was shown in Equations (15)-(19) in Degnan's study [46]. The DBSCAN method processes the noisy points cloud based on the tion of photons and performed well when the SNR was better than the However, when the SNR became worse, some noise photons would rem signal photons would be missed. Taking the ATL03 datasets used in the as an example, Figure 5a-c illustrates sampled along-track geolocated points) and its corresponding extraction results (red points) in mountaino different Minpts and strong/weak beams were used. When the same M (Minpts = 6), the extracted surface profile of the strong beam illustrated in much better than the result of the weak beam illustrated in Figure 5b. If Minpts (in Figure 5c), although more signal photons can be detected from w many noise photons would be mistakenly identified as signal photons. U ured signal and noise photons within the green boxes in Figure 5a,b, two m of the strong and weak beams were calculated and illustrated in Figure  marks. Then, based on Equation (1), the theoretical SNRs of the strong be solid curve) and the weak beam (using red dashed curve) were also calcu trated in Figure 5d. It should be noted that the environmental paramete calculate the theoretical SNR were obtained from the corresponding ICES taset and the system parameters are listed in Table 2. In Figure 5d, the tw ured SNR values agreed well with the theoretical results. From Figure 5, The DBSCAN method processes the noisy points cloud based on the spatial distribution of photons and performed well when the SNR was better than the 4 dB threshold. However, when the SNR became worse, some noise photons would remain and some signal photons would be missed. Taking the ATL03 datasets used in the first study area as an example, Figure 5a-c illustrates sampled along-track geolocated photons (blue points) and its corresponding extraction results (red points) in mountainous areas, when different Minpts and strong/weak beams were used. When the same Minpts was used (Minpts = 6), the extracted surface profile of the strong beam illustrated in Figure 5a was much better than the result of the weak beam illustrated in Figure 5b. If using a smaller Minpts (in Figure 5c), although more signal photons can be detected from weak beam data, many noise photons would be mistakenly identified as signal photons. Using the measured signal and noise photons within the green boxes in Figure 5a,b, two measured SNRs of the strong and weak beams were calculated and illustrated in Figure 5d using green marks. Then, based on Equation (1), the theoretical SNRs of the strong beam (using blue solid curve) and the weak beam (using red dashed curve) were also calculated and illustrated in Figure 5d. It should be noted that the environmental parameters that used to calculate the theoretical SNR were obtained from the corresponding ICESat-2 ATL03 dataset and the system parameters are listed in Table 2. In Figure 5d, the two typical measured SNR values agreed well with the theoretical results. From Figure 5, in the daytime of mountainous areas, the classical DBSCAN algorithm performed well for strong beams that had higher SNR values, but it showed a certain deficiency in extracting the signal photons from weak beam data. calculate the theoretical SNR were obtained from the corresponding ICESat-2 ATL03 taset and the system parameters are listed in Table 2. In Figure 5d, the two typical m ured SNR values agreed well with the theoretical results. From Figure 5, in the day of mountainous areas, the classical DBSCAN algorithm performed well for strong be that had higher SNR values, but it showed a certain deficiency in extracting the si photons from weak beam data.  In mountainous areas with steep along-track slope, if the major axis of the elliptical neighborhood aligns with the along-track slope σ L (as illustrated in Figure 4b), more signal photons will be included in the elliptical searching neighborhood (i.e., the photons along the slope direction have higher weights, so that they are more likely to be classified as signal photons). The photons within an elliptical neighborhood with the major axis a, the minor axis b, and rotation angle θ satisfy where x p and x q represent the along-track distances for the photons p and q in the ICESat-2 ATL03 dataset; and h p and h q represent the elevations for the photons p and q, respectively.

Parameters Determination for Modified DBSCAN
In this study, four parameters were determined and used in the modified DBSCAN method, and these are the major axis and minor axis for the elliptical neighborhood (i.e., a and b in Figure 4b), the rotation angle of the elliptical neighborhood (i.e., θ in Figure 4b), and the classification threshold (i.e., Minpts). As illustrated in Figure 4b, the rotation angle of the elliptical neighborhood θ can be represented by the along-track surface slope σ L , which can be further estimated by the background noise rate as follows. In a previous study, it has been proven that the model shown in Equation (4) can well express the relationship between the solar-induced background noise rate and the along-track slope σ L [52]. However, in mountainous areas, the slope direction also has a significant effect on the background noise rate (i.e., an adret slope generally reflects more solar energy than that of an ubac slop).
The sampled along-track of 2-km for the strong beam in Figure 5a was used to demonstrate this phenomenon in Figure 6. As shown in Figure 6a, the measured noisy geolocated photons of the strong beam (using blue points) were filtered by the classical DBSCAN method to extract the signal photons (using red points) [53]. The measured geolocated photons in every 20 m along-track distance were grouped into a segment, which was further used to estimate the along-track slope σ L and noise rate f nc . The along-track slope in each segment was calculated using the detected signal photons on the terrain profile while the noise rate f nc was calculated using the noise photons marked with yellow in Figure 6a. These noise photons were selected according to their elevation within the vertical window. Over land areas, the size of the ICESat-2 s vertical window varied from 500 m to 5000 m in different terrain types and elevation variations [11,24]. The used ATL03 data corresponded to the vertical windows from approximately 800 m to 1000 m in different study areas. The estimated along-track slope and the noise rate of each segment are illustrated in Figure 6b. It should be noted that in Figure 6b, the along-track terrain slope on the ubac side was defined as negative.
Sens. 2021, 13, x FOR PEER REVIEW As shown in Figure 6a, the spatial density of noise photons o 30.5 to 31 km) was much denser than that on the ubac slope (fro results in Figure 6b indicate that the background noise rate on the a imately two times larger than that on the ubac slope in steep area larger than 10°. When the along-track slope was smaller than 10° rates was relatively small. Based on the phenomenon in Figure 6, between the surface slope and the noise rate have to be separately e As shown in Figure 6a, the spatial density of noise photons on the adret slope (from 30.5 to 31 km) was much denser than that on the ubac slope (from 31 to 31.5 km). The results in Figure 6b indicate that the background noise rate on the adret slope was approximately two times larger than that on the ubac slope in steep areas, where the slope was larger than 10 • . When the along-track slope was smaller than 10 • , the difference in noise rates was relatively small. Based on the phenomenon in Figure 6, the empirical equations between the surface slope and the noise rate have to be separately established for the adret and ubac sides. In this study, a cubic polynomial was used to represent the empirical slope-noise relationship, and two groups of fitted parameters (i.e., A, B, C, and D in Equation (8)) were used for the adret and ubac sides, respectively.
As above-mentioned, the signal photons of the strong beam can be well detected and used to calculate the along-track slope σ L and the same environmental parameters were used for both beams in the same pair. Therefore, the slope-noise relationship can be fitted by the strong beam of a pair and then be used to inverse the along-track slope of the weak beam according to its noise rate. Using a full track of strong beam data (i.e., the ATL03_20181207065658_10640102_002_01, GT1R), Figure 7a,b illustrates the fitting result for the adret and ubac sides, respectively. In Figure 7, the R-square values of the fitting results on both the adret and ubac slo exceeded 0.97, which indicates that the used empirical equations can generally repre the relationship between the surface slope σL and the noise rate fnc. Although the un tainty of the slope-noise relationship and residuals exist in the fitting process, the fi empirical equation is sufficient to estimate the along-track slope in steep areas where signal photons are often missed and provide a referring rotation angle for the ellip neighborhood (i.e., θ in Figure 4) in this study.
The major axis and minor axis for the elliptical neighborhood were related to system parameters in this study. The major axis was set as the footprint diameter of I Sat-2 (i.e., 2a = 2z•θT) because the detectors may receive the photons reflected from a where within a laser footprint. The minor axis is related to the receiving pulse widt expressed in Equation (6) (i.e., 2b = c•4σp), and the time range of 4σp will contain mos the return energy (95%). The threshold Minpts was first set as three times of the calcula noise rate fnc, which can be estimated according to the noise photons distributed in the and bottom of the vertical window (as illustrated in Figure 6). Meanwhile, the ceiling floor of Minpts were determined by the expected number of both signal and noise pho within the time range of 4σp. The upper limit is determined by the noise rate and the S (i.e., Minptsmax = fnc·SNR) and the lower limit is expressed as  In Figure 7, the R-square values of the fitting results on both the adret and ubac slopes exceeded 0.97, which indicates that the used empirical equations can generally represent the relationship between the surface slope σ L and the noise rate f nc . Although the uncertainty of the slope-noise relationship and residuals exist in the fitting process, the fitted empirical equation is sufficient to estimate the along-track slope in steep areas where the signal photons are often missed and provide a referring rotation angle for the elliptical neighborhood (i.e., θ in Figure 4) in this study.
The major axis and minor axis for the elliptical neighborhood were related to the system parameters in this study. The major axis was set as the footprint diameter of ICESat-2 (i.e., 2a = 2z·θ T ) because the detectors may receive the photons reflected from anywhere within a laser footprint. The minor axis is related to the receiving pulse width as expressed in Equation (6) (i.e., 2b = c·4σ p) , and the time range of 4σ p will contain most of the return energy (95%). The threshold Minpts was first set as three times of the calculated noise rate f nc , which can be estimated according to the noise photons distributed in the top and bottom of the vertical window (as illustrated in Figure 6). Meanwhile, the ceiling and floor of Minpts were determined by the expected number of both signal and noise photons within the time range of 4σ p . The upper limit is determined by the noise rate and the SNR (i.e., Minpts max = f nc ·SNR) and the lower limit is expressed as where N s is the theoretical number of signal photons and is expressed in Equation (2); vs. is the flight speed of the satellite; f laser is the laser repetition frequency; and the RMS pulse width σ p can be calculated by substituting the system parameters and the estimated alongtrack slope into Equation (6). When the Minpts is higher than the upper limit Minpts max or lower than the lower limit Minpts min , its value will be replaced by the corresponding limit. The ceiling and floor of the threshold Minpts are related to the theoretical total number of the signal and noise photons within the elliptical neighborhood and its value will vary at different sites according to its signal energy, terrain slope, and noise level. Apparently, in rugged areas, the dynamic searching neighborhood and threshold have better flexibility than that with fixed parameters.

Modified DBSCAN Algorithm
Based on the analysis in the last section, the empirical slope-noise equation between the along-track slope and the solar-induced background noise provides a feasible way to directly estimate the rotation angle θ using Equation (8). For the DBSCAN method, the other three parameters (the major axis and minor axis of the elliptical neighborhood and the Minpts) can be calculated based on the system parameters and the measured data. In this study, a modified DBSCAN method was proposed to detect the signal photons from weak beam data measured in mountainous areas referring to the available information from the strong beam in the same pair. The flow chart of the extracting procedure is illustrated in Figure 8. Supplement Materials are available in this paper, i.e., a demo of the algorithm code based on MATLAB and its future update are available online at https://github.com/ Futurewhu/ATLAS_WEAKBEAM_PROCESS, accessed on 9 September 2018. It should be noted that no in situ measurements were need when using the propo method. Normally, to evaluate the performance of signal processing algorithms, the cessed results (i.e., detected signal points from weak beam data) should be compared w in situ measurements (e.g., airborne LiDAR data or DSM). However, the method of

(1) Signal extraction and statistical parameter estimation from strong beam data
The measured noisy geolocated photons from the strong beam in a pair were filtered by the classical DBSCAN method to extract signal photons. The noise photons were selected by the corresponding vertical window. In the along-track direction, both the noise photons and detected signal photons (or the terrain profile) were divided into segments with a fixed step length. In each along-track segment, the along-track slope σ L was determined by the linear fitting of the terrain profile, whereas the noise rate f nc was estimated according to the number of noise photons within the vertical window, the width of the vertical window, and the number of measurements included in this segment.

(2) Noise-slope relationship fitting
Along-track segments were divided into the adret slope or the ubac slope based on the value of the along-track slope being positive or negative. For each slope type, all along-track slopes with similar noise rates (±0.1 MHz) were collected as a dataset, and in each dataset, the mean and standard deviation of along-track slopes were calculated. The slope-noise empirical equations on the adret and ubac slope were separately fitted using the function of Equation (8).

(3) Calculating parameters of the searching area and threshold in each cluster
The noise rate of geolocated photons measured by the weak beam was estimated similarly as Step 1. Since the slope direction cannot be directly determined for the weak beam, the adret and ubac slopes were estimated through their corresponding empirical equations, respectively. The length of the major axis was set as the size of the laser footprint on the ground (i.e., 2a = 2z·θ T ) for ICESat-2 and the length of minor axis was calculated by 2b = 4c·σ p , where σ p is calculated by Equation (6). The threshold Minpts and its limitation range were determined by the theoretical number of the signal photons as well as the estimated noise rate f nc within the elliptical neighborhood, as described in Section 2.5.

(4) Searching signal photons using the DBSCAN from weak beam data
For the weak beam, the measured geolocated photons were filtered by the modified DBSCAN algorithm using the elliptical neighborhood. For each elliptical neighborhood, both the adret slope and the ubac slope are used to rotate the searching neighborhood and calculate their corresponding spatial density. If the calculation result of either slope exceeds the threshold Minpts, the point p (i.e., the center of current elliptical neighborhood) is classified as a rough signal photon.

(5) Running a 3σ confidence filter to remove outliers
The extracted rough signal photons were divided into segments with a fixed length and the mean value h aver and standard deviation σ h of the photons' elevation in each segment were calculated. If the elevation of a rough signal photon h satisfies |h − h aver | > 3σ h , it is identified as an outlier and removed from the result.
It should be noted that no in situ measurements were need when using the proposed method. Normally, to evaluate the performance of signal processing algorithms, the processed results (i.e., detected signal points from weak beam data) should be compared with in situ measurements (e.g., airborne LiDAR data or DSM). However, the method of this study focused on remote mountainous areas with large slopes, where few in situ measurements are available. As a result, similar to many previous studies [30,35,38], the visual inspection is used as an evaluation standard for photon-counting cloud filtering when the in situ ground-truth is unavailable.

Results
The proposed terrain profile extraction method was tested using ICESat-2 s data measured by the weak beam in four mountainous areas, where the classical DBSCAN algorithm cannot successfully detect the signal photons from weak beam data. Both the classical DBSCAN algorithm and the modified DBSCAN algorithm in this study were separately applied to extract terrain profile, and the results were compared with the truth-values, which were obtained by labeling the signal photons manually [30,35,38]. As above-mentioned, in the classification result provided by the ATL03 dataset, geolocated photons were marked with different confidence levels (i.e., 0 = noise; 1 = added to allow for buffer but algorithm classifies as background; 2 = low; 3 = medium; 4 = high) [42], and normally, photons with a confidence level larger than 1 can be considered as signal photons. The ATL08 product was based on the DRAGANN method, which discarded most of the noise photons [24]. The DRAGANN method also identified many more signal photons in land and vegetation areas than that identified in the ATL03 data product. It should be noted that the results extracted by the classical DBSCAN algorithm were also filtered by the procedure (5) described in the last section to further discard the outliers.
In the first study area, Figure 9a (i.e., a Sentinel-2 image) shows ICESat-2 along-track surface types and Figure 9b illustrates the detected signal photons from the ATL03 geolocated photons by different methods. The four columns in Figure 9b correspond to the extraction results from the ATL03 confidence level, the ATL08 product, the classical DBSCAN algorithm, and the modified DBSCAN algorithm, respectively. The strong beam (GT1R) was first processed to obtain the empirical slope-noise equation, and then the data of the weak beam in the same pair (GT1L) were processed by our method. Two sampled along-track segments of the result (at the along-track distances from 29 km to 32 km and from 46 km to 48 km) marked with the yellow and green boxes were further enlarged and illustrated to demonstrate more details in Figure 10a, (GT1R) was first processed to obtain the empirical slope-noise equation, and then the data of the weak beam in the same pair (GT1L) were processed by our method. Two sampled along-track segments of the result (at the along-track distances from 29 km to 32 km and from 46 km to 48 km) marked with the yellow and green boxes were further enlarged and illustrated to demonstrate more details in Figure 10a,b, respectively. As shown in Figure 10, the blue points correspond to measured geolocated photons from the ATL03 dataset and the points marked with red are the signal photons extracted by different methods. The along-track slope exceeded 30° in many sub segments (e.g., at the distance from 29.5 km to 30.7 km in Figure 10a as well as from 46.2 km to 46.6 km in Figure 10b. At steep slopes, the extraction results showed significant differences. The extraction result based on the confidence level from the ATL03 dataset mistakenly classified many noise photons as signal photons. Moreover, several breakpoints existed (i.e., many signal photons are missed), for example, near 31 km, which corresponded to the largest figures of the other three study areas are shown in the Appendix A. Specifically, the back ground noise of the fourth study area was the strongest because the ICESat-2 data wer measured in summer. In the second study area, the terrain slopes in some along-trac segments were close to 40°, which makes extracting signal photons more challengin Some snow was found in the third study area in late winter. In the comparisons of th result figures in the four study areas, the modified DBSCAN produced smoother and con tinuous terrain profiles. Figure 10. Detailed comparison of results from different signal photon extraction methods in the first study area. The rows from top to bottom correspond to signal photon extraction results from the ATL03 dataset, the ATL08 dataset, the classical DBSCAN algorithm, and the modified DBSCAN algorithm, respectively. The measured geolocated photons are drawn with blue points while the extracted signal photons are marked with red points. (a) Enlarged along-track segment corresponding to the yellow box in Figure 9a at the along-track distance from 29 km to 32 km. (b) Enlarged along-track segment corresponding to the green box in Figure 9a at the along-track distance from 46 km to 48 km.
As above-mentioned, no in situ measurements (e.g., airborne LiDAR data or DSM were available to verify the performance of the proposed method. As a result, the ground truth signal photons were obtained by the visual inspection, and then, the metrics in term of signal photons (i.e., the parameters of the Precision, Recall, and F-score in this study Figure 10. Detailed comparison of results from different signal photon extraction methods in the first study area. The rows from top to bottom correspond to signal photon extraction results from the ATL03 dataset, the ATL08 dataset, the classical DBSCAN algorithm, and the modified DBSCAN algorithm, respectively. The measured geolocated photons are drawn with blue points while the extracted signal photons are marked with red points. (a) Enlarged along-track segment corresponding to the yellow box in Figure 9a at the along-track distance from 29 km to 32 km. (b) Enlarged alongtrack segment corresponding to the green box in Figure 9a at the along-track distance from 46 km to 48 km.
As shown in Figure 10, the blue points correspond to measured geolocated photons from the ATL03 dataset and the points marked with red are the signal photons extracted by different methods. The along-track slope exceeded 30 • in many sub segments (e.g., at the distance from 29.5 km to 30.7 km in Figure 10a as well as from 46.2 km to 46.6 km in Figure 10b. At steep slopes, the extraction results showed significant differences. The extraction result based on the confidence level from the ATL03 dataset mistakenly classified many noise photons as signal photons. Moreover, several breakpoints existed (i.e., many signal photons are missed), for example, near 31 km, which corresponded to the largest slope as in Figure 10a. For the classical DBSCAN method, it had to increase the threshold value (i.e., Minpts) to refrain from a massive amount of noise photons that were wrongly classified as signal photons, but it simultaneously made the extraction result inconsecutive in steeper areas, which was similar to the results from the ATL08 product. In the other three study areas, comparisons similar to those in Figures 9 and 10 were conducted. The figures of the other three study areas are shown in the Appendix A. Specifically, the background noise of the fourth study area was the strongest because the ICESat-2 data were measured in summer. In the second study area, the terrain slopes in some along-track segments were close to 40 • , which makes extracting signal photons more challenging. Some snow was found in the third study area in late winter. In the comparisons of the result figures in the four study areas, the modified DBSCAN produced smoother and continuous terrain profiles.
As above-mentioned, no in situ measurements (e.g., airborne LiDAR data or DSM) were available to verify the performance of the proposed method. As a result, the groundtruth signal photons were obtained by the visual inspection, and then, the metrics in terms of signal photons (i.e., the parameters of the Precision, Recall, and F-score in this study) were used to compare the performances of our algorithm and three other algorithms. The performance and the comparison results are listed in Table 3 to quantitatively compare the extraction results of the four methods. The truth values were manually labeled based on the land type signal confidence of the ATL03 dataset. From the enlarged PEs, the along-track signal photons were very clear by visual inspection and these signal photons construct the along-track surface profile (because very little vegetation exists in four study areas). The TP in Table 3 represents the number of signal photons that are correctly classified as a signal, whereas the FN represents the number of signal photons that are wrongly classified as noise. The TN and FP are similarly defined (i.e., the noise photons that are correctly and wrongly classified). The parameter of the precision P pre is calculated by P pre = TP/(TP + FP), which denotes that the probability of an algorithm makes a type I error while the parameter of the recall P rec is calculated by P rec = TP/(TP + FN), which denotes that the probability of an algorithm makes a type II error. The extraction of signal photons requires that both the precision and recall parameters are very high to produce a reliable terrain profile. Therefore, the F-score, defined by F = 2·P pre P rec /(P pre + P rec ), was introduced to quantitatively evaluate the performance of three methods [54]. The quantitative evaluation results are listed in Table 3.
Generally, the results in Table 3 indicate that the signal photons extracted from the ATL03 confidence level tended to have a high recall (i.e., more than half of the total signal photons were detected), but low precision (a lot of noise photons were identified as signal). The ATL08 algorithm and classical DBSCAB algorithm were just the opposite. Specifically, the ATL08 algorithm had the best averaged precision (approximately 94%) in the four study areas, but more than half of the total signal photons were missed (averaged recall in four study areas). Our modified DBSCAN extracted more signal photons than the other three methods, especially in the terrain with slopes larger than 25 • , but it would inevitably increase the number of photon events that were wrongly classified as photons. At the cost of much higher averaged classification recall (over 50% than that of the ATL08 results), our method had an approximately 1% lower classification precision than that of the ATL08 results. The statistics in Table 3 agreed with the results in Figure 10, Figure A1, Figure A3, and Figure A5 (i.e., the modified DBSCAN method obtained continuous along-track profiles in the fourth rows), whereas the other three methods failed. The classical DBSCAN algorithm had a low recall because at steep slopes, the pulse width will be expanded to several times that on the flat ground, so a fixed threshold is difficult in balancing the precision and recall parameters. Benefiting from the rotating elliptical neighborhood referring to the slope direction (which makes more signal photons are included in a cluster) and the dynamic searching neighborhood and threshold, the modified DBSCAN still performed well. The modified DBSCAN achieved a good balance between the classification precision (93.49% averaged) and recall (89.34% averaged) and its F-score (more than 0.91) was much higher than that of the other three methods. In addition, in the fourth study area, due to a higher solar elevation angle in summer, the higher background noise level made all of the used methods have comparatively worse performance. In summary, the experiments in different locations and seasons prove that the modified DBSCAN algorithm can effectively extract signal photons from ICESat-2 s weak beam data and generate a continuous terrain profile in mountainous areas.

Precondition of Using the Algorithm
The proposed algorithm in this paper is based on an assumption that the environmental parameters of the adjacent laser beams (e.g., the atmospheric transmittance, the ground surface reflectivity, and the solar radiance) are nearly identical. This assumption was verified by the background noise rate ('bckgrd_atlas/bckgrd_rate') provided in the ATL03 dataset. If the background noise rates of the strong beam and weak beam data in the same pair are nearly identical, the environmental condition can be assumed as the same because the background noise rate is associated with the atmosphere condition the surface reflectivity as well as the terrain slope. In the ATL03 dataset, the background count rate was estimated by the data of 50-shot sum. Specifically, the background count was calculated by subtracting the number of likely signal photons from the number of total photons in the current 50-shot segment. For a given 50-shot segment, the number of signal photons is determined by summing the number of photons with signal_conf_ph = 2, 3, or 4, which used the maximum signal_conf_ph value for any surface type within current 50-shot segment [42].
For the four study areas, the ATL03 background noise rates of the strong beam and weak beam in the same pair are illustrated in Figure 11a-d, respectively. It should be noted that the background noise rates of the strong beam and weak beam were adjusted to the same start latitude in Figure 11. The coordinate interval in Figure 11a-d was evenly divided into small grids with an interval of 0.05 MHz. The background noise rate of each 50-shot segment was placed into each grid, and the color in each grid indicates the actual number of the background noise rates that fell within the current grid. The statistical results in Figure 11 demonstrate that the RMSE of background noise rates from the adjacent strong and weak beams was lower than 0.9 MHz, and the correlation coefficient of the background noise rates from these two beams was higher than 0.9. As result, the environmental conditions of the laser beams in the same pair (i.e., the atmospheric transmittance, the surface reflectivity, and the terrain slope) can be assumed to be nearly identical. Actually, the embedded data processing in the ATL03 product also use similar assumptions (i.e., environmental conditions calculated from the neighboring strong beam data were used in the data processing for weak beam data) [43].
ote Sens. 2021, 13, x FOR PEER REVIEW 19 o Figure 11. (a-d) Heat maps of background noise rates from the strong beam and weak beam in the same pair in four study areas. The abscissa and ordinate correspond to the noise rates from the weak beam and the strong beam data. The number of data at certain coordinates is marked with different colors and the color bar gives the data number.

Why the Classical DBSCAN Failed
Taking the captured photons in the first study area as an example, Figure 12 il trates the normalized spatial density of the used data in Figure 10a and indicates why Figure 11. (a-d) Heat maps of background noise rates from the strong beam and weak beam in the same pair in four study areas. The abscissa and ordinate correspond to the noise rates from the weak beam and the strong beam data. The number of data at certain coordinates is marked with different colors and the color bar gives the data number.

Why the Classical DBSCAN Failed
Taking the captured photons in the first study area as an example, Figure 12 illustrates the normalized spatial density of the used data in Figure 10a and indicates why the classical DBSCAN method and its modified self-adapting version failed to extract signal photons from the weak beam data in mountainous areas. First, the noise rates sharply fluctuated in different along-track segments (e.g., at the ridge near the segment of 31 km). The spatial density of noise photons at the adret slope (in the segment of less than 31 km) was very close to that of signal photons at the ubac slope (in the segment of larger than 31 km). As a result, the fixed threshold of spatial density cannot detect the signal photons at the ubac slope and discards the noise photons at the adret slope at the same time. Second, the spatial density at vertical direction is higher than that along the terrain profile (e.g., the locations marked with red circles in Figure 12). A method that rotates the searching neighborhood to the direction of the maximum spatial density would also fail to distinguish signal photons from the noise photons. As a result, it is necessary to import the available information to determine the direction of the searching neighborhood (e.g., the noise-slope relationship in this paper).
. 2021, 13, x FOR PEER REVIEW Figure 12. Normalized spatial density of the using data in Fi PE is calculated by the number of PEs within a circular neigh mum spatial density in the whole along-track segment was s with red circles demonstrate that the direction of the maximu with the direction of the terrain profile due to the very low S In addition, in many previous studies, the param cially the threshold, were usually determined by the u the used algorithms may rely on the user's experience DBSCAN algorithm were self-adaptive and the calcula tors of the terrain, the noise condition, and the pulse w were related to the system hardware or could be direct Figure 12. Normalized spatial density of the using data in Figure 10a. The spatial density of each PE is calculated by the number of PEs within a circular neighborhood of a 10-m radius. The maximum spatial density in the whole along-track segment was set as 1. Several sub-segments marked with red circles demonstrate that the direction of the maximum spatial density was not consistent with the direction of the terrain profile due to the very low SNR.
In addition, in many previous studies, the parameters used in the algorithm, especially the threshold, were usually determined by the user. Therefore, the performance of the used algorithms may rely on the user's experience. The parameters of our modified DBSCAN algorithm were self-adaptive and the calculation procedure considered the factors of the terrain, the noise condition, and the pulse width of the signal. The parameters were related to the system hardware or could be directly obtained from the ATL03 standard dataset.

Potential Implications of the Algorithm
The experimental results indicate that the proposed algorithm can effectively extract signal photons of the weak beam data measured in mountainous areas. The extraction results can be applied to the following potential implications. (1) Elevation accuracy assessment of ICESat-2 weak beam data in mountainous areas. The elevation accuracy of ICESat-2 signal photons has been validated over forest [9,55], terrain [56], ice sheet [57], and sea ice areas [58] in many previous studies. The signal extraction results of our method can provide suitable input for the elevation accuracy assessment in mountainous areas. (2) Slope estimation in mountainous areas. The ingenious design of ICESat-2 makes it possible to calculate both the along-track slope and cross-track slope. A continuous terrain profile from weak beam data is essential for the slope estimation and our algorithm can provide qualified terrain profiles from weak beam data in mountainous areas. (3) Systematic parameter design for future space-borne LiDARs. In this study, the SNRs of the strong and weak beam were quantified and analyzed, and the reason why the Classical DBSCAN missed signal photons in mountainous areas is explained. The transmitted power and lifetime of the laser source are normally contradictory and need balancing. The weak beam design with a lower transmitted energy provides an excellent research material for studying how to optimize the transmitted laser energy. The precise signal photon extraction results can be applied to analyze the key parameters of the weak beam channel in mountainous areas such as the signal photon rates, the receiving pulse width, and SNR. These valuable parameters can be further used to assist the optimization of the future instrument. (4) The ground control points (GCPs) are essential for stereo-mapping satellites to generate high-accuracy maps [59], especially in mountainous areas with large slopes or sharp changes in topography. The ICESat-2 terrain points have a very good accuracy of a few tens of centimeters in vertical and several meters in horizontal [9]. Since this method performed well in mountainous areas (where are normally lacking in situ measurements), the detected signal points from the strong and weak beams can be further used to obtain GCPs for the stereo-mapping satellite in the future.

Conclusions
In this study, a theoretical equation was derived to analyze the SNR of the strong and weak beam data of ICESat-2 in mountainous areas. The result indicated that, in the daytime of mountainous areas, the SNR of strong beam data was normally 4~6 dB higher than that of weak beam data. The SNR dropped quickly as the along-track slope rose, and when the along-track slope was higher than 25 • , the SNR of the weak beam data fell to around 4 dB, which is often taken as the threshold to judge whether signal photons can be extracted from measured noisy photons. Meanwhile, even when the along-track slope reached 45 • , the SNR of the strong beam data remained over 4 dB, which is why most of the current signal extraction algorithms perform well for strong beam data, but fail to extract signal photons from weak beam data in mountainous areas. This phenomenon also inspired us to utilize the available information provided by strong beam data to assist in the extraction of signal photons from weak beam data (i.e., using the fitted slope-noise relationship to estimate the along-track slope for weak beam data).
Then, the circular searching neighborhood in classical DBSCAN algorithm was modified as an elliptical neighborhood and the rotation angle between the major axis of the elliptical neighborhood and the horizontal line was determined by the estimated alongtrack slope. The other three key parameters (i.e., the major axis and minor axis of the elliptical neighborhood and the classification threshold) were set according to the system parameters of ICESat-2. The modified DBSCAN method was tested in the daytime of the Tibetan Plateau (in early winter), the Altun Mountains (in autumn), and the Tianshan Mountains (in late winter and summer). The extraction results were compared with that of the classical DBSCAN method, the ATL03 confidence level, and the ATL08 DRAGANN filtering technique. The ATL08 algorithm had the best averaged precision (approximately 94%) in the four study areas (i.e., most of the detected signal photons were corrected), but more than half of the total signal photons were missed (averaged recall in four study areas). Our method can be used to supplement and detect the signal photons of weak beams, especially in terrain with slopes larger than 25 • and in the midday. Generally, the modified DBSCAN could achieve a good balance between the classification precision (93.49% averaged) and recall (89.34% averaged), and its F-score (more than 0.91) was much higher than that of the other three methods. As a result, our method can obtain a more continuous surface profile from a very low signal-to-noise ratio (SNR) data in mountainous areas and in the daytime.
In summary, in the daytime of mountainous areas, which normally corresponds to a very low SNR of weak beam data, this new method successfully obtained continuous terrain profiles from the weak beam data of ICESat-2. Furthermore, the determination of the algorithm parameters is now calculated by theoretical equations rather than relying on the users' experience, which improves the usability of the algorithm. However, our algorithm relies on the available information provided by the nearby strong beam, so the surface reflectivity and land-cover type of the area should be stable, which limits the application of this algorithm. Future study is suggested to validate the performance of this method with ground measurements, terrestrial, and airborne LiDAR such as the elevation accuracy of detected signal photons.
tons by different methods. The background noise of the second study area was stro than that in the first study area because the data were measured in September. Moreo the terrain slopes in some along-track segments were close to 40°, which made extra signal photons more challenging in this area. The extraction results from the mod DBSCAN in the right column were the best among the four methods. Figure A1. (a) ICESat-2 ATLAS ground track (from north to south or a descending track) and satellite imagery in the Altun Mountains. The Altun Mountains are located between the Taklimakan Desert and the Gurbantunggut Desert, which makes this area very dry. The surface is covered by bare-land and very sparse grasslands. (b) The signal photons extracted by different methods. The diameter and threshold of the classical DBSCAN were set as 4 m and 4, respectively. Two sampled segments within the yellow and green boxes were enlarged and illustrated in Figure A2a,b, respectively. Figure A1. (a) ICESat-2 ATLAS ground track (from north to south or a descending track) and satellite imagery in the Altun Mountains. The Altun Mountains are located between the Taklimakan Desert and the Gurbantunggut Desert, which makes this area very dry. The surface is covered by bare-land and very sparse grasslands. (b) The signal photons extracted by different methods. The diameter and threshold of the classical DBSCAN were set as 4 m and 4, respectively. Two sampled segments within the yellow and green boxes were enlarged and illustrated in Figure A2a,b, respectively. Two 2-km along-track segments within the yellow and green boxes were enlarged in Figure A2a,b to illustrate the differences among the four methods in detail. The measured geolocated photons and extracted signal photons of two typical segments from 0 km to 2 km and from 6 km to 8 km are illustrated in Figure A1a,b, respectively. The spatial density of measured geolocated photons in the second study area was higher than that of the first area, which indicates that the background noise was higher. As a result, the signal photons extracted by the confidence level were broken into several discrete segments, which made it impossible to use as a terrain profile. The ATL08 algorithm and classical DBSCAN performed better, but still failed to extract signal photons in many along-track sub-segments and caused many breakpoints with a length over 100 m. The extraction result of the modified DBSCAN was most satisfying with the least break points and outliers.
area, which indicates that the background noise was higher. As a result, the signal photons extracted by the confidence level were broken into several discrete segments, which made it impossible to use as a terrain profile. The ATL08 algorithm and classical DBSCAN performed better, but still failed to extract signal photons in many along-track sub-segments and caused many breakpoints with a length over 100 m. The extraction result of the modified DBSCAN was most satisfying with the least break points and outliers. Figure A2. Detailed comparison of the results from different signal photon extraction methods in the second study area. (a) Enlarged along-track segment corresponding to the yellow box in Figure  A1a at the along-track distance from 0 km to 2 km. (b) Enlarged along-track segment corresponding to the green box in Figure A1a at the along-track distance from 6 km to 8 km.  Figure A1a at the along-track distance from 0 km to 2 km. (b) Enlarged along-track segment corresponding to the green box in Figure A1a at the along-track distance from 6 km to 8 km.
The third and fourth study areas were located in the Tian Shan Mountains, which are the world's most distant mountains from the sea. The satellite image of the third study area and the laser ground tracks are illustrated in Figure A3a and the extraction results of the signal photons are illustrated in Figure A3b. Two 2-km along-track segments within the yellow and green boxes were enlarged in Figure A4a,b to illustrate the differences among the four methods in detail. In Figure A5 of the fourth study area, as the data were measured in summer (with a higher solar elevation angle), the background noise was higher than that in the third test area. As a result, the signal extraction results of the four methods were comparatively worse than those in Figure A3, which corresponds to late winter. Two 1-km along-track segments within the yellow and green boxes were enlarged in Figure A6a,b to illustrate the differences among the four methods in detail. In the third and fourth test areas, the proposed method in this study generally achieved the best performances. among the four methods in detail. In Figure A5 of the fourth study area, as the data wer measured in summer (with a higher solar elevation angle), the background noise wa higher than that in the third test area. As a result, the signal extraction results of the fou methods were comparatively worse than those in Figure A3, which corresponds to lat winter. Two 1-km along-track segments within the yellow and green boxes were enlarged in Figure A6a,b to illustrate the differences among the four methods in detail. In the third and fourth test areas, the proposed method in this study generally achieved the best per formances.
(a) (b) igure A3. (a) ICESat-2 ATLAS ground track (a descending track) and satellite imagery in the Tian Shan Mountains. The ian Shan Mountains are located between the Tarim Basin and the Junggar Basin, which are the farthest mountains to the ea around the world. (b) The signal photons extracted by different methods. The diameter and threshold of the classical BSCAN were set as 5.5 m and 5, respectively. Two sampled segments within the yellow and green boxes were enlarged nd illustrated in Figure A4a,b to show the details, respectively.  Figure A3a at the along-track distance from 7 km to 9 km. The signal photons extracted by different methods. The diameter and threshold of the classical DBSCAN were set as 6.5 m and 7, respectively. Two sampled segments within the yellow and green boxes were enlarged and illustrated in Figure  A6a,b to show the details, respectively.  Figure A5a at the along-track distance from 1 km to 2 km. (b) Enlarged along-track segment corresponding to the green box in Figure A5a at the along-track distance from 4 km to 5 km.