2.2.1. DBSCAN
DBSCAN, as a classic algorithm for photon denoising, demonstrates outstanding performance in noise removal. Its core advantages lie in directly distinguishing noise points through density definitions by categorizing data into core points, border points, and noise points without requiring pre-specified noise ratios, automatically eliminating low-density outliers; exhibiting robustness to noise by not relying on spherical cluster assumptions and being capable of recognizing clusters of arbitrary shapes to prevent noise interference with clustering structures; requiring only two parameters (neighborhood radius) and (minimum number of points) that can be optimized using k-distance plots; and featuring simple algorithmic principles and wide applicability across various scenarios.
The clustering principle of the DBSCAN algorithm is illustrated in
Figure 3. Given the neighborhood radius
and the minimum point threshold
, the algorithm begins by randomly selecting a starting point (the green point in
Figure 3) and defines its ε-radius neighborhood. If at least three points fall within this neighborhood, the point is classified as a core point. When the center of the search region is the purple point (P) or the black point (B), even though their circular search regions contain fewer than three photon points, they still belong to the cluster of the core point because they lie within the neighborhood of that core point. As such, these points are labeled as border points. However, for a red photon point whose search region contains fewer than three photons and is not reachable from any core point, it is classified as noise. By analyzing density, the DBSCAN algorithm identifies core points, border points, and noise points—with both core and border photons considered valid water surface photons, while noise photons are removed.
2.2.2. OPTICS
The core principle of denoising in OPTICS lies in computing two key metrics for each point: the core distance and the reachability distance. The core distance refers to the minimum radius required to include at least points within a given neighborhood, while the reachability distance represents a measure reflecting density relative association between points. These metrics establish a hierarchical density structure of the dataset and generate a reachability plot to visualize the distribution of different density regions. By analyzing the plot’s prominent peaks or abrupt jumps, high-density clusters and noise points are distinguished: points with low reachability distances belong to the same high-density cluster, while isolated points with sharply increased reachability distances typically represent noise. Compared to DBSCAN, OPTICS adapts to varying density distributions by dynamically adjusting neighborhood ranges, enabling more accurate noise identification. However, final noise determination generally requires either manual inspection of the reachability plot or threshold-based filtering.
Compared to DBSCAN, OPTICS is far less sensitive to the distance threshold because it does not output direct clustering results. Instead, it generates an ordered sequence under MinPts, along with the core distance and reachability distance for each sample. During post-processing of this sequence, clustering is determined based on reachability distance and the threshold , where overly sparse clusters identified under and MinPts are labeled as noise. This makes the algorithm relatively insensitive to distance thresholds. In essence, OPTICS acts as a dynamically adjusted version of DBSCAN, enabling multi-density clustering. Since the output includes reachability distance information, it also facilitates the selection of an appropriate .
In summary, with a fixed MinPts, OPTICS can derive new clustering results for any given through straightforward computation. From the reachability plot, if we draw a horizontal line at a reachability distance (i.e., y = , y is the vertical axis of the accessibility graph), the number of valleys this line intersects directly corresponds to the number of clusters obtained—each valley representing a distinct cluster (or a high-density region). Based on this mechanism, an appropriate reachability distance can be selected as the initial parameter for other distance-based clustering algorithms. OPTICS can thus also be viewed as a method for identifying the optimal threshold distance
2.2.3. An Iterative Denoising Method Based on an Elevation Difference Exponential Decay Function
This paper proposes an iterative denoising method based on an exponential decay function of elevation differences to address noise in ICESat-2 photon-counting data. The method first employs Gaussian fitting to estimate an approximate water level and performs an initial coarse denoising of photons based on this estimation. Subsequently, the histogram of photon elevation differences is statistically analyzed, and an exponential function is fitted to the distribution. The denoising threshold is determined based on the asymptote of this exponential function to complete the first denoising pass. The process is further refined through multiple iterative denoising steps to enhance the denoising performance, with the detailed workflow illustrated in
Figure 4.
- (1)
Gaussian Fitting-Based Approximate Water Level Estimation and Coarse Photon Denoising
During photon reflection from water surfaces in ICESat-2 measurements, random errors occur in individual range measurements due to water wave disturbances and photon detection noise, stemming from variations in scattering paths. The accumulation of numerous independent random errors, following the central limit theorem, leads to an approximately normal distribution of photon elevation values in the water surface region. While individual measurements exhibit fluctuations, the average elevation of still water remains stable over short time scales. The inherent randomness in photon detection noise, including detector dark current and atmospheric scattering, further reinforces this Gaussian characteristic. To some extent, the mean value of the laser photons can provide an approximate elevation of the water surface, while the standard deviation (
) reflects the noise intensity. Gaussian fitting allows for extracting a robust estimate of the water surface elevation from the noise, mitigating the influence of individual outliers. This fitting process employs Equation (1), which represents a standard Gaussian distribution:
Here, A denotes the amplitude, μ represents the mean, and stands for the standard deviation. The undetermined coefficients in the Gaussian model are determined by minimizing the residual between the fitted curve and the histogram using the nonlinear least-squares method (Levenberg–Marquardt algorithm). Based on the fitted Gaussian curve, a preliminary denoising process is performed within a neighborhood radius of centered at .
- (2)
Construction of the exponential attenuation function and Iterative Refinement Denoising
The iterative denoising method based on the elevation difference probability attenuation model analyzes the distribution characteristics of elevation differences to construct an attenuation function model and enhances denoising performance through multiple iterations. Accounting for photon limited penetration depth and refraction effects, the algorithm effectively utilizes the dense Gaussian distribution formed by sea surface photons. By fitting the elevation histogram’s height, it successfully separates noise from genuine water surface photon signals. Building upon the assumption of continuous and smooth water surfaces, the algorithm establishes an exponential attenuation function model. Through calculating elevation differences between adjacent photons and analyzing their frequency distribution, it is found that noise photons typically exhibit significantly larger elevation variations than signal photons while being considerably fewer in number, resulting in a “long-tail” distribution pattern in the difference histogram. Based on the probability distribution of elevation differences, the exponential decay function is fitted using the Levenberg–Marquardt (LM) algorithm. The denoising threshold is determined according to the asymptote of the fitted exponential function for elevation differences. After performing initial denoising, the algorithm progressively eliminates abnormal high-difference noise by iteratively optimizing the threshold. The detailed procedure is as follows:
(1) First, calculate the elevation difference between two adjacent photons. The formula is as follows:
Here, represents the elevation difference between two adjacent photons, denotes the elevation of the i-th photon, and is the total number of photons.
(2) Frequency calculation: The elevation differences are divided into segments (bins), and the frequency within each segment is counted. The frequency calculation formula is as follows:
Among them, represents the probability density, denotes the count of elevation differences within each segment, is the width of the elevation difference segment, and is the frequency.
(3) Attenuation model fitting: Assuming that the elevation difference distribution follows an exponential attenuation model, the formula is as follows:
Here, represents the amplitude, is the decay rate, is the constant term, denotes the elevation difference, and represents the frequency.
By fitting the exponential decay function using the Levenberg–Marquardt (LM) algorithm, the elevation difference corresponding to the maximum frequency that satisfies the condition is selected as the denoising threshold. After denoising, the elevation differences are recalculated for the remaining photons, and then the denoised data are iteratively reprocessed according to steps (1), (2), and (3). According to the test results, when is used as the denoising threshold, three iterations are generally employed. If the photon distribution is good, only one to two iterations are needed. Therefore, to balance the denoising effectiveness and efficiency, the stopping criterion is set to three iterations.