Improved Filtering of ICESat-2 LiDAR Data for Nearshore Bathymetry Estimation Using Sentinel-2 Imagery

The accurate estimation of nearshore bathymetry is necessary for multiple aspects of coastal research and practices. The traditional shipborne single-beam/multi-beam echo sounders and Airborne LiDAR bathymetry (ALB) have a high cost, are inefficient, and have sparse coverage. The Satellite-derived bathymetry (SDB) method has been proven to be a promising tool in obtaining bathymetric data in shallow water. However, current empirical SDB methods for multispectral imagery data usually rely on in situ depths as control points, severely limiting their spatial application. This study proposed a satellite-derived bathymetry method without requiring a priori in situ data by merging active and passive remote sensing (SDB-AP). It realizes rapid bathymetric mapping with only satellite remotely sensed data, which greatly extends the spatial coverage and temporal scale. First, seafloor photons were detected from the ICESat-2 raw photons based on an improved adaptive Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which could calculate the optimal detection parameters for seafloor photons by adaptive iteration. Then, the bathymetry of the detected seafloor photons was corrected because of the refraction that occurs at the air–water interface. Afterward, the outlier photons were removed by an outlier-removal algorithm to improve the retrieval accuracy. Subsequently, the high spatial resolution (0.7 m) ICESat-2 derived bathymetry data were gridded to match the Sentinel-2 data with a lower spatial resolution (10 m). All of the ICESate-2 gridded data were randomly separated into two parts: 80% were employed to train the empirical bathymetric model, and the remaining 20% were used to quantify the inversion accuracy. Finally, after merging the ICESat-2 data and Sentinel-2 multispectral images, the bathymetric maps over St. Thomas of the United States Virgin Islands, Acklins Island in the Bahamas, and Huaguang Reef in the South China Sea were produced. The ICESat-2-derived results were compared against in situ data over the St. Thomas area. The results showed that the estimated bathymetry reached excellent inversion accuracy and the corresponding RMSE was 0.68 m. In addition, the RMSEs between the SDB-AP estimated depths and the ICESat-2 bathymetry results of St. Thomas, Acklins Island, and Huaguang Reef were 0.96 m, 0.91 m, and 0.94 m, respectively. Overall, the above results indicate that the SDB-AP method is effective and feasible for different shallow water regions. It has great potential for large-scale and long-term nearshore bathymetry in the future.


Introduction
Acquiring detailed bathymetric and topographic information in coastal areas is one of the challenges for both hydrology-related studies and water resource management. High-resolution underwater topography data are a basic reference for a wide range of coastal applications, such as hydrological investigation, nearshore protection, and marine mineral resources exploitation. Conventionally, shipborne single-beam/multi-beam echo sounders and Airborne LiDAR bathymetry (ALB) are mainstream technologies used to collect shallow coastal water data and deliver near-constantly underwater topographies [1][2][3][4][5][6]. However, the disadvantages of these two methods are obvious. They have a high cost, are inefficient, and have sparse coverage [7][8][9][10][11].
Satellite-derived bathymetry (SDB) is a crucial alternative measurement used to map coastal water bodies of the world. Traditional SDB methods in shallow water can be divided into two methodological categories: empirical and physics-based [12]. SDB empirical methods tend to model the radiative transformation in water to process optical images, which are mainly discussed in this paper. SDB empirical methods were first associated with multispectral imagery technology in 1978 [13]. Then, spaceborne multispectral sensors (e.g., Sentinel-2/Multispectral Instrument (MSI), LandSat-1/Multispectral Scanner (MSS), and LandSat-8/Operational Land Imager (OLI)), which are typical passive remote sensing techniques, have been widely used for bathymetry [14][15][16][17][18][19][20]. They can realize bathymetry in optimal conditions (<30 m depth) [21]. However, empirical SDB methods using multispectral imagery technology generally rely on in situ depths as control points, or SDB is independent of control data through complicated physics-based approaches to derive bathymetric data [17,22]. Therefore, it is not feasible to obtain ground, shipborne, or airborne-based measurement data in remote locations of the world. With the development of photon-counting sensors, spaceborne photon-counting LiDARs show many advantages of bathymetry mapping. In conjunction with passive multispectral measurements, spacebased LiDAR can provide complementary, vertical profiling for superior depth penetration and vertical accuracy. The new Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2) has great potential to fill in this gap since it carries the Advanced Topographic Laser Altimeter System (ATLAS)-the first Earth-orbiting laser that is capable of penetrating water at a high resolution in the along-track direction [23].
The National Aeronautics and Space Administration (NASA) launched ICESat-2 on 15 September 2018. ATLAS operates at 532 nm and emits three pairs of beams spaced 3.3 km apart with a 90 m distance within pairs [24]. Each pair of beams consists of a strong beam, and a weak beam based on a 4:1 transmit energy ratio, and each beam samples with an interval of~70 cm along the orbit and only has a~14 m diameter footprint [25,26]. ICESat-2 has the advantage of global coverage, including areas where high-resolution or up-to-date bathymetry is not available [27]. In addition, ATLAS can collect along-track bathymetric points up to 40 m in depth in very clear water [23]. Therefore, the bathymetric photon points from ICESat-2 can provide complementary datasets which work as in situ depth control points on SDB empirical models.
ICESat-2 ATLAS penetrated the water as an active LiDAR and has been applied to the expression of the Earth's surface topography. ATLAS determined the surface slope on ice sheets and tracked the evolution of a sea ice pack and freeboard in winter [28,29]. ICESat-2 makes considerable improvements, particularly the dense along-track sampling of the surface, which helps obtain highly detailed bathymetry information in shallow water [30]. A method was proposed to estimate the temporal change in water levels and volumes with Multiple Altimeter Beam Experimental LiDAR (MABEL) data [31]. MABEL served as a technology demonstrator of ICESat-2 ATLAS. Based on the actual ICESat-2 data, an adaptive variable ellipse filtering bathymetric method was proposed to precisely identify and separate the photons in the above-water, water surface, and water-column regions [32]. Some studies have combined ICESat-2 ATLAS with passive remote sensing techniques into bathymetry retrieval in recent years [33,34]. The ATLAS dataset could work as a complementary dataset, which offers flexibility on bathymetry. According to the signal characteristics of ATLAS, the clustering density method has been proven to be effective for photon signal processing [35,36]. These studies have proved that this fusion method is now feasible between satellite-based multispectral images and LiDAR data.
However, there are still some barriers to be solved. Due to the optical characteristics of ATLAS, the raw photon data include a great deal of solar background noise photons, which require the development of specialized onboard receiver algorithms that can distinguish Remote Sens. 2021, 13, 4303 3 of 25 signals containing surface returns from those consisting of pure noise [26]. Current methods for valid photons detection partly rely on preview empirical parameters [35] and even classify points manually [36], which cannot be used for batch data processing. In addition, these methods could not overcome the limitations of large scales and different underwater terrains. To address these problems, the adaptive methods for automated processing of the ATLAS photon data are needed and need to be improved to achieve spatial coverage and depth measurements.
In this study, a satellite-derived bathymetry method merging active and passive remote sensing (SDB-AP) in shallow waters and coastal areas is proposed. It merges both active (i.e., ICESat-2) and passive (i.e., Sentinel-2 satellite) satellite datasets and applies an adaptive Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, which greatly extended spatial coverage and bathymetric inversion accuracy. First, we detected seafloor return photons from the ICESat-2 raw photons based on an improved adaptive DBSCAN algorithm. Then, we corrected the detected seafloor photons due to the refraction at the air-water interface. After, we applied a suitable outlier-removal algorithm to remove outlier photons to improve the retrieval accuracy. We divided all of the ICESate-2 seafloor signal photons randomly into two parts: 80% of gridded data as control points were used to train the empirical bathymetric model, and the remaining 20% as well as the in situ data collected by the NOAA's National Centers for Environmental Information (NCEI) were used to quantify our inversion scheme's accuracy. We produced several bathymetric maps over St. Thomas, Acklins Island, and Huaguang Reef using the empirical bathymetric model trained by merging the ICESat-2 data and the Sentinel-2 multispectral images. Finally, we evaluated and analyzed the performance of our method.

Study Areas
Three study regions with different topographic distributions were chosen to obtain the satellite bathymetric map. The first region, named Saint Thomas (St. Thomas), as Figure 1a shows, is the second-largest island of the US Virgin Islands in the eastern Caribbean Sea. The district has a land area of about 83 km 2 [37]. The geographical location of this area is between 18.26 •~1 8.43 • N and 64.80 •~6 5.0 • W. The Continuously Updated Digital Elevation Model (CUDEM) obtained in December 2014 for the US coast developed by NCEI, was used as in situ data to verify the performance of our bathymetric method in St. Thomas [38]. The second study region, as Figure 1b shows, was the surrounding waters around Acklins Island and Long Cay in Southeastern Bahamas. The geographical location of this area is between 22.10 •~2 2.60 • N and 73.90 •~7 4.40 • W. There is a lagoon between Acklins Island and Long Cay. The bottom is mainly reef, sand, and stone. The third study region, as Figure 1c shows, was the Huaguang Reef. Huaguang Reef is one of the main reefs of Xisha Islands in the northwestern South China Sea, about 330 km away from Hainan Island. The geographical location of this area is between 16.13 •~1 6.28 • N and 111.52 •~1 11.86 • E. There are several coral reefs in the region. In all three sites, the in situ bathymetric data are not easy to access to evaluate the empirical SDB model; therefore, we applied ICESat-2 bathymetric photons to evaluate bathymetric maps based on SDB-AP. A total of 80% of the ICESat-2 bathymetric points were used to train the empirical model, and the remaining 20% were used to verify the SDB-AP bathymetric result. In addition, we discuss Chen's method [32] along with our method; thus, we chose two sites (Shanhu Island and Nan Island) and the corresponding ICESat-2 tracks (see Table 1). According to Chen's definition, the photon density of ICESat-2 track data on 2 February 2019, 24 May 2019, and 19 August 2019 are low, medium, and high, respectively. Three types of ICESat-2 data with different density distributions of photons can help to validate the accuracy of these DBSCAN algorithms.
ICESat-2 data with different density distributions of photons can help to validate the accuracy of these DBSCAN algorithms. December 2020, respectively. A total of 80% of the ICESat-2 bathymetric points were randomly used to train the empirical model, and the remaining 20% as well as in situ data were used to verify the SDB-AP Figure 1. Overviews of three study areas: (a) Saint Thomas, US Virgin Islands-the background satellite image belongs to Sentinel-2 acquired on 15 January 2019. All red lines correspond to the laser trajectories of ICESat-2 on 22 November 2018, 10 February 2019, and 13 December 2020, respectively. A total of 80% of the ICESat-2 bathymetric points were randomly used to train the empirical model, and the remaining 20% as well as in situ data were used to verify the SDB-AP bathymetric result.  19 April 2020, 19 July 2020, and 20 August 2020. 80% of the ICESat-2 bathymetric points were used to train the empirical model, while the remaining 20% were used to verify the SDB-AP bathymetric result.

ICESat-2 LiDAR Datasets
The first spaceborne photon-counting LiDAR ATLAS carried by ICESat-2 can provide supplementary, vertical profiling for superior depth penetration and vertical accuracy. ICESat-2 has been collecting data since 14 October 2018. Each captured photon has a unique delta-time, latitude, longitude, and elevation based on the World Geodetic System 1984 (WGS84) ellipsoid datum. ICESat-2 has a set of data products, including the Global Geolocated Photon Level 2A data product ATL03. ATL03 is a large dataset comprising latitude, longitude, and ellipsoidal height for every detected photon event. ICESat-2 photon data we used in this study can be freely downloaded from the National Snow & Ice Data Center website [39]. ATL03 data includes the height above the WGS84 datum (ITRF2014 reference frame) and in the universal transverse Mercator (UTM) [40].
In ATL03 datasets, the "signal_conf_ph" parameter in the /gtx/heights group for each beam refers to the confidence parameter provided to classify photons that are likely surface-reflected signal photons and those that are likely background photons [41]. The "signal_conf_ph" array has five values for each photon, corresponding to five surface types (land, ocean, sea ice, land ice, and inland water). The specific values range from 4 to −2 [42]. The photon classification algorithm is histogram-based which may classify a background photon incorrectly as a signal photon because of the false threshold. The higher the confidence (from 0 to 4) means the higher possibility of the photons being a signal. It is noticeable that the high confidence (4) photons do not include all signal photons, especially seafloor signal photons. We also noticed that the confidence four photons were mostly the signal from the objectives of high reflectivity, such as sea surface and land. To improve the usage of this parameter, an adaptive DBSCAN algorithm was proposed to detect the signal photons without detection radius and a threshold priori, and the confidence of the four photons was used to help determine the positions of the sea surface in this algorithm.

Sentinel-2 Satellite Imagery
Sentinel-2 Level-1C (L1C) imagery products can be freely downloaded in SENTINEL-SAFE format from the US Geological Survey website (USUG) [43]. All of the downloaded L1C images are in the UTM/WGS84 projection, and they are orthophoto images after geometric correction [44]. To reduce random errors caused by the atmosphere, we tried to choose a total of three Sentinel-2 L1C images with less than 20% cloud coverage. Detailed information on all study areas and the ICESat-2 and Sentinel-2 datasets are shown in Table 1.

Adaptive DBSCAN for ICESat-2 Signal Photon Detection
The DBSCAN algorithm has been proven to be a classic and efficient way to detect ground and seafloor photons from photon-counting LiDAR signal returns [19,28]. In the DBSCAN algorithm, when the density of adjacent points in radius ε exceeds the threshold (minpts), the points in the cluster are classified as signals; thus minpts and ε are the key parameters [45]. Standard DBSCAN algorithms rely on preview empirical parameters of ε and minpts [35]. The determination of adaptive threshold minpts and ε could be helpful for automated batch processing and could improve the detection algorithm accuracy. An improved adaptive DBSCAN algorithm for ICESat-2 signal photon detection was proposed as follows, and detailed processing steps are provided in Appendix A.
First, we intercepted the underwater photons and rescaled the current along the track axis to avoid calculation errors from rounding minpts. Then, we divided the dataset into several segments. For each segment, we determined the instant sea surface height (SSH) and calculated the average counts of seafloor signal and noise photons. After that, we counted the noise-signal-dominated frames and noise-dominated frames. We calculated the candidate ε dataset. According to the candidate ε dataset, the average counts of seafloor signal and noise photons and the average counts of noise photons could be computed, and we obtained the candidate minpts dataset. Finally, we selected the optimal ε k o and minpts k o by iteration. Our adaptive DBSCAB codes were completed on the Python 3.9 platform. The flowchart for the adaptive DBSCAN algorithm is shown in Figure 2.

Bathymetric Correction for Detected Seafloor Photons
ICESat-2 products do not consider the refraction and the corresponding change in light speed at the air-water interface. This will produce horizontal and vertical errors in the geolocation recorded in ATL03, resulting in the position being deeper than the real measurement value and farther from the lowest point; thus, ICESat-2 ATLAS needs appropriate refraction corrections.
The refraction correction method proposed by Parrish was applied [23]. The final underwater return photon depth H could be expressed as follows: where Dep is the uncorrected depth of the underwater photons detected by the adaptive DBSCAN method, S is the slant range to the uncorrected bottom return photon location, R is the corrected slant range, θ 1 is the angle of incidence, and θ 2 is the angle of refraction. In Equation (1), β could be expressed as follows:

Outlier Removal for Corrected Seafloor Photons
The purpose of outlier detection is to filter abnormal data in each dataset. In other words, the main goal of outlier detection is to find outliers or noises markedly different from or inconsistent with the normal instances. The corrected seafloor photon points still contained a few noise photons [46]; therefore, we proposed an outlier-removal method for seafloor photons as follows: Wavelet filtering A hard threshold was optimally set according to the noise level estimation of each layer of the wavelet decomposition. II. K-medoids classification The data were divided into three categories by the K-medoids algorithm [47]. In the K-medoids algorithm, such a point would be selected from the current cluster-the minimum sum of the distances from it to all other points (in the current cluster)-as the center point, which allows the cluster size not to vary greatly. III. Outlier removal along the geographic axis The data A were sorted along the ICESat-2 along-track first. For each category, the outliers were detected and eliminated using scaled mean absolute difference MAD, and these outliers were eliminated with a window size of 50. The outliers were defined as the elements that differ from the median by more than three scaled MAD from the median in the window. MAD could be expressed as follows: IV. Outlier removal along the depth axis The remaining data were then processed. The outliers were defined as the elements more than three scaled MAD in data with the window size of 100, and the recognized outliers were removed.
Here are three examples that present the outlier-removal results from the ICESat-2-derived signal photons in Figure 3a-c. The light blue points correspond to the noise photons, and the red points are seafloor photons after noise-outlier removal. It can be noted that our algorithm cleared out most outlier points and reserved seafloor signal photons efficiently.

Bathymetric Correction for Detected Seafloor Photons
ICESat-2 products do not consider the refraction and the corresponding change in light speed at the air-water interface. This will produce horizontal and vertical errors in the geolocation recorded in ATL03, resulting in the position being deeper than the real measurement value and farther from the lowest point; thus, ICESat-2 ATLAS needs ap- Here are three examples that present the outlier-removal results from the ICES derived signal photons in Figure 3a-c. The light blue points correspond to the noise tons, and the red points are seafloor photons after noise-outlier removal. It can be n that our algorithm cleared out most outlier points and reserved seafloor signal pho efficiently.

Matching ICESat-2 Data to Sentinel-2 Images with Different Spatial Resolution
The spatial resolution of ICESat-2 and in situ depth data are higher than that of tinel-2 image data, which means the depth value of each pixel of the Senti

Matching ICESat-2 Data to Sentinel-2 Images with Different Spatial Resolution
The spatial resolution of ICESat-2 and in situ depth data are higher than that of Sentinel-2 image data, which means the depth value of each pixel of the Sentinel-2 bathymetric map may correspond to a great deal of high spatial resolution control points or in situ points in the process of depth inversion. To solve this problem, it is necessary to reduce the high-resolution data to the Sentinel-2 spatial resolution (10 m).
The ICESat-2 data with high spatial resolution were matched with the Sentinel-2 pixel in the WGS-84 coordinate reference system, according to the same latitude and longitude range, and the average value was obtained from the data corresponding to each pixel value of Sentinel-2 data. During the averaging process, the Pauta criterion of the mean plus or minus three standard deviations (SD) was utilized based on the characteristics of a normal distribution for which 99.87% of the data appear within this range [48]. Therefore, each mean depth corresponded to a Sentinel pixel, and the down-sampled dataset was used to build the empirical model.

Atmosphere Correction
Before the application of the SDB empirical model, all the raw Sentinel-2 L1C products in study areas needed to be atmospherically corrected and conveyed to Level-2A (L2A) products containing bottom-of-atmosphere (BOA) corrected reflectance [49]. The Sentinel-2 Data Correction (Sen2Cor) version 2.9 was chosen for atmospheric correction [50]. Sen2Cor is a semi-empirical algorithm based on the Atmospheric and Topographic Correction (ATCOR) and can be applied over water surfaces. European Space Agency (ESA) provides an offline version of the Sen2Cor to produce L2A data. The first step is cloud detection and scene classification, followed by retrieving the Aerosol Optical Thickness (AOT) and the Water Vapor (WV). Finally, Top-Of-Atmosphere (TOA) is converted to Bottom-Of-Atmosphere (BOA) [51]. Considering the spatial resolution for each visible band (B2-497 nm, B3-560 nm, and B4-665 nm), a spatial resolution of 10 m was used in Sen2Cor processing. Thus far, the raw Sentinel-2 L1C image was converted to the processed L2A image.

Spatial Operation
The L2A images needed a spatial operation to simplify our analysis and focus on the region of interest (ROI). First, we resampled all bands to equal resolution. In the resampling, we selected band B2 as the reference band. Then, we could create subsets of ROI in each resampled L2A image.

Clouds, Whitecaps, and Land Pixels Mask
Masks were generated to remove clouds, whitecaps, and land pixels. The Near Infrared (NIR) band is used to mask clouds, whitecaps, and most land pixels; since the NIR Remote Sens. 2021, 13, 4303 9 of 25 does not penetrate the water, the water seems darker, which helps to distinguish the water from the things that look brighter. We used a user-set threshold value (0.05) of the NIR band and created the mask with this threshold; by applying this mask to each band, we created a masked image that only contains sea with original values and invalid value NaN for the ocean and the others, respectively. For the land pixels mask, we downloaded a course Space Shuttle Radar Topography Mission (SRTM) 5-min Digital Elevation Model [43], which can determine whether the pixel is on land. Then, we created a new image so that all ocean pixels would be preserved, and all land pixels would be set to NaN, and all Sentinel-2 images were projected to the same WGS-84 coordinate reference system before empirical bathymetric inversion.

Empirical SDB Retrieval
Finally, the ratio model was conducted to calculate seafloor depths using the depth control points from ICESat-2-derived bathymetry described in Section 3.3. The ratio model is expressed as follows: where Z is the estimated depth, m 1 is a tunable constant to scale the ratio to depth, n is a fixed constant for all areas (usually 1000), R w is the reflectance of water for bands i or j, and m 0 is the offset for a depth of 0 m (Z = 0). m 0 and m 1 were calculated by regression using the depth control points from ICESat-2 derived bathymetry. The flowchart of the SDB-AP method is shown in Figure 4. The whole SDB-AP codes and partly ICESat-2 bathymetry data are available online in Supplementary Materials. where is the estimated depth, is a tunable constant to scale the ratio to depth, a fixed constant for all areas (usually 1000), is the reflectance of water for bands , and is the offset for a depth of 0 m ( 0). and were calculated by reg sion using the depth control points from ICESat-2 derived bathymetry.
The flowchart of the SDB-AP method is shown in Figure 4. The whole SDB-AP co and partly ICESat-2 bathymetry data are available online in Supplementary Materials

Bathymetric Retrieval by ICESat-2 Data
Based on our proposed algorithm in Sections 3.1 and 3.2, the ICESat-2 signal phot were detected. Figure 5 provides ICESat-2 ground trajectory images, and correspond

Bathymetric Retrieval by ICESat-2 Data
Based on our proposed algorithm in Sections 3.1 and 3.2, the ICESat-2 signal photons were detected. Figure 5 provides ICESat-2 ground trajectory images, and corresponding detected elevation profiles over the St. Thomas site, Acklins Island site, and Huaguang reef site. Figure 5a,c,e demonstrates the enlarged satellite image and the sampled areas corresponding to the green box. Figure 5b,d,f illustrates the detected ICESat-2 underwater photons with and without refraction correction.
Figure 5b, significant seafloor signal photons were detected by ICESat-2 in accordance with underwater topography. For each laser shot, the refraction effect in the water column was corrected by the equations in Section 3.2. The maximum profile of the refraction-corrected bottom return photons was about ~70 m, while ICESat-2 detected seafloor signal photons nearly reached ~80 m. Therefore, without the refraction correction in vertical elevation, the same water level errors influence the bathymetry results. The maximum bathymetric depth over St. Thomas was larger than that over the other two sites. Figure 5c shows that ICESat-2 flew over this area at the local time of 09:17:25 on 13 June 2019, from south to north. The trajectory was over two land margins of Long Cay. In Figure 5d, it is noticeable that our DBSCAN results matched well with underwater topography, and even the seafloor details were scanned accurately. The maximum refraction-corrected return photons were up to the water depth of ~20 m. Figure 5e shows that ICESat-2 flew over this area at the local time of 13:51:59 on 22 February 2019, from north to south. The satellite passed by a part of a reef and then went to a deep-water area. In Figure 5f, our DBSCAN results were consistent with underwater topography, and the maximum refraction-corrected return photon depth was ~8 m. The maximum bathymetric depth over Huaguang Reef was the smallest.    Figure 5b, significant seafloor signal photons were detected by ICESat-2 in accordance with underwater topography. For each laser shot, the refraction effect in the water column was corrected by the equations in Section 3.2. The maximum profile of the refraction-corrected bottom return photons was about~70 m, while ICESat-2 detected seafloor signal photons nearly reached~80 m. Therefore, without the refraction correction in vertical elevation, the same water level errors influence the bathymetry results. The maximum bathymetric depth over St. Thomas was larger than that over the other two sites. Figure 5c shows that ICESat-2 flew over this area at the local time of 09:17:25 on 13 June 2019, from south to north. The trajectory was over two land margins of Long Cay. In Figure 5d, it is noticeable that our DBSCAN results matched well with underwater topography, and even the seafloor details were scanned accurately. The maximum refraction-corrected return photons were up to the water depth of~20 m. Figure 5e shows that ICESat-2 flew over this area at the local time of 13:51:59 on 22 February 2019, from north to south. The satellite passed by a part of a reef and then went to a deep-water area. In Figure 5f, our DBSCAN results were consistent with underwater topography, and the maximum refraction-corrected return photon depth was~8 m. The maximum bathymetric depth over Huaguang Reef was the smallest.  Figure 6a, respectively. The water depth in the north was shallower than that in the south, and the coastal water depth was shallow. The average depth was 18.83 m in total. Over the Acklins Island site, we present a bay within the depth of 5 m, where the average bathymetric depth was 2.83 m. From west to east, the bathymetric depths presented a fluctuating state, and the margin of the area was towards the deep water in Figure 6b. The maximum bathymetric depth was 9.87 m. Over the Huaguang Reef site in Figure 6c, which is a submerged reef in the South China Ocean, the maximum, minimum, and average bathymetric depths were 13.88 m, 0.37 m, and 3.08 m, respectively.   Figure 6a, respectively. The water depth in the north was shallower than that in the south, and the coastal water depth was shallow. The average depth was 18.83 m in total. Over the Acklins Island site, we present a bay within the depth of 5 m, where the average bathymetric depth was 2.83 m. From west to east, the bathymetric depths presented a fluctuating state, and the margin of the area was towards the deep water in Figure 6b. The maximum bathymetric depth was 9.87 m. Over the Huaguang Reef site in Figure 6c, which is a submerged reef in the South China Ocean, the maximum, minimum, and average bathymetric depths were 13.88 m, 0.37 m, and 3.08 m, respectively.

Validation
For all study sites, 80% of the detected ICESat-2 derived bathymetric photon points were used as model-training points, and the remaining 20% were used to validate the method's validation accuracy. The in situ data over the St. Thomas site were also employed to quantify the performance of SDB-AP. In the following error scatter plots, the red line is the 1:1 line, and the blue line represents the regression line.
is the number of

Validation
For all study sites, 80% of the detected ICESat-2 derived bathymetric photon points were used as model-training points, and the remaining 20% were used to validate the method's validation accuracy. The in situ data over the St. Thomas site were also employed to quantify the performance of SDB-AP. In the following error scatter plots, the red line is the 1:1 line, and the blue line represents the regression line. N is the number of the training gridded bathymetric points from ICESat-2, and the regression equation, R 2 , and RMSE details are also shown in the figures.
In Figure 7, the ICESat-2-derived bathymetric depths matched well with the in situ data over St. Thomas, with a good agreement of R 2 = 0.9951 and a low standard Root Mean Squared Error (RMSE) at 0.68 m, which proves that our adaptive DBSCAN and outlierremoval algorithm are effective. For SDB-AP-estimated depths and the in situ data over St. Thomas, R 2 was 0.93, and the RMSE was 1.91 m, as shown in Figure 8. Compared with the ICESat-2-derived bathymetric depths, the bathymetric accuracy of SDB-AP estimated depth decreases due to the error accumulation effects, including the ICESat-2 inversion error, the spatial matching error between ICESat-2 and Sentinel data, the Sentinel image process error, the empirical SDB model error, etc. Among them, the empirical SDB model error is the main error source; generally, the error for the empirical SDB model is about 2 m [15,17,18,20].  Table 2 corresponds to the information for Figure 9a-c, and Table 3 corresponds to the information for Figure 9d-f. It appears that the N, RMSEs and Mean Absolute Error (MAEs) rose with the increase in water depth, while the R 2 fell, which revealed that there were large errors in deep water (>20 m) and that the SDB is more feasible for shallow water within a depth of 10 m. In Figure 7, the ICESat-2-derived bathymetric depths matched well with the in situ data over St. Thomas, with a good agreement of R 2 = 0.9951 and a low standard Root Mean Squared Error (RMSE) at 0.68 m, which proves that our adaptive DBSCAN and outlierremoval algorithm are effective. For SDB-AP-estimated depths and the in situ data over St. Thomas, R 2 was 0.93, and the RMSE was 1.91 m, as shown in Figure 8. Compared with the ICESat-2-derived bathymetric depths, the bathymetric accuracy of SDB-AP estimated depth decreases due to the error accumulation effects, including the ICESat-2 inversion error, the spatial matching error between ICESat-2 and Sentinel data, the Sentinel image process error, the empirical SDB model error, etc. Among them, the empirical SDB model error is the main error source; generally, the error for the empirical SDB model is about 2 m [15,17,18,20]. Figure 6d-f shows the performance of SDB-AP estimated depths when the ICESat-2-derived bathymetric points were used as testing data in the three study areas. The R 2 in St. Thomas site reached the top (R 2 = 0.96), followed by that in the Huaguang Reef site and Acklins Island site, 0.94 and 0.91, respectively. All RMSEs in the St. Thomas site, Acklins Island site, and Huaguang Reef site were less than 10% of the maximum depths, and the smallest RMSE was in the Acklins Island site with the value of 0.27 m.      Table 2 corresponds to the information for Figure 9a-c, and Table 3 corresponds to the information for Figure 9d-f. It appears that the , RMSEs and Mean Absolute Error (MAEs) rose with the increase in water depth, while the R 2 fell, which revealed that there were large errors in deep water (>20 m) and that the SDB is more feasible for shallow water within a depth of 10 m.   Figure 10a,b shows the results of the detected ICESat-2 signal photons based on our algorithm and the standard DBSCAN algorithm. Retrieval by both methods followed the   Figure 10a,b shows the results of the detected ICESat-2 signal photons based on our algorithm and the standard DBSCAN algorithm. Retrieval by both methods followed the underwater terrain. However, the result of the standard DBSCAN with fixed ε and Minpts reserved plenty of noisy photons near the sea surface compared with our adaptive DBSCAN algorithm. This may be because the photon density in deep water is much higher than that in shallow water; hence, the DBSCAN with fixed ε and Minpts is suitable for deep water while retaining many noise photons in the high-density sea surface. Figure 11a,b shows the outlier-removal results over the St. Thomas site based on the two methods. The seafloor return photons are light blue, and the outlier-removal signal photons are red. The results showed that our outlier removal algorithm removed most noisy photons near the sea surface, while there were still some outliers by the standard method in Figure 11b, which would reduce water depth estimation accuracy. In Figure 11a, the most outlier photons were clearly eliminated. underwater terrain. However, the result of the standard DBSCAN with fixed and reserved plenty of noisy photons near the sea surface compared with our adaptive DBSCAN algorithm. This may be because the photon density in deep water is much higher than that in shallow water; hence, the DBSCAN with fixed and is suitable for deep water while retaining many noise photons in the high-density sea surface. Figure 11a,b shows the outlier-removal results over the St. Thomas site based on the two methods. The seafloor return photons are light blue, and the outlier-removal signal photons are red. The results showed that our outlier removal algorithm removed most noisy photons near the sea surface, while there were still some outliers by the standard method in Figure 11b, which would reduce water depth estimation accuracy. In Figure 11a, the most outlier photons were clearly eliminated.  After outlier removal, we used the gridded ICESat-2 bathymetric photon points as model-training data to train the band ration model. Figure A1a,b in Appendix A corresponds to the bathymetric maps derived from the data by our adaptive DBSCAN algorithm and the standard DBSCAN algorithm, respectively. Figure A1c shows the bathymetric map with the in situ data. The dark grey line means the trajectory of ICESat-2 on 13 December 2020 in Figure A1. The trend of the water depth in Figure A1a is consistent with that in Figure A1c. The average water depth found using our method in Figure A1a was 12.68 m, which is close to the in situ with an average depth of 13.98 m in Figure A1c. underwater terrain. However, the result of the standard DBSCAN with fixed and reserved plenty of noisy photons near the sea surface compared with our adaptive DBSCAN algorithm. This may be because the photon density in deep water is much higher than that in shallow water; hence, the DBSCAN with fixed and is suitable for deep water while retaining many noise photons in the high-density sea surface. Figure 11a,b shows the outlier-removal results over the St. Thomas site based on the two methods. The seafloor return photons are light blue, and the outlier-removal signal photons are red. The results showed that our outlier removal algorithm removed most noisy photons near the sea surface, while there were still some outliers by the standard method in Figure 11b, which would reduce water depth estimation accuracy. In Figure 11a, the most outlier photons were clearly eliminated.  After outlier removal, we used the gridded ICESat-2 bathymetric photon points as model-training data to train the band ration model. Figure A1a,b in Appendix A corresponds to the bathymetric maps derived from the data by our adaptive DBSCAN algorithm and the standard DBSCAN algorithm, respectively. Figure A1c shows the bathymetric map with the in situ data. The dark grey line means the trajectory of ICESat-2 on 13 December 2020 in Figure A1. The trend of the water depth in Figure A1a is consistent with that in Figure A1c. The average water depth found using our method in Figure A1a  After outlier removal, we used the gridded ICESat-2 bathymetric photon points as model-training data to train the band ration model. Figure A1a,b in Appendix A corresponds to the bathymetric maps derived from the data by our adaptive DBSCAN algorithm and the standard DBSCAN algorithm, respectively. Figure A1c shows the bathymetric map with the in situ data. The dark grey line means the trajectory of ICESat-2 on 13 December 2020 in Figure A1. The trend of the water depth in Figure A1a is consistent with that in Figure A1c. The average water depth found using our method in Figure A1a was 12.68 m, which is close to the in situ with an average depth of 13.98 m in Figure A1c. The estimated water depths in Figure A1b (with an average water depth of 10.22 m) were shallower than the in situ water depths. The derived bathymetric map created using our adaptive DBSCAN algorithm was generally well derived. Moreover, the maximum bathymetric depth in Figure A1b was only 13.25 m, which is far shallower than in situ with 18 m. Figure A2a,b in Appendix A shows the error scatter plots using the two DBSCAN methods over the St. Thomas site. The results derived by our adaptive DBSCAN algorithm showed that the R 2 was 0.96 and that the RMSE was 1.14 m, indicating that our algorithm is consistent with the in situ data. However, for the results derived by the standard DBSCAN algorithm, the R 2 was 0.94, and the RMSE was 1.94 m.

Impact of Outlier Removal on Bathymetry Accuracy
As shown above, although the adaptive DBSCAN algorithm could track the underwater terrain well, there were still some noise photons remaining near the sea surface and seafloor signals. To verify the influence of these noise outliers for overall bathymetry accuracy, we generated the bathymetric maps at the St. Thomas site without outlier removal in Figure A3a in Appendix A. To compare the results, bathymetric maps for the same location obtained from the SDB-AP method and the in situ data were also generated ( Figure A3b,c). The estimated water depth in Figure A3a was much shallower than in the other two maps. In addition, it could be noticed that depths in shallow water (2-5 m) were close to the in situ data. The large differences are likely that the outlier photons near sea surface introduce errors for the bathymetry inversion of deep water. Figure A4a-c in Appendix A shows the error scatter plots based on different methods without using the outlier-removal process. They showed that the estimation result stayed away from in situ values without using the outlier removal method, and the RMSEs without outlier removal were 13.90 m, 11.87 m, and 8.73 m, which revealed that the outlier-removal method is significant for SDB.

Stability of SDB-AP
To verify the stability of our method, we downloaded six Sentinel-2 images on 1 March 2016, 21 November 2016, 21 March 2019, 12 September 2019, 4 April 2020, and 3 May 2021 and retrieved the bathymetric maps over St. Thomas Island using the SDB-AP method. Figure A5a-f in Appendix A shows the six error scatter plots. Over these six dates, the result on 12 September 2019 illustrated the best fitness with in situ data with an R 2 of 0.94 and an RMSE of 1.88 m, while the result on 1 March 2016 was relatively the worst, with an R 2 of 0.91 and an RMSE of 2.06 m. It should be noted that the deviation between different dates may be caused by many reasons, such as satellite remote-sensing reflection difference or the tide [52][53][54]. For each date, the R 2 and RMSEs were also calculated and shown in the top-left corners. The mean R 2 was 0.93, and the mean RMSE was 2.00 m, which is less than 10% of the maximum depth. These key regression equation parameters mean great temporal consistency of SDB-AP over different dates.

Comparison with an Adaptive Variable Ellipse Filtering Bathymetric Method
As we mentioned above, several great DBSCAN algorithms for ICESat-2 photon detection were proposed. For comparison, we selected a similar method named Adaptive Variable Ellipse-Filtering Bathymetric Method (AVEBM) proposed by Chen [32]. AVEBM can precisely identify and separate photons in the above-water, water surface, and water-column regions, and it is based on the characteristics and changes in the density distribution of water-column photons with increasing water depth. Figure A6a-i in Appendix A shows the ICESat-2 photon detection results based on our adaptive DBSCAN algorithm, AVEBM, and the standard DBSCAN algorithm on 2 February 2019, 24 May 2019, and 19 August 2019. The detected seafloor signal photons are red, the detected sea surface is dark blue, and the noise photons are light blue.
As for the three photons in different regions, the detection results indicated that our adaptive DBSCAN algorithm and the AVEBM could extract seafloor photons accurately. Compared with the other two methods, the standard DBSCAN had the worst detection performance: it ignored minor photons at the end of the underwater topography for low-density data and counted too many background noisy photons for medium-and high-density data.
One of the reasons why our adaptive DBSCAN algorithm could obtain similar results to AVEBM is that we rescaled the current along-track axis according to Equation (2) before density clustering, which increases the detection accuracy of the photon signal using DBSCAN. However, for medium-and high-density photon data, there were still some noise photons remaining along the seafloor signal and sea surface signal in the results from our adaptive DBSCAN. They could be deleted in outlier-removal steps. It was also noticed that there was some small discontinuity in the detected sea-surface signal photons in the results using our adaptive DBSCAN and the AVEBM due to spatial photon distribution change.

Comparison with Different Methods Deriving Bathymetry from Sentinel-2
In recent research, several approaches have been applied to estimate bathymetry from Sentinel-2, such as extraction wave propagation information or deep learning [55,56]. Deep learning could identify features in satellite images. A deep learning method named Deep Single-Point Estimation of Bathymetry (DSPEB) was proposed [56] and used a convolutional neural network to estimate the average depth of individual local areas. Since this method processes a single small sub-tile at a time, it has demonstrated impressive capabilities for its applicability to many coastal areas. Here, a simple comparison was needed to develop and quantify the performance of classic models and the deep learning method deriving bathymetry from Sentinel-2.
Therefore, we utilized the Neural Clustering tool in MATLAB 2020B to compare the performance of bathymetry estimation of classic models and deep learning. We first defined a 10-layer network. The ICESat-2 corrected seafloor detection results and corresponding above-water surface remote sensing reflectance at the blue and green band from Sentinel-2 images which were used to train the network. The trained network was applied to derive a bathymetric map over the St. Thomas site. Another classic linear model [13] was also applied to generate the bathymetric map over St. Thomas. Figure A7a,b in Appendix A shows two bathymetric maps derived from Sentinel-2 using the linear model and deep learning method over the St. Thomas site. The maximum bathymetric depths were 31.02 m and 33.23 m in Figure A7a,b. The trend of the linear model results in Figure A7a was similar to that in Figure 6a using the band ratio model, and its average bathymetric depth was 19.23 m. It is clear that the map using the deep learning method presents a deeper bathymetric result in the edges of the selected field with an average depth of 22.88 m.
To compare the performance of the linear band model and deep learning with the band ratio model over St. Thomas,Tables 4 and A1 in Appendix A list the detailed error analysis in different depth ranges using two methods. Comparing Tables 2 and A1 in Appendix A, the linear model and the band ratio model had approximate accuracy in the St. Thomas site. In addition, both mathematical models had poor inversion accuracy in deep water. In Table 4, it should be noted the result using the deep learning method had the best bathymetric accuracy in terms of R 2 . The reason for this may be that the location of points for the error analysis corresponds to the location of points used to train the network.
In terms of the RMSE, the deep-learning method had a worse accuracy at 0-10 m, and the reason may be that there were insufficient training points. However, it had better performance in the water depth range of 10 to 30 m. In future research, the deep-learning method is superior for deep-water bathymetry.

Conclusions
In this study, we proposed a satellite-derived bathymetry method by merging active and passive remote sensing in shallow waters and coastal areas. The results showed that the ICESat-2 bathymetric photons reached excellent accuracy with in situ data, and the corresponding RMSE was only 0.99 m. Moreover, the accuracy between the SDB-AP estimated depths and in situ data was also good, and the RMSE was 0.93 m. The RMSEs between the SDB-AP estimated depths and the ICESat-2 bathymetry results over St. Thomas, Acklins Island, and Huaguang Reef were 0.96 m, 0.91 m, and 0.94 m, respectively. This reveals that the SDB-AP method is feasible and efficient. In addition, an adaptive DBSCAN algorithm for raw ICESat-2 photon detection was proposed to adjust the optimal parameters in different underwater topographies. Compared with the standard DBSCAN method, the applied algorithm improved the ICESat-2 bathymetric accuracy and has extended its scope of application. The algorithm showed that the estimation result stays away from in situ values without using the outlier-removal method, revealing that the outlier-removal method is significant for SDB. A suitable outlier removal algorithm was proposed to remove the outlier photons points for the detected ICESat-2 seafloor photons here, which could improve clustering accuracy. The accuracy of SDB-AP was mainly limited to the empirical SDB model error. Further investigation is needed to validate the SDB-AP method in various regions, and the machine learning method is worthy of more investigation for seafloor photon detection and bathymetry retrieval in the future.
As mentioned above, ATLAS can collect along-track bathymetric points up to 40 m in depth in very clear water. Before the following processes, the raw ICESat-2 ATL03 data should be intercepted roughly in the vertical direction, including the seafloor, sea surface, and land returns. Its vertical distance range is with [y min , y max ], where y min and y max are the chosen minimum elevation and the maximum elevation, respectively (y max − y min = y win < 40).
S2. Rescaling the current along-track axis The along-track spanning distance in this data segment is x win , and x win could be described as follows: where x min and x max are the minimum value and the maximum value of the current along-track axis, respectively. As the order of magnitude x win and y win differed greatly, to avoid the calculation error of rounding minpts, the along-track axis is divided t i times S3. Dividing the dataset The ATL03 raw data photon datasets are divided into several segments for processing. Every continuous N (usually 5000) points along the along-track direction are a dataset D.
S4. Determining the instant sea surface height (SSH) The "signal_conf_ph" parameters in the ICESat-2 ALT03 data are also introduced. The photons with confidence level of 4 are divided into different bins with a resolution of 0.1 m in the vertical direction. The largest bin would be considered the bin containing the sea surface. We took the median value of this bin as the elevation of sea surface height S su f in the WGS84 ellipsoidal height.
For the sea surface photon accumulation, S up and S down are the upper and lower layers of the sea surface photon, respectively. S up and S down could be set as follows: S5. Calculating the average counts of seafloor signal and noise photons The sea surface photons above S down are eliminated. The average number pho of seafloor signal and noise photons is calculated as follows: where N t is the number of remaining total photons after eliminating the photons above S down . S6. Counting noise-signal-dominated frames For each dataset, the original photons are divided into M frames in the vertical direction, and the height of each frame in the vertical elevation direction is h (usually 5 or 10 m). If the number of photons in the photon number in the current frame are bigger than pho, the current frame is considered as controlling by both noise and signal photons. The photons numbers of these frames are N 1 in total, and the number of these vertical frames is M 1 .
S7. Counting noise-dominated frames If the number of photons in the photon number in the current frame is smaller than pho, the current vertical frame is considered controlled by noise photons. The photons numbers of these frames are N 2 in total. The number of these vertical frames is as follows: The candidate dataset ε of dataset D were obtained as follows: 1.
Compute the Euclidean distance matrix Dist N×N from i to j for all points in dataset D.
Sort each row element in the distance matrix Dist n×n in ascending order. The first column of matrix Dist N×N represents the distance from the object to itself and the elements in column k constitute the K-nearest neighbor distance vector D K of all points. 3.
Calculate the vector D k mean value D k . Calculate all k (k = 1, 2, . . . , N) to obtain candidate radius dataset D ε , which is expressed as follows: The values of the candidate ε dataset less than 0.4 would be eliminated.
S9. Calculating the average counts of seafloor signal and noise photons For each radius ε k , the average number of seafloor signal and noise photon N sn in the circular region is expressed as follows: where S is the area of this circular region with radius ε k . ρ 1 is the number of seafloor signal and noise photons per unit area, and x win is the reduced along-track spanning distance in this data segment. S10. Calculating the average counts of noise photons The average number N n of noise photons in the circular region with radius ε k was expressed as follows: where ρ 2 is the number of noise photons per unit area. S11. Calculating the candidate minpts dataset As the definition of optimum noise threshold [57], the minpts k corresponding to ε k could be expressed as: where N s is signal counts, N b is the noise counts per frame; and N a = N s + N b is the total mean photoelectron count, equal to the sum of the mean signal N s and noise counts N b . The function round( ) rounds values to the nearest integer. S12. Selecting the optimal cluster number Looping through k values: the ε k and minpts k with different k values are input into the DBSCAN algorithm to cluster the dataset D, and the numbers of clusters corresponding to k are generated and recorded under different k values. When the number of generated clusters are the same three times in succession, the clustering results are judged to be stable, and the current cluster number N clu is the optimal number. S13. Selecting the optimal k value S12 is repeated until the number of generated clusters is no longer N clu , the maximum k value k o is selected as the optimal value and ε k o and minpts k o are the optimal parameters for DBSCAN. Then, input the optimal parameters into the DBSCAN calculator to calculate the current segment results. S14. Processing the next data segments Repeat S4-13 until all data segments are processed.