Assessment of Empirical Algorithms for Shallow Water Bathymetry Using Multi-Spectral Imagery of Pearl River Delta Coast, China

: Pearl River Delta (PRD), as one of the most densely populated regions in the world, is facing both natural changes (e.g., sea level rise) and human-induced changes (e.g., dredging for navigation and land reclamation). Bathymetric information is thus important for the protection and management of the estuarine environment, but little effort has been made to comprehensively evaluate the performance of different methods and datasets. In this study, two linear regression models—the linear band model and the log-transformed band ratio model, and two non-linear regression models—the support vector regression model and the random forest regression model—were applied to Landsat 8 (L8) and Sentinel-2 (S2) imagery for bathymetry mapping in 2019 and 2020. Results suggested that a priori area clustering based on spectral features using the K-means algorithm improved estimation accuracy. The random forest regression model performed best, and the three-band combinations outperformed two-band combinations in all models. When the non-linear models were applied with three-band combination (red, green, blue) to L8 and S2 imagery, the Root Mean Square Error (Mean Absolute Error) decreased by 23.10% (35.53%), and the coefﬁcient of determination (Kling-Gupta efﬁciency) increased by 0.08 (0.09) on average, compared to those using the linear regression models. Despite the differences in spatial resolution and band wavelength, L8 and S2 performed similarly in bathymetry estimation. This study quantiﬁed the relative performance of different models and may shed light on the potential combination of multiple data sources for more timely and accurate bathymetry mapping.


Introduction
70% of the Earth's surface is covered by ocean, and the total length of global coastlines is more than 1.16 million kilometers [1]. Nearshore bathymetry is critical to monitoring nearshore activities, such as port facility management, fishing, movement of deposited sediments and pipeline laying [2,3]. Timely and accurate bathymetric information is also necessary to support effective resource policy making and management for coastal environment, as well as to assure human safety and welfare [4].
Bathymetry is traditionally mapped via in situ measurements, such as vessel-based acoustic echo sounding, to generate accurate point measurements or depth profiles along transects [5]. However, the echo sounder survey is constrained by its time-consuming nature and the high operational cost [6]. In addition, due to the swath angle and the distance between the survey vessel and shore, the survey system will result in a blind area on the shore terrain outside the swath coverage [7]. As a result, the echo sounder survey cannot be applied in the shallow waters with complex underwater terrain. Remote sensing techniques are able to derive nearshore bathymetry over varying spatial and temporal scales efficiently and repeatedly over large areas. Satellite-derived bathymetry (SDB) are generally divided into two broad categories: the microwave imagery-based methods and the multi-spectral imagery-based methods [2]. The microwave imagery-based methods use radar backscatters to capture variations of sea surface velocity and estimate water depth through the interactions between tidal flow and sea bottom topography. The accuracy of this method is heavily influenced by wind speed, and is vulnerable to the speckle-infested radar imagery [8].
Unlike microwave remote sensing, the principle of optical bathymetry mapping is to determine water depth using the total amount of radiative energy reflected by the water column [9]. The spectral bands of optical satellite imagery (e.g., blue and green bands) normally feature strong penetration capacity through clear water. With higher concentration of suspended sediments in the water body the reflection peak of the water body shifts to longer wavelength [10,11]. As the incident solar radiation propagates through water, it is partly scattered and absorbed by water and in-water constituents, and partly backscattered and recorded by multi-spectral sensors. Multi-spectral imagery thus enables the production of finer resolution bathymetric information than the microwave-based methods, though the range of detectable depth is reduced [12,13].
The commonly used optical bathymetry mapping algorithms mainly consist of analytical models and empirical models [2,14]. The analytical models adopt the flow radiative transfer models that take the sunlight and atmospheric variations into consideration. Since such models are constrained by single band information, they are not applicable to areas with variant optical properties and bottom reflectance [15][16][17]. In the empirical approaches, multi-spectral bands are combined to establish linear/non-linear regression models between remotely sensed radiance of the water body and the observed water depth [18,19]. Compared with analytical models, empirical models are able to account for the water conditions (e.g., water clarity, bottom reflectance) as well as the atmospheric conditions (e.g., atmospheric absorption and scattering effect) in a more efficient way [20,21].
However, it is worth noting that the accuracy of optical bathymetry is subject to influence from the spatial, spectral and radiometric resolution. Medium resolution satellite data (e.g., Landsat TM, Landsat 8) and high resolution satellite data (e.g., SPOT, RapidEye) have been applied in previous bathymetry studies, and have achieved different accuracies [22][23][24]. Finer spatial resolution does not guarantee higher accuracy in bathymetry retrieval. Instead, the radiometric resolution of selected spectral bands as well as the water turbidity are at least as important as the spatial resolution in affecting the accuracy of bathymetry retrieval [25,26]. In particular, water tides, waves, sea floor materials and water clarity exert impacts on the reflected radiation. It is thus imperative to integrate different levels of water depth samples with different spectral bands to explore the influence of water conditions on bathymetry retrievals [27][28][29].
Sentinel-2 (S2) imagery, with a 5-day revisit, 10 m multi-spectral resolution, offers a new source of freely available dataset in bathymetry retrieval. The imaging swath of S2 data is 290 km wide, much wider than that of Landsat 8 (L8) and WorldView-2 [30]. It thus has the advantage of a wider range of view angles and facilities a larger simultaneously mapped area [31]. Preliminary results have suggested that S2 satellite data are potentially useful for mapping inland water quality, coral reefs, as well as bottom depth with varied degrees of success [30,32,33]. While the mixed resolution of optical spectral bands (e.g., 60 m for the equivalent of the deep blue band from L8) in S2 satellite still calls for further evaluation of its capability in bathymetry mapping [34,35].
With the advantages in spatial coverage, revisit frequency and availability, L8 and S2 have been individually explored in SDB in many previous studies. However, their spectral discrepancy as well as their capacity in retrieving bathymetry over large areas of shallow water at different time periods have not been explicitly evaluated. In this study, we incorporated multi-spectral information from L8 and S2 data to derive bathymetry in a large coastal region of the Pearl River Delta (PRD), China from 2019 to 2020 with Google Earth Engine (GEE). Four widely used empirical models, namely the linear band model [36], the log-transformed band ratio model [23], the support vector regression model [37] and the random forest regression model [38] were adopted. To evaluate model performance with areal spectral features, all models were applied both with and without area segmentation. It is the first study for satellite-derived bathymetry near the PRD coast (over 3000 km 2 ) validated with a large volume of validation data (over 600 ground samples). The novelty of this study lies in the comprehensive comparison of the bathymetry estimation based on two satellite products, in which different pre-processing steps, multispectral band combinations and spatial regression techniques were investigated. A thorough understanding of the strengths and weaknesses of the L8 and S2 data in bathymetry mapping, as well as the relative performance of different methods will facilitate bathymetry estimation under different environmental conditions and may serve as a reference for the potential combined use of both data sets for higher resolution, more timely bathymetry mapping.

Study Area
The Pearl River, as one of the 25 largest rivers in the world and the third-largest river in China, extends from 21 • N to 24 • N, 112 • E to 115 • E. The study area ( Figure 1) covers the Lingding Bay and the Huangmao Bay, which are the two biggest sub-estuaries of the PRD. The region features a subtropical climate, and the annual precipitation varies between 1500 and 2000 mm. The Pearl River receives discharge and sediment from the West River, the North River and the East River. In recent years, intensive human activities, such as aquaculture and human-induced land use and land cover change, have dramatically changed the morphology and altered the hydrodynamics [39]. For example, the average water depth increased from 4.2 to 4.6 m between 1980 to 2010 due to decreasing sediment input from dam construction in the Lingding Bay, and the mean tidal range is amplified from 1.1 m near the mouth to 3.7 m near the top of the bay [38].
tral discrepancy as well as their capacity in retrieving bathymetry over large areas of shallow water at different time periods have not been explicitly evaluated. In this study, we incorporated multi-spectral information from L8 and S2 data to derive bathymetry in a large coastal region of the Pearl River Delta (PRD), China from 2019 to 2020 with Google Earth Engine (GEE). Four widely used empirical models, namely the linear band model [36], the log-transformed band ratio model [23], the support vector regression model [37] and the random forest regression model [38] were adopted. To evaluate model performance with areal spectral features, all models were applied both with and without area segmentation. It is the first study for satellite-derived bathymetry near the PRD coast (over 3000 km 2 ) validated with a large volume of validation data (over 600 ground samples). The novelty of this study lies in the comprehensive comparison of the bathymetry estimation based on two satellite products, in which different pre-processing steps, multispectral band combinations and spatial regression techniques were investigated. A thorough understanding of the strengths and weaknesses of the L8 and S2 data in bathymetry mapping, as well as the relative performance of different methods will facilitate bathymetry estimation under different environmental conditions and may serve as a reference for the potential combined use of both data sets for higher resolution, more timely bathymetry mapping.

Study Area
The Pearl River, as one of the 25 largest rivers in the world and the third-largest river in China, extends from 21°N to 24°N, 112°E to 115°E. The study area ( Figure 1) covers the Lingding Bay and the Huangmao Bay, which are the two biggest sub-estuaries of the PRD. The region features a subtropical climate, and the annual precipitation varies between 1500 and 2000 mm. The Pearl River receives discharge and sediment from the West River, the North River and the East River. In recent years, intensive human activities, such as aquaculture and human-induced land use and land cover change, have dramatically changed the morphology and altered the hydrodynamics [39]. For example, the average water depth increased from 4.2 to 4.6 m between 1980 to 2010 due to decreasing sediment input from dam construction in the Lingding Bay, and the mean tidal range is amplified from 1.1 m near the mouth to 3.7 m near the top of the bay [38].

In Situ Bathymetry Data
In-situ bathymetry data were obtained from the China Navy Hydrographic Office (CNHO), which is China's official hydrographic surveying and mapping organization and the only publishing house for nautical publications authorized by the government. The CNHO performs periodic surveys using echo sounding equipment in the coastal areas for the purpose of maritime safety, marine route monitoring and management for the coastal spatial planning and environmental protection. A total of 667 in situ survey samples were acquired between 2017 and 2020 using a 5-m inflatable boat equipped with a Lawrence High-Definition System (HDS).

Satellite Data and Pre-Processing
The cloud cover and water column conditions hampered the amount of suitable satellite images along the Pearl River coast. Therefore, only the surface reflectance data of L8 and S2 with cloud coverage lower than 20% from September to November in 2019 and 2020 were used in the analysis. As GEE is a cloud-based geospatial computing platform that offers a petabyte-scale archive of freely available S2 and L8 imagery, we averaged the three months' images in each year to generate a new image collection for subsequent analysis. All these surface reflectance data of L8 and S2 were masked for clouds and land areas in GEE. The details of the datasets are summarized in Table 1.

Methodology
The methodology is illustrated in Figure 2. Four models based on light attenuation in the optical bands were used, namely the Linear Band Model, the Log-transformed Band Ratio Model, the Support Vector Regression Model and the Random Forest Regression Model. Considering the data availability and cloud contamination of the optical data, the mean composite images acquired in autumn (September to November) in 2019 and 2020 were extracted.
To maximize the water depth that can be derived by satellite measurements, spectral bands with the lowest level of radiation absorption (i.e., Red, Green and Blue bands) were combined to derive water depth [23,39,40]. Before applying the models, the K-means clustering algorithm was applied to segment the study area into multiple clusters based on the spectral features. We assumed that clustering enables better interpretation of the relationship between water depth and multi-spectral characteristics and reduces the bias as well as uncertainty of bathymetry estimation over a wide variety of bottom types. All models were also applied without a priori K-means clustering for comparison. The estimation uncertainties were quantified by comparing the satellite-derived and surveyed bathymetry across different clusters separately. Step Sep.2020-Dec.2020 To maximize the water depth that can be derived by satellite measurements, spectral bands with the lowest level of radiation absorption (i.e., Red, Green and Blue bands) were combined to derive water depth [23,39,40]. Before applying the models, the K-means clustering algorithm was applied to segment the study area into multiple clusters based on the spectral features. We assumed that clustering enables better interpretation of the relationship between water depth and multi-spectral characteristics and reduces the bias as well as uncertainty of bathymetry estimation over a wide variety of bottom types. All models were also applied without a priori K-means clustering for comparison. The estimation uncertainties were quantified by comparing the satellite-derived and surveyed bathymetry across different clusters separately.

K-Means Clustering
The K-means clustering algorithm makes use of the spectral similarity of the image to perform dimensionality reduction. It segments the image into k clusters. Each pixel has a degree of membership in cluster i, and the degree of membership in each pixel is achieved by computing the relative distance between the pixels and cluster centroids. In this study, we used K-means to reduce the bias and errors that may occur in the bathymetry estimation.
The number of clusters needs to be given to the K-means algorithm as an input. After trial and error, we segmented the images into three clusters in this study, as the image would be too fragmented when the choice of k was larger than three. The average water depth in each cluster ranged from 3.  Table S1 in the supplementary materials).

Linear Band Model (LBM)
The linear band model was proposed by Lyzenga et al. [36] for calculating water depth from satellite imagery based on the Beer-Lambert law of absorption (Equation (1)). This model assumes a linear relationship between the log-transformed band reflectance and water depth via multiple linear regression: where h j is the water depth derived from band j, R j is the surface reflectance in band j, and R wj is the average reflectance of deep water. Nevertheless, depth-independent variability in reflectance between bands and factors such as water quality heterogeneity all contribute to errors in the depth estimated using Equation (1). Lyzenga et al. [12] used a set of depthindependent variables in Equation (2) to account for volume scattering in the water column and external water reflection, including sun glint effects and atmospheric scattering: where A i,j is the i th variable in the rotational matrix for band j. h 0 and h 1 to h j are determined through multiple linear regression between a set of known depths and the log-transformed reflectance values at those depths. The reflectance values were log-transformed to calibrate a linear relationship between input reflectance and depth. In this study, to be consistent in our comparison with the existing SDB literature, we exploited two band combinations (i.e., Blue + Green and Blue + Green + Red) in the estimation due to their water penetration capacity.

Log-Transformed Band Ratio Model (BRM)
The LBM assumes that the water mass is homogenous across the study area [23]. However, in turbid waters, especially under conditions with suspended sediments, the assumption is often violated. Consequently the depth estimates under certain benthic environments may be biased [39,41]. The log-transformed band ratio model was proposed by Stumpf et al. [23]. This model assumes that the reflectance in the low-absorption bands decreases more slowly with water depth compared to the high-absorption bands. As a result, the ratio of a high-absorption band to a low absorption band would display a linear decrease with depth when both are log-transformed. As shown in Equation (3), the log-transform reflectance is used to fit a linear relationship between water depth and spectral information to account for water turbidity: where h is the satellite derived bathymetry depth, h 0 and h 1 are fitted parameters controlling the intercept and the slope of the relationship, and b1 and b2 indicate high-absorption and low-absorption bands (Blue, Green or Red in this study), respectively. The BRM algorithm was implemented using three different band combinations (Blue/Green, Blue/Red and Blue/Green + Blue/Red) to explore the uncertainty of band combination in bathymetry estimation under different water turbidity conditions.

Non-Linear Regression Model
Machine learning approaches, such as the support vector regression (SVR) model and the random forest regression (RFR) model, serve as alternatives to the standard linearregression and band-ratio methods to characterize the nonlinear relationship between the spectral information and water depth [42]. Similar to the LBM model, three band combinations (i.e., Blue/Green, Blue/Red, and Blue/Green + Blue/Red) of S2 and L8 were used as predictors for SVR and RFR training, respectively. For the SVR model, the three different spectral band combinations were used to create a high-dimensional feature space, then a radial basis function kernel was implemented to construct a regression hyper plane in this high-dimension space with cross validation [9,18]. For the RFR model, the band combinations were used as the predictors and the water depth was used as the dependent variable, and then the ensemble learning method that combined predictions from multiple random trees were used to generate a more accurate prediction of bathymetry [24,42].

Accuracy Assessment
To avoid overfitting, we divided the ground samples (667 depth sounding points in Figure 1b) into two sub-datasets (70% for model training, 30% for validation). The coefficient of determination (R 2 ), the Root Mean Square Error (RMSE), and the Mean Absolute Error (MAE) were used to evaluate the bathymetry estimation. The metrics are calculated by where D SDB is the depth derived by satellite data, n is the number the validation points, D BF is the field points depth and D BF is the mean value. In addition, the Kling-Gupta efficiency (KGE), which measures the mean and distribution of estimates relative to the observations, was also adopted. KGE ranges from negative infinity to unity. The closer KGE is to unity, the better the model performs.
where µ E and µ o are the mean values of the simulations and observations, σ E and σ o are the respective standard deviations.

Bathymetry Estimates with/without Clustering
The SDB estimates with and without clustering from L8 and S2 satellite imagery in 2019 and 2020 using the four approaches considering different band combinations (e.g., B/G, B/R, B + G + R or B/G + B/R) are shown in Tables 2 and 3. The clustering-based SDB estimation outperformed the no-clustering ones in both 2019 and 2020. In details, using the L8 data, the RMSE and MAE decreased by 20.80% and 24.27% on average, and the R 2 and KGE increased by 0.11 and 0.15 on average with clustering compared with the no-clustering results. Similarly, compared with the no-clustering results using S2 data, the RMSE and MAE decreased by 10.41% and 11.21% on average, and the R 2 and KGE increased by 0.14 and 0.03 on average when clustering was performed. The estimation using three bands (i.e., B + G + R or B/G + B/R) outperformed all the two-band combinations in SDB in both 2019 and 2020, with or without a priori clustering. With a priori clustering, the RMSE using three-band combinations ranged from 2.27 m to 3.21 m for L8 and from 2.35 m to 3.64 m for S2 in 2019, and the MAE ranged from 1.45 m to 2.39 m for L8 and from 1.54 m to 2.63 m for S2. The correlations with observations were significant across all SDB models, with R 2 ranging from 0.92 to 0.85 for L8 and from 0.91 to 0.78 for S2. The KGE metric was also high, which ranged from 0.66 to 0.95 for L8 and from 0.83 to 0.91 for S2. The statistical metrics in 2020 were similar to those in 2019.

Bathymetry Estimates in Different Clusters
To evaluate the model performance for water with different spectral features, the SDB estimation results were further compared among different clusters (Figure 3). Similar to the results without clustering (Tables 2 and 3), the three-band combinations (B + G + R or B/G + B/R) also outperformed all the two-band combinations in the three clusters (included in Tables S2 and S3 in the supplementary materials). Meanwhile, with a priori clustering for L8 images, the non-linear SVR and RVR models performed better than the LBM and BRM models with an average decrease of 11.37% (45.74%) in RMSE and 23.26% (37.33%) in MAE, as well as an increase of 0.02 (0.12) in R 2 and 0.11 (0.14) in KGE in 2019 (2020), compared to the LBM and BRM models. Similarly, with a priori clustering for S2 images, the non-linear SVR and RVR models also performed better than the LBM and BRM models with an average decrease of 19.59% In addition, the largest estimation errors tended to occur in the same areas for all models using S2 and L8 data in 2019 (Figures 4 and 5) and 2020 (included in Figures S1  and S2 in the supplementary material). Specifically, most of the large retrieval differences (>±3 m) were found in areas with depth up to 20 m in the northeast and southwest of the PRD coast. In contrast, to the southeast of the PRD coast where the water is mostly deeper than 25 m, most of the estimation difference is small (<±1 m). Meanwhile, the SDB estimates were closer to the surveyed water depth when clustering was performed a priori. For example, many areas exhibited a large estimation difference with surveyed data (>±3 m) in Figure 4d1-d4 and Figure 5d1-d4 without area clustering, which were reduced to smaller areas with area clustering in Figure 4 f1-f4 and Figure 5 f1-f4, respectively. In addition, the largest estimation errors tended to occur in the same areas for all models using S2 and L8 data in 2019 (Figures 4 and 5) and 2020 (included in Figures S1  and S2 in the supplementary material). Specifically, most of the large retrieval differences (>±3 m) were found in areas with depth up to 20 m in the northeast and southwest of the PRD coast. In contrast, to the southeast of the PRD coast where the water is mostly deeper than 25 m, most of the estimation difference is small (<±1 m). Meanwhile, the SDB estimates were closer to the surveyed water depth when clustering was performed a priori. For example, many areas exhibited a large estimation difference with surveyed data (>±3 m) in Figure     The bathymetry differences between models and surveyed data without clustering; (e1)-(e4) S2 derived bathymetry based on the four models with clustering; (f1)-(f4) bathymetry differences between models and surveyed data with clustering.

Performance of Different Estimation Strategies
For a four-band multi-spectral satellite image (e.g., S2 with 10 m resolution, L8 with 30 m resolution), there are more than 20 possible combinations of optical band ratios or bands [33,43], the determination of the most effective band combination is thus crucial for SDB estimation [30]. In this study, multi-spectral bands which can effectively penetrate the water column were combined (e.g., B/G, B/R, B + G + R or B/G + B/R) and utilized. Their capabilities in retrieving water and underwater reflectance were evaluated based on different empirical retrieval models. The near infrared bands, which cannot effectively penetrate the water column, were not considered. Our results show that the three-band combinations (B + G + R or B/G + B/R) always outperform all the two-band combinations despite of the data sets used. With the four regression models, L8 and S2 demonstrated similar capacities in bathymetry mapping. For example, the linear regression models (i.e., LBM, BRM) using S2 images showed smaller and more consistent prediction difference (lower RMSE and MAE, higher R 2 and KGE) compared to models using L8 images. In contrast, non-linear regression models (i.e., SVR, RFR) tended to work better for L8 images than for S2 images, with the R 2 and KGE increased by 0.06 and 0.11 respectively on average and the RMSE and MAE decreased by 10.91% and 3.61% respectively on average.
Meanwhile, when we applied the LBM model to S2 images, the spectral parameters of deep water were obtained from L8 images due to the data availability of S2 images. These parameters performed well for the S2 data, suggesting that the combined use of S2 and L8 information does not introduce large estimation errors.
In addition, it should be noted that the swath of S2 (290 km) is wider than that of L8 (185 km) with some variation in the view angle [30,44]. This offers S2 the capacity to observe a larger area simultaneously compared to L8, which can be particularly advantageous over large coastal regions. The 10 m resolution further enables S2 to better depict benthic features, such as sand, rock, or the coral structure at finer scales than L8. Finally, the short revisit interval (5-day revisit) provided by the S2A and S2B twin satellites provides potential for the evaluation of short-term bathymetry variations.

Influential Factors on the Spectral Information
In order to evaluate the influence of water clarity on the model performance, bathymetrical data were extracted from six transects to compare the performance of the four models under different bottom conditions ( Figure 6). The transect lengths range from 10 km to 28 km, with the surveyed depth up to 15 m. The transects span across typical coastal land use types such as aquaculture areas, farmland, tourism areas and sand mining areas, and the estimated water depth was compared with the interpolated bathymetry from the in-situ samplings.
As shown in Figure 7, the estimated water depth along the transects using S2 (Figure 7a-f) and L8 (Figure 7g-l) data showed similar trends. The estimation differences with observations were generally small nearshore, and large errors were mostly seen in the deep-water regions (>10 m). Different coastal land use types affected the estimation accuracy in the nearshore areas. For example, the SDB estimation difference of nearshore mining areas (B' in Figure 7b,h) was larger than that of the other land use type areas, ranging from 3 m to 5 m for both L8 and S2 images. In contract, for bathymetry estimates close to nearshore aquaculture areas (B in Figure 7b,h), marine tourism (C in Figure 7c,i) and farmland (D in Figure 7d,j and E in Figure 7e,k) areas, the estimation differences were less than 1 m using both S2 and L8 data. These results suggested that the coastal water quality, which is related to land use types may have an influence on the bathymetry estimation.
It is worth noting that the estimation difference of nearshore aquaculture areas (E' in Figure 7k) from S2 images was higher than that of aquaculture areas from L8 images, and that the SDB estimation differences based on LBM and BRM were more than 5 m while the estimation differences based on SVR and RFR were less than 3 m. The spectral mixture problem of L8 and S2 may also contribute to the SDB retrieval uncertainty. The pixel-based spectrum observed by L8 or S2 was the result of incoming light interacting with all the objects within the pixel, and the multitude of interactions may affect the observed spectrum [45]. The bathymetry samples collected in this study may also be a mixture with different physical and optical characteristics. To solve this mixture problem, nonlinear regression models (e.g., SVR and RFR) combined with the spectral band weighting tech-niques were used to measure the spectral covariance or correlation matrix in the regression models. Therefore, the two nonlinear regression models (SVR and RFR) both outperformed the linear regression models (LBM and BRM) in all the cross-shore transects. As shown in Figure 7, the estimated water depth along the transects using S2 (Figure  7a-7f) and L8 (Figure 7g-7l) data showed similar trends. The estimation differences with observations were generally small nearshore, and large errors were mostly seen in the deep-water regions (>10 m). Different coastal land use types affected the estimation accuracy in the nearshore areas. For example, the SDB estimation difference of nearshore mining areas (B' in Figure 7b,h) was larger than that of the other land use type areas, ranging from 3 m to 5 m for both L8 and S2 images. In contract, for bathymetry estimates close to nearshore aquaculture areas (B in Figure 7b,h), marine tourism (C in Figure 7c,i) and farmland (D in Figure 7d,j and E in Figure 7e,k) areas, the estimation differences were less than It is worth noting that the estimation difference of nearshore aquaculture areas (E' in Figure 7k) from S2 images was higher than that of aquaculture areas from L8 images, and that the SDB estimation differences based on LBM and BRM were more than 5 m while the estimation differences based on SVR and RFR were less than 3 m. The spectral mixture problem of L8 and S2 may also contribute to the SDB retrieval uncertainty. The pixelbased spectrum observed by L8 or S2 was the result of incoming light interacting with all the objects within the pixel, and the multitude of interactions may affect the observed spectrum [45]. The bathymetry samples collected in this study may also be a mixture with different physical and optical characteristics. To solve this mixture problem, nonlinear regression models (e.g., SVR and RFR) combined with the spectral band weighting techniques were used to measure the spectral covariance or correlation matrix in the regression models. Therefore, the two nonlinear regression models (SVR and RFR) both outperformed the linear regression models (LBM and BRM) in all the cross-shore transects.

Other Sources of Uncertainty
In shallow water (<6 m) regions, the retrieval results using L8 and S2 images both demonstrated a similar under-prediction pattern (surveyed water depth > retrieved water depth). This may relate to the influence of the observed seafloor reflectance [22], as the pixel value may be effected by seafloor types, variations in slope or aspect, among others. Such influence on reflectance is difficult to quantify, as has been reported by previous studies using the empirical regression models [44,46]. As the water depth increases (6 m~30 m), the retrieval results gradually shift to an over-prediction pattern (survey water depth < retrieval water depth). This trend has also been reported in other studies carried out near the Irish coast [46], and the influence of the seafloor may gradually decrease while

Other Sources of Uncertainty
In shallow water (<6 m) regions, the retrieval results using L8 and S2 images both demonstrated a similar under-prediction pattern (surveyed water depth > retrieved water depth). This may relate to the influence of the observed seafloor reflectance [22], as the pixel value may be effected by seafloor types, variations in slope or aspect, among others. Such influence on reflectance is difficult to quantify, as has been reported by previous studies using the empirical regression models [44,46]. As the water depth increases (6 m~30 m), the retrieval results gradually shift to an over-prediction pattern (survey water depth < retrieval water depth). This trend has also been reported in other studies carried out near the Irish coast [46], and the influence of the seafloor may gradually decrease while the influence of other factors relevant to water properties (e.g., suspended matter or chlorophyll) may increase. Meanwhile, many previous studies have also tested the capability of multispectral imagery for bathymetry estimation in regions up to 20 m deep [47,48]. In this study, to the southern part of study area where the average water depth is more than 20 m, the empirical models worked well (SDB differences < 3 m), but their performance was relatively poor (SDB differences > 3 m) in the southeastern part where the average water depth is more than 25 m. These differences may be attributed to the non-linear relationship between seafloor reflectance and water depth. The influence of the seafloor as well as the maximum light penetration depth on deep water depth retrieval should be explored further. SDB retrieval accuracy is also affected by the atmospheric correction. Here the surface reflectance products that have undergone atmospheric correction in GEE were adopted. The K-means algorithm was also used to account for the influence of spectral variability on SDB mapping. However, spectral libraries fully representing variability in depth, water quality, and bottom types are still lacking for accurate satellite-based bathymetry estimation. In addition, the 200 soundings samples used for validation were taken up to 2 years before the images were acquired. High levels of anthropogenic disturbance, especially reclamation efforts, may lead to uncertainties in the result evaluation. More timely sampling data covering a range of bottom types and water quality conditions may improve prediction accuracy as well as offer a more comprehensive model performance evaluation [46,49].
In addition, the fluctuations of tidal waves also lead to variations in the water depth. Many previous studies have established models to predict the height of tide from multiple harmonic constituents based on long records of tidal measurements. In this study, the in-situ bathymetry data were collected during periodic surveys, which were spatially and temporally sparse to estimate tidal waves. Consequently, the tidal waves cannot be specifically accounted for in this study. A long historical record from tidal gauging stations would enable the prediction of tidal height [50,51], which would further reduce the uncertainties in water depth retrieval.

Conclusions
In this study, satellite-based bathymetry algorithms were applied to Landsat 8 (L8) and Sentinel-2 (S2) imagery for bathymetry mapping of the nearshore waters in the Pearl River Delta. Four models based on linear, band ratio, and non-linear optimization were applied using the multi-spectral properties of the water column. To evaluate the influence of spectral bands used and areal spectral variability, different band combinations were tested with and without a priori area clustering. The model performance was evaluated using echo sounding data in autumn 2019 and 2020.
The results suggested that a priori area clustering based on the spectral features using the K-means algorithm improved bathymetry estimation accuracy compared to the noclustering case. Among the different models and band combinations tested, the three-band combinations (red, green, blue) performed better than all the two-band combinations, and the non-linear models (SVR, RFR) outperformed the linear (LBM) and the band-ratio (BRM) models. Results using the non-linear models with three-band combination for L8 and S2 images yielded an average decrease of 23.10% (35.53%) in the RMSE (MAE), and an average increase of 0.08 (0.09) in the R 2 (KGE), compared to those from the linear regression models. Overall, S2 and L8 data showed comparable performance in bathymetry mapping, despite the differences in spatial resolution and band wavelength.
This study presented a comprehensive evaluation of the utility of different methods and datasets for bathymetry mapping, which can serve as a reference for future research design in aspects such as data source selection, band combination and data pre-processing. It proved Google Earth Engine to be an efficient, flexible and easily accessible tool for bathymetry mapping, which can be used over global coastal areas and inland water bodies. The results also implied that S2 and L8 contained complementary information and may be used together, which provided potential for the combined use of multi-source remote sensing data for bathymetry mapping.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10.339 0/rs13163123/s1, Table S1: The average water depth across different clustering zones, Table S2: Statistical analysis of bathymetry mapping for 2019, Table S3: Statistical analysis of bathymetry mapping for 2020, Figure S1: Bathymetric maps of the study area for S2 in 2020, Figure S2: Bathymetric maps of the study area for L8 in 2020.
Author Contributions: Conceptualization and writing, C.W. and Y.L.; methodology and validation, Q.Z.; data analysis, D.F. All authors have participated in the manuscript revision. All authors have read and agreed to the published version of the manuscript.