remote sensing Coastal Bathymetry Estimation from Sentinel-2 Satellite Imagery: Comparing Deep Learning and Physics-Based Approaches

: The ability to monitor the evolution of the coastal zone over time is an important factor in coastal knowledge, development, planning, risk mitigation, and overall coastal zone management. While traditional bathymetry surveys using echo-sounding techniques are expensive and time consuming, remote sensing tools have recently emerged as reliable and inexpensive data sources that can be used to estimate bathymetry using depth inversion models. Deep learning is a growing ﬁeld of artiﬁcial intelligence that allows for the automatic construction of models from data and has been successfully used for various Earth observation and model inversion applications. In this work, we make use of publicly available Sentinel-2 satellite imagery and multiple bathymetry surveys to train a deep learning-based bathymetry estimation model. We explore for the ﬁrst time two complementary approaches, based on color information but also wave kinematics, as inputs to the deep learning model. This offers the possibility to derive bathymetry not only in clear waters as previously done with deep learning models but also at common turbid coastal zones. We show competitive results with a state-of-the-art physical inversion method for satellite-derived bathymetry, Satellite to Shores (S2Shores), demonstrating a promising direction for worldwide applicability of deep learning models to inverse bathymetry from satellite imagery and a novel use of deep learning models in Earth observation.


Introduction
Coastal areas are under a constant multitude of pressures resulting from different natural forces. The ability to reliably track and measure the nearshore bathymetry over time is critical for a wide array of applications including coastal development and management, coastal risk monitoring and mitigation, coastal science studies, among others [1,2]. Traditional in situ bathymetric measurements using echo-sounding or Light Detection and Ranging (LiDAR) are time-consuming and expensive [3] and are preconditioned on a number of environmental factors such as the navigability of the site to be surveyed [4], in addition to a multitude of logistical constraints [5,6].
Remote sensing tools have recently become an important tool to collect different types of data that allows the monitoring of coastal areas [7,8]. These tools differ in their temporal frequency and spatial coverage. Shore-based or drone-mounted video cameras provide high-resolution imagery frequently with a spatially limited coverage [9][10][11]. On the other hand, satellite constellations such as the European Space Agency's (ESA) Sentinel-2 satellite constellation provide high resolution (10 m) imagery with global coverage at a relatively high revisit frequency (every 5 days with Sentinel-2) [12,13]. These remotely sensed satellite products have been shown to be a valuable resource in a wide variety of coastal science studies and applications. For example, a large body of work exists on the use of ocean color data to quantify water quality parameters [14][15][16]. Methods making use of satellite imagery to estimate water depth can be divided into two categories based on the target phenomena studied. Namely, the effect of bathymetry on the propagation and dispersion of surface waves (wave kinematics), as well as the relation between water depth and light penetration and reflectance in water (water color). Methods based on the radiative transfer of light in water as a function of depth and wavelength (i.e., color-based methods) can be used to estimate depth in optically shallow waters [17][18][19][20][21][22]. Such methods are sensitive to the optical properties of seawater and are generally limited to clear and non-turbid waters [1,23]. Other methods based on wave kinematics extract wave features from satellite imagery such as the wave phase shift and wave number to estimate depth using the linear dispersion relation [24] (described in more detail in Section 2). Both approaches offer different advantages. Methods based on the radiative transfer of light in water are more accurate in shallow waters (up to 15 m depth) and are able to detect smaller-scale bathymetric features, with an absolute error order of 10-20% of the target value, and an average RMSE of 1.5 m [25][26][27]. On the other hand, wave kinematics-based approaches are preconditioned on the observability of wave patterns in the input imagery, however, their detectable depth range is significantly larger than the typical range of color-based methods [20] but with less accuracy when applied globally (RMSE between 6-9 m, [23]). The task of constructing a depth estimation function applicable to satellite data is non-trivial and remains a topic of ongoing research due to the great potential it offers to in-expensively monitor coastal morphodynamics at a large scale.
Machine learning has been applied to satellite-derived bathymetry to automatically learn an estimation function, bringing great expectations to solve satellite-based bathymetry issues in areas of complex physics and environmental parameters. Early works made use of multi-layered perceptrons to estimate water depth as a function of spectral radiance in input satellite imagery [28]. Other works that make use of more traditional machine learning algorithms include [29], where support vector machines are used to estimate depth based on a transformed ratio between the blue and green bands of the National Aeronautics and Space Administration's (NASA) EO-1 satellite imagery. In [22], random forests are used to analyze several Landsat 8 surface reflectance products over a specific site in order to create a map of the bathymetry in shallow waters (0 to 20 m).
Recently, numerous Earth observation and remote sensing applications have adopted deep learning (DL) methods using convolutional neural networks (CNN), due to their image processing and feature analysis abilities [30][31][32][33]. The use of DL for bathymetry estimation is a recent and growing application; Ref. [34] estimate river bed topography from depth-averaged flow velocity observations. Both [35,36] use aerial imagery to estimate water depth, on the surf zone of Duck, North Carolina (NC) and the floodplain of the Lech river, respectively. The use of DL on satellite products for bathymetry estimation is relatively unexplored but presents an opportunity for global, low-cost bathymetry estimation. In [37], DL is used to estimate seabed depth based on the radiative transfer of light in water in multispectral images from the Orbview-3 satellite. In [38], a CNN is used to estimate depths of the Devils Lake Area (ND, USA), casting estimation as a classification problem with classes at each foot of depth. The most convincing application of deep learning to coastal SDB currently appears to be from [39], which uses reflectance values from Sentinel-2 Level 2A images to estimate coastal water depth with high precision in clear waters (1.48 m RMSE). While machine learning and deep learning applications for satellite-derived bathymetry have-until now-mainly been applied to color-based approaches, great expectations come from the combination of different methods, in particular, based on wave information [40,41].
To our knowledge, this work's contribution of DL for satellite-derived coastal bathymetry based on wave kinematics from real satellite products is novel.
In this work, we apply the previously developed Deep Single-Point Estimation of Bathymetry (DSPEB) method [42] and showcase its ability to reconstruct bathymetry using real-world data. We create a supervised dataset for bathymetry inversion using publicly available Sentinel-2 imagery [12] and a number of bathymetry surveys obtained from the French Naval Hydrographic and Oceanographic Service (SHOM). We present two different satellite image pre-processing techniques to augment wave kinematics and color information as inputs to two DSPEB models. We train our models in two different sites and compare the performance of color-based and wave kinematics-based DSPEB to one of the current state-of-the-art bathymetry inversion models based on wave kinematics, Satellite to Shores (S2Shores) [24,43]. The layout of this article is as follows. In Section 2, we present the DSPEB and S2Shores approaches for SDB. We then describe our dataset creation methodology and the datasets used in our experiments. In Section 3, we present our results and compare our DSPEB models to the physical method's performances in two application sites. Section 4 concludes the work with a discussion of the results presented, highlighting possible further research paths. In the appendices, we include additional supporting results from the different models presented in the article, our first steps towards a hybrid approach to SDB using deep learning that makes use of the physical characteristics of surface waves in addition to water color to estimate depth, in addition to a full list of the Sentinel-2 images used in order to facilitate reproducibility.

Data and Methods
In this section, we first present the DSPEB approach for bathymetry estimation using deep learning, as well as the physics-based method S2Shores, which we use as a reference for comparison. We then describe our data setup and input data pre-processing for both wave kinematics-based and color-based DSPEB. We give an overview of the sites used in this study and the final supervised datasets used to train our DSPEB models. Finally, we present a summarized description of the functioning of DSPEB as a complete workflow.

Deep Single-Point Estimation of Bathymetry
Deep neural networks are a family of algorithms that are inspired by and modeled after the human brain. These networks are trained to approximate a mapping between inputs and outputs that minimizes an objective function. Training such networks is done through Stochastic Gradient Descent (SGD); a network's prediction error is calculated according to the objective (loss) function over a batch of training samples and is then propagated backward through the different layers of the network using backpropagation, where an optimizer (SGD) is responsible for updating the different parameters of the network. This process is repeated over multiple iterations of the available data and is stopped according to varying criteria. As part of the optimization process, a learning rate is employed to control the scale of weight updates that are done at each step.
The Deep Single-Point Estimation of Bathymetry method [42] is a deep learning-based bathymetry inversion method that operates on 40 × 40 × 4 px multi-spectral input subtiles (corresponding to the blue (B2), green (B3), red (B4), and near-infrared (B8) bands of the Sentinel-2 satellite constellation [12] at 10 m resolution) to estimate the water depth corresponding to the center of each input subtile. The neural network input is an image of 40 × 40 × 4 px input channels conforming to our dataset of 40 × 40 × 4 satellite subtiles; and a single output neuron with a Rectified Linear Unit (ReLU) activation, corresponding to the average depth beneath the imaged area. Figure 1 presents the different steps of the DSPEB method. The deep learning parameters for DSPEB, including the choice of the model architecture and learning hyperparameters, were studied in a previous work [42]. We found that while networks of varying depth can perform bathymetry estimation, small convolutional neural networks (CNN) are sufficient. The chosen architecture for this work is ResNet20 [44], a small version of a residual architecture that achieves state-of-the-art performance on many computer vision tasks [45]. The network is trained using Adam [46], a standard CNN optimization method based on stochastic gradient descent (SGD). In this work, the hyperparameters of Adam (lr, gradient estimate decay factors β 1 and β 2 , and ) are optimized based on a grid search and findings from our previous work [42].

Satellite to Shores
We compare our deep learning-based approach (DSPEB) to a wave kinematics-based depth inversion model named Satellite to Shores (S2Shores) [24,47]. S2Shores employs a Fourier slicing method (FS), consisting of a combined radon transform (RT) and a discrete Fourier transform (DFT). The FS technique is used to detect spectral wave characteristics such as the spectral wave phase shift and the wave number to invert water depth using the linear dispersion relation for free surface waves (2). The depth estimation procedure is repeated for each sub-window around a point where one wants to know the depth (h). Each sub-window has a user-defined size in O(100 s m), as such that it contains at least 1-2 wavelengths (λ). The radon transform is applied to the sub-sampled image to produce a sinogram of integrated pixel-intensities per direction. The angle corresponding to the maximum variance in the RT-sinogram corresponds to the wave direction (see [24,47] for more details). A 1D DFT procedure per direction over the sinogram enables to pass from the spatial domain to a complex spectral domain in polar space. From the resulting polar spectrum, the wave phase and amplitude can be determined per wave number, per direction. The difference in phase (∆Φ) can be found between (several) pairs of detector bands. Presuming that the wavenumber (k) is constant or near-constant over the subwindow, ∆Φ can be seen as representative of ω(t), and given that the timing between the different detector bands (∆t) is constant, the wave celerity (c) can be determined as: For each wavenumber or celerity pair, (2) can be solved for depth.
Estimates of water depth, wave celerity, wavenumber (wavelength), and direction are output by the S2Shores algorithm at each point on an output grid with a resolution of 500 m. To evaluate and compare S2Shores to the survey data and the DSPEB results, we use linear interpolation of the raw sparse output grid.

Sentinel-2 Data Pre-Processing
For this work, we apply our DSPEB method to two different types of inputs, corresponding to the wave kinematics-based DSPEB approach (W-DSPEB) and color-based DSPEB (C-DSPEB). Throughout the article, we make a distinction between two types of information included in raw satellite imagery over coastal areas. Signals and information corresponding to ocean waves are referred to as "wave kinematics information". These signals are pre-processed (described further in this section) and used as inputs to the W-DSPEB model. The remaining signals, termed "color information", are presumed to represent ocean water color as affected by the optical properties of water, water constituents, seabed reflectance and depth. These signals are filtered and used as inputs to the C-DSPEB model.
The inputs to both neural networks are 40 × 40 × 4 px satellite images. We noted that model training was sensitive to different dates with varying meteorological conditions. To construct a training dataset, we select dates based on high estimation correlation from the S2Shores method, which depends on visible waves. We provide a list of all images used in this work in Appendix A. While the focus of this work is to develop on our previous work on DSPEB, a less expensive and more general date selection method is a topic of on-going work.
The first step in our pre-processing workflow is cloud detection. We make use of a simple cloud detector based on the percentage of blue pixels in each subtile by looking at the RGB Sentinel-2 bands. We discard all subtiles where the percentage of blue pixels is less than 80%, allowing for a margin of noise in the input data.
We apply a pass-band filter to our input subtiles in the range of ocean-specific wavelengths (periods T min = 5 s to T max = 25 s). First, we create a frequency filter based on T min and T max . Then, a discrete FFT is applied in two dimensions to the signals of each Sentinel-2 subtile band. The original filter is then used to filter the resulting frequencies, discarding all wave signals with periods outside of the specified range. For W-DSPEB, we further process our filtered subtiles by calculating the two-dimensional normalized cross-correlation (NORMXCORR) of each band, in order to extract the most consistent and recurring wave signals, which we presume correspond to the crests of actual ocean waves. For C-DSPEB, we subtract the filtered signals from the raw input image in order to retain the background color rather than the ocean wave signals. Figure 2 demonstrates our pre-processing workflow on an example 400 × 400 m Sentinel-2 subtile. For C-DSPEB, we scale all input images such that the minimum and maximum pixel values over the full dataset are equal to −0.9 and 0.9, respectively.

Study Sites
In this work, we apply and test our methods in French Guiana and in the Gironde area in France. For each site, we select a number of Sentinel-2 images according to cloud coverage and the visibility of wave patterns. A full list of images used and the wave conditions on those dates are provided in Appendix A. Figure 3 shows the measurement area and depth distribution of the surveys used in French Guiana and Gironde. To create a supervised dataset for each site, the raw bathymetry survey is coupled with a set of Sentinel-2 images with varying wave conditions. The resulting subtiles and their corresponding depths are grouped into training and validation sets. For the final datasets, we make use of points with depth values ranging between 2 and 40 m only. The distributions of depths used in the training and validation sets are shown in Figure 4. By including Sentinel-2 images from different dates in the training and validation sets, we expose the DSPEB models to a wide variety of wave conditions during training, reducing the models' ability to overfit to any specific conditions. The distributions of wave period, wavelength, and the direction of propagation on the dates used for this study are documented in Figure 5. To test our models, we create a test set for each site. In French Guiana, we use the south-eastern section of the raw bathymetry survey as our target area, and we collect Sentinel-2 images from six different dates in 2018 to create the inputs to the models during application. For Gironde, the whole area is reconstructed over four different 2018 dates.

Application Workflow
This section presents a summarized overview of the functioning of DSPEB as a complete workflow, going from raw Sentinel-2 imagery and raw bathymetry measurements to model application and the creation of composite estimates. Figure 6a shows the first steps of DSPEB, where the exact dates for model training are selected. The first step in our workflow is concerned with the date preselection, where a set of Sentinel-2 images are filtered and only images where wave propagation and activity can be observed. As mentioned in Section 4, our current date selection criteria is based on S2Shores. Each of the initial images is used with S2Shores and a resulting bathymetry estimate is compared to the target bathymetry survey. The images are only kept in our pipeline if the correlation between S2Shores' estimate and the target survey is higher or equal to 0.5. Next, a subtile is extracted from each of these images for each depth measurement point, such that the depth point is situated in the center of the extracted subtiles. These subtiles are then passed through our pre-processing chain, described in Section 2.3, subtiles are either pre-processed or recorded on disk to be used for model training. A test dataset is created following the same methodology using a different set of initial Sentinel-2 images with different dates.
The DSPEB model is then trained on the prepared training dataset following the method further described in Section 3.1. After model training, the DSPEB model is used to estimate bathymetry from a Sentinel-2 image using a sliding-window technique (Figure 6b), where each subtile is treated according to the previously described pre-processing scheme (Section 2), resulting in an estimate profile from a single date. Finally, a composite estimate can be created by grouping multiple single-date estimated profiles using a simple pointwise mean (Figure 6c).

Research and Results
This section evaluates and compares the performances of wave kinematics-based DSPEB (W-DSPEB), color-based DSPEB (C-DSPEB), and S2Shores. In the following, we analyze our results based on two different criteria. First, Section 3.2 compares the performance of the models in reconstructing bathymetry using a single Sentinel-2 image (single date) as input. Section 3.3 then compares the different models based on aggregate (composite) estimates, which are created by calculating the point-wise mean over all six dates.
The metrics used to evaluate the predictions of each of the models are the root mean squared error (RMSE), the Pearson correlation coefficient (r), the concordance correlation coefficient (CCC) [48], and the slope of the predictions compared to the target depths.

Model Training
We train the DSPEB models using the Adam optimizer [46], with mean squared error (MSE) loss and a batch size of 256. We optimize the learning process of W-DSPEB and C-DSPEB separately, using a simple grid-search procedure over a predefined set of values for the learning rate (lr) and Adam's hyperparameters ( , β 1 and β 2 ) as described in our previous work [42]. The best-performing configurations were used to train the models used in this work. The lr, , β 1 , and β 2 were respectively set to 1 × 10 −4 , 1 × 10 −8 , 0.99, and 0.999 for W-DSPEB, and to 1 × 10 −3 , 1 × 10 −6 , 0.5, and 0.9 for C-DSPEB. To stop the training, we make use of an early stopping mechanism that stops training if no improvement in performance on the validation set is achieved for 10 consecutive epochs, known as a patience value of 10. The learning curves of the trained models are presented in Figure 7, showing the models' errors on the training and validation sets at each training step.  Figure 7 shows the MSE training and validation losses of W-DSPEB and C-DSPEB on both study sites and demonstrates the difference in convergence speed between the two models due to the higher learning rate used for C-DSPEB. We note that W-DSPEB was unable to converge using larger learning rates. The training of W-DSPEB stops at 53 and 29 epochs in French Guiana and Gironde and achieves 4.2 and 5.9 m RMSE on the test set of each site respectively. The training of the C-DSPEB model halted at 30 and 13 epochs and achieves 5.8 and 7.8 m RMSE on the test sets in French Guiana and Gironde.

Single Date Estimation Comparison
In this subsection, we compare the performances of the wave kinematics-based DSPEB model (W-DSPEB), color-based DSPEB (C-DSPEB), and S2Shores based on their single-date estimates. All test images date to 2018. Six images are collected for the tests in French Guiana, and four images for Gironde. Figure 8 shows an example single-date reconstruction over each of the test sites.
As seen in Figure 8, we note that W-DSPEB outperforms S2Shores on RMSE and correlation in the French Guiana site. While S2Shores predicts shallow depths with high accuracy, W-DSPEB maintains a higher correlation in deeper waters. On the selected date in the Gironde site, we observe a scattered estimate from the W-DSPEB method which has an overall high correlation but is outperformed by S2Shores in terms of RMSE.
The performance of C-DSPEB varied greatly over different dates due to its sensitivity to background color, which we further discuss in the next section. The highest single-date RMSE of C-DSPEB based on the selected dates is 15.26, whereas the lowest single-date RMSE of C-DSPEB in French Guiana was 4.33. We note that this lower RMSE is similar to the performance of other deep learning color-based methods, notably [39], which has an RMSE of 3.03 in San Juan for a single date; Ref. [39] notes that the performance of the color-based deep learning model highly depends on water turbidity, which we observe here in the difference of prediction between different dates.
We note that the training of both deep learning models C-DSPEB and W-DSPEB uses gradient estimates from mini-batches which can contain multiple dates together. The same point may therefore have multiple estimates attributed to it from different satellite images, and the gradient directions from these estimates will be averaged for the network update. We hypothesize that a training method focused on accuracy for a single date could improve the variability of estimates for the same date.

Composite Estimation Comparison
We observed that all three models performed inconsistently over Sentinel-2 images from different dates, which motivated the use of a composite estimate from multiple dates. For each model, a composite profile is created by calculating a point-wise average over selected dates (six for French Guiana and four for Gironde). In this section, we compare the composite estimates of W-DSPEB, C-DSPEB, and S2Shores, and further detail is provided in Appendix B, including the point-wise absolute errors of the final estimates as well as the point-wise standard deviation of each method's single date estimates. Figure 9 presents the composite estimate of each of the models at French Guiana (top) and Gironde (bottom). Compared to S2Shores and C-DSPEB, W-DSPEB achieves the lowest RMSE score over the entire bathymetry profiles in both test sites when a composite profile is considered. W-DSPEB also achieves a similar correlation to the S2Shores model on the French Guiana site and a much higher correlation on the Gironde site. We note that the use of a composite estimate appears to reduce outliers with high error in the two deep learning methods, but not in S2Shores. We assume that this is due to the batched training of the deep learning methods which tend to improve the average estimate of the models over multiple dates, as previously mentioned. All three methods, including S2Shores, benefit from a composite estimate over multiple dates rather than a single-date estimate. The composite results of the three methods are analyzed further in Table 1.  Table 1 shows that when using composite estimates, W-DSPEB outperforms S2Shores on almost all metrics for the two sites (correlation, RMSE, and standard deviation over individual estimates). On two metrics, it is slightly worse but competitive with S2Shores: W-DSPEB has a correlation of 0.91 in French Guiana compared to 0.93 for S2Shores, and it has a slightly higher standard deviation over the 4 estimates from different dates in Gironde (4.01 versus 3.89). We note that W-DSPEB has an overall low temporal STD, which is averaged over the entire area, even in Gironde where single date estimates had high error.
Compared to the wave kinematics-based methods, we observe a higher temporal variance from the color-based method C-DSPEB. In Figure 10, the sensitivity of C-DSPEB to background color change is evident as the prediction is highly influenced by turbidity. This variance due to turbidity has been noted in other deep learning approaches [39]. While other color-based methods may account for this variance, we consider it a strong argument for W-DSPEB over C-DSPEB. When using the DSPEB methods to estimate bathymetry over a large area, as seen in Figure 11, we note the stability of the W-DSPEB method. This correctly predicts water depth around the coastline and up to 35 m of depth. We note the limit at 40 m, as these models were trained with samples only up to 40 m, and therefore do not predict greater values even in deeper waters.
(a) (b) Figure 11. Reconstruction of a full Sentinel-2 tile in Gironde using composite estimates from C-DSPEB (a) and W-DSPEB (b). The bathymetry survey data used for Gironde is presented in Figure 3.

Discussion
In this work, we have shown that deep learning can be used for SDB using wave kinematics information as well as color. We have evaluated the performance of a deep learning SDB approach (DSPEB) on real data using Sentinel-2 satellite imagery. We propose two different variants of DSPEB based on wave kinematics (W-DSPEB) and color (C-DSPEB; Section 2.3) and we compare them to a state-of-the-art physics-based SDB method, S2Shores [24,43], on two different sites. We show in Section 3.3 that the use of composite estimates over multiple dates compared to single-date estimates improves the performance of all methods tested. We show that the performance of the deep learningbased model (W-DSPEB) exceeds that of the physical method (S2Shores) and the deep learning color-based method (C-DSPEB) in correlation, RMSE, and temporal standard deviation on two test sites. W-DSPEB achieves an RMSE of 3.26 m in French Guiana and of 5.12 m in Gironde. While this does not yet meet international standards for bathymetry surveys, it demonstrates that deep learning can be applied to both spectral and wave kinematic information for coastal SDB, rivaling existing physics-based models.

Research Implications
We believe that this work has implications both for satellite-derived coastal bathymetry estimation and deep learning for Earth observation. We highlight the possibility to integrate wave and color information for SDB and the application of deep learning to physical modeling.
Color information is often used to estimate coastal bathymetry [20][21][22][25][26][27], but these methods can be sensitive to site and/or season-specific features such as turbidity and bottom reflectance. In this work, we observed a high sensitivity of our color-based model C-DSPEB to the background color, which led to high uncertainty in the estimated profiles. We propose that a wave kinematics-based method would have the potential for global application, which would be difficult with color-based estimation.
This work demonstrates that physical information, i.e., wave kinematics, can be used by deep neural networks for estimation. This follows a recent trend in machine learning for Earth system science where machine learning models use information from existing physical models [49]. W-DSPEB is a deep learning regression model based on physical information, which is a relatively unexplored model type as deep learning is more often used in classification tasks [50].

Limitations
While the results we present show that DSPEB is capable of reconstructing bathymetry, there are limitations to the current method. Specifically, we highlight the limitations of data pre-selection based on dates and the requirement to train on application sites.
The accuracy of W-DSPEB was found to be sensitive to the dates selected, as was S2Shores, indicating that the necessary wave kinematics information was lacking for certain dates. Currently, S2Shores is used as a date-selection method to dictate which images can be used for training W-DSPEB at a certain site, requiring large amounts of computing time before dataset construction. A possible solution to this limitation could be the use of a CNN as a binary classifier to dictate whether a Sentinel-2 image contains the necessary information for W-DSPEB. Such a model would greatly minimize the amount of computing power required for date selection, in addition to providing insight into this issue. Breaking this limitation is important in order to achieve the operational requirements of the International Hydrographic Organization (IHO).
Another limitation of DSPEB is that it currently requires local training before application, limiting the model to sites with existing survey data. A future direction of this work is to use trained models from individual sites and fine-tune them to unseen sites. Applying a network to a site for which no survey data is available is also a goal but requires further study in developing models capable of zero-shot learning [51].

Future Research
Beyond addressing the limitations of the current approach, DSPEB opens many directions that could be explored further. We believe that single-date estimation and a global model which combines wave and color information are important directions for future work.
The results presented in this work show the improvement of accuracy when integrating estimations from multiple dates to create a composite estimate, compared to using the original single-dates estimates. However, a model capable of obtaining single-date estimates with accuracy similar to the composite estimates would be preferable. As mentioned in Section 3.2, an interesting path for future work would be to design the training scheme to maximize single-date accuracy specifically (through e.g., training batch management). We also believe that study of the estimation variability between dates for wave kinematicsbased methods such as W-DSPEB and S2Shores can lead to improvements for single-date SDB estimation.
A potential direction for SDB is to combine wave and color information to achieve local estimates with high accuracy and global applicability. In Appendix E, we explore a hybrid model which combines estimates from W-DSPEB and C-DSPEB. While the results from this experiment were inconclusive, with the hybrid model performing worse than W-DSPEB in some cases, we strongly believe that a combined model incorporating both types of methods (color-based and wave kinematics-based methods) is the way forward to unlock and extend the applicability of SDB to a global scale covering all types of coastal waters and coastal depths. While we present H-DSPEB as our first steps in this direction by engineering a hybrid model, other possibilities exist including traditional and/or deep learning-based data assimilation techniques for example [52].
A fundamental direction for future research in deep learning-based SDB is the development of a singular model which can be applied directly to sites without training. Such a model would require the inclusion of multiple sites in a single training dataset (mixed-site training), which should minimize the model's ability to overfit to any site-specific features, and consequently increase the model's ability to generalize to previously unseen sites. We also expect that it would need to include both wave kinematic information and color information, as proposed above. In this work, we demonstrate first steps in this direction with local site training of deep learning models using wave kinematics and color information, showing that deep learning can outperform existing physical methods for coastal bathymetry estimation.

Conclusions
This work showcases the performance of the W-DSPEB approach to satellite-derived bathymetry based on wave kinematics using deep convolutional neural networks and Sentinel-2 satellite imagery. In a direct comparison, W-DSPEB is shown to be competitive with a state-of-the-art physics-based SDB method, S2Shores, achieving RMSE performance of 3-5 m over areas reaching 40 m depths.
The wider applicability of wave kinematics-based approaches for SDB is demonstrated through a comparison with C-DSPEB, a color-based variant of DSPEB, showing a promising direction towards a more global application of wave kinematics-based SDB.
The use of composite bathymetry profiles over estimates from single dates is discussed and in shown to improve the test RMSE performance of all methods included in this study by 50%.
Finally, considering the impressive capabilities deep learning has recently demonstrated in image processing and model inversion applications, we strongly believe that the H-DSPEB model architecture presented in Appendix E is a strong motivation for further exploration in deep learning-based methods for satellite-derived bathymetry.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Table A1. A log of all Sentinel-2 images used in this work.

Site
Sentinel-2 Image ID Appendix C. Single Date Results

Appendix D. French Guiana Composite over the Full Water Body
(a) (b) Figure A3. Reconstruction of a full Sentinel 2 tile in French Guiana using composite estimates from C-DSPEB (a) and W-DSPEB (b). The bathymetry survey data used for French Guiana is presented in Figure 3. The white spots in the image correspond to clouds over land areas.

Appendix E. Hybrid-DSPEB Model
Previous work in deep learning for computer vision has proposed multi-input convolutional neural networks to improve performance on tasks where different views of the same input are useful for approximating a single output. This can be done by grouping multiple neural networks, or duplicating a single network architecture, through an MLP-like architecture near the output of the merged network [53][54][55]. In this experiment, we follow a similar methodology to create a hybrid model (H-DSPEB). We merge the output layers and the last fully connected layers of each of the two pretrained C-DSPEB and W-DSPEB models, forming the final output head of H-DSPEB. The architecture of H-DSPEB can be seen in Figure A4 (right). The MLP head which we append to the end of the two pre-trained models is composed of two new fully connected layers which connect to the last hidden layer of each sub-model, in addition to the output layer of each of the sub-models. The single output of the MLP head corresponds to the final output of H-DSPEB. We tested various architectures for the MLP head but noted little difference in performance. During training, we freeze all previously trained weights in C-DSPEB and W-DSPEB, as indicated by the dotted lines in Figure A4.
The aim of this architecture is to evaluate whether color and celerity information can be automatically combined for enhanced estimation, due to the different conditions in which these two model types function. By including higher-level features from the final layer, our goal is that the hybrid model learns to estimate depth using both color and celerity information. Because the two approaches are complementary for clear and turbid waters, contrary to previous deep learning bathymetry inversion applications, the combination of the two unlocks the potential of inversion of the bathymetry from a satellite at any coast worldwide. While the principal contribution of this work is the W-DSPEB method, we find the H-DSPEB idea to be motivating and deserving of further study.

Hybrid H-DSPEB Preliminary Results
In this section, we present our preliminary results using H-DSPEB and we evaluate its performance in a comparison to the results of DSPEB submodels and S2Shores presented in previous sections. Figure A5 presents two example results obtained using H-DSPEB in the French Guiana site. The pattern produced by H-DSPEB in both single-date and composite estimates suggests that the model learns to rely on the C-DSPEB submodel more than W-DSPEB. We presume this is due to the higher accuracy of C-DSPEB on the training and validation sets compared to W-DSPEB in French Guiana, as can be seen in Figure 7, which could be leading the H-DSPEB model towards the same minima as C-DSPEB. However, a comparison to the DSPEB submodels' results presented in Figure 8 shows that the hybrid model H-DSPEB does improve the accuracy estimation over the individual DSPEB submodels in cases where C-DSPEB is more accurate than W-DSPEB, suggesting that H-DSPEB does make use of the W-DSPEB submodel to perform the final (merged) approximation.