High-Coverage Satellite-Based Coastal Bathymetry through a Fusion of Physical and Learning Methods

: An up-to-date knowledge of water depth is essential for a wide range of coastal activities, such as navigation, ﬁshing, study of coastal erosion, or the observation of the rise of water levels due to climate change. This paper presents a coastal bathymetry estimation method that takes a single satellite acquisition as input, aimed at scenarios where in situ data are not available or would be too costly to obtain. The method uses free multispectral images that are easy to obtain for any region of the globe from sources such as the Sentinel-2 or Landsat-8 satellites. In order to address the shortcomings of existing image-only approaches (low resolution, scarce spatial coverage especially in the shallow water zones, dependence on speciﬁc physical conditions) we derive a new bathymetry estimation approach that combines a physical wave model with a statistical method based on Gaussian Process Regression learned in an unsupervised way. The resulting system is able to provide a nearly complete coverage of the 2–12-m-depth zone at a resolution of 80 m. Evaluated on three sites around the Hawaiian Islands, our method obtained estimates with a correlation coefﬁcient in the range of 0.7–0.9. Furthermore, the trained models provide equally good results in nearby zones that lack exploitable waves, extending the scope of applicability of the method.


Introduction
Sea coasts are extremely dynamic and fragile regions that are constantly exposed to diverse natural and anthropogenic phenomena, in particular in the context of climate change. Thus, a precise monitoring of coastal dynamics (such as erosion, waves, currents, moving seabeds, or rising sea levels), and specifically of coastal bathymetry, is crucial for activities such as urban planning, fishing, navigation, or the protection of environment.
Coastal bathymetry traditionally relies on in situ campaigns using sonar or airborne LIDAR. Being expensive and time-consuming, these methods are infrequently used and cannot be applied on a regular basis. Remote sensing techniques using satellite images propose a cost-effective alternative: one sensor covers a large geographical area with a repetitive time interval of typically 5-10 days. So far two main approaches have been put forth for the purpose of coastal bathymetry: • the water colour inversion model that relies on the extinction of light with depth; • the linear wave model that relies on a physical relationship between the water depth and the wavelength of waves approaching the coast.
The water colour inversion model has been in use since the 1970s [1][2][3][4][5][6]. This method provides coastal bathymetry with a high spatial resolution (equal to the resolution of the images), but results greatly depend on water quality, seabed reflection, and atmospheric effects. The mean difference between measured and estimated depths is around 20% between 0 and 6 m [4]. The principle is widely used to propose bathymetry products until 35 m of depth in clear waters [7]. One of its major limitations is its need for in situ measurements for calibration. The linear wave model, in contrast, does not need in situ data for calibration and is robust with respect to water or seabed quality (turbid or clear waters, sandy or rocky seabed, etc.). So far two main types of approaches based on the linear wave model have been proposed: [8,9] require the availability of two images within a very short time interval in order to compute wave properties, while [10][11][12][13] rely on a single acquisition. While the methods of the first kind are generally able to provide good coverage and resolution, the two-image constraint limits input data sources to a small number of instruments with non-free availability (e.g., Pléiades, Worldview).
In [12] the authors have implemented a single-image solution using a wave tracing technique that was evaluated to have a standard deviation of less than 15% of the observed depth between the surf zone until as far as 20 m, for sites with a direct exposure to the swell and with an absence of clouds (the latter being an obvious limitation to the exploitation of optical imagery). However, this method, as well as all others based on a single image, also have important limitations in terms of coverage that may be dissuasive for end users: • it depends on the presence of swell and, consequently, it cannot cover acquisitions of calm zones; • the estimations provide an incomplete spatial coverage: the propagation of waves is often irregular and does not always reach the shore; • the resolution is low (about 500 m at best).
In this paper, our main objective is to eliminate or reduce these limitations in order to be able to offer an exploitable product. Our key contribution is the combination of the linear wave model (that we will call physical model as it is based on wave physics) with a statistical model using Gaussian Process Regression (GPR) in an unsupervised setup. First known in the geostatistics community as kriging, GPR is a machine learning approach introduced by [14] that has been successfully used in recent Earth Observation applications [15,16].
Our goal is to obtain a water depth product on a regular grid with a resolution of less than 100 m that better corresponds to the needs of real-world applications. We achieve this result by the unsupervised training of a GPR model with input features extracted from multispectral bands of a single acquisition. This approach is an alternative way to exploit image reflectance and, as such, can be considered as a special case of the water color inversion model. We are thus able to combine the approach based on wave physics with the one based on water colour while maintaining independence of in situ measurements and relying on very little to no human supervision. Section 2 of our paper gives an overview of the combined physical-statistical method. Sections 3-5 describe the three main components of the method, namely physical estimation, statistical model optimisation, and statistical estimation. Section 6 evaluates the method over acquisitions around the Hawaii Island of Oahu. In the goal of extending the scope of applicability of our method to images without exploitable waves, Section 7 evaluates and discusses the transfer of our trained statistical GPR models, both in space (to surrounding zones at the same time of acquisition) and in time (to earlier or later acquisitions of the same zone).

Method Overview
Our method combines two approaches: a physical estimation based on linear wave theory and a statistical estimation based on water colour inversion. The fundamental hypothesis is that, if input conditions are met (as detailed below), an initial application of the physical model produces an output that, while incomplete in spatial coverage, is reliable enough to train and subsequently apply a statistical regression method in an unsupervised manner. In order to ensure a sufficient accuracy of the unsupervised training data, only results with a high enough reliability are used from the output of the physical model. Accordingly, the method is divided into three main steps, as shown in Figure 1: 1. physical bathymetry estimation: the physical model is applied to the input satellite image, producing physical bathymetry estimates as output; 2.
statistical model optimisation: from a set of candidate statistical GPR models, an optimal one is trained using the input image as well as the physical bathymetry estimates; 3.
statistical bathymetry estimation: the optimal statistical model is applied to the multispectral image to obtain high-resolution, high-coverage estimates; the same model can subsequently be reapplied to obtain bathymetry estimates for new images that are spatially or temporally related (i.e., close) to the original image. The input of our method consists of a single optical acquisition. For our research we used images taken by the MSI sensor onboard Sentinel-2 (A and B), but there is no theoretical constraint on using other sources with different characteristics (For example, in [17], the authors showed that estimating bathymetry from satellite is optimal at a resolution between 10 and 20 m and at least three spectral bands (one between 485 and 550 nm, another one between 550 and 580 nm and a last one in the NIR domain)). On the 13 channels provided by Sentinel-2, we used four with a 10-m resolution: red (wavelength between 457.5 and 522.5 nm), green (542.5-577.5 nm), blue (650-680 nm), and near-infrared (784.5 to 899.5 nm) . Nevertheless, the method remains applicable if the set of channels provided by the sensor are different. We are not applying atmospheric corrections and are using the images as provided by ESA into the level-1C product.
In terms of natural conditions, a low cloud coverage is obviously necessary for the method to be applicable. Furthermore, the presence of swell with a period of at least 8 s is required by the physical model (step 1 in Figure 1), but not by the statistical one (step 3 in Figure 1).

Physical Bathymetry Estimation
Based on linear wave theory described in [18], this method has been explored since as early as the Second World War. The theory describes how wavelengths decrease as the waves approach the coast. Knowing the wavelength λ and the wave period T, it is possible to estimate the water depth h: where g is the standard acceleration due to gravity. In [12] the authors have presented an original method to estimate both the wavelength and the wave period. Using a wave tracing approach (Figure 2), the water depth can be deduced from the image itself, without relying on external information providing the wave period. We apply the model only to the near-infrared band of the multispectral image. The reason is that this band is the least sensitive to changes of reflectance due to water turbidity and the type of seabed, phenomena that can potentially interfere with the wave tracing method of the physical model. As input parameters we use those developed in [12].
The model provides bathymetry estimates within the range of about 3-25 m of depth. Under 3 m, the linearity of the physical model does not hold anymore. Above 25 m, the shoaling effect may be compromised with other interactions. Furthermore, the spatial coverage of the output is irregular: firstly, resolution is proportional to the wavelength of the swell which, in turn, depends on the water depth. In practice, on Sentinel-2 images the highest reachable resolution is around 500 m. Secondly, the wave tracing method produces its output along the wave rays which converge and diverge in space following the refraction and diffraction of waves due to coastal geometry.
While suboptimal for direct use in real-world applications in terms of coverage and resolution, the physical estimates are remarkably precise. In [12], we found that, for sites with a direct exposure to the swell and with an absence of clouds, the standard deviation of bathymetry produced by the physical model was less than 15% of the observed depth between the surf zone and where depth reaches about 25 m. The precision of these results make them suitable for further statistical processing.
For a more detailed presentation of the theory and implementation of the physical model, we refer the reader to the paper [12].

Statistical Model Optimisation
We define an unsupervised machine learning setup using Gaussian Process Regression where the training data consists of the physical bathymetry estimates and the reflectance values provided by the image at the same locations. Estimation then consists of feeding multispectral reflectance data taken from the entire image into the GPR estimator. This can be understood as a learning-based modelling of the well-known water colour inversion method. Contrarily to the physical model, we designed the output of the statistical model to provide results that are exploitable in practical applications, which involves a high enough resolution and a regular and complete coverage of the shallow water zone. We set the output resolution to 80 m on a regular grid as a compromise: while it is slightly lower than the pixel-level resolution of the input image (e.g., 10 m for Sentinel-2), this patch size allows us to smooth the effect of waves on reflectance and improve the overall output quality.
While the water colour inversion method conventionally needs in situ calibration to work, i.e., gathering of ground truth and ancillary data such as water turbidity or weather conditions, our approach uses the output of the physical model in lieu of ground truth and considers ancillary data as latent variables of the machine learning model. This lack of proper ground truth data, however, needs to be compensated by a careful optimisation of the training parameters and input, which is the subject of the current section.
The optimisation phase consists of four substeps, as shown in Figure 3: 1. co-location of reflectance values with the physical bathymetry estimates; 2.
generation of several candidate training sets used as input for the subsequent training of candidate models; 3.
training of candidate Gaussian Process Regression models on each candidate training set; 4.
computation of statistical GPR estimates produced by each candidate model, followed by the selection of the best-performing model. All of these steps are run in a fully automated manner, allowing the computation of an optimised estimator without any human intervention.

Co-Location
Co-location consists of identifying the reflectance values (i.e., the pixels of the input image in each of the input bands) that correspond to each physical estimate. This is a non-trivial operation as physical estimates are provided along irregularly shaped wave traces with varying patch sizes of >500 m (as shown in Figure 2), while reflectance is computed over much smaller patch sizes of 80 m. Our method assumes that a physical estimate is the most characteristic at the centre of its patch and, accordingly, computes the mean reflectance over an 80-m patch placed in the middle of each larger physical patch, separately for each of the bands, as illustrated in Figure 4. The result consists of as many reflectance values as there are input bands (four in our case), co-located with each physical bathymetry estimate. As the water colour inversion method is not applicable beyond 12 m of depth [4], we co-locate only the physical estimates that are shallower than this threshold.

Candidate Training Set Generation
Our experiments have shown that training with the maximum amount of data available did not necessarily guarantee the best results, and that the statistical estimates could be sensitive to the way training values were selected. We therefore need a way to determine (1) how many co-located points to use for training; and (2) how to select these points.
Setting the size of the training set to a constant value is not an ideal solution as the optimal size depends on a number of variables, such as: • the uncertainty of training data (0-15% of the ground truth for physical estimations between 0 and 20 m); • the presence of clouds which can compromise the training; • the amount of estimates provided by the physical model in the shallow water zone, which provides an upper bound on the number of training points that can be generated.
Rather than fixing the training set size a priori, we generate several candidate training sets of different sizes. We start with a size of 50 points that we increment by 50 up to about 1000, which is close to the maximum amount of training points we are typically able to generate for a single site using the physical model.
An entirely random sampling of training points may result in a significantly non-uniform depth distribution in the input, which in turn could generate a biased model. Our solution is rather to train models with reasonably uniform inputs that are achieved by randomly selecting the same amount of training points for ten depth intervals between 3 m and 12 m (the zone of applicability of our reflectance-based method).

Training of Candidate Models
For each candidate training set, we train a candidate machine learning model using Gaussian Process Regression, a Bayesian learning tool. Ref. [14] defines Gaussian processes as a collection of random variables, any finite number of which have a joint Gaussian distribution. The statistical properties deduced from this process define a model that is able to provide an estimate as well as its variance with respect to new inputs [19]. The output of variance was the main reason for our choice of GPR over other machine learning methods, on which we rely for the optimisation of the trained model.
As in many other machine learning techniques, GPR uses kernel methods to characterise nonlinear relationships between inputs and output [16]. This implies defining a kernel function describing the covariance function over pairs of data. Our chosen covariance function is the squared exponential, commonly used in the literature for similar applications.
Formally, the output target y and its variance σ 2 are predicted by minimising the hyper-parameters of the considered kernel function using the training data (inputs and targets). The training inputs are symbolised by x n and the training targets by y n . The dimension of x n depends on the number of features considered. In this case, the water depth is the output target y, the training targets y n are the estimation from the physical model, and the training inputs x n are the mean reflectance of each band co-located with the training targets. The computational complexity of the minimisation operation increases with the amount of training data. For space reasons, we describe only briefly the argument needed for applying the approach with a new sample x n * . For a detailed explanation of GPs, we advise the reading of [14].
where K(x n , x n ) is the covariance matrix of the training sample x n , k * are the covariance values between the training sample x n and the new sample x n * , k(x n * , x n * ) are the covariance values of the new sample x n * , and σ 2 n is the variance of the training data. I represents the identity matrix.

Selection of the Optimal Model
In order to select the best model, we run an estimation on each candidate model trained in the previous step. Besides bathymetry estimates, GPR also outputs estimates of variance. In order to determine the optimal model we use the following criteria, in the order of decreasing priority: 1. the number of high-variance estimates (the lower the better); 2. the spatial variability of bathymetry estimates (the lower the better); 3. the size of training data (the lower the better).
Firstly, high-variance estimates are those for which the variance output by the GPR exceeds a certain threshold (we have been using 2 m as threshold for standard deviation). The lower the number of such points, the more confident we are of the overall output. We thus generally prefer the model that produces the smallest number of high-variance estimates.
Secondly, the notion of spatial variability is motivated by the consideration that a 'smoother' output (in terms of adjacent bathymetry estimates) is more coherent with our training data than an output that shows rapid changes in water depth. While the GPR output is set up to produce results on a 80 × 80 m grid, the physical estimates that were used for training only have a resolution of 500 m at best. We thus consider that a very high spatial variability in the output is a warning sign on the reliability of the model. We define spatial variability z as the mean of the absolute differences between an estimated valueŷ and its eight neighbouring values y n in the grid.
Thirdly, in case of similar results in terms of variance and spatial variability, we choose to minimise the size of data used for training. This is to minimise the time necessary for subsequent estimations using the same GPR model, as in the case of applying it to related images (see Section 7).

Statistical Bathymetry Estimation
The last step of the overall process in Figure 1 is the estimation step that uses the statistical model trained and selected as the best in the previous step. The input of the model is a multispectral image with the same bands as those used for training.
While the primary goal of the statistical model is to produce high-resolution and high-coverage estimates for the very same image on which it was generated, we also present a different but very important use case of applying the model to other acquisitions. In particular, the statistical model is not bound by the input constraints of the physical model and can therefore be applied to images without any exploitable swell. This allows us to extend the applicability of our method considerably.
However, applying the trained model to different images implies that the physical conditions latent in our reflectance data used for training-such as atmospheric conditions, seabed properties, and water quality-stay similar to a large extent across the images. This is a strong constraint that limits the scope of application of the trained model to what we called related images in Figure 1: images of the same zone taken at different times, or images of nearby zones taken in a short time interval. An example of the former case can be the monitoring of changing bathymetry (e.g., moving seabeds) of a given site even in the absence of swell. The latter case corresponds to reusing the trained model in nearby sheltered areas that are rarely or never exposed to the swell.

Experimental Validation
In this section we evaluate each of the major steps of our method: the physical model, the optimisation of the statistical model, and finally the output of the statistical model as the result of the entire setup.

Evaluation Sites and Data
The geographical region for our experiments was chosen to be Hawaii due to the availability of ground truth data for several sites. Furthermore, Hawaii benefits both of a long-period swell (more than 8 s), regularly, necessary for the physical model, and of clear waters that are beneficial for the statistical model.
Three sites have been used along the North-East coast of the island of Oahu: Waimanalo, Kaneohe, and Kailu (see Figure 5). Each cover an area of about 8 × 8 km in a nearly contiguous way along a 30 km-long portion of the coast. The ground truth bathymetry was measured in 2000 by a SHOALS airborne LIDAR campaign (http://www.soest.hawaii.edu/coasts/data/oahu/shoals.html) [20]. Its horizontal accuracy is ±3 m and its vertical accuracy is about ±0.15 m. Because of the volcanic nature of the Hawaiian Islands, the time interval between the LIDAR campaign and the satellite acquisitions is not considered as a bias factor.
Four Sentinel-2 images of size 100 × 100 km were selected for the evaluation, each covering all three sites. As it can be seen from Table 1, of the 12 site-specific acquisitions we only used nine, as the remaining three acquired on 2 and 12 August 2016 suffered from a cloud coverage too dense to be exploited. Table 1 presents for each site the error measures with respect to ground truth. Error measures are presented in two forms between 0 and 20 m of depth: (1) along wave rays; and (2) interpolated over the entire site. We interpolated the results of the physical model to a regular grid of 80 m in order to be able later to perform meaningful comparisons between the physical and the GPR results.

Results of the Physical Model
The table also shows the number of physical estimations realised along the ray traces. These values are significant given that a minimal number of estimates (more than 50) is required to train a GPR model efficiently. In the case of 3 July at Waimanalo and 2 August at Kaneohe, the number of training points is insufficient to train a GPR model efficiently. For these two days, the presence of clouds explains why the method is not working properly: the clouds interrupt the wave traces and, consequently, the approach breaks down. This explains the low coefficients of correlation.
Finally, Table 1 also includes an estimation of the tidal heights at acquisition time. These values were provided by the SHOM (Service Hydrographique et Océanographique de la Marine. Their online tide forecast can be found at http://maree.shom.fr/) [21]. Depending on the tide, the bathymetry obtained may present a bias from one acquisition to another. In the case of Hawaii, the tidal shift reaches 75 cm. In other regions, however, it may be superior to 5 m, such as in the Bay of Mont Saint-Michel in France. In our evaluations we did not take tidal heights into account.

Statistical Model Optimisation
As explained in Section 4.2, statistical estimates can be improved by computing the optimal training set size for each acquisition. In our evaluations, for each of our seven acquisitions (two out of nine having been excluded for an insufficient number of training points) we trained 18 candidate models of sizes of 50, 100, 150, ..., 900 points. For each trained candidate model, we computed the spatial variability of the estimation results (as explained in Section 4.2), as well as the correlation coefficient with respect to the ground truth in order to validate the optimisation approach.
We obtained the following results: • the number of estimations with a standard deviation under 2 m is not increasing with the number of training points; • beyond 150 training points, the correlation coefficients with respect to the ground truth do not change significantly (around 0.87 ± 3% in our evaluations, see Figure 6); • estimations show an increasing spatial variability with the number of training points (see Figure 7a,b). Overall it turned out that all models between 150 and 900 training points present robust and roughly equivalent solutions in terms of correlation coefficients and RMSE. Following the principles put forth in Section 4.2, we opted for smaller models in the range of 150-300 points, depending on the site, for lower spatial variability and faster computing times.

Results of the Statistical Model
As it can be seen on Table 2, the GPR model results have a signficiantly and systematically higher correlation coefficient and a smaller root mean square error to ground truth compared to the physical model (interpolated over the same 80 × 80 m grid). We have to precise here that the interpolated results from the physical model are slightly different from Table 1 since only the estimation between 2 and 12 m depth are considered.
Furthermore, the GPR model provides a significantly higher resolution and a nearly complete coverage of the shallow water zone, fulfilling our main objectives. These results are visualised in Figure 8.  These first results clearly show how a statistical approach can significantly improve on the estimates provided by the physical model. These results, however, largely depend on the accuracy of the latter. In our experience, as long as the physical model itself proposes coastal estimates that are accurate enough (correlation coefficient with the ground at least 0.7), the precision of the GPR results will remain largely acceptable, as shown by graph (c). In the contrary case, the statistical model is less likely to produce a reliable solution, as it can be seen for the acquisition of 2 August.

Spatial and Temporal Transfer of the Statistical Model
The results obtained in the previous section underline the fact that the accuracy of the statistical model depends on that of the physical model it was trained on. In this section we evaluate and discuss the possibility of applying a high-quality trained model to other acquisitions, offering a way to avoid having to use the physical model where physical conditions are less favourable (e.g., due to a presence of clouds or an absence of swell).
Two kinds of experiments were conducted: • spatial transfer consists of applying the statistical model to a site nearby yet different to where it was trained, on the same day of acquisition; • temporal transfer consists of applying the statistical model to acquisitions on different dates, on the same site.
The goal of the experiments was to explore the robustness of the trained model with respect to physical conditions changing in space and in time.

Spatial Transfer
We expect a GPR model to be spatially transposable if atmospheric conditions, seabed properties, and water quality are very similar between the site where it was trained and where it is applied. As the three sites of our evaluations (Waimanalo, Kailu, and Kaneohe) are close to each other (less than 20 km of distance see Figure 5), these conditions have a good chance to be met on one acquisition. So we are expecting good results for spatial transfer from one site to another on a same day.
For our experiment we chose three of the four acquisitions, the one of 2 August not having been considered because of its poor results (presence of clouds). For each acquisition we chose the best statistical model from the three possible sites, based on the results from Table 2. The model was then applied to the two other sites for bathymetry estimation.
From Table 3, it can be seen that the results remain equivalent or even better with respect to the non-transferred case (see Table 2 for comparison). Even more interestingly, transposing the model trained on the 12 August acquisition at Kaneohe to the site of Kailua, we managed to obtain good-quality results (correlation coefficient 0.78, RMSE 2.14 m), which had been impossible using the physical model on the same site due to the configuration of clouds.
Our results clearly point in the direction that a statistical model can be successfully transposed across sites that present similar conditions with respect to the atmosphere, seabed, and water quality.

Temporal Transfer
Reusing a statistical model on the same site but over several acquisitions across time could potentially provide an efficient solution for coastal monitoring, addressing the issue of the non-applicability of the physical model due to no swell or the presence of clouds. As in the case of spatial transfer, the underlying assumption is that the physical conditions at the times of acquisition remain similar. While we expect seabed properties to be largely (even if not completely) invariant on the same site, this may not be the case with respect to atmospheric conditions or water quality.
Among the 17 test cases presented in Table 4, only three were able to improve the results with respect to models trained on the same acquisition. Most of the results are weak when compared to ground truth. These results hint at the conclusion that, in the absence of evidence for the invariance of physical conditions, the reusability of the statistical model is generally limited to a very short time interval (acquisitions on the same day).
However, in certain cases the condition of temporal locality could be relaxed: when an image is very difficult or impossible to process through the physical model, and a less-than-optimal result is acceptable. As it can be seen from Tables 1 and 2, this is the case of the acquisitions of 12 August at Kailua (no result from the physical model) and of 2 August at Waimanalo and Kaneohe (correlation coefficients 0.48-0.53). In these cases, models temporally transposed from 3 July provide correlation coefficients of 0.63-0.68 and RMSE of 1.85-2.10 m. The accuracy remains relatively weak yet significantly improved with respect to those obtained through the physical model. to use our method with multiple sensors simultaneously, in the goal of increasing the temporal coverage for coastal monitoring. Funding: This research has been founded by the European Space Agency for the ECOBAW project in the framework of the Living Planet Fellowship.