Development of a Fully Convolutional Neural Network to Derive Surf-Zone Bathymetry from Close-Range Imagery of Waves in Duck, NC

Timely observations of nearshore water depths are important for a variety of coastal research and management topics, yet this information is expensive to collect using in situ survey methods. Remote methods to estimate bathymetry from imagery include using either ratios of multi-spectral reflectance bands or inversions from wave processes. Multi-spectral methods work best in waters with low turbidity, and wave-speed-based methods work best when wave breaking is minimal. In this work, we build on the wave-based inversion approaches, by exploring the use of a fully convolutional neural network (FCNN) to infer nearshore bathymetry from imagery of the sea surface and local wave statistics. We apply transfer learning to adapt a CNN originally trained on synthetic imagery generated from a Boussinesq numerical wave model to utilize tower-based imagery collected in Duck, North Carolina, at the U.S. Army Engineer Research and Development Center’s Field Research Facility. We train the model on sea-surface imagery, wave conditions, and associated surveyed bathymetry using three years of observations, including times with significant wave breaking in the surf zone. This is the first time, to the authors’ knowledge, an FCNN has been successfully applied to infer bathymetry from surf-zone sea-surface imagery. Model results from a separate one-year test period generally show good agreement with survey-derived bathymetry (0.37 m root-mean-squared error, with a max depth of 6.7 m) under diverse wave conditions with wave heights up to 3.5 m. Bathymetry results quantify nearshore bathymetric evolution including bar migration and transitions between single- and double-barred morphologies. We observe that bathymetry estimates are most accurate when time-averaged input images feature visible wave breaking and/or individual images display wave crests. An investigation of activation maps, which show neuron activity on a layer-by-layer basis, suggests that the model is responsive to visible coherent wave structures in the input images.


Introduction
Accurate characterization of surf-zone bathymetry is vitally important for modeling the coastal environment. Bathymetry provides a critical boundary condition for nearshore wave, circulation, and morphology models. The accuracy of input bathymetry may be as important as model parameterization [1,2]. These models, in turn, provide necessary information for coastal management, forecasting, and emergency response decisions.
The surf-zone coastal environment presents unique challenges to accurate bathymetry estimation. This region may experience significant morphologic changes in response to hydrodynamic (wave and current) forces on daily [3] and even hourly [4] timescales. The dynamic nature of the surf zone both produces adverse conditions for bathymetry measurement [5][6][7] and also motivates frequent data collection.
In situ measurement, often performed with vessel-mounted acoustic sensors and global positioning system (GPS) devices, may provide highly accurate measurements and offer locally dense spatial coverage. However, the frequency of surveys conducted with this approach are typically limited due to expense. Wave conditions present additional constraints to data collection. For example, the U.S. Army Corps of Engineers Coastal Research Amphibious Buggy (CRAB) has approximately 0.03 [m] vertical accuracy, but its operation is limited to conditions with wave heights of less than 2.0 [m] [5].
Remote sensing approaches have been embraced by the coastal research community as a natural alternative to traditional survey methods. Platforms such as satellites, unmanned aerial vehicles (UAVs), and camera towers support methods for retrieving water depths at much higher spatial and temporal resolution than traditional in situ surveys typically allow [8][9][10][11][12][13]. However, these methods are not immune to the challenges that the surf-zone environment presents. In particular, features such as wave breaking, persistent foam, and high turbidity can impede methods that rely on clear water for either direct measurement or inversion.
Spectral inversion-based methods (hyperspectral, multispectral) relate image intensities to depth by applying light attenuation equations [8,[14][15][16][17][18][19][20][21][22][23][24]. These methods often require the use of empirical coefficient fit to site-specific data such as bottom substrate and vegetation. Machine learning (ML) algorithms have helped reduce the reliance on site-specific data in applying the empirical relationships [22,25,26] but still require areas of clear water and high bottom reflectance. Because of the reliance on clear water, or in situ measurements, most instantaneous spectral inversion approaches are inappropriate for generating the high spatial (1 m) and temporal (hourly to daily) resolution bathymetries in the turbid waters that are characteristic of the surf zone. Multi-temporal methods assume that the bathymetry remains approximately constant across multiple satellite collections and extract bathymetry only at times of highest water clarity [27][28][29][30][31], an assumption that is clearly violated in the surf zone [32].
Optical wave-based inversion methods that use cameras to capture sea-surface imagery generally are not subject to the same water clarity constraints that limit the applicability of spectral inversion methods to the nearshore environment. In the case that cameras are mounted on a fixed platform, they are able to capture imagery at an almost arbitrary frequency. This is in contrast to publicly available satellite-collected imagery, which may only be available on a daily or weekly basis [33,34]. Two optical wave-based inversion methods, in particular, have been investigated extensively by the coastal research community. The first approach relates the time-averaged location of wave breaking in video imagery to a dissipation proxy and then updates depth according to a comparison between the dissipation proxy and modeled dissipation [35][36][37][38][39][40][41]. The second approach exploits the contrast between the front and back faces of propagating waves to measure their celerity and estimate water depth using linear wave theory [11,12,[42][43][44][45].
The two approaches described in the previous paragraph are not mutually exclusive. While linear wave theory relationships were originally (and continue to be) used to relate celerity to depth, investigators have also explored model-based inversion approaches that combine surf-zone model results with observations in a statistically optimal sense. Van Dongeren et al. integrated approaches based on image-derived dissipation and celerity in the Beach Wizard data assimilation framework [41]. Wilson et al. conducted an extensive investigation of an ensemble Kalman filter (EnKF)-based data assimilation framework using shoreline, current, and celerity information derived from optical imagery [46]. That study also included observations derived from infrared and radar modalities.
The accuracy of celerity-based optical methods has been observed to degrade during storms when waves are large and bathymetry evolves rapidly [45,47]. Methods that rely on linear wave theory to estimate depth may incur errors due to wave non-linearities in surf-zone wave breaking. Optical wave celerity methods are also constrained by their ability to accurately measure speeds as waves transition to and from breaking, which can result in onshore biases in the position of the sandbar [47]. The cBathy depth inversion algorithm [44] implements a Kalman filter to improve estimates of bathymetry by averaging present and prior inversion results. The Kalman-filtered result may still be prone to error in cases where internal uncertainty estimates do not reflect the true error and/or depth estimates have systematic biases [44,47]. Despite these shortcomings, cBathy has been shown to be useful in quantifying surf-zone morphology evolution over multiple months [47,48] and for updating the bottom boundary condition for operational wave modeling [49].
ML has been successful in replicating some multispectral band-ratio approaches. Many recent studies demonstrate the ability of convolutional neural networks to produce accurate results in clear, shallow water [50][51][52][53][54][55][56]. ML has been applied to other coastal inference problems including identification of rip currents [57], estimation of wave heights [58], classification of land cover [59], identification of beach states [60], and providing solutions to the wave kinematics depth inversion problem using synthetic imagery [61][62][63][64]. In this study, we employ the fully convolutional neural network (FCNN) model developed in [63] that was trained by inferring bathymetry from synthetic sea-surface imagery. We adapt the model to use rectified, merged Argus time-exposure (Timex) and snapshot (single video frame) imagery collected at the U.S. Army Corps of Engineers Field Research Facility (FRF) in Duck, North Carolina ( Figure 1), to explore the ability to use this method to infer nearshore bathymetry at high spatial (1 m) and temporal (1 h) resolutions. This approach combines the sophisticated image-processing capabilities of neural networks with physical constraints from the synthetic training data generated using the Boussinesq wave equations [62,63,65]. The high resolution (monthly) and extensive (40-year) survey data set available at the FRF lends itself to initial exploration into these techniques, and the hourly estimates from cBathy that are calculated in near real-time provide useful baseline comparisons for these new approaches. Successful bathymetric inversion at the FRF utilized the synthetic FCNN with remotelysensed imagery from the Argus tower [9]. Figure 1 is an important step to a fully generalizable deep-learning-based depth inversion algorithm for the surf zone. This is the first time that FCNNs have been used to produce bathymetric measurements in the aerated and turbid waters typically found in the surf zone. Additionally, the inferences are at high resolutions-1 m in both the along-shore and cross-shore-and produced hourly (during daylight). This high-resolution output is necessary to capture the high variability of the surf-zone bottom boundary to be useful for a wide range of tasks. In particular, accurate bathymetries with greater temporal frequency would significantly improve bottom boundary conditions for wave and circulation modeling that utilize FRF observations [49].

Argus Camera Image of FRF property
We begin the paper providing a brief overview of data sources in Section 2, and in Section 3, we detail our methodology and training data. Section 4 presents results, including overall model performance and comparison to eight survey-derived bathymetries. Section 5 discusses the results through example cases and provides a comparison to the instantaneous cBathy (Phase 2) output for these time periods. Section 6 discusses the implications of the fully convolutional neural network (FCNN) architecture, and Section 6 presents conclusions.

Data Sources
The primary data sources used in this study are (1) time-exposure (Timex) and snapshot (single video frame) image products, (2) bathymetric survey data, and (3) bulk wave statistics including wave height, period, and direction. In addition, we compare results to the cBathy linear depth inversion algorithm [44]. All measurements were collected at the U.S. Army Corps of Engineers Field Research Facility (FRF) in Duck, North Carolina.
The image products were collected from an Argus [9] station, which consists of six cameras mounted on a 43 m observation tower. Images collected by the six cameras are merged and rectified to produce a combined 3 km (shore-parallel) by 0.5 km (shorenormal) field of view. The Timex images consist of an average of 700 video frames collected over a 10-min period. Both Timex and snapshot image products are available on a halfhourly basis.
Surveys are conducted at the FRF at approximately monthly intervals using the Coastal Research Amphibious Buggy (CRAB) or Lighter Amphibious Resupply Cargo (LARC). The data products collected with both vehicles have centimeter-scale vertical accuracy. Typically, surveys cover an approximately 1 km 2 region that extends to an approximately 15 m depth. Along-shore survey transect spacing is nominally 43 m. Survey transect data are interpolated (using the interpolation method described in [43]) to create a high-resolution two-dimensional gridded bathymetry product.
The FRF 8 m array [66] consists of 15 bottom-mounted pressure sensors. The array is located approximately 900 m from the shoreline. Data from the array are processed to produce directional wave spectra from which estimates of mean wave height, period, and direction are derived. Typically, these data products are available hourly.
cBathy is a linear depth inversion algorithm that infers bathymetry from nearshore video imagery [44]. The cBathy algorithm operates in three phases. In phase 1, cBathy calculates wavenumber-frequency pairs by identifying coherent patterns in cross-spectral analysis of image intensity time series. In the second phase, cBathy determines the depth that produces the best fit between phase 1 estimates of wavenumber and frequency and those modeled with the linear dispersion relation. Phase 2 additionally provides a selfdiagnostic error metric. In Phase 3, a Kalman filter can be applied in time to provide more robust estimates of depth if hourly cBathy estimates are available. The internal error metric determined in phase 2 along with a site-specific process error are used to determine the Kalman gain.
Argus Timex and snapshot images and wave data are used as input to the ML network, and survey data are used to provide ground-truth data for training and testing. Details on the neural network and training/testing workflow are described in the following section. cBathy phase 2 estimates are used as baseline comparison data in the Discussion. Instantaneous cBathy phase 2 results are displayed, instead of the phase 3 Kalman-filtered product, in order to compare bathymetry estimates that depend only on data from the current time period for both methods.

Neural Network Model
We use transfer learning for domain adaption with the FCNN developed in [63] to develop the FCNN used in this effort. The model in [63] was trained on synthetic Timex and snapshot imagery generated with the two-dimensional Boussinesq wave model Celeris [65] over a variety of synthetic bathymetries and wave conditions. The original FCNN from [63] (Figure 2) ingested a three-dimensional array of size (512, 512, 3). The first channel consisted of the red, green, blue (RGB) mean of a synthetic Timex image; the second channel used the RGB mean of a synthetic snapshot image; and the third channel included wave statistic information (i.e., wave height, wave period, wave direction). Values in the third channel were set to zero in cases when wave information was unavailable. The output of the model was a prediction of seabed elevation at the same resolution as the input Timex/snap pair.

Experiment Workflow
The model developed in [63] was retrained to infer bathymetry at the FRF. The model was initialized with the weights from [63], and Timex and snapshot imagery from the Argus tower were input in the first and second channels, respectively, in place of the synthetic imagery. Bulk wave statistics (significant wave height, dominant wave period, and mean wave direction) were used as additional input features to the model, when available, but are not required for the model to produce an output. Output (labels) associated with example input for the training and testing sessions were derived from FRF survey data.
The model was trained on data from 2015-2018 inclusive and tested on image-survey pairs from 2019. We considered images coincident to a survey if taken within three calendar days. The training set consisted of 41 survey-derived bathymetries and 3036 associated images, and the test set contained 11 survey-derived bathymetries and 306 associated images. While we used the full data input in the cross-shore, the model was only trained on the center one kilometer of the imagery (roughly +/−500 m from the tower) because this portion of the imagery contains the clearest imagery with the least projection artifacts. To create a final bathymetric estimate over the one-kilometer along-shore domain, we used a sliding window method, where the prediction window (500 m along-shore, 500 m cross-shore) starts at the northern end of the property and is shifted southward by 50 m to create another prediction, and so forth, until the entire along-shore property range has been predicted at least once. The mean of any overlapping predicted values is taken to produce a final inferred water depth value at each cell (between 50 and 500 values for each cell). The total time to read input, preprocess the data, and deliver a bathymetry estimate is approximately 5 min on a Windows laptop with an NVIDIA QUADRO RTX 4000 GPU. The required time scales linearly with additional inferences.
Images of questionable quality due to weather/time-of-day effects (i.e., fog, rain, and glare) and those recorded while cameras were not functioning properly were removed from both the training and testing sets. Additionally, to help overcome smaller lighting and weather effects that are present in even the highest quality data, a random pixel intensity change is applied at each sliding window step and predicted in a batch of size 100 to produce an ensemble of bathymetries that is then averaged to produce a final guess. Mini-batch normalization was used with a batch size of 8. The sigmoid activation function and mean absolute error loss returned the best results during preliminary hyperparameter investigation. Figure 3 quantify the overall results of testing the ML model. Overall pixelwise RMSE is 0.37 m (Table 1) for the entire test set. RMSE was spatially variable with the lowest errors offshore of 300 m and slightly higher estimates closer to shore, between the sandbar and the shoreline (Figure 3b). There was also a region of higher error in Figure 3 that corresponds to the location of the FRF pier at~500 m on the along-shore location (see Figure 1). When normalized by water depth, the highest percent errors occurred near the shoreline (near 40%), whereas the rest of the domain had errors on the order of 10 to 20% (Figure 3c). Over all test cases, model bias was low (0.06 m (Table 1). However, Figure 3d indicates clear spatial trends in bias: the model tended to overestimate depth closer to shore, between the shoreline and sandbar, and underestimated depths in the offshore region, particularly near the trough associated with the FRF pier (Figure 3d). The pixel-wise 1:1 map shows generally good agreement at the full range of depths between the predicted and surveyed depth (Figure 3e). Histograms of image-wise RMSE and bias are displayed in Figure 3f,g, respectively. In most cases, image-wise RMSE is less than 0.60 m. Mean image-wise depth bias is close to zero, with a slight shift towards over prediction of depths.   Table 2 along with the time between the survey and image. The displayed transect corresponds to the FRF cross-shore array [66] of wave gauges and was chosen because of its far location from the anomalous effects observed near the pier. The model produces realistic bathymetry estimates for all the test cases and correctly captures a range of surfzone morphologies. These results include differing beach states with the morphology transitioning between single and double-barred systems over the course of 2019. In most cases, the model locates the sandbar position, but it sometimes fails to accurately estimate the sandbar amplitude. For example, the model correctly identifies the formation and offshore movement of an inner sandbar between February and April (Figure 4a,f,k,p). During this same time period, the model correctly places an outer sandbar near X = 400 m but overestimates its amplitude. In addition, the model correctly identifies the transition from a double-barred profile to a single-barred profile between November and December (Figure 5f,k,p) but again overestimates offshore sandbar amplitude in June and September (Figure 5a,f). The model fails to capture the deep trench under the FRF pier in all of these test cases, particularly south of the pier, which could possibly be explained by the pier obscuring wave signatures in this region. The variability in model performance over the test set is explored in the following section.  Figure 4 for the shown dates.  Figure 3), compares favorably relative to the scale of errors that typically occur from optical wave-based inversion methods within the surf zone. For example, prior work quantified the performance of cBathy phase 2 over a range of wave conditions, finding an RMSE between 0.51 and 0.89 m when wave heights were less than 1.2 m and between 1.75 and 2.43 m when wave heights were over 1.2 m [47]. In Figures 4 and 5, we explore model performance in different conditions and compare the output of the ML network to both ground-truth and cBathy phase 2 estimates for the same time. cBathy also provides an error prediction at each cell, and cells with a predicted error of greater than 2 m are removed for this comparison. A Kalman filter could be applied to either product to potentially improve overall skill.

Survey Time series (Part A)
The first example is from transect four in Figure 5 on 10 December 2019. This example case is a single-barred system, with a break in the bar near the pier. The ML model performs well along both the transects at 60 and 940 m along-shore ( Figure 6f) and captures the overall shape of the full two-dimensional survey product when comparing Figure 6b,c. The model predicts a break in the sandbar that is farther north than the survey product indicates. This is likely due to the corresponding gap in wave breaking in the Timex imagery at that location, between 400 and 500 in the along-shore. We observe good agreement at deeper depths between the model and the survey, and clear wave crests are visible throughout the snap imagery in this region (Figure 6e). cBathy provides relatively complete depth estimates at this time, though it has higher error throughout, translating the sandbar onshore through much of the northern portion of the domain (Figure 6i). Figure 7 examines an example of some of the highest-error cases in the test set, when the network is biased deep throughout the nearshore area. In this case, the ML bathymetry estimate has a higher error close to shore than the cBathy results for the same time period. In this example, there is minimal wave breaking in the snap and Timex input, and the peak period is only 4.11 s. Correspondingly, the wave field in the snap image has low coherence. It is likely that the lack of visible wave breaking in the image, along with the scarcity of prominent wave crests in the snapshot image, provides insufficient information to accurately estimate depths or the position of the sandbar. In this case, the cBathy product is more accurate than the ML model close to shore, as wave parameters may be more readily distinguishable from noise in spectral analysis of the time-series data. In addition, cBathy can take advantage of multi-frequency waves in its results, whereas the ML inversion designed here can only interpret information from the visibly dominant wave in a single frame, which in this case is a very high frequency and thus less sensitive to depth.   Figure 8 presents results from an image that occurs three days after the previous case (Figure 7). Both image results are evaluated with the same bathymetry. Unlike the previous case, longer period swell wave crests are clearly distinguishable in the snap image (Figure 8e), the dissipation signal is easily visible in the Timex image (Figure 8a), and bathymetry estimates from the ML model are more in the nearshore (Figure 8c). In contrast, cBathy, (Figure 8d,h,i), displays a more pronounced deep bias in this region of the domain.

2019-12-10 18 Case
These results suggest that the model performs better when wave breaking is clearly visible in the Timex image and/or when wave crests are clearly visible in the snap images, illuminating the refraction patterns and changes in wavelengths.

Activation Maps
To explore the hypothesis that the visibility of wave crests in the snap image has a direct effect on the ability of the ML model to estimate depths, we examine activation maps within network layers. Activation maps display output values from a neural network layer as an image. Figure 9 shows the snapshot inputs for the two example cases from Figures 7 and 8, along with the prediction, ground truth, and example activation maps for the northern portion of the domain. Similar to [63], wave breaking and refraction patterns are visible in activation maps from some of the network layers, suggesting the model is using this information in its predictions. The snap image for 10 April has a less distinct wave field and breaking pattern, while the snap image for the 13 April case has more pronounced wave crests. These more pronounced wave crests are producing a stronger signal in the same activation maps (Figure 9h) than the shorter and harder-to-distinguish wave patterns from the earlier time period (Figure 9d). Qualitatively, improved model performance occurs when these images appear clearer to expert observation and are more similar to the images produced for the synthetic model training set in [63]. The enhanced ability of the model when there are clear and distinguishable wave refraction patterns mirrors the success criteria from other synthetic approaches and state-of-the-art satellite methods [11,12,61,67]. We hypothesize that the ML model is more successful at estimating water depths under these conditions of greater wave pattern coherence than in cases where visual artifacts or complicated sea states can hinder wave pattern identifications. Future work will endeavor to explore this hypothesis further. Figure 9. Activation maps from a single ensemble window are shown for the northernmost section for the example shown in transects three and four in Figure 4 and detailed in Figures 7 and 8. (a) shows the snap image for the labeled date. (b) shows the survey contours. (c) shows the instant prediction for that ensemble window. (d) shows three activation maps that emphasize the information that the network is keying off of to make its prediction. (e) shows the snap image for the labeled date. (f) shows the survey contours. (g) shows the instant prediction for that ensemble window. (h) shows three activation maps that emphasize the information that promotes a response in the network.

Wave Conditions
While wave-celerity-based inversions show degradation of results in high wave conditions [44,45,47], the ML model performs similarly across different wave heights, periods, and directions. Figure 10 shows this invariance by comparing the absolute errors of each image in the test set with the offshore wave conditions recorded at the 8 m array during the image collection. Sample sizes across a wide range of wave periods (4-14 s) and directions (11-144°) show no correlation in mean error. For wave height, we see consistent mean error metrics (0.31 MAE, 0.37 RMSE) up to wave heights of 3.5 m. However, instances with high wave heights (>1.5 m) are more rare, and so more testing is needed to determine if estimates are similarly consistent under these conditions. Errors may slightly increase during small, short period waves, similar to Figure 7, when contrast on the front and back sides of the waves is low and little breaking occurs. Wave condition features were used as input during training/inference when available. Wave period was included, as it can affect wave speed in intermediate water depths and thus potentially interpretation of visible refraction patterns in the imagery. In addition, for a given bathymetry, variations in wave height will influence the extent of wave breaking visible in the Timex imagery. Interestingly, while including the local measured wave conditions did reduce the overall RMSE (0.369 m from 0.376 m), it was not statistically significant. More research is needed to determine if this is consistent across different locations.

Transfer Learning
The network architecture utilized in [63] provided a framework to explore the potential of transfer learning on FRF imagery and was able to incorporate some non-image features (wave height, period, and direction). Similar to [63], activation maps for the real imagery used here highlight the wave breaking and refraction patterns in the imagery, suggesting the ML model is associating the visible signatures of shallow water wave processes with changes in water depth. This study demonstrates that an FCNN can accurately infer bathymetry using real-world Timex and snapshot imagery at the FRF, even with a training data set of limited size (3036 image/41 surveyed bathymetry pairs). More research is needed to determine the value of transfer learning from a network pretrained on physically realistic synthetic data. The transfer learning approach may add value when applying the network to locations with limited or no data availability. The applicability of the synthetic data to a general real-world location could possibly be enhanced by dimensional reduction of the synthetic and remotely sensed image pairs into a latent space. This would be expected to increase similarity and reduce environmental noise while retaining necessary information for the bathymetric inversion. The transfer learning component may be less important, however, for sites where large volumes of training data are available.

Conclusions
In this work, we explored the adaptation of an FCNN trained on synthetic surf-zone imagery [63] to infer bathymetry from real Timex and snapshot imagery. This was the first time, to the authors' knowledge, a neural network was successfully applied to infer bathymetry from remotely sensed sea-surface imagery in the surf zone. Not only did the approach provide accurate bathymetry estimates (RMSE = 0.37 m), but it also was effective with a relatively small training data set (3036 images/41 surveyed bathymetries). Additionally, all inferences over the test set predicted physically realistic bathymetric states. The model is robust to different wave conditions; however, estimates are most accurate when wave breaking and/or wave crests are clearly visible in imagery. Activation maps show that the model can learn the relationship between wave breaking signatures and underlying bathymetry, instead of merely overfitting to a mean bathymetric state. On the contrary, it accurately tracks patterns of morphologic change and infers transitional states between single-barred and double-barred profiles. Future work will be twofold: (a) developing a generalized form of the algorithm to work on wave fields from other locations, without requiring training data from those locations, and (b) examining the best approach to integrate this methodology into other bathymetric inversion methods and available data sources at the FRF, with the goal of developing the best bathymetry product possible on a daily time scale between survey dates.