Updates to and Performance of the cBathy Algorithm for Estimating Nearshore Bathymetry from Remote Sensing Imagery

This manuscript describes and tests a set of improvements to the cBathy algorithm, published in 2013 by Holman et al. [hereafter HPH13], for the estimation of bathymetry based on optical observations of propagating nearshore waves. Three versions are considered, the original HPH13 algorithm (now labeled V1.0), an intermediate version that has seen moderate use but limited testing (V1.2), and a substantially updated version (V2.0). Important improvements from V1.0 include a new deep-water weighting scheme, removal of a spurious variable in the nonlinear fitting, an adaptive scheme for determining the optimum tile size based on the approximate wavelength, and a much-improved search seed algorithm. While V1.2 was tested and results listed, the primary interest is in comparing V1.0, the original code, with the new version V2.0. The three versions were tested against an updated dataset of 39 ground-truth surveys collected from 2015 to 2019 at the Field Research Facility in Duck, NC. In all, 624 cBathy collections were processed spanning a four-day period up to and including each survey date. Both the unfiltered phase 2 and the Kalman-filtered phase 3 bathymetry estimates were tested. For the Kalman-filtered estimates, only the estimate from mid-afternoon on the survey date was used for statistical measures. Of those 39 Kalman products, the bias, rms error, and 95% exceedance for V1.0 were 0.15, 0.47, and 0.96 m, respectively, while for V2.0, they were 0.08, 0.38, and 0.78 m. The mean observed coverage, the percentage of successful estimate locations in the map, were 99.1% for V1.0 and 99.9% for V2.0. Phase 2 (unfiltered) bathymetry estimates were also compared to ground truth for the 624 available data runs. The mean bias, rms error, and 95% exceedance statistics for V1.0 were 0.19, 0.64, and 1.27 m, respectively, and for V2.0 were 0.16, 0.56, and 1.19 m, an improvement in all cases. The coverage also increased from 78.8% for V1.0 to 84.7% for V2.0, about a 27% reduction in the number of failed estimates. The largest errors were associated with both large waves and poor imaging conditions such as fog, rain, or darkness that greatly reduced the percentage of successful coverage. As a practical mitigation of large errors, data runs for which the significant wave height was greater than 1.2 m or the coverage was less than 50% were omitted from the analysis, reducing the number of runs from 624 to 563. For this reduced dataset, the bias, rms error, and 95% exceedance errors for V1.0 were 0.15, 0.58, and 1.16 m and for V2.0 were 0.09, 0.41, and 0.85 m, respectively. Successful coverage for V1.0 was 82.8%, while for V2.0, it was 90.0%, a roughly 42% reduction in the number of failed estimates. Performance for V2.0 individual (non-filtered) estimates is slightly better than the Kalman results in the original HPH13 paper, and it is recommended that version 2.0 becomes the new standard algorithm.


Introduction
Coastal bathymetry and its change over time are of paramount importance to understanding and modeling coastal morphology and the health of natural coastal systems. In Facility at Duck, NC, USA. Below, we will start by summarizing the elements of the original 1.0 algorithm plus the various significant updates. We, then, introduce the test bed dataset, which can be used to test both cBathy and any other algorithm in nearshore remote sensing. Finally, we will describe and execute a testing protocol that was used to document performance changes with each algorithm update.

cBathy Versions-Version 1.0
Version 1.0 is fully described in HPH13, but is summarized herein. For more detail, the reader is referred to the original manuscript. The algorithm is based on the linear dispersion relationship σ 2 = gk tan h(kh) (1) that relates the wavenumber, k, and frequency, σ, of ocean waves to the water depth, h (k is 2π divided by the wavelength, L, while σ is 2π divided by the wave period, T). The input data for cBathy are usually derived from cameras and consists of time series of image intensity, I (x p , y p , t), at a set of discrete pixel locations, x p , y p that span the domain of interest and adequately sample the typical ocean wave scales, while not oversampling them (across-shore and alongshore spacing is commonly 5 and 10 m, respectively, see blue dots in Figure 1). The analysis is carried out at a map array of model locations, x m , y m , (example red dot in Figure 1) and at each map location is based on the observed wave phases in a tile of observations within some user-specified cross-shore and longshore length scales, L x and L y , of the model location (green dots in Figure 1; see Figure 2 for an example phase map). At a set of dominant radial frequencies, σ, the two components of wavenumber, k x and k y , are derived from the phase ramp slopes in the x (cross-shore) and y (alongshore) directions and are combined to yield the magnitude of k.  1 4 of the pixels are shown (decimated by two in both x and y directions). The wavenumber for each analysis point, for example, x m = 250; y m = 750 shown above by the red asterisk, is found using phase map data from a surrounding region, shown by green dots above. Imagery is derived from six oblique-viewing cameras and merged into rectified images such as this [44,45].  Figure 1 for the dominant frequency, f = 0.0956 Hz. Observed phase is on the left, best-fit modeled (Equation (2)) phase on the right. The x and y components of wavenumber are derived from the components of slope of the phase ramps (colors from seaward to landward going from blue through green, yellow then red with 360 • phase jumps as blue-red transitions).
Thus, the goal is to estimate the dominant wave frequencies and wavenumbers at each model location and merge that information into a spatially smooth estimated bathymetry using the dispersion relationship. The algorithm is divided into three phases: (1) estimation of frequency-wavenumber pairs, (2) estimation of bathymetry,ĥ, from suites of those estimates, and (3) Kalman smoothing of separate hourly estimates to create a running average bathymetry product, h, that is robust to noise and error. In the following, we sometimes use the terms f-combined and f-averaged to refer to phase 2 and 3 results, respectively. The updates since 2013 have all addressed improvements to phases 1 and 2, so the algorithm review below will omit phase 3 Kalman filtering. Note that the performance tests in HPH13 only considered phase 3 results, since phase 2 output was noisy.
The algorithm must adapt to incident waves with periods from almost 20 s to about 4 s (shorter waves than this are insensitive to depth in other than very shallow water so are not helpful) without knowing a priori what conditions will be. Because the analysis is performed in frequency space, the first step is to Fourier transform each pixel time series and normalize the Fourier magnitudes, allowing focus on Fourier phase only. Cross-spectra are computed between all pixels in a tile for a suite of candidate frequencies that are usually spaced to allow about 40 degrees of freedom (for a typical run length of 1024 s, frequencies are spaced by 0.02 Hz). The (usually four) dominant frequencies are those with the largest total coherence in the resulting cross-spectral matrix and a wavenumber is, then, estimated for each of these frequencies.
Wavenumbers are found from maps of Fourier phase for each tile ( Figure 2). To guard against cases of waves from multiple directions within a single frequency band, as a first step, the cross-spectral matrix is decomposed into complex eigenvectors, v(x p , y p ), and only the one with the largest eigenvalue is retained. This normalized eigenvector is modelled as a single dominant plane wave form v = exp i kcos(α)x p + ksin(α)y p + ϕ (2) where α is the wave angle, and φ is a scalar phase angle. In version 1.0, the three parameters k, α, and φ were found using a weighted nonlinear least-squares fit between the observed and modeled eigenvector phase maps. Details of the weighting are included in the original paper [1]. The goal of phase 2 of the algorithm is to use a suite of σ-k phase 1 estimates to estimate the depth at each individual model location. Each model point can yield up to Remote Sens. 2021, 13, 3996 5 of 25 four frequencies and phase 2 incorporates estimates from adjacent x m , y m locations from within the tile, weighted by the inverse distance from the current estimation point. In version 1.0, the weighting was taken to depend on the normalized eigenvalue and skill of the model fit, both from phase 1, as well as the inverse distance from the current estimate location in phase 2. The final phase 2 result is the single depth that is the nonlinear best fit to predicted depth using the input suite of frequency-wavenumber pairs and the dispersion relationship. Error estimates, h err , are produced for each depth (see [1], error estimation is unchanged in recent upgrades).

Version 1.1 Update
Versions 1.1 and 1.2 were consecutive steps in cBathy improvement but were never fully tested or used separately. Thus, we will describe each, then combine them into a single upgrade to cBathy version 1.2 for testing. We now describe the algorithm improvements.
As waves are measured in increasingly deep water, they become decreasingly sensitive to depth. Thus, small errors in the measured wavenumber become associated with large errors in estimated depth. To guard against this excessive sensitivity in version 1.0, values deeper than a user-specified maximum depth were neglected. However, this maximum depth should instead be wave-frequency dependent, and biases are introduced in deeper water results by simply truncating estimated values. Version 1.1 corrected this problem by accounting better for the dispersion relation sensitivity in phase 2 weighting. Equation (1) can be rewritten where k 0 is the wavenumber in deep water. We can define k 0 /k = L/L 0 = Γ, where Γ is a non-dimensional wavelength (or wavenumber), going from small in shallow water to a maximum value of 1.0 in deep water. We can define sensitivity to wavenumber error by the equation where µ is the sensitivity, a function only of Γ, which can be found numerically and represents the fractional sensitivity of bathymetry estimates to fractional errors in estimated wavenumber. Figure 3 shows the sensitivity of wavenumber inversion as a function of nondimensional wavelength. In shallow water, the value is 2.0 so that fractional bathymetry errors are twice the magnitude of fractional wavenumber errors. This is the best case. Closer to deep water (Γ approaching 1.0), the sensitivity rises rapidly, approaching 10 when wavelengths are 0.9 of their deep-water value. This sensitivity provides a convenient weighting measure. If we denote the version 1.0 phase 2 weighting value as W for any σ-k pair, we can divide this weight by the sensitivity as a simple method of preferentially weighting σ-k pairs that are in shallower water and de-emphasizing those approaching the deep-water limit. Thus, the main change in the version 1.1 algorithm is this modified weighting in phase 2. A side benefit is that the user no longer needs to specify a maximum depth beyond which to no longer believe an estimate. Note that L 0 = gT 2 /2π is frequency dependent so this weighting puts a strong preference on long-period waves.

Version 1.2 Update
Version 1.2 dealt mostly with a problem of poor bathymetry estimates along the seams between cameras, an issue that only applies to image collection systems whose sampling is distributed over multiple cameras. Camera image data that are a common input to cBathy analysis are often collected from multiple cameras and merged into a map of data coverage, for example, as shown in Figure 1. When these data were analyzed as a single array, estimated bathymetries along camera seams were often anomalous.
Two causes were identified. The most obvious issue is variations in camera geometry used to map from image to world coordinates [45]. The sampling pixel array is usually designed as a regular world grid, mapped to pixels for each camera using originally accurate camera geometries. If camera geometries shift slightly over time, the projected world spacing of those pixels on either side of a camera seam will begin to differ from the original spacing. Thus, waves will appear to move too quickly across a shortened sample gap or too slowly for a stretched gap, leading to errors in estimated depth.
The second plausible cause of cross-seam anomalies can come from a loss of synchronization of the cameras. Argus cameras are usually synchronized by either an electronic trigger or by computer bus synchronization. However, there can be rare frame slips that allow time shifts between cameras that would be interpreted as wave-speed anomalies for tiles that span camera seams.
Initial attempts were made to mitigate both of these problems. However, it soon became clear that the simple solution was to never mix pixels from multiple cameras in any tile [46,47]. Thus, the pixels from the camera with the most pixels in any tile are used, and the others are simply neglected. This results in fewer degrees of freedom and increased predicted error, but otherwise had no significant negative impact. Version 1.2 also included a new method for finding the seed wavenumber and wave angle for the nonlinear search, but this change has been overtaken by a much better seed routine in version 2.0 (discussed below) and will not be discussed further here.

Version 2.0
The upgrade to version 2.0 involves three significant changes and a code reorganization, so is considered a major revision and is given a new leading version number. In order of importance, the changes are: (a) to change from tile sizes that are fixed by the user to those that are chosen automatically depending on expected wavelengths, (b) to change from solving for three variables, k, α, and φ 0 , at each map point to solving for only k and α, and (c) the introduction of a much better algorithm to find seed values for k and α before the nonlinear search for each tile. The cumulative consequence of these modifications is a major restructuring of the code. Each component will be described in turn.

Automatic Tile Sizes
In all earlier versions, cross-shore and alongshore tile sizes were set by the user using the parameters, L x and L y . Suggestions for best values were ad hoc with a belief that the search would work best if the tile was typically about one wavelength long, but an implicit faith that even mismatched tile sizes would solve well somewhere within the nonlinear fitting routine. Thus, the same tile size was used for, e.g., 4 s and 16 s waves despite at least a four-fold difference in their wavelengths. Although sub-optimal, this approach was still successful, since tiles that were poorly designed simply failed to converge and returned nan's rather than a poor depth.
Issues with this approach became especially clear for short-period incident waves, for example, 4-5 s waves. These could have 4-5 wavelengths within the tile sampling region, so convergence of the nonlinear search usually behaved very poorly, since there was not a simple global cost function minimum unless you started with a very accurate seed. The problem was made worse by a feature of the code that decimates the original number of pixels in a tile down to a user-defined maximum, maxNPix, to help reduce processing time. Commonly, a standard tile was reduced from 250 collected pixels down to 80 that would be analyzed, enough to map the phase of a single typical wavelength but way too few to map out five short wavelengths in a tile. The resulting sparse sampling per wavelength often led to aliasing of the true signal.
Version 2.0 fixes these problems by defining the phase 1 tile sizes to be k L times the expected wavelength, where k L is an empirical scalar taken to be approximately 1.0. However, the expected wavelength depends of the frequency and depth, neither of which is known a priori. Thus, there is strong motivation to develop a seed-finding algorithm (see below) that can provide good initial estimates of k under all wave conditions. Thus, given an initial tile based on generic user input, version 2.0 feeds all of the available pixels to the routine to find the seed k and α, then crops the original tile size to k L times this wavelength, a size that varies considerably. The number of pixels in this truncated tile is, then, reduced, if necessary, to maxNPix to speed up processing. Because all tiles are roughly one wavelength no matter what the frequency, it is likely that fewer pixels are needed for search convergence, again speeding up processing.

Reduction in the Number of Search Variables
In all earlier versions, the model for wave phase, Equation (2), had three parameters, each of which was found using a standard Levenburg-Marquardt solver. However, only two of the parameters, k and α, are expected to be well behaved in a nonlinear gradientdescent search. The third variable, φ 0 , is a scalar offset between the measured and modeled phase maps and can jump around a great deal, in ways that are inconsistent with a search for a global cost function minimum. Thus, this variable is not well estimated and likely just confuses the search.
In a very early version of this algorithm known as Beach Wizard [35], the solution was sought not in x-y space phase maps as here, but in cross-spectral lag space. This had the advantage that a phase offset was not required (phase differences just increased with lag from zero), but it meant that if you had N pixels in your tile you were searching in an N 2 space (each pixel compared to every other pixel). In addition, visualization of measured and modeled results (e.g., Figure 2) was not as clear in lag space. Thus, we wish to retain the simplicity of working in x-y space maps but using a method for estimating φ 0 for each search iteration that will allow a sensible search for k and α. The solution is to force the measured and modeled phase to be the same at the tile center (the pixel closest to x m , y m ). This is done by finding dφ, the difference in phase at the middle pixel, and multiplying all modeled complex values of the eigenvector, v, by e idφ . The nonlinear search is, then, reduced to two dimensions.

Improved Seed Algorithm
Both the success of adaptive tile sizes and of the nonlinear search depend on the accuracy of the initial seed search values for k and α. The search is complicated by the fact that Fourier phase has 2π jumps whose slight mispositioning adds to cost functions out of proportion to the error. It also rapidly became clear that the full resolution of the tile was needed during the seed search, i.e., decimation down to maxNPix could only occur after the search seeds were found and the full tile was sub-sampled to the adaptive tile size.
The solution for finding more robust seed values makes use of the Radon transform. First, the observed phases from the eigenvector were interpolated onto a regular grid, a Remote Sens. 2021, 13, 3996 8 of 25 somewhat noisy process due to the phase jumps. Next, the Radon transform was found to estimate the phase variance when projected along a suite of candidate wave directions (−45 to +45 • in 2 • increments; Figure 4). The angle that corresponded to the maximum variance was selected as the seed wave angle, α. Finally, the phase map was interpolated onto a new grid that is oriented in the direction of wave propagation, and the median of the phase gradient in that direction was used as the seed value of k. This search also returned the location of the pixel that lay closest to the center of the tile (x m , y m ), to be used in finding dφ.

Algorithm Organization Changes
The changes in version 2.0 have forced a major restructuring of the program logic flow. Figures 5 and 6 show flow charts for versions 1.2 (also representing version 1.0) and 2.0, respectively. The reorganization only impacts phase 1 of the algorithm, shown in red boxes, with the partitioning into tiles happening early in version 1.2 (in subBathyProcess) but later in version 2.0 (in csmInvertKAlpha).     Instead of reducing the number of pixels prior to any analysis steps, it was clear that full pixel resolution was required for finding the k-α seeds and for establishing the required size of the tile. Thus, the full tile of size L x , L y is initially passed into the main analysis routine, csmInvertKAlpha, where the routine prepareTiles is called to (a) find the dominant frequencies, (b) for each frequency, find the dominant eigenvector and the k-α seeds, then c) reduce the tile to an adaptive size and to a maximum number of pixels. These outputs are, then, passed back to csmInvertKAlpha to carry out the nonlinear search for best-fit values and their errors and, then, to build the results structures and find the depths from the σ-k results.
Several new fields have also been added to the bathy output MATLAB structure. These provide a set of traditional Argus image proxies, computed from the input time stack data over the region of interest at the array of desired x m , y m points. These include time exposure, brightest and darkest images, all at the coarse resolution of the analysis array but adequate to determine, for example, the locations of wave breaking. Within the phase 1 fDependent sub-field, there are now maps that are included for diagnostic and performance improvement purposes. These include maps of the seed values of k and α used for the nonlinear search as well as maps of the number of pixels used in each tile and of the number of model calls during the nonlinear fitting routine, a key to algorithm speed. In addition, there is a map of which camera is used for each tile. Within the fCombined sub-field, there is now a map of the effective mean frequency, fBar, used in the phase 2 bathymetry calculation, found as the weighted mean of each frequency that contributed to the phase 2 final bathymetry estimate. Finally, the elapsed CPU time for each analysis is saved.

Bathymetry Test Bed Datasets
HPH13 tested the cBathy version 1.0 against a collection of 16 bathymetry surveys collected at the Field Research Facility (FRF) between 2009 and 2011 as well as one survey at Agate Beach, Oregon, on the US West Coast. In 2015, the Argus station at the FRF was upgraded with improved cameras and computer hardware, so it was decided to base the updated tests in this paper on more recent surveys.
The new dataset includes 39 FRF surveys collected from 20 May 2015 to 17 April 2019. The full dataset is described in the document "The2019cBathyDataTestBed", which is located on the CIRN (Coastal Imaging Research Network) GitHub site (https://github.com/ Coastal-Imaging-Research-Network, accessed on 21 July 2021) in the cBathy toolbox and at the date of publication is located in the version 2.0 branch in a folder called cBathyTestBed (this branch should become the master branch in the near future). The MATLAB data structure, bathyTestDataSet, includes the full set of actual survey data, the list of related cBathy stack files, which can be loaded directly from the Coastal Imaging Lab, and the environmental data on waves and tides. These are accompanied by a testbed description document and instructions on how to download cBathy time stacks. To be consistent with HPH13, the cBathy files for each survey include all collections for the day of the survey plus for each of the three preceding days, a total of 96 time stacks for each survey.
In HPH13, cBathy results were computed for each time stack and, then, smoothed into the phase 3 Kalman-filtered estimate-only the Kalman results were compared to ground truth, since the individual phase 2 results were viewed as too noisy. In this paper, we will examine Kalman results as did HPH13, but we will also compare individual phase 2 results from a large subset of runs within each four-day cBathy sampling window. Analysis runs were selected at 1200, 1400, 1600, and 1800, local standard time, for each of the four days, avoiding morning sun glare problems. This yielded a total of 624 phase 2 bathymetry map estimates in addition to the 39 Kalman smoothed results. Survey data, while only collected on the fourth day, are assumed to be representative for the full four-day sample periods. Figure 7 shows a plot of the heights, periods, and directions of waves measured at both the 26 m depth waverider and the 8 m array seaward of the FRF property. Wave heights were typically fairly low, compatible with the safety needs of in situ surveying, although some of the surveys followed major storm events so include larger wave conditions in the collections from the three prior days. Tidal data are taken from the pier-end NOAA tide station and span a full range of almost 2 m.  was nearly equal to the median of all 624 examples (0.46 m) and because there was some breaking over the offshore sand bar (wave breaking has been found to affect performance). The 26 m waverider wave height was 0.72 m and the period was 6.0 s, the wave direction at the 8 m array was 78 • (shore normal is 72 • , so this is from 6 • south of normal), and the tide elevation was 0.05 m. Bathymetry estimates for each cBathy version (left three panels) are thresholded to only show those for which the predicted error is less than or equal to 0.5 m. For comparison, the right-hand panel shows the ground-truth survey, taken two days later. Ground-truth values are only used if the loess interpolation error in the gridded data is less than 0.15, omitting some larger interpolation errors where there were occasional gaps in survey coverage. Similarly, ground-truth values are only used if the true water depth (survey depth plus tide) was greater than zero, i.e., we only consider locations that are wet. Consistent with HPH13, performance statistics excluded the region of the FRF pier, i.e., the region between y = 400 and 600 m (not inclusive), which interferes with both the camera views and the waves. Previous papers have observed that cBathy estimates are worst in the surf zone and particularly at the onset of breaking [21]. The onset of breaking can be detected based on coarse-resolution time exposure image (one of the new fields in version 2.0), by defining the breaking onset as any region whose intensity gradient, averaged over 20 m in the cross-shore, is less than −2.0 intensity units per cross-shore meter. Neither V1.0 nor V1.2 had any successful estimates in the breaker region (dark red regions landward of roughly x = 200 m). For V2.0, bias and rms errors were 1.20 m and 1.24 m, respectively, reinforcing the poor cBathy performance statistics in this region (for this example run).

cBathy Phase 2 Version Performance Statistics
HPH13 also found that performance varied with true depth. cBathy estimates were partitioned by depth as 0-1 m, 1-2 m, . . . , 7-8 m and performance estimates found for each depth bin ( Figure 9). As expected, performance degrades in the shallowest two bins in the surf zone (note that for the shallowest bin, no estimation points succeeded for V1.0, and only 3 and 15 points were successfully estimated for V1.2 and V2.0, respectively). Both the bias and rmse are good for depths greater than 2 m (outside the surf zone) for versions 1.2 and 2.0 and are similar to those from Kalman-filtered estimates found by HPH13 for version 1.0. Version 2.0 performs best for all statistics, but particularly for the percentage of successful coverage. Several new fields in version 2.0 can be explored. Most interesting is fBar, the weighted mean frequency of waves that contributes to the phase 2 f-Combined depth estimate. Figure 10 shows these results for the example day, showing that map areas to the north (large y-values) are more dominated by shorter northeast waves (f = 0.21 Hz, α~30 • , north of normal), while views looking offshore and south from the cameras in the center region saw longer period waves from the southeast (f = 0.17 Hz, α~−15 to −20 • , south of normal). In general, fBar favors lower frequencies in a broad spectrum. This result is consistent with predictions by Walker [48] and observations by Holman et al. [32] that the strongest optical signals occur when viewing directly into oncoming waves.
Finally, we can compare the seed values of k and α versus final nonlinear fit values as a test of the effectiveness of the new seed algorithm (Figure 11). Eighty-four percent of wave angle fit values are within 10 • of the seed, while k seeds are less accurate with only 60% of best-fit values being with 50% of the k seeds (which seems to be good enough for successful nonlinear searches, given good seed angles).

Bulk Analysis of cBathy Version Statistics
In all, 624 cBathy analyses were considered including 39 surveys and 16 cBathy collections for each survey (a subset of the full set of collections). Because each of the three cBathy versions returned results over different regions (i.e., had different coverages), it is harder to directly compare statistics such as bias and rms error. Thus, we will first discuss the performances of the algorithms in terms of coverage, then we will discuss other error statistics considering both the full comparison as well as comparisons based on coverage regions that were in common between algorithms. Figure 12 shows histograms of the bulk coverage statistics for each cBathy version. It is apparent that V2.0 does much better than earlier versions; for example, the median coverage values are 86, 87, and 95%, for V1.0, V1.2, and V2.0, respectively. All algorithms provide decent coverage with only about 41 of the 624 runs having less the 50% success. There were several reasons for low successful coverage. Figure 13 shows the dependence of coverage on the significant wave height at the 8-meter array. Performance clearly declines with wave height with a rough rule of thumb that for Hs > 1.2 m, expected performance will become poor (consistent with Brodie et al. [21]). Snapshots from the 41 cases of coverage worse than 50% were examined manually and showed that poor performance was also caused by foggy or rainy days (although many rainy days were also successful) as well as low evening light.  We next wish to compare the basic performance statistics between the three versions. Figure 14 shows histograms of each statistic for each version. Surprisingly, the first three statistics, bias, rmse, and ∆h 95 , look almost indistinguishable between versions, despite what we had hoped were significant improvements in the algorithm as shown in the example run above. This result is a consequence of the fact that for each version, results with estimated errors, h err > 0.5 m were rejected from the analysis. Thus, the statistics are computed for different regions of coverage. To make a more direct comparison (apples versus apples), the map of successful coverage for V2.0 was saved for each collection and used as the basis for computing performance statistics for V1.0 and V1.2. This usually larger region, thus, includes poorer performing regions from the earlier algorithm versions. This is confirmed by Figure 15, which shows the same histograms of statistics but is now based on a common area of sampling defined by the V2.0 error estimates. For all three statistics, bias, rmse, and ∆h 95 , the performance of V2.0 is superior. The fact that statistics are roughly the same when h err is determined by each individual version is a testament to the robustness of that error estimate. In the following, all statistics will be based on the common area of coverage determined for V2.0.  The bias statistics are seen themselves to be slightly positively biased (estimates too deep), with mean biases for the three versions of 0.19, 0.14, and 0.16 m, respectively. There appear to be two causes for larger biases: performance issues with large waves and breaking and cases of low coverage, so poor imaging conditions or noisy statistics. Removing cases for which H s is greater than 1.2 m reduces the mean bias to 0.15, 0.11, and 0.12 m, respectively. Removal of estimates for which coverage was less than 50% reduced the original mean bias to 0.15, 0.1, and 0.1, respectively, while removal of both issues (consider only cases with Hs ≤ 1.2 m and coverage ≥ 50%, reducing the dataset from 624 to 563 runs) led to mean biases for the three algorithm versions of 0.15, 0.09, and 0.09 m, respectively.
Ninety-five percent of the bias statistics had magnitudes less than 0.61, 0.58, and 0.59 m for the three cBathy versions when the full dataset was considered. When large waves (>1.2 m) and low coverage (<50%) data were removed, the 95% exceedance of bias was reduced to 0.46, 0.43, and 0.34 m, respectively. As a further step, the most extreme cases of error were examined visually. Most cases were associated with fog or rain (low coverage), storms (large waves), or an apparent absence of longer period (>4 s) waves. Figure 15 shows that the rms and 95% exceedance errors also improved with later versions of the cBathy algorithm. Table 1 summarizes the mean values for the four basic performance statistics for each of the three cBathy versions. These statistics are shown for the full dataset (top of Table) as well as for the reduced dataset (bottom of Table). It is clear that version 2.0 is superior in all categories of performance measurement. Table 1. Means of the four basic performance statistics (rows) comparing across the three versions of the cBathy algorithm (columns). Statistics are computed for the full dataset of 624 runs (upper section) as well as for the reduced dataset of 563 runs for which successful coverage was greater than 50% (based on V2.0 coverage) and significant wave height was less than 1.2 m (lower section).  Figure 16 shows the four bulk statistics against significant wave height, plotted by color for each cBathy version, for all 624 data runs. Points are plotted in order of V1.0 (red), V1.2 (green), then V2.0 (blue), so later values hide earlier ones. That said, it is clear that V2.0 values are better than those of V1.0 and V1.2. This is particularly apparent for rmse statistics where there are fewer cases of large error for V2.0 (fewer blue dots at large values). For comparison, Figure 17 shows the same statistics for the reduced dataset of 563 points for which wave height is less than or equal to 1.2 m, and successful coverage is greater than 50%. The y-scales are the same as for Figure 16 to ease comparison, although the range of wave heights (x-axis) is reduced.  As was shown by the example run in the previous section, statistics vary with true depth. The full dataset was partitioned by depth (0-1 m, 1-2 m, . . . , 7-8 m) and the mean of each of the core statistics computed for each depth bin. The results are shown in Figure 18. It is apparent that the improvement in V2.0 performance is concentrated in the shallow (<2 m) and deep (>6 m) regions with much smaller differences in intermediate depths.
It is unclear why the performance differences are greater in deeper water. However, in shallow water, the worsening performance is likely linked to wave breaking in the surf zone. To test this hypothesis, we partitioned the dataset into breaking and non-breaking regions for each data run, based on the mean intensity brightness from the time exposure image (calculated from the time stack in version 2.0 but applicable to all versions). Time exposure intensities greater than 150 were considered to be surf zone, while values less than 150 were considered non-breaking. Figure 19 shows the bias and rms statistics for the breaking (dashed line) and non-breaking (solid line) partitions, reinforcing the increased errors when waves break, which always occurs preferentially in shallow depths. Note that while errors are largest in shallow depths for all versions of the algorithm (Figure 18), they are significantly better for version 2.0.  Finally, the quality of the new k-α seed algorithm was tested for version 2.0 (seeds are not saved for earlier versions). Over the 624 data runs, the standard deviation of the residual between the seed and the best fit value of wave angle, α, was typically between 5 and 10 degrees with a mean value of 6.2 degrees. The errors of the k seeds are relatively larger with a mean of 0.05 m −1 , but with improved wave angle seeds, this was sufficiently close to yield greater success in the nonlinear search.

Kalman-Filtered Results
In the first paper describing cBathy (HPH13), only Kalman-filtered results were presented because individual results from version 1.0 were felt to be too noisy or to have too many analysis gaps. Kalman filtering allowed these gappy areas to be merged with data from other runs based on the estimated error, h err , using a Kalman filter. Thus, regions with bad results due to, for example, wave breaking, can be filled with results from runs in which there was no breaking at that location. Kalman results accumulate over a series of prior data collections, with each new run improving the estimate until the results stabilize after roughly one or two days. In this case, we have included four cBathy bathymetry estimates per day for four days, the three days prior to each survey plus the actual survey day. The Kalman result from approximately 3:00 pm on the day of the survey is used to compare with the survey ground truth. Figure 20 shows the Kalman-filtered results for the example survey used in Figure 8  Continuing the comparison to the non-filtered results in the previous section, Figure 21 shows the four basic statistics plotted by depth. The advantages of Kalman filtering become apparent. While there are still some anomalies in the shallowest depth bin, they are largely absent for depths greater than 1 m. Moreover, coverage is now 100% for all versions. These results are consistent with the bulk statistics for the 39 surveys, shown in Figure 22 and plotted versus wave height for the survey date data run. Bias, rmse, and ∆h 95 are all very good with V2.0 results outperforming earlier versions. Note that the coverage (of submerged depths) is now always greater than 95% due to the Kalman filtering, with V2.0 performing the best (the lowest V2.0 coverage was 99.3%). Again, performance is best for lower waves, although the Kalman filtering averages over multiple prior runs. Mean statistics for the three algorithms averaged over the 39 surveys are shown in Table 2. Again, version 2.0 outperforms earlier versions.  Finally, the effectiveness of predicted errors from V2.0 was compared to errors that were measured in comparison to ground truth. For each selected Kalman dataset (3:00 pm on the survey date for each of the 39 surveys), the ratio (labelled R err ) of the measured absolute error to both the Kalman and non-Kalman (phase 2) predicted error was taken, averaged over the domain for which Kalman and non-Kalman predicted errors were less than or equal to 1.0 and 0.5 m, respectively (this removes the influence of extreme predicted errors). In HPH13, this value was only tested for Kalman results and was found to be approximately 7. For this more extensive dataset and the version 2.0 algorithm, the values are smaller, as shown in Figure 23. The mean of the Kalman-filtered results is 4.47, smaller than the value of 7.0 noted by HPH13. The mean of the non-Kalman-filtered results was 2.0. During this comparison, it was also noted that the process of Kalman filtering reduced the actual error of the Kalman estimates by an average factor of 4.6 compared to the non-Kalman single run estimates. The same analysis, carried out for version 1.0 results, showed that the mean of the predicted to measured errors for Kalman-and non-Kalman-filtered results were 5.22 and 2.29, with the Kalman ratio being a bit smaller than the factor of 7.0 found by HPH13 for their dataset. The reduction in the algorithm error for Kalman-and non-Kalman-filtered predictions was 5.3, slightly larger than with V2.0, likely due to the improvements in the non-Kalman estimates.

Discussion
Improvements to the cBathy algorithm contained in versions 1.2 and, then, 2.0 have led to significant improvements in cBathy performance. One of the main improvements is the larger region of successful coverage, defined as the fraction of the domain that yielded bathymetry estimated with a predicted error ≤0.5 m. The typical improvement in coverage of around 5-10% may seem small, but when viewed in terms of the fraction of locations for which estimates fail, the improvement is considerable, with a mean reduction in failed estimates of 28% for the phase 2 full dataset (42% for the reduced dataset) and 89% for the Kalman result (although failed estimates are rare with Kalman filtering).
Interestingly, the other performance statistics were not greatly changed by the improvement in the algorithm. However, this observation is only true when comparing dissimilar populations, i.e., the reduced dataset that is acceptable for V1.0 and V1.2 compared to the successful region for V2.0. When the comparison is made over similar regions (that which was acceptable to V2.0), the bias, rmse, and ∆h 95 all showed improvements with V2.0. In some ways, it is reassuring that the quality measurements (predicted h err ) are performing sensibly.
With all of the algorithms, it was found that performance degraded when the waves were large or when the data were of poor quality for one or more reasons such as fog, rain, or sun glitter. The latter issues can be hard to objectively identify but can be represented by a simple test that if coverage is less than 50%, there are likely to be problems with the data collection, and it should be ignored completely. To remove issues of large wave breaking in the surf zone, we follow the recommendations of Brodie et al. [21] that data runs for which the wave height was greater than 1.2 m should be omitted from subsequent analysis.
This yields a potential concept of operations that could apply to a beach that is similar to Duck, NC. Analysis can be automated for all data collections, but for those with either a wave height greater than 1.2 m or an estimated data coverage less than 50%, the results should be excluded from subsequent Kalman filtering or from further analysis consideration. For other beaches that have no wave height measurements, it may be possible to introduce a quality filter that uses the coarse time exposure image (estimated in V2.0) to detect wave breaking and exclude those regions. For the tests above, a threshold of image intensity >150 was used to detect breaking, but this was ad hoc, and image intensity is affected by automated gain control so is not a fundamental variable. Additionally, there are beaches such as on the northwest coast of the US where waves are always large and surf zones are usually wide. Waiting for small waves at such sites may not be practical and further investigation is needed on a useful concept of operations. For beaches with wide surf zones, it may be necessary to use either a more complete numerical model approach to address finite-amplitude increases in the dispersion relation, or to use an approximate compensation that might be based on detected breaking in the time exposure image. Either is beyond the scope of this paper.
The data used in this study are excellent for cBathy testing in that the cameras are fixed to a tower (image geometry is fixed), the data runs are all 1024 s, a standard length for incident wave analysis, and many consecutive runs are available for Kalman filtering. However, there is increasing interest in less structured sampling methods, particularly using small commercial drones (e.g., [32,33]). For these applications, Kalman filtering is likely not available, so the improvements in phase 1 and 2 components of the algorithm that are embodied in V2.0 are important.
Finally, the CPU time was found to be influenced by the algorithm changes. All analyses were done on a modest Linux desktop machine with an Intel i7 CPU using MATLAB R2016b running four parallel threads. The three versions took an average of 92, 91, and 118 s to compute for each data run. The improved seed algorithm for k and α is likely responsible for the increased CPU time in V2.0.
It is recommended that V2.0 should supersede previous versions for all practical uses.

Conclusions
In 2013, Holman et al. ( [1]; HPH13) published the new cBathy algorithm for the estimation of bathymetry from optical observations of wave celerity. The current manuscript describes a set of algorithm improvements that have been made since that time, labeling the original version as V1.0 and comparing it to an intermediate version, V1.2, that has seen considerable use but limited testing, and a current version, V2.0, that is significantly changed compared to the original. Tests of the performance improvements associated with each version were made using an updated dataset of 39 bathymetric surveys collected from 2015 to 2019 at the Field Research Facility in Duck, NC, each spanning a region of 500 and 1000 m in the cross-shore and alongshore, respectively. cBathy returns products of two types, a non-filtered, phase 2 bathymetry and a Kalman-filtered, phase 3 product that averages intelligently over a suite of phase 2 results that have variable quality. Since phase 2 results were viewed by HPH13 as too noisy or having too many gaps, they only presented Kalman results. For the 39 bathymetries of the new dataset, the performance of Kalman stage results improved between V1.0 and V2.0 with bias, rms error, and 95% exceedance error improving from 0.15, 0.47, and 0.96 m, respectively, for V1.0, to 0.08, 0.38, and 078 m for V2.0. The percentage of successful returns (predicted error ≤0.5 m) was 99.1 for V1.0 and 99.9% for V2.0.
With the increasing use of cBathy for isolated data collections, for example from drones or cameras of opportunity for which Kalman filtering is not possible, there has been increasing interest in the quality of individual phase 2 results. Phase 2 performance statistics were not published by HPH13 but have been computed for this new dataset including 624 phase 2 bathymetry estimate maps. For this full dataset, the bias, rms error, and 95% exceedance errors for V1.0 were 0.19, 0.64, and 1.27 m, respectively, and for V2.0, they were 0.16, 0.56, and 1.19 m. The average successful coverage was 78.8% for V1.0 and 84.7% for V2.0 (about a 28% reduction in the number of failed estimates). Thus, V2.0 performs almost as well as the Kalman predictions of V1.0 Larger errors were associated with both large waves (a wide surf zone) as well as a variety of poor imaging conditions such as fog, rain, or darkness. As a practical mitigation, it is recommended that at this site, conditions for which the significant wave height is greater than 1.2 m or for which successful coverage is less than 50% (a proxy for image quality problems) should be excluded from consideration. With this reduced dataset of 563 runs, the bias, rms error, and 95% exceedance errors for V1.0 were 0.15, 0.58, and 1.16 m and for V2.0 were 0.09, 0.41, and 0.85 m, respectively. Successful coverage for V1.0 was 82.8%, while for V2.0, it was 90.0%, an approximately 42% reduction in failed estimate locations. It is noted that this concept of operations for removing poor estimates is specific to Duck, NC. Finally, predicted errors for Kalman and non-Kalman results are found to be too small by a factor of 5.22 and 2.29, respectively, for V1.0 and 4.47 and 2.0 for V2.0.
It is recommended that V2.0 becomes the new standard for cBathy bathymetry estimation.
Data Availability Statement: The full dataset is described in the document "The2019cBathyData-TestBed", which is located on the CIRN (Coastal Imaging Research Network) GitHub site (https:// github.com/Coastal-Imaging-Research-Network) in the cBathy toolbox and at the date of publication is located in the version 2.0 branch in a folder called cBathyTestBed (this branch should become the master branch in the near future).