Unmanned Aerial Vehicle Depth Inversion to Monitor River-Mouth Bar Dynamics

Monitoring the morphological evolution of a river-mouth bar is of both practical and scientific importance. A large amount of sediment is transported from a river to surrounding littoral cells via a deltaic bar after an extreme weather event. However, it is often not feasible to capture drastic morphological changes in the short term with conventional bathymetric surveys. This paper presents a depth-inversion method based on unmanned aerial vehicle technology to estimate twodimensional bathymetry from video-sensed swell propagation. The estimation algorithm is tested over four cases with varying wave and bathymetric conditions and is validated with transect survey data. The test results suggest that the method can estimate deltaic-bar topography in front of a river mouth with a root-mean-square error of <0.5 m. The applicable range is limited by wave breaking in the inner bar and up to a depth of ~8 m, where swell intensity signals become ambiguous. A comparison of the different cases shows that the method works better under calm weather conditions with dominant swells propagating from non-local sources. Significant morphological changes of a river-mouth bar due to a powerful typhoon are successfully detected by observations right before and after the event.


Introduction
The coastal zone is a valuable space for marine species and various human activities. Many coastal areas face erosion problems, which will worsen due to climate change [1,2]. Monitoring coastal bathymetry is essential for coastal erosion management. However, we cannot frequently conduct bathymetric surveys with costly vessel operations. From practical and scientific viewpoints, it is crucial to monitor dynamic bathymetry around a river mouth as a relay point of fluvial sediment supply to coastal zones. During an extreme event of fluvial sediment discharge, a large amount of sediment accumulates on the deltaic bar in front of a river mouth, some of which is later transported to surrounding coasts mainly by wave action [3,4]. The morphological change of a river-mouth bar is also linked to littoral drift across the river mouth [5]. Without understanding the morphodynamics of a river mouth during an extreme event, we cannot estimate the sediment budget in a coastal zone and conduct proper erosion management in the long term. As such, there is an urgent need for a low-cost method to monitor highly two-dimensional coastal bathymetry. Water depth estimation from video-sensed waves has gained popularity as a lowcost alternative to obtain two-dimensional coastal bathymetry. Different algorithms have been proposed for the depth inversion, which commonly estimates water depth using the dispersion relation from linear wave theory. There are two main approaches to extract wave parameters from the video imagery: spectral and temporal methods. Both methods assume that the pixel intensity time series is closely related to water surface variation [6]. On the one hand, the spectral method estimates wavenumber and frequency pairs by decomposing intensity signals with empirical orthogonal function analysis [7][8][9]. The water depth is estimated from the dispersion relation using the multiple wave components. On the other hand, the temporal method determines wave celerity by a cross-correlation analysis of the intensity time series [10,11]. Video-based depth inversion has recently been applied to monitor beach profiles in a wide range of environmental conditions [12][13][14] and nearshore currents via data assimilation into numerical models [15]. The root-mean-square error (RMSE) of the estimated bathymetry in these studies is commonly around 0.5 m. However, monitoring a broad coastal area requires a camera fixed at a high elevation point. As such, it is often difficult to find a suitable place to install a camera to monitor the bathymetry of a river mouth.
Recent advancements in unmanned aerial vehicle (UAV) technology have expanded the possibility of video-based depth inversion. UAVs enable us to obtain aerial footage of a broad area from an arbitrary location and angle under preferable wave and weather conditions. Still, they are not without their limitations. One shortcoming is that UAV oscillations by wind generate noise in the sea-surface intensity signals. Recent studies overcome the issue by automatic rectification to realize the same accuracy as a fixed camera [16][17][18][19]. However, most previous studies conducted the depth inversion in coastal areas with nearly uniform bathymetry in the longshore direction. It is still uncertain whether the UAV depth inversion applies to highly two-dimensional bathymetry in front of a river mouth. Depth contours around a river mouth are convex seaward due to the presence of a river-mouth bar. Hence, incoming waves cause refraction over the convex topography and produce a non-uniform vector field of wave celerity. The bathymetry estimation requires a robust algorithm that can extract wave parameters from intensity signals generated by the non-uniform wave field.
This research aims to construct a UAV-based depth inversion method that works for highly two-dimensional bathymetry in front of a river mouth. Our strategy is to utilize video signals of swells that have a high dependency on coastal bathymetry. Based on the temporal method, we develop a robust and efficient algorithm to extract a wave-celerity vector field of swells from pixel intensity signals. The depth inversion capability is tested over four different cases with varying wave and bathymetry conditions. We describe the video acquisition and rectification processes in Section 2 and then introduce the depth inversion method in Section 3. Then, we show the test results in Section 4 and identify the factors affecting the accuracy and discuss the applicability of the proposed method in Section 5. Finally, we present the conclusions in Section 6.

Field Site: Suruga Coast (Japan)
We conducted field tests on Suruga Coast in Shizuoka Prefecture, Japan. The coast is located on the west side of Suruga Bay, which opens to the Pacific Ocean (Figure 1a,b). Swells from the Pacific Ocean propagate over a deep trough running along the bay axis and significantly influence the bay's inner coasts. The Suruga Coast consists of steep beaches made of coarse sand and gravel supplied from the Oi River. A long stretch of the coastline around the river mouth suffers from coastal erosion due to active sand mining and dam construction upstream in the past. Additionally, the coastline has changed significantly due to harbor construction on both sides of the river mouth, which has interfered with the longshore sediment transport. The highly altered coast is one of 12 sites in Japan under direct control of the national government for severe coastal issues.  Figure 1c shows the coastal area around the mouth of the Oi River. We set three test areas with different bathymetric features, A1, A2 and A3, as indicated by the red rectangles. A1 has nearly uniform bathymetry alongshore, and wave propagation usually occurs in a one-dimensional way. A2 has a series of detached breakwaters for shore protection, and therefore, two-dimensional wave transformation due to dissipation, reflection, and diffraction by the structures creates non-uniform bathymetry. The last site, A3, is the main target of the present study, which covers the area around the mouth of the Oi River. Fluvial sediment usually forms a submarine delta out of the deflected river mouth, and incoming waves refract over the convex depth contours. Figure 2 shows annual changes of the significant wave height (SWH), the significant wave period (SWP), and the tidal water level (TWL) during our test period (from October 2018 to September 2019) at Omaezaki Station located at the Suruga Bay entrance (see Figure 1b). The superposition of swells propagating from the Pacific Ocean and local wind waves from the northeast direction characterise the incident wave conditions on this coast. From late summer to autumn, when typhoons take northward tracks in the north-western Pacific, long-period swells tend to dominate local wind waves. The tides are semidiurnal with mean spring and neap tidal ranges of approximately 0.5 m and 1.5 m. We obtained aerial footage of the sea surface in the three areas under significant swell conditions.

Image Acquisition and Rectification
This study employed a DJI Phantom 4, a commercially available small UAV, for the video observations. The UAV camera has a 4K resolution (3860 × 2160 pixels) and a video framerate of 29.97 fps. We recorded oblique aerial videos of each test area by hovering the UAV in a stationary position at an elevation of 100-150 m above mean sea level. As shown by an example image in Figure 3a, the horizontal camera angle was set obliquely to the shoreline in a similar way as previous studies [18,19]. The recorded image contains ground control points (GCPs) on land and captured swell patterns in high image contrast. Table 1 summarizes general information about the field tests. Cases 1 and 2 were conducted in our previous study [19]. We carried out Cases 3 and 4 at the river mouth (A3) in September 2019 before and after a powerful typhoon hit, i.e., Typhoon Faxai, which approached the area on 8 September, accompanied by strong winds, high waves, and heavy rain [20] and significantly impacted coastal and river-mouth bathymetry. During the peak period of the typhoon, the significant wave height reached 6.0 m at the same station ( Figure 2). Therefore, we explored the morphological impact on the river-mouth bar with Cases 3 and 4. The significant wave height and period for each case observed with the bottom pressure sensor at Omaezaki Station are shown in Table 1. The table also includes tidal water level at each observation time, which we used for the tide correction of the estimated bathymetry data.    1 Azimuth angle clockwise from the north and elevation angle positive upward of a horizontal plane.
Before depth inversion, we converted each recorded video into orthogonal video imagery in the same way as [19]. Even when operated in the hover mode, the UAV occasionally moved in horizontal and vertical directions due to strong winds. First, we estimated the camera position and angle by geospatial information analysis using the GCPs following [21]. Some ground objects, such as the corners of coastal structures, were utilized as GCPs (Figure 3b). Additionally, blue-coloured sheets were laid out on sandy beaches when distinctive objects were not available ( Figure 3c). The geographical coordinates of the GCPs were measured by real-time kinematic (RTK) GPS during the field observations. The corresponding image coordinates were automatically detected in each video frame using the image template matching algorithm. Then, we created orthogonal video imagery by projecting the original video frames onto a horizontal target area. The rectified image covers a target area of 600 m in the cross-shore direction and 1000 m in the alongshore direction. The sampling rate was reduced from 29.97 to 6 fps (∆t = 0.167 s) to decrease the computational cost in the following depth-inversion process. Table 2 summarizes the video length and camera parameters estimated from the geospatial analysis. The UAV was hovered in a stationary position at an elevation of 100-150 m above mean sea level. The elevation was set within the legal maximum altitude of Japan, 150 m above ground level. Wave propagation in each coastal area was filmed with a horizontal angle of 30-55 • seaward from the shoreline and an elevation angle of 60-80 • upwards of the horizontal plane. Figure 4 shows a snapshot of the original video and its orthogonal image of the target area (1000 × 600 m) for each case. The orthogonal image captured swells and showed that their wavefronts were shadowed from the sunlight, as the sun was located on the sea side during each observation (Table 1). Wave patterns appear differently over the four cases, reflecting different wave, lightning and camera conditions. We extracted the swell parameters by analysing the spatial and temporal variations of pixel intensity over the sea surface.

Depth Inversion
As with previous video-based inversion methods, we invert estimated wave parameters into water depth based on the dispersion relationship from linear wave theory. Therefore, the water depth h can be expressed as: where f is the wave frequency, c is the wave celerity, and g is the gravitational acceleration. This relationship allows us to estimate water depth from the wave celerity and frequency. The dependency of wave celerity on water depth is strong when the wavelength is large relative to water depth. Therefore, it is more advantageous to use long-period waves for depth inversion [22]. The relationship among the relative errors in the estimation can be derived from Equation (1) as: Equation (2) describes the relative-error propagation from wave celerity and frequency to water depth. The variable G converges to unity in very shallow water, whereas it diverges to infinity in deep water. Figure 5 shows G as a function of h and f. In the range of h < 8 m, the relative error in wave celerity propagates to water depth by a factor of 2-3, whereas the propagation factor of the wave frequency is much smaller. In the deeper area of h > 8 m, the propagation factor rapidly increases in the range of high-frequency waves (f > 0.15 Hz). Hence, the depth inversion requires very accurate measurements of wave celerity and frequency. The key to successful depth inversion is to extract the properties of long-period swells in coastal water. However, it is difficult to capture long waves with areal footage. Nevertheless, this trade-off can be mitigated due to the flexibility that UAVs provide, enabling us to optimise the camera position and angle for the swell-signal intensification.
Equation (2) describes the relative-error propagation from wave celerity and frequency to water depth. The variable G converges to unity in very shallow water, whereas it diverges to infinity in deep water. Figure 5 shows G as a function of h and f. In the range of h < 8 m, the relative error in wave celerity propagates to water depth by a factor of 2-3, whereas the propagation factor of the wave frequency is much smaller. In the deeper area of h > 8 m, the propagation factor rapidly increases in the range of high-frequency waves (f > 0.15 Hz). Hence, the depth inversion requires very accurate measurements of wave celerity and frequency. The key to successful depth inversion is to extract the properties of long-period swells in coastal water. However, it is difficult to capture long waves with areal footage. Nevertheless, this trade-off can be mitigated due to the flexibility that UAVs provide, enabling us to optimise the camera position and angle for the swell-signal intensification.

Estimation of Wave Celerity and Direction
Despite complex optical processes on the wavy sea surface, pixel intensity and water level signals show similar temporal characteristics, but the intensity signals are affected by different factors [11]. Therefore, it is common to use a bandpass filter to remove low-

Estimation of Wave Celerity and Direction
Despite complex optical processes on the wavy sea surface, pixel intensity and water level signals show similar temporal characteristics, but the intensity signals are affected by different factors [11]. Therefore, it is common to use a bandpass filter to remove lowand high-frequency components; the former involves light changes due to clouds, whereas the latter involves changes due to wind waves, the camera auto-iris control, and camera movement. Almar et al. (2009) [11] used a bandpass filter (0.05-0.5 Hz) to eliminate the noise signals. Tsukada et al. (2020) [19] further narrowed the band to 0.05-0.2 Hz to target swells. We follow the latter approach and filter signals out of 0.05-0.2 Hz.
There are two popular methods to extract wave parameters from coastal video imagery: temporal and spectral methods. Each method has its advantages and disadvantages. Bergsma and Almar (2018) [23] compared these methods and confirmed that both, in general, estimate bathymetry with similar accuracy. They also suggested that the temporal method works better for cases with a limited number of wave components. Almar et al. (2009) [10] pointed out that the temporal method is robust for various waveforms; intensity signals are often far from being sinusoid in the nearshore area. Given its advantage for narrow-band waves, we employed the temporal method in the present study. The temporal method estimates the signal propagation time based on cross-correlation between intensity signals at two points. The wave celerity can be simply calculated by taking the distance between the two points divided by the propagation time. This can be extended for a two-dimensional case by cross-correlation analysis, both in the cross-shore and alongshore directions.
We take a different approach to realise the robust estimation of a wave-celerity vector field over non-uniform coastal bathymetry. By letting A be an analysis point at which the wave celerity is estimated, we can compare wave signals between A and B n aligned along a circumference of radius r with A at its centre (n = 1 to N). Figure 6 illustrates the circular alignment for the case of N = 8. B 1 is placed such that AB 1 is parallel to the horizontal axis of the orthogonal image, and B 2 , . . . B N are aligned in the counter-clockwise direction from B 1 with the angle θ n being ∠B n AB 1 . We employed a circular array to determine the measurement distance r and the number of reference points N. We will later discuss how the two independent parameters affect the estimation accuracy. First, we evaluate the propagation time t n from A to B n as time lag to maximise the cross correlation of bandpass-filtered signals at the two points. When a wavefront propagates with the wave angle θ, the distance between the wavefront passing A and B n is equal to rcos(θ − θ n ). Therefore, the propagation time can be expressed as Equation (3) suggests that t n is represented by a sinusoidal function of θ n when a monochromatic or narrow-band wave propagates over the array. The cross-correlation analysis is affected by noise signals, so it may produce an abnormal time lag. To eliminate this, we can obtain a regression sine curve and exclude reference points where t n deviates significantly. The deviation mainly occurs when the array overlaps with land area or surf zones.
Next, we estimate c and θ from the N equations using the non-linear least-squares method. Given initial estimates c 0 and θ 0 , the true values can be expressed as c = c 0 + dc and θ = θ 0 + dθ, respectively. By letting F n (c, θ) be the right-hand side of Equation (3), we can rewrite the said equation as: where v n is the residual, and the subscript 0 indicates the initial estimate. The N equations can be cast into a single matrix equation as Since the propagation time variation along the circumference could be produced by noise signals, we introduce a weighting matrix P into Equation (5). The weighting matrix is based on the cross-correlation coefficient R n of two signals at A and B n at the time lag t n and can be expressed as: where R represents the cross-correlation coefficient averaged over the circumference. Hence, we give a higher weight to the propagation time estimated with higher cross-correlation. By introducing P into Equation (5), the normal equation that minimises t VPV can be expressed as: The values of c and θ are obtained by solving Equation (7) for dX and updating them successively until both |dc| and |dθ| produces less than 0.01 m/s and 0.01 • , respectively. Figure 7 illustrates an example of the estimation of a wave-celerity vector field for Case 3. The estimation was performed with r = 20 m and N = 8, and the initial estimates were given as c 0 = 6 m/s and θ 0 = 90 • , which is perpendicular to the shoreline (Figure 7a). After the iterative solutions of Equation (7), the wave-celerity vector field converges to exhibit wave refraction over the river-mouth bar (Figure 7b). The estimation does not work due to wave breaking and superposition of refracted waves in the vicinity of the river mouth. Moreover, the dispersion relation is significantly affected by the river and tidal flows in the area. Therefore, the depth inversion from wave information is not possible in this area. Nevertheless, we can obtain a reasonable distribution of wave-celerity vectors of swells in a broad region apart from the river mouth.

Estimation of Wave Frequency
The representative wave frequency f r is also determined from coherent signals along the circumference. Since the propagation time results from the cross-correlation analysis, the wave period can be evaluated consistently as a weighted average frequency by the cross-spectrum of intensity signals at A and B n . As such, f r can be expressed as: where C AB,n (f ) is the cross-spectrum of intensity signals at A and B n , and f 1 and f 2 are the lower and upper frequencies of the bandpass filter, respectively (f 1 = 0.05 Hz and f 2 = 0.2 Hz at present). This wave frequency represents one of the coherent intensity signals over the circumference. Figure 8a shows the distribution of the representative period T r (≡1/f r ) for Case 3, which corresponds to the wave-celerity vector field in Figure 7. The wave period slightly varies over the domain due to the spatial variation of signal coherence. The value tends to be relatively lower when the swell signal is weak. Figure 8b shows the distribution of the mean cross-correlation coefficient, which measures local signal coherence in the frequency band of 0.05-0.2 Hz. The value tends to be higher around the river-mouth bar, where the swell develops a steeper front due to wave shoaling and refraction in shallow water. We estimate the water depth from local values of c and T r based on Equation (1). The estimated water depth is then converted to the seabed level using the mean sea level as a datum after the tide correction.

Estimated Bathymetry
We estimated the coastal bathymetry for each case using the method described in Section 3. The record length of the video differs in the four cases listed in Table 2. A previous study suggested that the estimation accuracy would not change with signal length when the signal is sufficiently long (>5 min) [19]. Therefore, we analyzed the intensity signal of the whole video length for each case in Table 2; we will discuss how the video length affects the estimation accuracy in 5.2. The depth inversion was performed on a regular grid with 2-m grid spacing, and the wave parameters were computed at each grid point with eight reference points (N = 8) along a circumference with r = 20 m; we will later discuss the estimation sensitivity to the two parameters. Then, we estimated water depth from the dispersion relationship, except at grid points where the signal time lag did not vary sinusoidally along the circumference. The exception occurs in the land area, surf zones, and areas where the wave signal is weaker than noise signals, or where two wave components of similar magnitudes superpose in the signal. Finally, we created a contour map from the gridded water depth data for each case after applying a Gaussian filter of size 5 × 5. Figure 9 shows the contour maps of estimated water depth for the four cases. We successfully obtained the continuous distribution of water depth in the entire domain, except inside the surf zone. The result in Case 1 shows alongshore uniform bathymetry with an increasing seabed slope to the shore. In contrast, the depth map of Case 2 reveals a milder beach slope due to the effect of detached breakwaters. The depth inversion did not work in the areas behind the breakwaters, where diffracted waves in opposite directions superposed to form combined wave fields. Cases 3 and 4 detected the deltaic bar in front of the river mouth in different forms. However, wave breaking in front of the river mouth limited the depth inversion in the vicinity of the river mouth. Comparing the two results suggests that high waves during the typhoon passage caused erosion in nearshore areas along the sand spit, whereas a significant amount of sediment accumulated in the deeper area. Since Case 4 was observed more than 10 days after the event, a crescentic bar was forming by wave-induced sediment transport in front of the river mouth. These results confirm that the present method works in various coastal situations and can grasp the morphological evolution of a river-mouth bar.

Validation
We compared the estimated water depth with transect survey data to validate the proposed method. The survey data was obtained by a remote control (RC) boat equipped with sonar and RTK GPS. Although there are time gaps between the UAV observations and the RC-boat surveys (Table 3), we treat the survey data as true values in the following discussion. The bathymetric change during the time gaps can be assumed to be small in general. However, it should be noted that the river-mouth bar was in a rapid erosional phase just before the typhoon passage in Case 3. Therefore, the transects around the river mouth may not represent actual values due to the several-days gap. To compare the capability of the present method with an existing method, we also estimated water depth along the transects using the spectral method, cBathy [8]; the estimation parameters of cBathy are given in Appendix A. Since the computational cost of cBathy was relatively high, we set the analysis point spacing as 6 m in the cross-shore direction and 10 m in the alongshore direction for cBathy and the present method. The smoothing length scales of cBathy were chosen at 18 m in the cross-shore direction and 30 m in the alongshore direction. Therefore, cBathy uses signals within a rectangular area of 36 × 60 m around each analysis point. In contrast, the present method uses signals along the circumference of r = 20 m, and the resulting water depth was subsequently smoothed by the 5 × 5 Gaussian filter (30 × 50 m). It should be noted that we cannot compare the two methods in a strict sense, as they involve different filtering processes. Figure 10 compares the seabed profiles from the present method, cBathy and the survey data. In Case 1, the two inversion methods produced similar profiles in good agreement with the survey data. In Case 2 with breakwaters along the shore, both methods underestimated the water depth in the deep area (<−5 m). The underestimation might be due to weak swell signals, and the cross-correlation analysis was significantly affected by non-swell signals. Moreover, there is also the possibility that a projection error affected the results, since the GCPs were available only along the shore for this case. In Cases 1 and 2, cBathy tended to produce larger fluctuations. In Cases 3 and 4, the two methods generated similar results, except in far-field from the camera (No. 56) where cBathy generated significant fluctuations. The discrepancy from the survey data in front of the river mouth (No. 53) in Case 3 might be partially due to the survey time gap, as mentioned above. Table 3 summarizes the overall estimation errors and relevant domain-averaged variables. The present method has a higher level of accuracy than cBathy for the tested cases. The domain-averaged wave periods range from 7.4 to 8.6 s, and thus, the error was supposed to be primarily due to an estimation error of wave celerity according to Equation (2). The domain-averaged cross-correlation coefficient ranges between 0.53 and 0.68. The present method tends to achieve higher accuracy when swells produce coherent signals. These results suggest that the present method works well under the condition that narrow-band swells are dominant, which is consistent with previous comparative studies of the temporal and spectral methods [19,23].
the results, since the GCPs were available only along the shore for this case. In Cases 1 and 2, cBathy tended to produce larger fluctuations. In Cases 3 and 4, the two methods generated similar results, except in far-field from the camera (No. 56) where cBathy generated significant fluctuations. The discrepancy from the survey data in front of the river mouth (No. 53) in Case 3 might be partially due to the survey time gap, as mentioned above.   Table 3 summarizes the overall estimation errors and relevant domain-averaged variables. The present method has a higher level of accuracy than cBathy for the tested cases. The domain-averaged wave periods range from 7.4 to 8.6 s, and thus, the error was supposed to be primarily due to an estimation error of wave celerity according to Equation (2). The domain-averaged cross-correlation coefficient ranges between 0.53 and 0.68. The present method tends to achieve higher accuracy when swells produce coherent signals. These results suggest that the present method works well under the condition that narrow-band swells are dominant, which is consistent with previous comparative studies of the temporal and spectral methods [19,23].

Optimal Estimation Parameters
There are two estimation parameters in the present method that potentially affect the estimation accuracy: the number of reference points and the radius of the circumference. The two parameters change the measurement array for the wave celerity and influence the estimated water depth in different ways. Here, we discuss how the results vary with the choice of these parameters.
The estimated wave celerity tends to converge with an increase in the number of reference points. Hence, the number should be determined by balancing the computational cost and accuracy. We computed the wave celerity by varying the number of reference points at 50 random grid points in each case to investigate the general convergence behavior. We evaluated the wave-celerity discrepancy ∆c from the one obtained with a large number of reference points (an averaged value of N = 80-100) at each location. We then defined the standard discrepancy as the RMS value of ∆c at the 50 points and plotted its variation with varying N in Figure 11. The standard discrepancy originates from non-swell signals and is relatively large in Case 2 with low signal coherence. The discrepancy of 0.1 m/s minimally affects the estimated water depth in the present range of water depth according to Equation (2). Considering the balance of the computational cost and accuracy, we chose N = 8 in the present study, beyond which the wave-celerity discrepancy ∆c will not decrease rapidly.
The radius significantly affects the estimated water depth in a diverse way. To explore the influences of the radius, we implemented the depth inversion with various r values along the four transects of each case in Figure 12. When r is small, the seabed profile exhibits bias and fluctuations. This is primarily because a small error in propagation time significantly affects the wave celerity estimation when r is not sufficiently large relative to c∆t. Additionally, the small radius tends to overestimate the water depth because of underestimating propagation times over the circumference. The cross-correlation analysis of the intensity signals over a small distance is affected by locally synchronous noise signals, especially when swell signals are relatively weak. The bias and fluctuations become small when the radius increases to 20 m. The error also appears for a large radius of 40 m, which exceeds the wavelength in relatively shallow areas. A large radius leads to a higher chance of signal mismatching due to low cross-correlation between intensity signals. For successful estimation of wave celerity and water depth, the radius must be greater than c∆t and smaller than the wavelength in the present method. A radius of 20 m worked well over the four cases under the different wave and bathymetric conditions. The estimated wave celerity tends to converge with an increase in the number of reference points. Hence, the number should be determined by balancing the computational cost and accuracy. We computed the wave celerity by varying the number of reference points at 50 random grid points in each case to investigate the general convergence behavior. We evaluated the wave-celerity discrepancy Δc from the one obtained with a large number of reference points (an averaged value of N = 80-100) at each location. We then defined the standard discrepancy as the RMS value of Δc at the 50 points and plotted its variation with varying N in Figure 11. The standard discrepancy originates from nonswell signals and is relatively large in Case 2 with low signal coherence. The discrepancy of 0.1 m/s minimally affects the estimated water depth in the present range of water depth according to Equation (2). Considering the balance of the computational cost and accuracy, we chose N = 8 in the present study, beyond which the wave-celerity discrepancy Δc will not decrease rapidly. The radius significantly affects the estimated water depth in a diverse way. To explore the influences of the radius, we implemented the depth inversion with various r values along the four transects of each case in Figure 12. When r is small, the seabed profile exhibits bias and fluctuations. This is primarily because a small error in propagation time significantly affects the wave celerity estimation when r is not sufficiently large relative to cΔt. Additionally, the small radius tends to overestimate the water depth because of underestimating propagation times over the circumference. The cross-correlation analysis of the intensity signals over a small distance is affected by locally synchronous noise signals, especially when swell signals are relatively weak. The bias and fluctuations become small when the radius increases to 20 m. The error also appears for a large radius of 40 m, which exceeds the wavelength in relatively shallow areas. A large radius leads to a higher chance of signal mismatching due to low cross-correlation between intensity signals. For successful estimation of wave celerity and water depth, the radius must be greater than cΔt and smaller than the wavelength in the present method. A radius of 20 m worked well over the four cases under the different wave and bathymetric conditions.

Required Video Length
The previous results were obtained by analysing the aerial videos of different lengths over the four cases (see Table 2). To discuss how the estimation accuracy varies with the video length, we implemented the depth inversion along the survey transects with different video sub-intervals. Figure 13a,b compare the seabed profiles obtained with different time intervals from the start of the record for Case 1 and 4, respectively. The significant error arises from noise signals when the video length is short (< 3 min), while the estimation will not significantly improve when increasing the video length beyond 5 min. This tendency is consistent with the previous study based on the temporal method [19]. Figure  13c,d compare seabed profiles for Case 1 estimated with mutually exclusive 1-min and 3min video sub-intervals, respectively. Since the errors are not systematic, they can be re-

Required Video Length
The previous results were obtained by analysing the aerial videos of different lengths over the four cases (see Table 2). To discuss how the estimation accuracy varies with the video length, we implemented the depth inversion along the survey transects with different video sub-intervals. Figure 13a,b compare the seabed profiles obtained with different time intervals from the start of the record for Case 1 and 4, respectively. The significant error arises from noise signals when the video length is short (< 3 min), while the estimation will not significantly improve when increasing the video length beyond 5 min. This tendency is consistent with the previous study based on the temporal method [19]. Figure 13c,d compare seabed profiles for Case 1 estimated with mutually exclusive 1-min and 3-min video sub-intervals, respectively. Since the errors are not systematic, they can be reduced when averaging over different video sub-intervals. The figures show that the averaged seabed profile agrees better with the survey data. Therefore, we can improve the estimation accuracy by multiple observations when the video length is shorter than 5 min.  Figure 14 shows the relationship between the cross-correlation coefficient and the RMSE for all the transects. There is a general tendency that the error decreases with an increasing cross-correlation coefficient. The correlation is high when intensity signals on the circumference are coherent. Among the four cases, the errors are relatively small in Cases 1 and 4. The two cases were observed during calm weather, but significant swells were propagating to the coast. Case 3 was observed on a windy day just before the typhoon passage, and locally generated wind waves were significant. In Case 2, which is the worst case in terms of accuracy, the cross-correlation is low, especially in the deeper area because the video did a poor job of capturing swell patterns. A possible reason is that wave steepness was low due to the relatively long incident wave. The present method takes advantage of the wave conditions in which swell-originated signals are dominant in the target domain. Therefore, the UAV observation should be performed on a calm day when narrow-band swells propagate from a far-field source. However, the cross-correlation shows significant variation over transects even in the same case affected by different factors. Further studies will be needed to identify the influential factors and optimise the observation setup to extract swell signals systematically. The proposed method showed its applicability to monitoring river-mouth bar dynamics. It can estimate the outer-bar bathymetry with a depth error of < 0.5 m under preferable conditions. We also confirmed that depth inversion is possible up to a water depth of~8 m under a swell period of 7-8 s, beyond which the error will increase rapidly due to high estimation sensitivity to errors in the wave celerity and period. The depth estimation in further deep areas requires high swells with longer wave periods. The inner-bar bathymetry cannot be estimated well due to wave shoaling, breaking and the superposition of refracted waves. The present method does not apply any special treatment to the shallow-water wave transformations. Since wave behaviors and intensity signals are different from those in deeper water [24,25], there is a possibility of further improvement of the algorithm by accounting for signal characteristics in shallow water.

Applicability to Bathymetry Monitoring
The errors along different transects do not show a tendency to increase with the distance from the camera. Therefore, the depth inversion works well to cover the longshore distance of 1 km with the camera settings. The coverage could be further extended with a higher altitude of the UAV, although it requires special permission in Japan. The method enables us to grasp the bar geometry with a several-minute UAV observation, which is especially advantageous in tracking drastic changes due to an extreme event in the short term. The flexible and low-cost method based on UAV provides a new approach for monitoring river-mouth morphodynamics in a wave-dominated environment.

Conclusions
We developed a simple method to estimate two-dimensional coastal bathymetry from UAV-derived video imagery. The estimation procedure can be divided into two phases. In the first phase, the original video was converted into sequential orthogonal images using GCPs. The effect of wind-induced UAV motions was removed by automatically detecting the GCPs. In the second phase, a wave-celerity vector field was obtained by the cross-correlation analysis of pixel intensity signals, whereas the corresponding wave period was determined via cross-spectral analysis. Finally, the estimated wave parameters were inverted to water depth based on the linear dispersion relation.
The method was tested for the four cases with varying wave and bathymetry conditions. The coastal videos were taken from a UAV hovering 100-150 m above mean sea level, which typically covered a horizontal area of 600 × 1000 m. The method robustly worked under different bathymetry conditions and yielded two-dimensional bathymetry with a RMSE less than 0.5 m. The applicable range was limited mainly by wave breaking on the shore and up to a water depth of~8 m offshore, depending on the dominant wave period. We compared the estimation results with those obtained by cBathy and confirmed a better performance for the tested cases under dominant swells. Furthermore, with a compact array and a small number of reference points, the present method is expected to reveal finer topography at a low computational cost.
We can apply the method to monitor the evolution of highly two-dimensional bathymetry, such as a deltaic bar in front of a river mouth. Although it does not work in the vicinity of a river mouth, the method reveals the outer bar topography with effortless UAV-based observation. The accuracy level is insufficient for it to be a promising alternative to conventional bathymetric surveys. However, the low operation cost is advantageous in monitoring drastic topographic changes induced by an extreme event. A drawback of the method may be that its accuracy and applicable range vary with the incident wave condition. The method was made simple and efficient on the assumption that intensity signals are generated dominantly by a single wave component. The UAV's observational flexibility can mitigate this shortcoming, as we can adjust the camera settings to strengthen the dominant swell signals. Further tests will be needed to clarify how the method's performance changes with the observation settings and environmental conditions.