Depth from Satellite Images: Depth Retrieval Using a Stereo and Radiative Transfer-Based Hybrid Method

: Satellite imagery is increasingly being used to provide estimates of bathymetry in near-coastal (shallow) areas of the planet, as a more cost-effective alternative to traditional methods. In this paper, the relative accuracy of radiative-transfer and photogrammetric stereo methods applied to World View 2 imagery are examined, using LiDAR bathymetry and towed video as ground truth, and it is demonstrated, with a case study, that these methods are complementary; where one method might have limited accuracy, the other method often has improved accuracy. The depths of uniform, highly-reﬂective (sand) sea bed are better estimated with a radiative transfer-based method, while areas where there is high visual contrast in the scene, as identiﬁed by using a local standard deviation measure, are better estimated using the photogrammetric technique. In this paper, it is shown that a hybrid method can give a potential improvement in accuracy of more than 50% (from 2.84 m to 1.38 m RMSE in the ideal case) compared to either of the two methods alone. Metrics are developed that can be used to characterize regions of the scene where each technique is superior, realizing an improved overall depth accuracy over either method alone of between 16.9% and 19.7% (demonstrating a realised RMSE of 2.36 m).


Introduction
In the study of the world's oceans, bathymetry is a key variable that offers researchers immediate information regarding habitats [1] and ocean currents [2]. Used in combination with ocean color modelling in near-coastal and estuarine regions, it provides important measures of the effect of land-based human activities on the health of these environments [3]. Satellite observations of oceans are now recognized [4] as a practical source of information for water quality, bathymetry, and benthic habitat mapping when air-or ship-borne surveys are either too costly or impractical to implement.
While radiative-transfer derived bathymetry from single images has been practiced and researched for many years [5,6], the production by the German company EOMAP of a Landsat-derived bathymetric map of the Great Barrier Reef has attracted media attention [7] to the availability of these techniques, and efforts are being made to quantify their accuracy for hydrographic purposes [8]. The new wave of satellites, including Worldview 2 [9], Landsat 8 [10], and the European Sentinel satellites [11], provide increased opportunities for low-cost, broad-scale bathymetry with increasing accuracy.

Satellite Images
For either of the methods, it is desirable to have clear, glint-free satellite images with a flat sea state. For the radiative transfer approach, a nearly perpendicular look angle provides superior results [22]. For the stereo approach, the second image should be clear and as glint free as possible, preferably acquired simultaneously or close to simultaneously with the first [23]. The nearly perpendicular point angle helps to simplify the geometry of the refraction correction and the epipolar search region [24,25].
Both methods are tested with Worldview 2 (WV2, [9]) data, which is supplied with a resolution of 2 m in 8 bands, with red, green, blue, and the "coastal band" (400 nm) the most relevant bands for the current study. Figure 2 shows WV2 scenes. Compared to Landsat TM5, more of the underwater landscape is visible in the WV2 imagery due to WV2's superior dynamic range and spatial resolution. While Landsat 8 has far better dynamic range compared to earlier Landsats, it was not considered here as there were no images contemporary to the WV2 stereo pair.
For the attenuation-derived bathymetry, WV2 data were acquired in June 2012. This is a mid-winter image, and thus potentially suffers from low solar angles and a rough sea-state, but was found to be free of these effects and thus suitable for deriving physics-based depths.
For the photogrammetric approach, the stereo pair of WV2 images were acquired in December 2009, approximately two minutes apart, from the satellite's downwards and backwards pointing cameras. While the 2009 images could have also been used for the Semi-Analytical Model for

Satellite Images
For either of the methods, it is desirable to have clear, glint-free satellite images with a flat sea state. For the radiative transfer approach, a nearly perpendicular look angle provides superior results [22]. For the stereo approach, the second image should be clear and as glint free as possible, preferably acquired simultaneously or close to simultaneously with the first [23]. The nearly perpendicular point angle helps to simplify the geometry of the refraction correction and the epipolar search region [24,25].
Both methods are tested with Worldview 2 (WV2, [9]) data, which is supplied with a resolution of 2 m in 8 bands, with red, green, blue, and the "coastal band" (400 nm) the most relevant bands for the current study. Figure 2 shows WV2 scenes. Compared to Landsat TM5, more of the underwater landscape is visible in the WV2 imagery due to WV2's superior dynamic range and spatial resolution. While Landsat 8 has far better dynamic range compared to earlier Landsats, it was not considered here as there were no images contemporary to the WV2 stereo pair.
For the attenuation-derived bathymetry, WV2 data were acquired in June 2012. This is a mid-winter image, and thus potentially suffers from low solar angles and a rough sea-state, but was found to be free of these effects and thus suitable for deriving physics-based depths.
For the photogrammetric approach, the stereo pair of WV2 images were acquired in December 2009, approximately two minutes apart, from the satellite's downwards and backwards pointing cameras. While the 2009 images could have also been used for the Semi-Analytical Model for Unmixing and Concentration Assessment (SAMBUCA) method, the 2012 image was deemed superior for the purpose (and thus for this study) according to criteria derived in [22].
Both sets of imagery were de-glinted [26] and the 2012 imagery for the radiative transfer model was atmospherically corrected, using a correction that also accounted for the effect of the air-water interface [3]. Unmixing and Concentration Assessment (SAMBUCA) method, the 2012 image was deemed superior for the purpose (and thus for this study) according to criteria derived in [22]. Both sets of imagery were de-glinted [26] and the 2012 imagery for the radiative transfer model was atmospherically corrected, using a correction that also accounted for the effect of the air-water interface [3]. The photogrammetric method was implemented on the WV2 Band 2 (blue) rather than the Pan or coastal bands. The coastal band was not used, as it was quite noisy and did not show the features as clearly as the blue band. It is also documented that it only penetrates deepest in pure water [27]. The PAN band was avoided in order to make the results have a 2 m resolution, and thus be more directly comparable to the attenuation-derived bathymetry. It was also observed to show less of the underwater features, as its spectral response tapers off in the green area of the spectrum. Figure 2 shows the deglinted blue band stereo pair and the deglinted multispectral image used in the paper.

LiDAR Bathymetry
LiDAR depth data [28] were collected by Fugro LADS (Kidman Park, SA) with their Mark II instrument, in a survey conducted for WA Transport in April 2009 off the coast of Western Australia from Two Rocks to Cape Naturaliste, see Table 1. The LiDAR waveform data were processed by Fugro Pty Ltd. to create depth maps, by measuring the time difference between LiDAR pulses returned from the surface of the water and the seafloor. The LiDAR data are broadly accepted within the hydrography community as being ground truth, since they are of known, high accuracy [29]. The LiDAR are collected at approximately 4.5 m point spacing, and this is super-sampled here to 2 m for comparison with the satellite-derived depths. The LiDAR ground truth covered the full extent of the area being considered here.
LiDAR reflectivity, calculated using CSIRO-developed techniques, is used as an additional input to segment the image into different cover types (Section 2.4), so that different classes can be assessed for accuracy. The bathymetry maps were co-registered to the WV2 using manually selected control points and a cubic warp. The LiDAR bathymetry is shown in Section 3. The photogrammetric method was implemented on the WV2 Band 2 (blue) rather than the Pan or coastal bands. The coastal band was not used, as it was quite noisy and did not show the features as clearly as the blue band. It is also documented that it only penetrates deepest in pure water [27]. The PAN band was avoided in order to make the results have a 2 m resolution, and thus be more directly comparable to the attenuation-derived bathymetry. It was also observed to show less of the underwater features, as its spectral response tapers off in the green area of the spectrum. Figure 2 shows the deglinted blue band stereo pair and the deglinted multispectral image used in the paper.

LiDAR Bathymetry
LiDAR depth data [28] were collected by Fugro LADS (Kidman Park, SA) with their Mark II instrument, in a survey conducted for WA Transport in April 2009 off the coast of Western Australia from Two Rocks to Cape Naturaliste, see Table 1. The LiDAR waveform data were processed by Fugro Pty Ltd. to create depth maps, by measuring the time difference between LiDAR pulses returned from the surface of the water and the seafloor. The LiDAR data are broadly accepted within the hydrography community as being ground truth, since they are of known, high accuracy [29]. The LiDAR are collected at approximately 4.5 m point spacing, and this is super-sampled here to 2 m for comparison with the satellite-derived depths. The LiDAR ground truth covered the full extent of the area being considered here.
LiDAR reflectivity, calculated using CSIRO-developed techniques, is used as an additional input to segment the image into different cover types (Section 2.4), so that different classes can be assessed for accuracy. The bathymetry maps were co-registered to the WV2 using manually selected control points and a cubic warp. The LiDAR bathymetry is shown in Section 3.

Towed Video
Towed video is from a CSIRO survey in March 2014 [1]. The video is time and location stamped in frame, as shown in Figure 3. The videos allow the sand and non-sand masks created in Section 2.4 to be validated, ensuring their accuracy.

Towed Video
Towed video is from a CSIRO survey in March 2014 [1]. The video is time and location stamped in frame, as shown in Figure 3. The videos allow the sand and non-sand masks created in Section 2.4 to be validated, ensuring their accuracy.

Creating a Sand/Non-Sand Mask
In order to examine the accuracy of radiative-transfer-based and stereo bathymetry over different cover-types, we created a map of sand and non-sand pixels. Using the covariates of depth, WV2 bands 1-4, and LiDAR reflectivity [1], we employed the following procedure to classify sand and non-sand pixels: A selection of 34 homogeneous training sites are manually selected at various depths and spatial locations throughout the scene. Assuming that the training sites all belong to different seafloor cover types, a canonical variate analysis [30] is performed on these groups, and their scores are plotted. The training sites are assigned to 10 super-classes according to where they cluster on the CV plots. These are "natural" clusters in the training sites and are not given labels. The pixels in the image are each assigned to the classes above according to their maximum likelihood of belonging to that class. Using ground truth video and visual inspection of the imagery, each of the classes is then assigned to sand-only, non-sand only, or mixed classes (containing both sand and non-sand).
The method closely follows procedures that are typically applied to terrestrial land-cover classification problems, such as [31]. The intention is not to create a perfectly classified map, but rather to measure the accuracy of the bathymetry in pixels which are fairly certain to be sand compared to those that are fairly certain to not be sand. For this reason, mixed classes were acceptable, provided good populations that had a high probability of being sand or non-sand were

Creating a Sand/Non-Sand Mask
In order to examine the accuracy of radiative-transfer-based and stereo bathymetry over different cover-types, we created a map of sand and non-sand pixels. Using the covariates of depth, WV2 bands 1-4, and LiDAR reflectivity [1], we employed the following procedure to classify sand and non-sand pixels: A selection of 34 homogeneous training sites are manually selected at various depths and spatial locations throughout the scene. Assuming that the training sites all belong to different seafloor cover types, a canonical variate analysis [30] is performed on these groups, and their scores are plotted. The training sites are assigned to 10 super-classes according to where they cluster on the CV plots. These are "natural" clusters in the training sites and are not given labels. The pixels in the image are each assigned to the classes above according to their maximum likelihood of belonging to that class. Using ground truth video and visual inspection of the imagery, each of the classes is then assigned to sand-only, non-sand only, or mixed classes (containing both sand and non-sand).
The method closely follows procedures that are typically applied to terrestrial land-cover classification problems, such as [31]. The intention is not to create a perfectly classified map, but rather to measure the accuracy of the bathymetry in pixels which are fairly certain to be sand compared to those that are fairly certain to not be sand. For this reason, mixed classes were acceptable, provided good populations that had a high probability of being sand or non-sand were identified.

SAMBUCA Method
Deriving bathymetry from ocean color is a relatively established technique [5] that exploits the exponential attenuation of light through the water column in order to estimate the depth of optically shallow water. Noting that optically shallow observations are composed of mixtures of water column and seafloor colours, these techniques approximate the water column contribution from observations of deep water areas [32,33]. Since then, many similar empirical techniques (where algorithms are derived from the regression of imagery on ground-truth depth measurements), have been, and continue to be, employed [18,34]. In recent years there has been a trend in the literature towards more physics-based or semi empirical models [35], to better deal with the water column optical complexity.
The Semi-Analytical Model for Unmixing and Concentration Assessment (SAMBUCA; [36,37]) is a semi-empirical model that is designed to allow the simultaneous retrieval of depth and inherent optical properties from hyperspectral images of optically shallow water bodies. It extends the work of Lee et al. [35], by including a mixing parameter for bottom substrates (also proposed by Hedley and Mumby, [38]) and allows mixtures of water constituents to be estimated for each pixel. SAMBUCA uses a forward-modelling approach and candidate spectral libraries to predict bottom type and depth simultaneously with the inherent optical properties (IOPs) of the water column.
For a full explanation of the details of its implementation, the reader is referred to the references above. Briefly, SAMBUCA employs an inversion/optimization method by Lee et al. [35,39]. where the analytical expression for r rs was previously derived for an optically shallow water body by Maritorena et al. [40]: with: r rs the subsurface remote-sensing reflectance (just below the waterline); r dp rs the subsurface remote-sensing reflectance of the infinitely deep water column; κ d the vertical attenuation coefficient for diffuse down-welling light; κ b the vertical attenuation coefficient for diffuse up-welling light originating from the bottom; κ C the vertical attenuation coefficient for diffuse upwelling light originating from each layer in the water column; ρ j for j = 1, 2 the bottom reflectance for two different substrates; q is the proportion of substrate 1 (so 1 − q is the proportion of substrate 2); and H is the length of the water column through which the light is passing.
In the SAMBUCA algorithm, Equation (1) is modified using semi-analytical relationships derived by Lee et al. [35,39], relating the absorption coefficients and deep water reflectance to five independent variables associated with various IOPs. IOP variables related to chlorophyll and colored dissolved organic matter were fixed in this analysis to simplify the search space, reducing the number of unknown variables from 10 to 5.
Candidate values for the coefficients are used as initial estimates in the semi-analytic model, and this is subsequently compared with remotely sensed measurements and the parameters are adjusted, via least squares non-linear optimization, until they best match the observations. The method is originally designed for hyperspectral data, so in general for multispectral data, as employed here, a variety of assumptions must be made about the IOPs and substrate reflectance in order to achieve a robust result [27]. The errors attributable to the reduced spectral bandwidth of the WV2 instrument are explored in some detail with simulated sand spectra in [41]. Of particular relevance is that nearly no light penetrates for wavelengths larger than near infrared, so these bands are of limited use, which reduces the number of variables that can be simultaneously solved for when using a pixel-wise approach.
The application of hyperspectral inversion methods for multispectral satellite imagery has several limitations [42] such as the increased width of spectral bands in multispectral satellite sensors (to ensure sufficient signal to noise). Lee [42] tested the impact of applying the model on multispectral data on IOP retrievals and found that, for optically deep waters, overestimation of the inverted absorption coefficient can occur when narrow-band hyperspectral models (like SAMBUCA) are applied to broad-band multispectral data. The implications for optically deep, clear waters can be an overestimation of the absorption coefficient at 440 nm (a440nm) of at least 20%. For turbid or complex waters where the a440nm is greater than approximately 0.3 to 1 m, then the band width uncertainties are relatively small (<5%) and therefore Lee believes their use can be justified in these cases [42].
For the results presented here, the SAMBUCA algorithm was employed with limited ground truth and additional information provided for processing. This simulates the real-life situation of an isolated location being mapped. It also accentuates the regions where its performance is lacking, so helping to emphasize the point that the two technologies are complementary.

Stereo Method
Here we present an overview of the method used for deriving depths from the stereo pair of WV2 images.

Rational Polynomial Coefficients
The use of rational polynomial models in lieu of full orbital camera models is now widespread, and we do not propose to contribute to methods already well described in the literature (see e.g., [43]). Work by Di et al. [44] describes in detail how to use the rational polynomial model supplied by Digital Globe with their imagery to calculate 3d ground positions from matched pixels, by linearizing using Taylor's theorem. We provide a brief overview below.
The WV2 metadata provides the coefficients of a rational polynomial, R, that maps 3D coordinates on the ground to the row and column of the image provided, i.e., where: x is the latitude, y is the longitude, h is the position of the scene point relative to the WGS84 geoid, r is the row number in the image, and c is the column number in the image.
For a given row and column of the image and height, (r, c, h), it is possible to solve for the longitude and latitude, (x, y), of that point on the ground. If we do this for a fixed (r, c) and two arbitrary heights, h 0 < h 1 , we find two points (x 0 , y 0 , h 0 ) and (x 1 , y 1 , h 1 ) such that the ray between them must pass through the camera centre on the satellite at the time of acquisition. Thus the rational polynomials in the metadata allow every pixel in each image to have a calculable unit vector n(r, c) that points to the camera centre. In practice, due to the fact that the satellite is 770 km above the surface of the earth, we have n(r, c) ≈ n for every (r, c) of a given image. For our stereo pair of images, n 1 = (−0.18, −0.60, 0.78) and n 2 = (−0.04, −0.12, 0.99), which allows us to calculate that the camera centre rays are separated by approximately 44.7 degrees and that the plane containing these two normals is inclined approximately 3.6 degrees from vertical.

Accounting for Refraction
It is not within the scope of this paper to provide a detailed analysis of the effect of the air-water refraction interface on the results of the stereo problem; we refer the reader to [24,25], among others, who have observed that in general there is no closed-form solution to the stereo photogrammetry problem in the presence of an air-water interface. However, when the sensor is very high compared to the depth, and in particular when, as here, the incidence angles from the scene are very similar, an approximation is effective [13]. Our case is similar to Figure 3 from [24].
In Figure 4, we present a simplified version of the refraction diagrams available in the literature above and elsewhere. Note that incidence and refracted angles are related according to Snell's Law, which states that sin Θ i / sin φ i = n, where we have assumed a value of n = 1.33 for seawater. For our particular case, the epipolar plane, Λ, is nearly vertical, which means we can ignore the effect of refraction bending this plane along the water line. For the present case, the error caused by this assumption is less than 1%. We therefore assume a single epipolar direction between the two images of the stereo pair, and correct the apparent height,ĥ, to the correct height, h, according to the formula: Sinceĥ is the height below the geoid, and not the actual water depth, this result contains an offset proportional to the difference between these two values (assumed constant over a scene). This is removed by subtracting an offset as explained at the end of the section below.
We also note that the refraction induces a horizontal displacement in observed position of the scene point: In the example here, this change in disparity is less than half a Worldview pixel over the range of heights to be considered so it doesn't have a significant effect on the positions of the calculated heights.

Determining Corresponding Pixels
The literature on formulating and solving the stereo problem is extensive [45] and current [46]. It is beyond the scope of this paper to review and analyze the relative merits of these many algorithms; we take a pragmatic approach to the correspondence problem as described below.
To determine which pixels correspond in the two images, we look along epipolar lines, with bilinear interpolation used where the observations do not fall onto a whole pixel value. Normalized correlation matching over a rectangular window that is longer over the direction of the epipolar line is used, with disparities chosen that have the maximum normalized correlation score over the window. Fractional disparities (for sub-pixel accuracy) are calculated by fitting a local quadratic in the neighborhood of the maximum.
The size of the matching window is important: A large window is more likely to give a positive match for a particular pixel, but a small window is necessary for a more accurate localization of features. The accuracy for which the maximum value of the correlation function can be solved is related to the features in the matching window. Areas of the seafloor which have relatively uniform cover have fewer features to match and a larger window is needed compared with regions that are more heterogeneous.
The disparity maps shown in this paper are produced using C code with an adaptively-sized matching window. A short segment of epipolar line is interpolated from the image around each pixel of interest. The matching is then performed locally on this short segment. The correlation function is evaluated over a range of window sizes, and the window size that gives the most pronounced peak is chosen.
For the results presented here, each pixel was matched with a window that was 11 pixels across (transverse to epipolar lines) and between 5 and 20 pixels long. The cost function is evaluated over every window size in this range, and the result that has the most sharply defined maximum is used. The idea of having an adaptive window size according to a stereo quality metric is discussed in [47]. The sharpness of the maximum is given by the curvature of the cost function evaluated at the maximum, as estimated by a locally fitted quadratic, which also provides sub-pixel accuracy on the optimization. The curvature of the correlation matching function provides a measure of confidence in the predicted disparity [47,48]. A higher curvature measure implies a better-defined peak on this curve, and thus a better-defined maximum.
For the purpose of this study, we have made the mean of the estimated stereo depths equal to the mean of the LiDAR depths by adding an appropriate offset. This corresponds to the real life situation where several ground truth depths are known and used to calibrate the stereo bathymetry, which is a realistic scenario envisaged by the authors [18,33].
It would be reasonable for the purpose of comparison to also apply this step to the SAMBUCA depths, but we do not do this, because although that improves the average accuracy, it negatively affects the sandy areas where its performance is best. The reason for this is that, for this particular implementation, the algorithm tends to systematically over-estimate the depth of shallow, dark targets by mischaracterizing them as deep, bright targets, while being quite accurate with the sand pixels (the errors for sand are not systematically biased). Therefore, to normalize the SAMBUCA bathymetry a positive depth would have to be added, which would introduce a systematic error to the sand class, so this was not done.
It would also be reasonable to leave the raw depth estimates unchanged in order to perform the experiments documented in this paper. However, this would be inconsistent with the normalization that was performed on the stereo data. It is well documented ( [40,49]) that areas of low albedo (deep/dark targets) pose a problem for the accuracy of SAMBUCA and other radiative-transfer methods. We have therefore chosen to normalize the output to the sand class only (as described in Section 2.3), so that the mean depth of the sand sites is equal for the LiDAR bathymetry and the SAMBUCA bathymetry. It is reasonable to assume that a selection of sand sites would be known a priori. This somewhat exaggerates the results and conclusions in the paper. However, we have performed the experiments with and without the normalization, and the fundamental conclusions do not change.

Results
In this section, we examine the behavior of the errors of the stereo-derived bathymetry and the SAMBUCA-derived bathymetry. First we show the results of the classification, see Figure 5a.
The two substrata with bottom reflectance ρ 1 and ρ 2 , employed by the model to produce these results, are detritus (seagrass wrack) [50] for the dark reflectance and sand [51] for the bright reflectance. The classified map of Figure 5a can be compared to the proportions, q, from Equation (1) of dark substrate with reflectance ρ 1 , see Figure 5b. The maps have similar characteristics, which is to be expected. The two substrata with bottom reflectance and , employed by the model to produce these results, are detritus (seagrass wrack) [50] for the dark reflectance and sand [51] for the bright reflectance. The classified map of Figure 5a can be compared to the proportions, , from Equation (1) of dark substrate with reflectance , see Figure 5b. The maps have similar characteristics, which is to be expected. For the results of the depth estimation, Figure 6a shows the ground truth LiDAR depths, Figure 6b shows the SAMBUCA depths, while Figure 6c shows the stereo depth estimates. Note that in the deep water to the west of the image, there are relatively few points for the stereo algorithm to match, resulting in patches of low accuracy. For the results of the depth estimation, Figure 6a shows the ground truth LiDAR depths, Figure 6b shows the SAMBUCA depths, while Figure 6c shows the stereo depth estimates. Note that in the deep water to the west of the image, there are relatively few points for the stereo algorithm to match, resulting in patches of low accuracy. To compare the distributions of accurately-estimated depth pixels throughout the image, Figure  7 shows the pixels that have been estimated within 1 m for each of the two methods. The accurate stereo method pixels are coloured green, while the accurate SAMBUCA pixels are coloured red. The areas in these images where both of the methods are accurate appear in yellow (where the red and green pixels are the same). Note that the majority of the accurate pixels are either green or red, and there are only small areas where both methods are accurate. The results showing the total percentage (%) of accurate pixels for each algorithm, and both, are shown in Table 2. This shows that the methods have the potential to provide improved depth estimates in different areas of the image. The whole-of-scene errors are summarized in the final column of Table 3.
To further demonstrate that the two methods are complementary, in the following sections, the errors for each method are compared according to three criteria: Depth; image texture (measured by local standard deviation); and seafloor cover type.  To compare the distributions of accurately-estimated depth pixels throughout the image, Figure 7 shows the pixels that have been estimated within 1 m for each of the two methods. The accurate stereo method pixels are coloured green, while the accurate SAMBUCA pixels are coloured red. The areas in these images where both of the methods are accurate appear in yellow (where the red and green pixels are the same). Note that the majority of the accurate pixels are either green or red, and there are only small areas where both methods are accurate. The results showing the total percentage (%) of accurate pixels for each algorithm, and both, are shown in Table 2. This shows that the methods have the potential to provide improved depth estimates in different areas of the image. The whole-of-scene errors are summarized in the final column of Table 3.
To further demonstrate that the two methods are complementary, in the following sections, the errors for each method are compared according to three criteria: Depth; image texture (measured by local standard deviation); and seafloor cover type.

Effect of Depth
In this section, we examine the accuracy of the two methods with respect to the depth of water that they are estimating. Figure 8 shows density plots of the absolute accuracy of each pixel against the depth of water that they lie in. The density plots are produced using R's scatterplot function, which sums 2D Gaussians centred at points according to the absolute error and depth of each pixel. High concentrations of pixels are indicated by red hues, while the blue hues indicate low densities of points. These can be thought of as 3D histograms, shaded according to point density. As an alternative to these plots, we also provide equivalent plots where the estimated depths are binned according to ground truth depths (per [52], Figures 4-7). The binned versions are the standard method of presenting such data in bathymetric analysis, although in our view the density plots, which are standard throughout the rest of the literature, provide just as much insight into the error structure and also have the advantage of being built in to many data analysis software packages. Figure 8a shows the results for the stereo method. The accuracy of the majority of the pixels is more-or-less flat with depth, although there is a broad trend of outliers that increases with depth, indicated by the expanding yellow cloud as the depth increases.

Effect of Depth
In this section, we examine the accuracy of the two methods with respect to the depth of water that they are estimating. Figure 8 shows density plots of the absolute accuracy of each pixel against the depth of water that they lie in. The density plots are produced using R's scatterplot function, which sums 2D Gaussians centred at points according to the absolute error and depth of each pixel. High concentrations of pixels are indicated by red hues, while the blue hues indicate low densities of points. These can be thought of as 3D histograms, shaded according to point density. As an alternative to these plots, we also provide equivalent plots where the estimated depths are binned according to ground truth depths (per [52], Figures 4-7). The binned versions are the standard method of presenting such data in bathymetric analysis, although in our view the density plots, which are standard throughout the rest of the literature, provide just as much insight into the error structure and also have the advantage of being built in to many data analysis software packages. Figure 8a shows the results for the stereo method. The accuracy of the majority of the pixels is more-or-less flat with depth, although there is a broad trend of outliers that increases with depth, indicated by the expanding yellow cloud as the depth increases. The plot of SAMBUCA absolute accuracy against depth in Figure 8b shows that there is a relationship between the depth and absolute accuracy, which is discussed in Section 4. Figure 9 shows plots of the local standard deviation versus absolute accuracy for the stereo bathymetry method and for the SAMBUCA method. We calculated the standard deviation of the pixels in the adaptive stereo matching window described in Section 2.4. The plot of SAMBUCA absolute accuracy against depth in Figure 8b shows that there is a relationship between the depth and absolute accuracy, which is discussed in Section 4. Figure 9 shows plots of the local standard deviation versus absolute accuracy for the stereo bathymetry method and for the SAMBUCA method. We calculated the standard deviation of the pixels in the adaptive stereo matching window described in Section 2.4. The stereo result indicates that if we place a threshold, ϓ, on the local standard deviation, taking only pixels above it, we would capture many pixels with low absolute error and would have much more confidence in the accuracy of the predicted depths. On the other hand, the SAMBUCA algorithm does not possess this property, and we can see in Figure 9b that a threshold on SD would exclude many accurate pixels.

The Effect of Seabed Substrate
The final comparison performed here is the accuracy of the predicted depths over the sand and non-sand pixels. To characterize the errors in different substrates, we calculate the mean absolute error, ∆ = 1/ ∑ − , with , the actual LiDAR depth, , the estimated depth, and , the total number of pixels in the scene. Table 3 shows a summary of these results for the 3 classes from the previous section. In Figure 10a,b, histograms of the frequencies of absolute errors in depth pixels are presented for sand and non-sand classes. Figure 10c shows that, for this case, there are few deep pixels in the non-sand class. The table of mean absolute errors presented in Table 3 also shows this effect; the mean absolute error for the SAMBUCA-predicted depths are around half those of the non-sand pixels. The result for all pixels falls between the results for the sand and non-sand pixels. The result for the mixed pixels reflects the composition of these mixed pixels which are more sand than non-sand. The stereo result indicates that if we place a threshold, 'Υ, on the local standard deviation, taking only pixels above it, we would capture many pixels with low absolute error and would have much more confidence in the accuracy of the predicted depths. On the other hand, the SAMBUCA algorithm does not possess this property, and we can see in Figure 9b that a threshold on SD would exclude many accurate pixels.

The Effect of Seabed Substrate
The final comparison performed here is the accuracy of the predicted depths over the sand and non-sand pixels. To characterize the errors in different substrates, we calculate the mean absolute error, the actual LiDAR depth,d i , the estimated depth, and N, the total number of pixels in the scene. Table 3 shows a summary of these results for the 3 classes from the previous section. In Figure 10a,b, histograms of the frequencies of absolute errors in depth pixels are presented for sand and non-sand classes. Figure 10c shows that, for this case, there are few deep pixels in the non-sand class. The table of mean absolute errors presented in Table 3 also shows this effect; the mean absolute error for the SAMBUCA-predicted depths are around half those of the non-sand pixels.
The result for all pixels falls between the results for the sand and non-sand pixels. The result for the mixed pixels reflects the composition of these mixed pixels which are more sand than non-sand.

Hybrid of Satellite-Based Bathymetry Approaches
To explore the potential improvements in error rates by employing a hybrid of the two methods, we present results based on a pixel-level decision criterion using the above metrics.
3.4.1. Potential Best Achievable Results Using the Most Accurate Pixels In order to provide a baseline for improvements in error rates using a hybrid technique, we first present the combined depth image where the most accurate pixel is chosen from the two bathymetry

Hybrid of Satellite-Based Bathymetry Approaches
To explore the potential improvements in error rates by employing a hybrid of the two methods, we present results based on a pixel-level decision criterion using the above metrics.

Potential Best Achievable Results Using the Most Accurate Pixels
In order to provide a baseline for improvements in error rates using a hybrid technique, we first present the combined depth image where the most accurate pixel is chosen from the two bathymetry models. This can only be done in the presence of complete ground truth, so does not represent a realistic scenario, but it is useful for the purpose of this benchmark comparison.
Let D be the ground truth depth, D S be the stereo depth estimate, D R be the radiative transfer based estimate, and define the best hybrid depth, D B H , estimate to be: Figure 11a shows the resulting combined image. The mean absolute error for the combined image is ∆ = 1.38 m, which can be compared to the overall individual errors given in Table 3. models. This can only be done in the presence of complete ground truth, so does not represent a realistic scenario, but it is useful for the purpose of this benchmark comparison. Let be the ground truth depth, be the stereo depth estimate, be the radiative transfer based estimate, and define the best hybrid depth, , estimate to be: = if abs( − ) < abs( − ) otherwise Figure 11a shows the resulting combined image. The mean absolute error for the combined image is ∆ = 1.38 m, which can be compared to the overall individual errors given in Table 3.

Decision Based on Local Texture
Next, we explore the idea of combining the depth estimates based on the local standard deviation of the stereo pair in the matching window. Let be the local standard deviation in the matching window from above. For a given ϓ ≥ 0, define the hybrid depth, ϓ , estimate to be: In the presence of ground truth, a plot of the mean absolute error of the depths as a function of ϓ can be made, as shown by the black curve in Figure 12a. For small values of ϓ, the stereo estimates are nearly all used, which means that the superior performance of the SAMBUCA in the low-textured areas is not exploited. As ϓ increases, more and more of the low-textured areas are excluded from the combined estimate, which improves the over-all accuracy until ϓ ≈ 3.1, where the mean absolute error is ∆ = 2.36 m, at which point the SAMBUCA estimate starts to be used in areas that have higher texture, decreasing the overall accuracy of the combined result.
In practice the optimal value of ϓ can be estimated from a few ground truth locations, as described below. We note that there is a broad part of the black curve in Figure 12a where significant

Decision Based on Local Texture
Next, we explore the idea of combining the depth estimates based on the local standard deviation of the stereo pair in the matching window. Let σ be the local standard deviation in the matching window from above. For a given 'Υ ≥ 0, define the hybrid depth, D 'Υ H , estimate to be: In the presence of ground truth, a plot of the mean absolute error of the depths as a function of 'Υ can be made, as shown by the black curve in Figure 12a. For small values of 'Υ, the stereo estimates are nearly all used, which means that the superior performance of the SAMBUCA in the low-textured areas is not exploited. As 'Υ increases, more and more of the low-textured areas are excluded from the combined estimate, which improves the over-all accuracy until 'Υ ≈ 3.1, where the mean absolute error is 'Υ = 2.36 m, at which point the SAMBUCA estimate starts to be used in areas that have higher texture, decreasing the overall accuracy of the combined result.
In practice the optimal value of 'Υ can be estimated from a few ground truth locations, as described below. We note that there is a broad part of the black curve in Figure 12a where significant improvements of the depth estimates are achieved without knowing 'Υ particularly accurately. The combined image for the optimal value of 'Υ is shown in Figure 11b. It is evident in Figure 11b that the threshold on SD has improved the obvious inaccuracies due to the stereo having poor matching capability in the western half of the image.
To simulate only having limited ground truth knowledge, we calculated 'Υ using a random sample of 20 ground truth points. An example plot is shown in red in Figure 12a. In practice these should be targeted at a range of different image textures, which could be identified from the image by eye. However, to recreate the worst-case-scenario, we chose 20 samples from arbitrary areas of the image. We repeated this 1500 times, thus creating the full range of potential 'Υ estimates. The histogram of these 'Υ estimates is shown in Figure 12b. The histogram shows that the mean estimate 'Υ = 3.45. To achieve a 10% improvement in the accuracy of the mean absolute error (from 2.89 to 2.59), we would need to estimate 2.2 < 'Υ < 6.9, and this occurs in 80% of the samples that we took. This shows that even with a random sample of ground truth depths, a significant improvement can be made by estimating 'Υ in this way. Further experiments are required to show if a nominal value of 'Υ ≈ 3.1 would be adequate for use on other images, potentially without requiring any ground truth. the threshold on SD has improved the obvious inaccuracies due to the stereo having poor matching capability in the western half of the image.
To simulate only having limited ground truth knowledge, we calculated ϓ using a random sample of 20 ground truth points. An example plot is shown in red in Figure 12a. In practice these should be targeted at a range of different image textures, which could be identified from the image by eye. However, to recreate the worst-case-scenario, we chose 20 samples from arbitrary areas of the image. We repeated this 1500 times, thus creating the full range of potential ϓ estimates. The histogram of these ϓ estimates is shown in Figure 12b. The histogram shows that the mean estimate ϓ = 3.45. To achieve a 10% improvement in the accuracy of the mean absolute error (from 2.89 to 2.59), we would need to estimate 2.2 < ϓ < 6.9, and this occurs in 80% of the samples that we took. This shows that even with a random sample of ground truth depths, a significant improvement can be made by estimating ϓ in this way. Further experiments are required to show if a nominal value of ϓ ≈ 3.1 would be adequate for use on other images, potentially without requiring any ground truth.

Decision Based on Cover Type
In Figure 10, it is evident that the errors in the stereo depths are not significantly different for sand and non-sand pixels, although Table 3 does indicate some small differences. On the other hand, the SAMBUCA estimates are significantly better in the bright, uniform sandy areas of the image. This indicates that a pixel-level decision based on cover-type is worth considering. For a cover-type based hybrid method, we tested two approaches-that all sand pixels should be estimated by the

Decision Based on Cover Type
In Figure 10, it is evident that the errors in the stereo depths are not significantly different for sand and non-sand pixels, although Table 3 does indicate some small differences. On the other hand, the SAMBUCA estimates are significantly better in the bright, uniform sandy areas of the image. This indicates that a pixel-level decision based on cover-type is worth considering. For a cover-type based hybrid method, we tested two approaches-that all sand pixels should be estimated by the SAMBUCA algorithm, with non-sand and mixed classes estimated with the stereo method, and alternatively that the stereo method should only estimate the non-sand pixels, and the sand and mixed pixels be estimated with SAMBUCA. The best mean absolute error was 'Υ = 2.58 m given by using the stereo estimates for all except the sand class. The combined image is shown in Figure 11c.

Discussion
The results and analysis presented in Section 3 reveal the improved accuracy potential for a hybrid of two previously known satellite-derived bathymetry methods. The improved accuracy is due to the fact that the areas of the image where the two methods are most effective are broadly disjoint sets, as revealed in Figure 7.

On the Overall Accuracy of the Methods
Our aim was to characterize the errors of the two methods, so that if a hybrid approach is applied to a given location there is a protocol for deciding which method should be employed to each part of the scene. There is a body of literature on sensitivity and accuracy of (mainly land-based) stereo photogrammetry [29,48,53] and physics-based bathymetry [54][55][56], and these should also form the basis of decisions regarding which to employ in particular regions. The SAMBUCA algorithm has various quality assessment metrics associated with the convergence of the optimization algorithm and its goodness-of-fit [37], which were not employed here but could also be used to aid the pixel-level decision process.
The whole-of-scene errors are summarized in the final column of Table 3, and are comparable to the results of similar methods implemented in the literature [24,27], although we must emphasize here that we are aiming to extract broad conclusions about the behavior of each method, not extract the best results that are possible.
Regarding the accuracy of the SAMBUCA method, see Figure 8a, the concentration of shallow pixels with high error rates (centered around −8 m with absolute errors ≈ 8 m) suggests that these pixels have been incorrectly identified as deep, bright targets, the intensity of which has been affected by the water column attenuation, when they are actually shallow dark targets. The concentration of deep and relatively accurate pixels in the −20 m to −18 m depth range is explained by the lack of non-sand pixels in deeper parts of the image (see Figure 10c). Some of the errors may also be due to the dark bottom types that would naturally be darker than the infinitely deep water column [27,49] (which thus appear brighter as the depth increases and the water column begins to dominate the signal).
With regard to the accuracy of the stereo method, we have observed in Figure 8b that there is a broad trend of outliers that increase with depth. This is because the contrast of the bottom features decreases with increasing depth as the water column signal begins to dominate that of the substrate, and this makes it harder for the matching algorithms to accurately localize their positions. In general, we would not expect accuracy to depend on distance for terrestrial stereo [20], because the variations in depth are usually small compared to the parameters of the stereo system (baseline, overall distance from the scene), so it is only the blurring due to the water column that causes this.
The extensive deep and sandy areas in the south west of the scene are not well matched in the stereo algorithm, and this is apparent in Figure 6c

Combining Satellite-Based Bathymetry Approaches
Section 3.1 shows that the potential benefits of using a hybrid satellite bathymetry approach are considerable. Carl and Miller [21] have applied a combined method over the Caspian sea, using an unspecified satellite sensor with a resolution of 4 m; they claim <2 m vertical accuracy, which is consistent with the results presented here, although their technique is not yet published in the literature and they have not presented an analysis of the errors to our knowledge.
We demonstrated that the best improvement that can be achieved with any hybrid approach for these data is a reduction of mean absolute error from 2.88 m (stereo) to 3.21 m (SAMBUCA) to 1.38 m, which is less than 50% of either one, though this is not a realistic approach given the dependence on ground data. The key to approaching these accuracies that are achievable when the most-accurate pixel is chosen in reality is to attempt to predict the errors without knowing the ground truth. We have attempted to characterize the errors as well as possible for each of the two methods.
For stereo methods, the presence of features, which can be seen as measure of image texture [14], allows successful matching to be performed, which is one of the key factors in predicting the accuracy of the stereo solution [20]. In recent work, Jalobeanu and Goncalves [57] explain that it is not possible to predict confidence intervals on disparity maps a priori (i.e., given just the image data metadata without ground truth), chiefly because of matching and modelling errors in the stereo recovery process. None-the-less, several predictors of accuracy have been discussed [20,58], with [47] also investigating several measures of confidence for predicting errors in stereo vision. We explored the idea of using the standard deviation of the matching window as a measure of confidence in the depth predicted from photogrammetric methods.
The standard deviation is a measure of the local texture of the imagery, and hence is an indicator of how well the stereo algorithm can match features between the two images of the stereo pair. Thus, in the density plot of Figure 9a, we can see that as the standard deviation increases, the absolute error decreases for the stereo method.
For radiative transfer methods, there are many studies of the effect of seafloor cover on the retrieval accuracies. For example, Miecznik and Grabowska [27] experiment with simulated and real data and find that seagrass results have larger retrieval errors compared to sand and that the accuracy for sand substrates at 20 m depth are similar to those for darker substrates at 10 m depths.
The absolute errors for SAMBUCA pixels for the sand and non-sand classes have distinctly different distributions, with the sand pixels mostly falling below 3 m of absolute error. This is consistent with the work of Burns et al. [49], using the numerical simulator HYDROLIGHT to investigate the sensitivity of the ocean color equations, who find that the largest dependence is the size of the difference in albedo between the water column and the substrate, implying that brighter substrates will have increased sensitivity, and thus accuracy of the retrieved depth.
While we can use depth to separate the scene into components where the stereo and SAMBUCA estimates have relatively more accuracy, there are no systematic trends apparent in this example which allow general statements to be made. In addition to this, we would require the depth in the first place, which would imply that some sort of bootstrapping algorithm would be required.

Conclusions
This paper presents the results of analyses that demonstrate the complementary properties of the radiative-transfer-based bathymetry technique SAMBUCA and classical photogrammetric techniques adapted and applied to bathymetric problems.
There are substantial theoretical improvements possible by combining these two techniques. Using the most accurate pixels from each of the two methods provides an improvement of 51% for the SAMBUCA method and 53% for the stereo method.
Each of the methods performs better for different sea bed types observed in a satellite image. The SAMBUCA algorithm is best suited to highly reflective, uniform areas where the bottom substrate is bright compared to the water column. Sand is an example of such a substrate, and has the advantage of being extremely common in the marine environment. For the scene examined in this paper, there were many deep (>15 m) sandy areas and SAMBUCA performed better in these deeper areas. Radiative transfer algorithms will often fail when the amount of light from the bottom falls below a certain level and the signal from the water column begins to dominate. Thus, deep targets can cause problems, but shallow dark targets can also be mistaken for deep dark ones when there are limited spectral bands to distinguish them.
Stereo algorithms are at their most effective when there are highly contrasted and distinct features in the image, which allows correlation matching to uniquely match features between the two images in the stereo pair. In uniform, or repeating, areas of the image, there can be no features to find, or the features can be falsely matched to other similar features. Thus, the accuracy of the stereo technique over a sandy substrate will often be limited.
We showed that by doing a classification of the imagery used to provide the depth estimates, we could choose to use SAMBUCA for the sand areas, and the stereo algorithm for the non-sand and mixed areas, giving an improvement in mean absolute error of 12% over stereo alone, or 9% for the SAMBUCA algorithm alone.
When the local-neighborhood standard deviation is used to decide whether to use stereo or SAMBUCA, we observed an improvement in mean absolute error of 20% for the stereo algorithm alone and 17% over just using the SAMBUCA algorithm. By improving these methods and identifying other ways of predicting the errors, we would hope to approach the theoretical "best" method in future research. An alternative strategy, also worth considering, is to use the stereo methods to constrain the SAMBUCA algorithm on dark targets.