1. Introduction
Bathymetric data are a basic geophysical parameter, and the accurate measurement of shallow-water bathymetry is crucial for the management of island coasts and the construction of marine projects. As a strategic resource, shallow-water geospatial data provide an important basis for marine science, engineering, and military operations. Traditional bathymetry methods, such as shipborne sonar and airborne light detection and ranging (LiDAR) surveying, offer high precision. However, these methods are costly, time-consuming, labor-intensive, and limited to areas that can be reached by aircraft or ships, making it difficult to carry out large-scale continuous surveying [
1,
2,
3]. The development of remote sensing technology has introduced new approaches to bathymetry. Unlike active bathymetry methods, optical remote sensing bathymetry retrieval involves passive bathymetry based on the radiative transfer model. Due to its cost-effectiveness, global coverage, and timely data availability, remote sensing imagery has been widely used in coastal bathymetry. To obtain accurate and efficient bathymetry data from coastal and open shallow-water areas, a satellite-derived bathymetry (SDB) method using ICESat-2 data combined with multispectral imagery is proposed [
4]. Studies have shown that spaceborne LiDAR can obtain bathymetric points with sufficient accuracy through processing to meet the accuracy requirements for use as SDB control points and compensate for the deficiency of field measurements to some extent. Spaceborne LiDAR can overcome the problem of inaccessibility [
5]. However, it is difficult to ensure the accuracy of bathymetric measurement by spaceborne LiDAR, since the pulse signals emitted by spaceborne LiDAR are subjected to more interferences over long distances and complex paths. Therefore, it is very important to extract accurate bathymetric information from the ICESat-2 satellite.
With the development of optical remote sensing-derived bathymetry technology, theoretical analytical models, empirical models, and semitheoretical and semiempirical models have been developed [
6,
7,
8,
9,
10]. Based on the theoretical analytical model, a semitheoretical and semiempirical model is built by using the attenuation characteristics of radiant energy transmission in water and a certain amount of actual bathymetry measurements as prior values [
11,
12,
13]. This method reduces the quantization process of the intermediate parameters and has been widely applied to remote sensing image bathymetry inversion. For example, Hsu et al. measured the shallow-water bathymetries of six islands and reefs in the SCS on the basis of ICESat-2 and Sentinel-2 data [
4]. Zhang et al. trained four typical models using ICESat-2 bathymetry points and multispectral imagery and produced bathymetric maps of Coral Island, Ganquan Island, and Lingyang Reef in the Xisha Islands; the average RMSE of the generated SDB was 0.16 and the average R
2 was 0.90 [
14]. Babbel et al. proposed using each pixel of the atmospherically corrected image data through a log-linear model combined with each pixel of the ICESat-2 data for the regression of bathymetry data [
15]. To fuse the ICESat-2 LiDAR data and the Sentinel-2 optical images, an improved cloud mask was developed. In each region, the 20th percentile of the Sentinel-2 reflectance data obscured by clouds was used to derive the mosaic map. The data were correlated with relatively dark values over the entire reflectance range to ensure that the relatively bright common natural interference in the satellite imagery of coastal areas was reduced [
16]. Chen et al. proposed a dual-band bathymetry method without bathymetric control points [
17]. However, this method is difficult to implement and suffers from low inversion accuracy. Ma et al. proposed a complex mathematical model for bathymetry inversion [
18]. However, most semitheoretical and semiempirical models do not consider the spatial correlation between the bathymetric point and surrounding pixels, and the traditional linear models are too simple and thus not suitable for complex environments [
19].
With the advancement of computer technology, neural networks have become a powerful means for accurate bathymetry inversion. Neural networks do not need to consider the physical mechanism of remote sensing-derived bathymetry. Instead, these networks learn the statistical relationships between water depth and image pixel radiance values to build a model. Owing to their advantages in solving multivariable, nonlinear, and complex problems, neural networks are introduced into a statistical model for bathymetry inversion. For example, Ai et al. proposed combining the local connectivity of a convolutional neural network (CNN) and the local spatial correlation of pixels of bathymetry inversion images to construct models using remote sensing images and airborne LiDAR data [
20]. Sandidge et al. proposed to use the backpropagation (BP) neural network model for bathymetry inversion [
21]. The proposed model was better than the traditional linear regression model. Zhou et al. used a statistical model and a semiempirical and semianalytical model to combine multispectral remote sensing image data with ICESat-2 data [
22]. The results revealed that the accuracy of the statistical model was relatively high, and the combination of the extreme gradient boosting model and Sentinel-2 data was the optimal choice. To avoid the problems caused by atmospheric correction failure, the Rayleigh corrected top-of-atmosphere reflectance (ρRC) was used as the input for bathymetry inversion via a multilayer perceptron (MLP) model [
23]. A deep learning framework for coastal bathymetry was constructed based on machine learning approaches. With the help of a 2D CNN, a deep learning framework for nearshore bathymetry (DL-NB) can make full use of the initial multispectral information of Sentinel-2 data at each bathymetry point and its adjacent area during the training process [
10]. The results showed that the universality of the DL-NB model was poor. The Sentinel-2 image time series were synthesized and used to perform coastal bathymetry inversion via four empirical models [
24], and the comparison confirmed that the neural network methods had better performance than the traditional methods. Neural network-based bathymetric inversion methods usually use single-temporal images for bathymetric inversion. Due to the noise and reflectivity anomalies in single-time-phase images, it is difficult to obtain good bathymetric inversion results based on single-temporal images.
To address the image data quality issue in the remote sensing-derived bathymetry inversion using single-temporal images, this paper introduces an active and passive fusion method for shallow-water bathymetry using multi-temporal images. First, the ICESat-2 satellite can provide high-quality a priori bathymetric data. Next, Sentinel-2 images with cloud obstruction rates less than 10% are selected, a new image is generated from a single image via pixel median filtering and pixel-level fusion, and the BP neural network model is trained by fully utilizing the corresponding multispectral information of Sentinel-2 data at each bathymetry point. Finally, a bathymetry map of the study area, including Ganquan Island, Dong Island, and Wuzhizhou Island, is generated. To evaluate the bathymetry accuracy of the proposed method, the performance of the models trained with different inversion factors are compared with field survey data.
The innovations of this paper are as follows:
- (1)
The DPC algorithm is first applied to ICESat-2 seafloor photon signal extraction to provide accurate bathymetric control points for SDB.
- (2)
The bathymetric inversion of active and passive satellite remote sensing data based on multi-temporal fusion is proposed to eliminate the effects of image noise and reflectivity anomalies on the bathymetric inversion results.
- (3)
Shallow bathymetric inversion results can be accurately obtained for different water depths and seafloor topography.
The remainder of this paper is organized as follows: in
Section 2, the used data in this study are described, including the ICESat-2 ATL03 product and the Sentinel-2A spectral images.
Section 3 presents the procedure and data processing methods used in this paper, and the performance of water depth estimation in three selected study areas in the SCS is demonstrated to validate the feasibility of the proposed method. In
Section 4, the performance of the method is demonstrated in terms of topographic feature trends, different water depth accuracies, and relative measurement errors.
Section 5 discusses and analyzes the performance of the proposed method.
Section 6 summarizes the experimental results and presents conclusions.
3. Methods
In this study, ICESat-2 laser altimetry data were fused with multi-temporal Sentinel-2 satellite data after median filtering, and the BP neural network method was used for bathymetry inversion. The flowchart of the proposed method is shown in
Figure 2.
First, isobath points were extracted from ATL03 data from the ICESat-2 satellite as control point data [
30]. For the processed Sentinel-2A images, the empirical models for different inputs were built sequentially. First, a single image was used as input. Second, median filtering (3 × 3) was applied to the pixel value of a single image. Third, the median-filtered image was synthesized by superimposing the images together through the “medfilt2” function in MATLAB. To eliminate the influence of adjacent land and water pixels, the normalized difference water index (NDWI) was used to mask land and extract shallow water areas, followed by median filtering. Finally, the preprocessed spectral bands (band 2 (red), band 3 (green), and band 4 (blue)) were input into the BP neural network model for training. To validate the inversion accuracy and evaluate the performance of the three models trained with different inputs, the trained models were compared and analyzed.
3.1. Extraction of Bathymetry Control Points from the ICESat-2 Satellite
In this study, bathymetry control points were extracted from the ATL03 data product of the ICESat-2 satellite, and the process included the following steps: for the point cloud data, the WGS84 ellipsoidal height needed to be converted to a height referring to a mean sea level or geoid model. Therefore, the ellipsoidal heights of the photons were converted to orthometric heights using the EGM2008 geopotential and thus referred to the EGM2008 geoid. To address noisy data interference in the dataset, the improved DPC algorithm was used for signal photon extraction [
31]. Since the DPC algorithm does not consider the distribution of the entire dataset, it cannot be applied to datasets with large differences in density. To improve the applicability of this algorithm, a method to calculate the data field potential energy is proposed that continuously optimizes the extraction of the optimal truncation threshold through information entropy. The potential energy formula is as follows:
where
represents the Euclidean distance between the sample points and is the impact factor, which determined the optimal truncation distance.
represents the initial truncation distance. The potential energy values in dataset X are {
,
, …,
}. The uncertainty of the dataset is described by the Gini index, and the formula is as follows:
where
is the sum of the potential energies of all the data in the dataset. As
varies, the Gini index initially decreases until it reaches a minimum, after which it increases rapidly and finally stabilizes. By incorporating two optimal local density threshold parameters, which correspond to the daytime and nighttime data with different signal-to-noise ratios (SNRs), clustering is performed on the basis of the local density of each dataset.
Currently, ICESat-2 products do not consider reflection at the air–water interface and the corresponding change in light speed [
29]. Because it is assumed that photons only propagate in the air, the depth of the seabed is overestimated. [
32]. Therefore, reflection correction is necessary to obtain more realistic underwater terrain changes. The use of the kernel density function to separate sea surface photons from seafloor photons is based on the two following rules: (1) the point density at sea level is relatively high near sea level and (2) the sea level is higher than the seafloor. The first local minimum of the kernel density curve is used as the threshold for separating photons from the seafloor, and photons smaller than the threshold are considered seafloor photons [
33]. The median sea surface photon elevation is used as the MSL. The sea surface is considered a horizontal surface, and the difference between the seafloor photon elevation and the MSL is calculated as the initial water depth. The reflection model is derived on the basis of the reflection law and combined with the vertical angle information corresponding to each photon provided by the ATL03 data in /gtx/geolocation/ref_elev. Reflection correction is performed on the seafloor photons to obtain more accurate bathymetric information [
34].
3.2. Sentinel-2 Data Processing
The proximity effect is one of the main satellite image distortions and is caused mainly by the scattering atmosphere between the surface and the sensor; that is, a photon interacts with surface objects around the target pixel, is scattered by the atmosphere, and enters the sensor [
35]. At the water–land boundary, owing to the optical complexity of seawater and disturbances from nearby land, the proximity effect is significantly increased [
36]; therefore, the effective removal of adjacent land and water pixels is important for optical remote sensing-driven bathymetry [
37]. To calculate the NDWI, land masking and deep and shallow water area separation are performed by adjusting the threshold, and the obtained shallow water area is used for the subsequent studies. The spatial resolution of the red, green, and blue bands of the Sentinel-2A image is 10 m, whereas the data interval of the ICESat-2 satellite is 0.7 m; therefore, when matching the same point from the two data sources, one pixel value of Sentinel-2A corresponds to multiple bathymetry points. Considering that the resolution of the remote sensing band is 10 m, median filter processing is performed on the image pixels and different window sizes are used. The experimental results show that when the window size is 3 × 3 pixels, the bathymetric inversion model based on the BP neural network can achieve high accuracy. Therefore, a window size of 3 × 3 pixels is chosen.
Figure 3a shows a schematic diagram of the filtering principle.
The median filter is a nonlinear digital filtering method that is often used to remove noise from images or other signals. The basic principle of median filtering is to select a neighborhood window of a pixel in the image, sort the values of all the signal points in that window by size, and replace the value of the current pixel with the median, which depends on the size of the filter window. The goal of median filtering is to make the pixel value closer to the true value. The median filtering formula is as follows:
where
represents the original image.
represents the processed image. m and n are the horizontal and vertical coordinates of the points in the neighborhood range
of the point (x, y).
The Sentinel-2 images used in this paper were all downloaded from the European Space Agency (ESA). The Sentinel-2 images are Level-1C products. After geometrical correction and radiometric correction of the apparent top-of-atmosphere reflectance, each Level-1C image product consists of an orthographic image of 100 km
2 (UTM/WGS84). The map coordinates of the image are corrected via a digital elevation model (DEM), and land, water, and cloud mask data are included. The Level-2A product is obtained by processing the Level-1C products using the Sen2Cor processor. Previous studies have shown that Sentinel-2A data with a longer time span may exhibit higher synthesis quality [
38]. The ICEsat-2 satellite was launched in 2018. To avoid the influence of terrain changes caused by the long time span between the image data and the ICESat-2 acquisition time, 3 recent Sentinel-2A images with a cloud pixel percentage < 10% were stacked together with ICESat-2 data acquisition time as the reference. The median-filtered multi-temporal image was median-fused to generate a new high-quality image for subsequent bathymetric inversion. As shown in
Figure 3b, this method can effectively mask the errors caused by the atmosphere or water quality in each satellite image.
3.3. Construction of the BP Neural Network Model
Machine learning methods can effectively learn and express the complex mapping mechanism between different variables. The BP neural network model is the most studied and widely used feedforward neural network [
38]. The classic BP neural network is composed of three parts, namely the input layer, the hidden layer, and the output layer. The structure diagram of the BP neural network is shown in
Figure 4. In this paper, the BP neural network contains input, hidden, and output layers. The input, hidden, and output layers contain 3 neurons, 10 neurons, and 1 neuron, respectively. Each neuron in the input layer is responsible for receiving information and then transmitting it to the intermediate neurons. The function of the neurons is information processing. The training process is completed through two processes, forward propagation and backward feedback. By adjusting the connections of internal nodes, information processing is achieved until the predicted result and the actual result converge. When the target output is known, the difference between the actual output of the neural network and the known true value is usually used as the loss function. The smaller the difference between the two values are, the better the performance of the neural network. The mean square error (MSE) function is often used as the loss function, and its formula is as follows:
where
represents the number of training samples,
is the number of outputs,
is the predicted value of the j-th output of the i-th sample, and
represents the corresponding true value.
The surface reflectance and water depths of the red, green, and blue bands containing the remote sensing reflectance from the joint training set were used as inputs. The ICESat-2 bathymetric data and the pixel values of the images of the same point were used to train the model, and a regression relationship was established with the bottom-of-atmosphere reflectance data of the Sentinel-2A image for bathymetry inversion. For the training model, 80% of samples was selected to train the model, and 20% of the samples was used as the validation set to evaluate the model accuracy. The network was trained to converge after 1000 epochs. The water depth of the corresponding sampling point was used as the output layer variable to continuously optimize the model. The training function was used as the transfer function of the hidden layer, and the purelin activation function was used in the output layer.
3.4. Quantitative Evaluation Indicators
To weigh the bias or difference between the estimated and measured water depths, the coefficient of determination (R2) between the two groups of data is used to evaluate the correlation, and the mean absolute error (MAE) and the RMSE between the obtained and measured water depths are used to evaluate the obtained water depths. When the MAE cannot be used to compare the reliability of different measured depths, the relative difference (RD) of the statistical data can better reflect the reliability of the estimated water depth. The formula of each indicator is as follows:
where
represents the total number of pairs included in the bathymetric accuracy assessment,
represents the obtained water depth,
represents the collected actual bathymetric data, and
represents the mean of the true values.
The MAE visually reflects the absolute error between the obtained and measured water depths, and the RMSE reflects the fluctuation in the bathymetric measurement error. The smaller the MAE and RMSE values are, the greater the measurement accuracy. The R2 reflects the correlation between the experimentally measured water depth and the actual water depth. The closer to 1 the R2 value is, the greater the goodness of fit. The RD reflects the magnitude between the absolute error and the true value.
5. Discussion
Figure 14a–c shows the original images of Ganquan Island, Dong Island, and Wuzhizhou Island, respectively. Within the red rectangular area of
Figure 14a, there is a thin cloud cover. In the green rectangular region of
Figure 14b, there is an image reflectivity anomaly.
Figure 14c shows better image quality compared to
Figure 14a,b.
Figure 15(a1–a3) shows the bathymetric inversion results of the red rectangular area in
Figure 14a using three algorithms, respectively. It can be seen that there are obvious anomalies in the bathymetry inversion results based on a single-temporal image due to the influence of thin cloud cover. The median filter can effectively alleviate the influence of thin cloud cover on the bathymetric inversion. Multi-temporal image fusion can further eliminate the influence of thin cloud cover on the bathymetric inversion results.
Figure 15(b1–b3) shows the bathymetric inversion results of the green rectangular area in
Figure 14b using three algorithms, respectively. It can be seen that due to the image reflectivity anomaly, the bathymetric inversion results based on the single-temporal have obvious errors. The median filter can alleviate the effect of image reflectivity anomalies on the bathymetric inversion. Multi-temporal fusion can further alleviate the effect of image reflectivity anomalies on the bathymetric inversion.
In
Table 2, for Ganquan Island, the RMSE of median filtering and multi-temporal fusion is improved by 0.65 m and 1.17 m, respectively, compared to the single-temporal image. In
Table 3, for Dong Island, the RMSE of median filtering and multi-temporal fusion is improved by 0.4 m and 0.57 m, respectively, compared to the single-temporal image. In
Table 4, for Wuzhizhou Island, the RMSE of median filtering and multi-temporal fusion is improved by 0.4 m and 0.36 m, respectively, compared to the single-temporal image. The above analysis shows that when there is thin cloud contamination and image reflectivity anomalies in the image, the bathymetric inversion results of the multi-temporal fusion are significantly better than the other two methods. When the original image quality is high, multi-temporal fusion and median filtering achieve bathymetric inversion results with similar accuracy. Overall, the optimal bathymetric inversion results can be obtained by multi-temporal fusion in three algorithms for different types of images.
6. Conclusions
In summary, to address the limitation of bathymetric inversion from single-temporal images, this paper introduces the use of Sentinel-2A image data and ICESat-2 satellite data from typical islands to construct a BP neural network model based on a machine learning algorithm. A method of combining median filtering and image fusion is proposed, and a bathymetric inversion study is conducted on Ganquan Island, Dong Island, and Wuzhizhou Island. Different bathymetric inversion maps show that the method in this paper can yield richer details and the continuity of topographic change. To evaluate the performance of the different processing methods, the inversion results are compared with field survey data for validation. The Sentinel-2A image data used in this study include mainly the bottom-of-atmosphere reflectance data, and an atmospheric correction step is not needed. The bathymetry points are extracted from the ICESat-2 lidar as a priori control points, overcoming the limitation that field data are difficult to obtain. The results show that the machine learning inversion method that combines the processed images with ICESat-2 data as depth prior information can achieve higher accuracy, and the RMSEs of the three study areas are 1.31 m, 1.82 m, and 1.41 m.
Specifically, the results of this study show that the performance of traditional machine learning methods for water depth estimation is reliable. Thus, these methods can fit the nonlinear relationship very well. The improvement in the inversion accuracy varies in different areas, indicating that different water qualities and complexities of the environment affect the universality of the method. A comparison of different water depths reveals that the inversion accuracy for areas shallower than 10 m is better than that for areas deeper than 10 m. In the future, the processed image data can be divided into different parts according to the water depth, and each part can be studied independently by modeling. In this study, ICESat-2 satellite and Sentinel-2A remote sensing images are used to obtain relatively accurate large-scale shallow-water surveys and seafloor topography data, providing a simple and reliable method for analyzing areas without field surveys or airborne LiDAR surveys. In the next step, the method introduced in this paper will be applied to different islands, reefs, and shallow waters for validation.