1. Introduction
Bathymetry mapping, the measurement and study of underwater depth, is critical for various applications, such as marine navigation, coastal management, the monitoring of environmental and aquatic resources, and hydrographic scanning [
1,
2]. Field water depth collection and further data processing have recently been carried out using the common ship-, air-, and space-borne approaches [
3]. The former uses single and multi-beam echo sounding systems [
3,
4] to gather accurate and timely water depth datasets. Multi-beam-based equipment transmits multiple, simultaneous sonar beams to collect depth data across a wide scope and in different directions, while a single-beam sonar provides the measured depth at points along the scanning line [
5]. Despite the high accuracy of field data measurement, these approaches are costly in operation and timely in field collection [
3,
6], leading to a gap in bathymetry map data in several regions [
1,
7]. The remote estimation of biological and physical parameters using satellite images has become essential to a variety of research domains in recent decades [
8]. This approach is cost-effective compared to other survey techniques, is a well-developed sensing technology, is easy to integrate with artificial intelligence (AI) models, and is accurate in thematic mapping [
9]. More importantly, remote sensing-based mapping requires only a limited number of field data points to train and validate the retrieval models, which confers great advantages, such as long-term and wide geographical observation, very low cost, reliability, and flexibility in retrieval computation. Hence, the use of remotely based approaches is becoming more popular for bathymetry mapping with a special focus on air-borne (e.g., UAV image [
10], air-borne LiDAR [
11]) and space-borne datasets, such as LiDAR data (e.g., IceSat-2 [
12] and satellite images (e.g., Landsat, Sentinel, WorldView [
13,
14,
15,
16], Pléiades [
17], SPOT [
18], and Planet [
19]). Of the satellite sensors in operation, Landsat is a common remotely sensed dataset used for bathymetry mapping with different levels of success and certainty. This satellite has been operating since 1972 at a spatial resolution of 30 m with an 8-day temporal coverage [
20,
21], which has increased the number of available Landsat images worldwide and has made it a valuable data source for any long-run temporal mapping projects. Landsat 9 inherits the successful design of Landsat 8 with a significant improvement in radiometric resolution of the OLI-2 (14 bits compared to the 12 bits of Landsat 8) and in straight light reduction, which enables a stronger detection of shade numbers and more accurate atmospheric correction [
21]. Despite this, we have observed a very limited number of studies [
22] that leverage the state-of-the-art Landsat 9 for water depth estimation.
Observing with other sensors in the Landsat family, retrieval models for Landsat images of shallow and clear coastal and oceanic regions have been developed using traditional linear band ratio approaches [
19,
23,
24], while other studies included band ratios together with machine learning (ML) models - a modern and advanced approach for non-linear data learning and over/underfit avoidance [
25] using Support Vector Machine (SVM) [
26], Neural Network (NN), Random Forest (RF), Extreme Gradient Boost (XGB), NN, and deep learning Convolution Neural Network (CNN) [
27]. Given the optical properties of clear coastal sites, the accuracy (
R2) was observed to range between 0.85 and 0.95. Fewer studies were found for bathymetry mapping in turbid water (i.e., rivers and lagoons) using Landsat imagery. We found an optimal band ratio approach coupled with Landsat 9 [
22] and a fused model of Adaboost and XGB (Adaboost-XGB) integrated into Landsat 8 [
28] to derive the depth map in turbid water, all of which enhanced but varied the model confidence to
R2 = 0.86 and 0.97, respectively. Liang et al. 2024 [
28] implemented the fused Adaboost-XGB in a mixed area of clear and turbid water, while Niroumand-Jadidi et al. 2021 [
22] deployed the retrieval model in different turbidity conditions but with a large variation in the coefficient of determination (0.44 - 0.86), leaving uncertainty in estimated depth in shallow and turbid waters. This accuracy variation may be attributed to the limited number of input features (i.e., only original bands and band ratios) used in the published models, leaving a gap where popular image feature extraction is not utilized [
29], such as gray scale morphological operation (GSMO) and morphological multi-scale decomposition (MMSD), to improve the accuracy using multi-dimensional estimation models.
On the other hand, the designed framework for bathymetry retrieval assumed a similar contribution of the input bands, and a regression approach was deployed with a lower concentration in feature selection, leading to important variables being overlooked and the loss in retrieval accuracy improvement. In this domain, different techniques have been adapted for several optimization problems in different domains (e.g., meta-heuristic optimization using a natural behavior-based algorithm) [
30,
31,
32]; however, we found no studies that applied these methods to optimize the input features for bathymetry mapping.
Given the gaps identified in the literature, this study aims to develop a general but advanced remote sensing-based method for bathymetry mapping in shallow and turbid water. We developed a novel approach using the feature extraction GSMO and MMSD to create the diverse input variables extracted from Landsat 9 imagery coupled with meta-heuristic-based feature selection to select the most important features feeding the retrieval models. We conducted a comparison of the performance of a wide range of ML models, including Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boost (XGB), CatBoost (CB), Light Gradient Boosting Machine (LGBM), and the new candidate KTBoost (KTB) for water depth estimation in a turbid lagoon (Sam Chuon-Ha Trung lagoon, Vietnam) from 53 variables attributed to original bands, band ratios, GSMO, and MMSD image extraction. The best model was then combined with state-of-the-art meta-heuristic algorithms (i.e., Dragon Fly (DF), Particle Swamp Optimization (PSO), and Grey Wolf Optimization (GWO)) to improve the certainty of depth retrieval in the study site. The objectives of this study were to (i) validate the performance of Landsat 9 imagery; (ii) compare the estimation capability of different ML models; and (iii) examine the feature selection ability of meta-heuristic algorithms for the development of an efficient workflow for water depth estimation, which is comprehensive, reliable, and scalable for bathymetry mapping globally.
4. Discussion
This study is the first operation and integration of diverse approaches for bathymetry mapping in coastal turbid water. Extracted data from the latest generation Landsat 9 image were combined with the state-of-the-art ML models and meta-heuristic algorithms to derive the water depth with an accuracy
R2 of 0.908 and RMSE of 0.31 m using a fused model of LGBM and PSO (LGBM-PSO). Of the selected ML models, the boosting algorithms achieved superior performance compared to the bagging and the hyperplane-based SVM. LGBM gained the highest confidence in depth estimation (
R2 = 0.88, RMSE = 0.35 m), followed by CB (
R2 = 0.86) and XGB (
R2 = 0.85). The KTB, a new boosting ML model introduced in 2021, presented a high potential in learning and predicting complex data with similar performance to the CB model (
R2 = 0.86). Boosting was found to outperform the bagging and SVM groups in the retrieval of both classification and bio-physical parameters [
27,
50,
51,
61] with advancements in algorithm structures and the creative workflow in the decision making of the final model [
62,
63].
Following the proposed workflows, a significant improvement was gained when using both approaches of feature extraction and feature selection. Feature importance analysis (
Figure 8) indicated the highest number of contributed variables (46%) of the band ratio, followed by the image extraction (44%) and original band (10%) groups, and the accumulated scores derived from LGBM-PSO were higher than those of the other variables. Of the 44% contribution, we discovered 27% and 17% of the information derived from the GSMO and the MMSD variables, respectively. There were only three original bands at the wavelengths of 561, 613, and 654 nm, while a larger contribution in the range of 443–654 nm was found for the band ratio and image extraction, implying the importance and necessity of feature extraction during the learning process of ML models. The working range of Landsat bands (443-654 nm) also fits the model performances found in other studies [
23,
24,
27,
64]. In addition to the successful retrieval of LGBM was the support of feature selection attributed to the DF, PSO, and GWO meta-heuristic algorithms. PSO produced the highest improvement in RMSE (~11%), while DF and GWO enhanced the RMSE by approximately 3% compared to the original LGBM. We do not claim here an absolute outperformance of the PSO to DF and GWO, due to a variation in the algorithm’s performance by the study sites and dataset [
50,
65,
66]. Rather, we motivated the integration of feature selection using nature-inspired algorithms during the building of the retrieval framework, not only for bathymetry mapping but also for other studies that have great accuracy improvement potential. However, a better convergence of PSO in our study may be assumed for the velocity adjustment (i.e., acceleration coefficients
and
parameters) of particle trajectories based on personal and global best solutions, which enables an efficient exploration of the search space and convergence to optimal solutions. Given a
-dimensional space, the inertial weight
controls the velocities
and
to balance exploration (searching for new potential regions) with exploitation (tuning the current searching area). The bird flock structure of PSO algorithms updates the next potential position of each particle using not only the experience itself but diverse experiences from other particles in the swamp, which increases the speed of convergence and the handling of noisy datasets [
67,
68].
Using the turbid water in Sam Chuon-Ha Trung lagoon as an example, our results derived from Landsat 9 imagery are more promising and have higher accuracy compared to similar studies in clear water and in rivers. Most of the published studies involving Landsat images use simple approaches of either original bands or the band ratio coupled with linear models (i.e., Stumpf model [
24], Generalized additive model (GAM) [
23], Lyzenga optical model [
19] or common ML models [
27]). While this approach showed promising confidence in water depth estimation in clear coastal and ocean waters, there was a great variation in accuracy in inland turbid waters (e.g., river [
22], estuary [
28]).
In addition to the reliability and consistency of the proposed methods, we emphasize the leveraging of open-source remotely sensed data (i.e., Landsat image), open-source algorithms (e.g., GSMO, MMSD feature extraction, ML, and meta-heuristic optimization) and the open Python programing environment, which enable our study to be replicated and the proposed framework to be expanded to a diverse environment at different scales. This can be claimed as another valuable contribution of the current study to the research community worldwide for both bathymetry mapping and the estimation of other parameters.
Despite reliable and promising results, this study comes with an unavoidable limitation. Due to the high cloud coverage, the number of available multi-spectral satellite images is reduced to a few scenes per year, which can make the comparison between field survey data and satellite image acquisition challenging. In addition, the appearance of private fish traps along the lagoon partly prevented a full observation of water depth to be made during the field survey. Discussion was raised with many local fishermen to obtain the information for water depth from these private areas. Ongoing studies are expanding the proposed methods to other turbid water bodies in river, estuary, and coastal regions. Drone images will be validated for bathymetry mapping together with other multi-spectral sensors and will be coupled with a deep learning model. Additional feature extraction techniques will be integrated with diverse meta-heuristic feature selection models (e.g., Harris Hawk Optimization, Genetic Algorithm) to improve the confidence of water depth estimation in the future.
5. Conclusions
We present an innovative approach for bathymetry mapping in turbid water using Landsat 9 image, leveraging state-of-the-art feature extraction techniques, such as GSMO and MMSD, alongside feature selection through DF, PSO, and GWO-based meta-heuristic optimization, coupled with ML-based learning employing RF, SVM, XGB, CB, LGBM, and KTB models. Among these, LGBM demonstrated superior performance in estimating water depth using all derived features (R2 = 0.88, RMSE = 0.35 m). This model was further enhanced by integrating PSO for feature selection in the second phase of prediction, resulting in the highest accuracy for bathymetry mapping (R2 = 0.908, RMSE = 0.31 m).
Of the 30 selected variables within the spectral range of 443-654 nm, band ratios accounted for 46% of the variance, while image extraction techniques (GSMO, MMSD) contributed 44% to the number of selected bands in the LGBM-PSO model. Feature importance analysis revealed that image extraction had the highest accumulated contribution score (10.68), followed by band ratios (8.96) and single-band groups (1.16).
The DF, PSO, and GWO algorithms exhibit significant potential in selecting the optimal combination of input variables for ML models, facilitating the derivation of accurate bathymetry maps across various turbidity conditions. This study underscores the superior learning capabilities of boosting compared to bagging and SVM techniques, with LGBM and KTB models showing promise for further deployment in water depth estimation and bio-physical parameter retrieval across different regions of the world.