An Optical Water Type-Based Deep Learning Framework for Enhanced Turbidity Estimation in Inland Waters from Sentinel-2 Imagery

Ma, Yue; Chen, Qiuyue; Song, Kaishan; Yang, Qian; Zheng, Qiang; Ma, Yongchao

doi:10.3390/s25206483

Open AccessArticle

An Optical Water Type-Based Deep Learning Framework for Enhanced Turbidity Estimation in Inland Waters from Sentinel-2 Imagery

by

Yue Ma

^1,*,

Qiuyue Chen

¹,

Kaishan Song

²

,

Qian Yang

²

,

Qiang Zheng

¹ and

Yongchao Ma

¹

School of Geomatics and Prospecting Engineering, Jilin Jianzhu University, Changchun 130118, China

²

State Key Laboratory of Black Soils Conservation and Utilization, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(20), 6483; https://doi.org/10.3390/s25206483

Submission received: 5 September 2025 / Revised: 17 October 2025 / Accepted: 17 October 2025 / Published: 20 October 2025

(This article belongs to the Special Issue Remote Sensing Image Processing, Analysis and Application)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

A novel framework that integrates the fuzzy c-means method with a weighted blending CNN-RF deep learning model for accurate turbidity estimation based on optical water types was effectively implemented and validated using Sentinel-2 data.
The OWT-based deep learning model achieves robust and generalizable turbidity predictions with high accuracy and effectively retrieves turbidity to capture the continuous spatial distribution characteristics of inland waters.

What is the implication of the main finding?

This study provides a practical and accurate method for facilitating the application of deep learning models based on the optical classification of inland waters in turbidity estimation.
The validated framework and methods enhance the operational capabilities of remote sensing for water quality monitoring and provide algorithmic support for a more comprehensive understanding of aquatic environmental conditions and ecosystem dynamics.

Abstract

Turbidity is a crucial and reliable indicator that is extensively utilized in water quality monitoring through remote sensing technology. The development of accurate and applicable models for turbidity estimation is essential. While many existing studies rely on uniform models based on statistical regression or traditional machine learning techniques, the application of deep learning models for turbidity estimation remains limited. This study proposed deep learning models for turbidity estimation based on optical classification of inland waters using Sentinel-2 data. Specifically, the fuzzy c-means (FCM) clustering method was employed to classify optical water types (OWTs) based on their spectral reflectance characteristics. A weighted sum of the turbidity prediction results was generated by the OWT-based convolutional neural network-random forest (CNN-RF) model, with weights derived from the FCM membership degrees. Turbidity for four typical waters was mapped by the proposed method using Sentinel-2 images. The FCM method efficiently classified waters into three OWTs. The OWT-based weighted CNN-RF model demonstrated strong robustness and generalization performance, achieving a high prediction accuracy (R² = 0.900, RMSE = 11.698 NTU). The turbidity maps preserved the spatial continuity of the turbidity distribution and accurately reflected water quality conditions. These findings facilitate the application of deep learning models based on optical classification in turbidity estimation and enhance the capabilities of remote sensing for water quality monitoring.

Keywords:

turbidity estimation; optical water types; deep learning model; Sentinel-2 imagery

1. Introduction

Lakes and reservoirs, as important inland water resources, significantly influence ecological balance, socioeconomic development, and human well-being [1,2,3]. They absorb runoff, regulate regional climates, and provide freshwater resources for drinking, agricultural irrigation, and industrial production [4]. Additionally, these water bodies are essential for hydropower generation, flood control, aquaculture, and recreational activities [5]. Lakes and reservoirs are critical for maintaining ecological and socioeconomic sustainability [6]. However, due to increasing anthropogenic activities and the impacts of climate change, these water bodies have experienced issues such as area reduction and water quality deterioration, which threaten aquatic ecosystems [7]. Regular water quality monitoring is essential for understanding environmental changes and supporting the formulation of effective strategies for aquatic ecosystem protection. Turbidity, which measures the degree to which suspended particles in water scatter light, has become a widely used indicator of water quality conditions. It offers a simple, economical, and practical alternative to direct measurements of suspended sediment [8,9,10]. Turbidity is closely related to optically active substances in water, such as phytoplankton, non-algal particles, and colored dissolved organic matter [11,12,13]. Elevated levels of suspended particulate matter or chlorophyll-a concentrations can restrict light penetration, increase turbidity, and signal a decline in water quality. Thus, effectively and accurately quantifying water turbidity is a necessary and fundamental task for enhancing the understanding of water quality in lakes and reservoirs.

Remote sensing technology has been regarded as an advanced tool for monitoring water turbidity since 1980, when Moore evaluated the feasibility of satellite data [14,15]. Numerous satellites have been launched to provide extensive remote sensing data, which has been successfully applied to mapping water turbidity in inland waters [2,16,17]. For instance, various satellite remote sensing instruments, such as Geostationary Ocean Color Imager (GOCI), Moderate Resolution Imaging Spectroradiometer (MODIS), Ocean and Land Colour Instrument (OLCI), and Medium Resolution Imaging Spectrometer (MERIS), are capable of acquiring optical measurements related to turbidity and generating images with medium spatial resolution exceeding 100 m [6,18,19,20,21]. However, these datasets exhibit limitations in good spatial resolution for monitoring smaller water bodies and sufficient spectral response for estimating water turbidity. The Landsat series of satellites, deployed relatively early, has provided data with improved spatial and spectral resolution, enabling long-term monitoring of water turbidity since the 1970s. Remote sensing data from the MSI (MultiSpectral Instrument) sensor aboard Sentinel-2 satellites offer higher spatial resolution (up to 10 m) and more sensitive spectral bands, making them more suitable than other sensors for estimating water turbidity across a wide range of water bodies, including smaller ones, within the visible and near-infrared spectral ranges. Additionally, Sentinel-2 data provide high temporal resolution (acquired every 2–5 days via Sentinel-2A and 2B), thereby facilitating more accurate temporal alignment with in situ measurements [14]. Moreover, non-satellite remote sensing data obtained from near-surface sensing platforms have also been used for turbidity estimation. Unmanned aerial vehicles (UAVs) equipped with a multi-spectral camera or a hyperspectral imager are convenient and effective to provide more detailed image data for turbidity monitoring [22]. The ground spectral data measured by portable spectrometers (e.g., Analytical Spectral Devices (ASD) spectrometer) can be flexibly used on board to detect the water surface reflectance with the highest spectral resolution [23]. However, data obtained from aircraft or ground measurements involve a higher cost and suffer from limited capacity to comprehensively observe continuous spatiotemporal distribution maps of water turbidity across large areas [24]. Consequently, generating more accurate water turbidity products from Sentinel reflectance products is more suitable and holds significant potential for operational and regular water quality monitoring applications.

Developing accurate, robust, and practical models for turbidity estimation is a foundational task in remote sensing technology. Empirical models based on statistical regression algorithms are commonly developed to establish relationships between water turbidity and sensitive spectral reflectance derived from various satellite remote sensing data [25]. Although empirical models are relatively simple to develop, they often lack robustness and generalization capability. In addition, semi-analytical models, which are based on the inherent optical properties of water, require more complex computational procedures and are more challenging to implement. Compared to these traditional methods, data-driven machine learning methods have become more promising methods for remotely estimating water turbidity. Machine learning methods offer a significant advantage in data mining and exhibit superior performance in modeling the nonlinear relationships between turbidity and spectral reflectance [19,26]. Several machine learning methods, including random forest regressor (RFR), extreme gradient boosting (XGBoost), support vector regression (SVR), and back-propagation neural network (BP), have been successfully applied for turbidity prediction in inland waters. The rapid advancement of deep learning algorithms has further enabled the development of advanced models for water quality assessment [27,28,29,30,31]. Compared to machine learning models, deep learning algorithms excel in constructing deep network architectures with multiple hidden layers. These methods progressively extract complex data relationships through iterative training, thereby significantly enhancing prediction accuracy [32]. Several researchers have applied deep learning algorithms, such as a convolutional neural network (CNN) and a deep neural network (DNN), to accurately estimate water quality indicators for aquatic ecological environment assessment [33,34,35].

However, challenges persist in constructing deep learning models for water turbidity estimation, as predictive performance is often affected by uncertainties arising from limitations in model generalization and dataset quality. In particular, lakes and reservoirs, categorized as optically complex Case II waters, pose significant challenges to developing a universal learning model based on samples from diverse water bodies. This is primarily due to the complex interplay of optical signals from chlorophyll, suspended sediments (SPM), and colored dissolved organic matter (CDOM) in these waters [36,37,38,39]. To overcome the heterogeneity of water optics, several studies have proved that waters can be classified into distinct optical water types (OWTs) based on observed spectra before model construction [40]. Since model performance is closely related to the optical properties of waters, classifying waters and developing specific estimation models for each OWT can enhance model accuracy and yield more reliable estimation results [14,41]. In general, optical water classification methods mainly include clustering methods based on the magnitude and shape of remote sensing reflectance, and bio-optical threshold methods based on the inherent optical properties (IOPs) of water (i.e., absorption and scattering) [11,42]. Among them, clustering methods are widely employed for water classification through automated unsupervised classification techniques (e.g., hierarchical clustering, k-means clustering, and fuzzy c-means clustering) [43,44,45,46], and empirical threshold partitioning methods for specific spectra [47].

Although these optical water classification methods have demonstrated efficiency in estimating chlorophyll or TSM, there remains a lack of deep knowledge regarding their potential applicability in capturing the Sentinel-2 characteristic spectrum of different water bodies for turbidity estimation. Moreover, most studies rely on uniform estimation models developed using statistical regression algorithms or traditional machine learning techniques, with limited exploration of deep learning approaches for water turbidity estimation. Consequently, to improve turbidity estimation for water quality monitoring in inland waters, this study proposes deep learning models for turbidity estimation based on specific water optical types using Sentinel-2 data. To achieve this goal, the specific research objectives are as follows: (1) classify water types for turbidity estimation based on the spectral reflectance characteristics of Sentinel-2 data; (2) establish deep learning methods for estimating water turbidity based on specific water optical types; (3) apply the established models to invert water turbidity from Sentinel-2 imagery across several representative lakes and reservoirs.

The rest of the paper is structured as follows: Section 2 outlines the geographical conditions of the study area, the data resources and processing procedures, and the basic theory of the algorithms used within this study. Specifically, the key algorithms include the optical water classification method, the proposed turbidity estimation method, and the evaluation criteria of model performance. The results are presented and discussed in Section 3 and Section 4. Finally, Section 5 summarizes the main conclusions.

2. Materials and Methods

2.1. Study Area

The study area is situated in Northeast China (Longitude: 118°53′ E–135°05′ E, Latitude: 38°43′ N–53°33′ N), encompassing Liaoning Province, Jilin Province, Heilongjiang Province, and the eastern part of Inner Mongolia Autonomous Region, with a total area of 1,240,897 km² (Figure 1). Surrounded by the Greater Khingan, Lesser Khingan, and Changbai Mountains, the region features a relatively elevated topography. Major river systems, including the Liaohe River, Songhua River, and Nenjiang River, form the low-lying Northeast China Plain, including the Songnen Plain, Sanjiang Plain, and Liaohe Plain. The region contains hundreds of rivers, lakes, and reservoirs, which provide vital support for people’s livelihoods as well as industrial and agricultural production. Moreover, this region is characterized by a temperate monsoon climate with distinct four seasons. The annual average temperature ranges from 2 °C to 6 °C, and the annual precipitation is 350–700 mm. Winters in Northeast China are long and severe, with a freezing period of up to 5–6 months. The prolonged freezing period significantly reduces the capacity of water bodies to degrade pollutants and undergo natural self-purification, resulting in a marked deterioration of water quality [48]. As a crucial national industrial and grain production base, the region has several shallow waters that are mesotrophic or eutrophic due to the influence of intensive human activities. To conduct a comprehensive evaluation of model performance, four representative and important lakes and reservoirs, namely, Xingxingshao Reservoir (XXSR), Chagan Lake (CGL), Erlong Lake (ELL), and Jingpo Lake (JPL), were selected to retrieve water turbidity from Sentinel-2 image data for detailed analysis.

2.2. Data Acquisition and Processing

2.2.1. Field Survey and Laboratory Measurements

Field sampling was conducted annually from April to October between 2018 and 2024 to monitor water turbidity in selected lakes and reservoirs across Northeast China. During each sampling campaign, the geographic coordinates of the sampling sites were accurately recorded using a Trimble PXRS GPS receiver (Trimble Navigation, Inc., Sunnyvale, CA, USA). Water samples were collected at a depth of 0.5 m below the surface, stored in a portable refrigerator at 4 °C, and transported to the laboratory for subsequent analysis within 7 days. In the laboratory at a controlled temperature of 20 ± 2 °C, artificial turbid water of 400 NTU was prepared. Distilled water was filtered through a 0.2 μm glass fiber membrane to obtain water free of turbidity. Standard turbidity solutions were prepared by diluting the turbid water using the filtered distilled water. The turbidity of water samples was measured using a UV–VIS spectrophotometer (SHIMADZU UV-2600, SHIMADZU, Kyoto, Japan) at a wavelength of 680 nm with a 3 cm quartz cuvette. The measured turbidity of samples was determined using the calibration curve derived from the absorbance of standard solutions at 680 nm.

2.2.2. Sentinel-2 MSI Data Acquisition and Processing

The Sentinel-2 mission, part of the Copernicus Programme, comprises two identical satellites operating in tandem to support the monitoring of changes on the Earth’s surface. Sentinel-2 is designed to provide a large swath width of 290 km and a high revisit frequency of 2–3 days at mid-latitudes, enabling frequent monitoring of surface water conditions [49]. The optical Multi-Spectral Instrument (MSI) onboard Sentinel-2 measures the reflected radiance in 13 spectral bands, ranging from the VNIR (Visible and Near-Infra-Red) to the SWIR (Short Wave Infra-Red). The MSI sensor features three distinct spatial resolutions, each corresponding to specific spectral bands. Specially, there are four classical VNIR bands with a 10 m spatial resolution: Blue (493 nm), Green (560 nm), Red (665 nm), and NIR (833 nm); four narrow bands in the VNIR vegetation red edge spectral domain (704 nm, 740 nm, 783 nm and 865 nm) and 2 wider SWIR bands (1610 nm and 2190 nm) with a 20 m spatial resolution; three bands primarily focused on cloud screening and atmospheric correction (443 nm for aerosols and 945 nm for water vapour) and cirrus detection (1374 nm) with a 60 m spatial resolution.

In this study, Sentinel surface reflectance images were obtained through the Google Earth Engine (GEE) platform. Cloud-free Sentinel-2 images that corresponded to the in situ data within a ±7-day temporal window were selected [14]. The reflectance of the central pixel was extracted by using the average surface reflectance within a 3 × 3 window size, and this central pixel was matched with the in situ turbidity measurement points. A total of 668 in situ sampling points were successfully matched with Sentinel-2 imagery over the period from 2018 to 2024. These data were used for optical water classification and model construction. The statistical information on water turbidity of the sampling points across different sampling years (2018–2024) is shown in Table 1.

The point data was primarily utilized for optical water classification. These data were also used to construct machine learning models, which served as a control group for evaluating the performance of deep learning models. In addition, with the Sentinel-2 image pixel corresponding to the field observation point set as the central pixel, image patches are segmented by defining window sizes as specific side lengths, which were generated for constructing deep learning models. To retrieve water turbidity information for selected lakes and reservoirs, water bodies were extracted using manual visual interpretation of Sentinel-2 images acquired in 2024. The water boundaries were created by vectoring and could be effectively and directly converted into a water body map.

2.3. Methodology

2.3.1. Fuzzy C-Means Algorithms for Optical Water Classification

The fuzzy c-means (FCM) algorithm, a soft clustering method, was selected for optical water classification. This approach generates multiple clusters, allowing each data point to belong to more than one cluster with varying degrees of membership. Membership grades are introduced to quantify the extent to which each data point belongs to a particular cluster [50]. Consequently, data points with lower membership grades exhibit a weaker association with a given cluster compared to those with higher grades. This method aims to minimize the distance between the data points and their corresponding cluster centroids [51]. The objective function, as presented in Equation (1), is continuously minimized through iterative updates of the membership grades and cluster centroids. The computation of the cluster centers is the weighted average of the membership values, as defined in Equation (2) [46]. The iterative process terminates when either the maximum number of iterations is reached or the residual change falls below a predefined minimum threshold. Each data point can ultimately be assigned to one cluster based on the maximum membership grade. The FCM algorithm returns a set of fuzzy clusters and a membership matrix containing the membership grades of each data point for every cluster [52].

J_{F C M} = J (U, v) = \sum_{i = 1}^{N} \sum_{j = 1}^{C} u_{i j}^{m} d_{i j}^{2}

(1)

u_{i j} = {(\sum_{k = 1}^{C} {(d_{i j} / d_{i k})}^{2 / m - 1})}^{- 1}

(2)

where N is the number of samples; C denotes the number of clusters; u_ij is an i × j matrix with membership degrees; d is the Euclidean distance; and m = 2 is fuzzification degree.

Cluster validity measures are essential for assessing clustering outcomes and determining the optimal number of clusters. In this study, the partition entropy (PE) index and Xie-Beni (XB) index were jointly utilized to determine the number of clusters when both indices reached their minimum values [53,54]. The corresponding formulas are presented in Equations (3) and (4). These two measures considered the fuzziness of membership grades and the compactness and separateness of clusters.

V_{P E} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{K} u_{i j} \log_{a} (u_{i j})

(3)

V_{X B} = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{K} u_{i j}^{m} d_{i j}^{2}}{N \min_{i, j} (d_{i j}^{2})}

(4)

where K denotes the number of clusters to be evaluated; other parameters are as defined in the aforementioned formulas.

2.3.2. Base Regressor Algorithms for Turbidity Estimation

A convolutional neural network (CNN) is a widely used deep learning model that employs forward and backward propagation algorithms to construct a deep neural network. The fundamental components of a CNN are multiple convolutional layers. Several two-dimensional image patches with a specific window size were fed into the first layer, referred to as the input layer. The CNN window size determines the size of image patches used to train the model, thereby affecting its feature extraction capability, computational efficiency, and final performance. A convolutional layer performs the dot product between the input two-dimensional vector and the convolution kernel that contains learnable weights and biases. Subsequently, an activation function is applied to introduce a nonlinear transformation. In the convolutional layers, various feature maps are generated as several different filters slide across the input vector. Following the convolutional layers, the pooling layer serves to reduce computational complexity and the number of parameters, thereby mitigating the risk of overfitting. Specifically, the operations performed in the convolutional layers can be expressed by Equation (5) [55,56].

O^{l} = p o o l_{p} (σ (O^{l - 1} * W^{l} + d^{l}))

(5)

where O^l−¹ denotes the input feature map to the lth layer; W^l and d^l represent the weights and biases of the layer, respectively, which convolve the input feature map through linear convolution; and σ(·) indicates the non-linearity function outside the convolutional.

Fully connected layers, as the final layer of the CNN, receive inputs corresponding to the flattened one-dimensional vector derived from the last pooling layer. Complex CNN models often face the challenge of overfitting. Several regularization techniques, such as dropout (i.e., randomly deactivating some neurons) and batch normalization (i.e., normalizing the input vector), are commonly employed to mitigate overfitting [57]. Within a CNN architecture, these layers are systematically integrated, with the number of each layer type typically determined by the specific requirements of the application. During forward propagation, the CNN model extracts abundant data information by transforming the input data through successive convolutional blocks, extracting increasingly abstract and informative features at each stage, and ultimately producing the final output through the fully connected layers. Throughout this process, the model’s weights and biases are initially set to random values. In the backward propagation process, the CNN model iteratively updates the learnable parameters by minimizing the loss function, which quantifies the error margin between the model’s prediction and the actual target value [58]. Through this iterative optimization process, the CNN gradually enhances its predictive accuracy.

In addition, applying the CNN model to solve regression problems requires the selection of an appropriate activation function. The widely used linear activation function enables the neural network to map inputs directly to outputs without undergoing complex mathematical transformations. Although it demonstrates efficiency in simpler regression tasks, it is limited in its ability to capture complex nonlinear relationships. In this study, the recognized promising regression method, the random forest (RF) regressor, is integrated with the CNN model to enhance the predictive capability for water turbidity.

Random forest is a tree-based ensemble machine learning method [59]. This method combines several individual classification and regression trees (CARTs) through bagging ensemble technology. Each CART, constructed with bootstrap samples rather than the entire original dataset, recursively partitions the feature space at each node to group similar targets and ultimately form binary trees [60]. In regression tasks, the loss function commonly uses mean squared error, as defined in Equation (6), as a criterion to minimize for splitting each node. Finally, the random forest regressor generates the prediction result by averaging the regression prediction results of all individual trees.

H (Q_{m}) = \frac{1}{N_{m}} \sum_{y \in Q_{m}} (y - {\bar{y}}_{m})^{2}

(6)

where H(·) is the loss function that mean square error, Q_m represents the datasets at node m, N_m is the number of samples at node m, and y and

{\bar{y}}_{m}

respectively represent the original and mean observed turbidity values at node m.

To compare the accuracy of the proposed model with that of other models, we selected several other popular, high-performance machine learning models in this study, including support vector regression (SVR), back-propagation neural network (BP), and extreme gradient boosting (XGBoost). Support Vector Regression (SVR) is a popular supervised learning method derived from statistical learning theory, and it can be applied to regression tasks. SVR is able to transform training datasets into a higher-dimensional feature space through the use of a kernel function [61]. BP is a flexible algorithm for modeling based on a multi-layer perceptron, which consists of input, hidden, and output layers, each comprising a set of neurons. BP utilizes a back-propagation algorithm to dynamically update the learnable parameters based on back-propagation errors [62]. XGBoost is a boosting-based ensemble learning method that consists of several regression decision trees as base regressors. It constructs a new decision tree based on the negative gradient direction of the loss function from the previous decision tree, and gradually reduces the loss through iterative methods. The average value of the predicted values from all base regressors is adopted as the final prediction result [63].

2.3.3. Research Framework for Turbidity Estimation Based on the Proposed Models

As illustrated in Figure 2, the primary workflow for turbidity estimation in this study comprised three key components: data acquisition, optical water classification, and deep learning model construction with spatial turbidity mapping. In this paper, the models were constructed using the Python 3.10 programming language within the PyCharm 2022.2.1 (JetBrains N.V., Amsterdam, The Netherlands) software. The environment configuration was based on Anaconda 3, where TensorFlow 2.15 was installed as the deep learning framework. Each process was implemented as follows:

(1): Two types of datasets were generated by matching Sentinel-2 images with in situ points, comprising the point data with spectral reflectance and the two-dimensional image patches for constructing deep learning models.
(2): The fuzzy c-means cluster method was applied to the point data with Sentinel-2 spectral reflectance for optical water classification. The FCM method automatically partitioned samples based on the similarity of spectral characteristics. After identifying the clusters, the spectral reflectance of cluster centroids and a matrix encompassing each data point’s membership grades were calculated for each cluster. The dominant OWT for each point was determined as the cluster with the highest membership value. Subsequently, two-dimensional image patches corresponding to the classified sample points were grouped according to their assigned OWTs and utilized to train distinct turbidity prediction models.
(3): We combined the CNN model and RF algorithm to form the CNN-RF model, which was employed to predict the water turbidity based on OWTs. To balance model complexity and generalization, the CNN framework incorporated two identical convolutional blocks. Each block consisted of a convolutional layer, a batch normalization layer, a max-pooling layer, and a dropout layer. The convolutional layers in the first and second blocks used 32 and 64 filters, respectively, with a 3 × 3 window size to generate feature maps. These two blocks were followed by a flattening layer and a fully connected layer containing 128 neurons. During the convolutional process, the spatial and spectral features of the input images were extracted. Subsequently, the output of the fully connected layer was supplied as input to a random forest algorithm, which served as a regressor for predicting water turbidity.
(4): The FCM algorithm was applied to classify the samples to be predicted into a specified number of OWTs, with each type assigned a corresponding membership value. Water turbidity was predicted using pre-trained CNN-RF models for each OWT, yielding turbidity estimates corresponding to each OWT. The Euclidean distance was employed to assess the similarity between the sample’s reflectance and the spectral centroid of each OWT. A smaller distance indicates a higher similarity, which leads to a higher membership value for that OWT. These membership values were subsequently used as weighting factors for the turbidity predictions generated by the respective CNN-RF models. The weighted sum of all OWT-based turbidity predictions generated the final blending result.
(5): For the application of Sentinel-2 imagery in water type classification and turbidity mapping, the image spectra were used as input for the FCM and CNN-RF models. For water classification, each pixel was assigned to a specific water type based on the maximum membership grade, which was derived from membership functions that calculated the distance between the reflectance of individual pixels and the spectral centroids of the in situ measured data. For turbidity mapping, each classified pixel with membership grades was predicted by the OWT-based CNN-RF models, and the blending retrieval result was the weighted sum of all OWT-based turbidity predictions using membership values.

Figure 2. Flowchart of turbidity estimation based on optical water type classification using Sentinel-2 MSI data. Conv2D, BN, Pool, Dropout, Flatten, and FC represent the convolutional layer, batch normalization layer, max-pooling layer, dropout layer, flatten layer, and fully connected layer, respectively.

2.3.4. Model Accuracy Evaluation

The predictions by different models were compared using the coefficient of determination (R²) and root mean square error (RMSE) as evaluation metrics. R² represents the proportion of total dependent variable variance explained by the independent variables in the model, thus assessing the effectiveness of a regression model in predicting the outcome variable. RMSE measures the difference between the predicted and actual values. A smaller RMSE suggests a smaller discrepancy between the predicted and actual values, indicating more accurate predictions by the model. Their specific calculation formulas are as follows:

R^{2} = 1 - \sum_{i = 1}^{n} {(y_{i} - y_{i}')}^{2} / \sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}

(7)

R M S E = \sqrt{\sum_{i = 1}^{n} {(y_{i} - y_{i}')}^{2} / n}

(8)

where n represents the number of samples, while

y_{i}

and

y_{i}'

refer to the observed and predicted turbidity values, respectively.

\bar{y}

is the average of the observed turbidity values.

3. Results

3.1. Spectral Fuzzy Clustering Using in Situ Data

Figure 3a shows the PE and XB indices based on different numbers of clusters. As the number of clusters increased, the PE and XB indices showed monotonically increasing trends. At higher values, these two cluster validity measures indicate a higher degree of fuzziness and greater overlap in the clusters. Thus, the optimal number of clusters was determined as 3 since it exhibited the minimum PE and XB. With three clusters specified, the relationship between data points and cluster centers was overall optimal in terms of the fuzziness of membership grades and the compactness and separateness of clusters.

The mean spectral reflectance of cluster centroids reflected the apparent optical properties (AOPs) of different OWTs, which depended on the absorption and scattering properties of the water constituents. As shown in Figure 3b, these spectral curves of in situ points exhibit the typical optical properties of Case II inland waters. OWT 1 exhibited lower mean spectral reflectance, and its spectral curve was flatter from 665 nm onward compared to that of other OWTs. The turbidity, chlorophyll a (Chl-a), and total suspended matter (TSM) of OWT 1 averaged the lowest at 27.975 NTU, 8.807 μg/L, and 25.271 mg/L, respectively (Table 2). Low particle concentration and water absorption contributed to the spectral reflectance features of OWT1. OWT 2 and OWT 3 produced similar spectral curves, with two reflection peaks around 560 nm and 705 nm and an absorption valley near 665 nm. Specifically, the reflection peaks were mainly caused by the strong scattering of particles from phytoplankton and sediment sources, and the valley around 665 nm can be attributed to the absorption of Chl-a. In addition, OWT 3 exhibited a slightly deeper absorption valley around 665 nm and higher peaks near 705 nm and 783 nm than OWT 2. These differences might stem from the higher Chl-a concentration in OWT 3. However, the spectral curve differences were insignificant, and their mean Chl-a concentrations were relatively close (12.360 μg/L for OWT 2 and 15.590 μg/L for OWT 3; Table 2). Moreover, the mean spectral reflectance of OWT 3 was higher than that of other OWTs, with the highest turbidity (57.214 NTU) and TSM (50.607 mg/L; Table 2).

3.2. Turbidity Estimation Model Construction and Performance Evaluation

3.2.1. Base Regressor Construction

The CNN and RF models were combined into the base regressor for the proposed blending algorithm. Figure 4 shows the accuracy of CNN models that utilized input image patches with window sizes of 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and 15 × 15. Considering the requirement for sufficient samples in the construction of deep learning models, we used the total samples to implement a 3-fold cross-validation method for assessing the influence of window sizes on prediction accuracy. As the window size gradually increased from 3 × 3 to 9 × 9, the cross-validation accuracy R² increased from 0.599 to 0.689, while the RMSE decreased from 18.555 to 16.852 NTU. The model achieved the best accuracy with a 9 × 9 window size, showing slight accuracy decreases with larger sizes. Thus, the optimal window size was determined as 9 × 9 to acquire training samples by sliding across the images during CNN model construction.

The performance of the base regressor models was evaluated by using 467 samples to construct the model and 201 independent samples to verify the precision. Figure 5 shows the calibration and validation accuracies of the CNN-RF model compared to those of the SVR, BP, XGBoost, RF and CNN models. The tree-based ensemble machine learning models (i.e., XGBoost, RF) and deep learning models (i.e., CNN, CNN-RF) performed well in the calibration set, all with R² exceeding 0.94 and RMSE below 10 NTU. On the validation set, among these two tree-based ensemble models, the RF model outperformed the XGBoost model, yielding a higher R² and lower RMSE of 0.720 and 19.914 NTU, respectively. A comparison of the two deep learning models showed that the CNN-RF model outperformed the CNN model, achieving a higher R² of 0.796 and a lower RMSE of 16.995 NTU. The SVR and BP models exhibited poorer performance in turbidity estimation. However, it should be noted that while deep learning algorithms achieved the highest accuracy, their model operational efficiency was lower than that of machine learning algorithms. According to the scatter distribution, most of the CNN and CNN-RF models’ validation samples clustered around the 1:1 line, with similar slopes (0.754 and 0.742, respectively). However, a minority of sample points remained relatively scattered, indicating a significant prediction deviation.

3.2.2. Performance Evaluation of the Blending Estimation Model Based on OWTs

The in situ points were divided into three groups based on the FCM method and the spectral reflectance characteristics of the OWTs. All points in each group were divided into calibration and validation sets for modeling and accuracy evaluation. Specifically, the calibration and validation sets included 120 and 52 points in the OWT 1 group, 226 and 97 points in the OWT 2 group, and 121 and 52 points in the OWT 3 group, respectively (Table 3 and Figure 6). The calibration accuracy of the CNN-RF models for each water type is presented in Table 3. The CNN-RF model achieved the highest accuracy for OWT 1 (R² = 0.955 and RMSE = 5.632 NTU), followed by OWT 3 (R² = 0.953 and RMSE = 10.180 NTU) and OWT 2 (R² = 0.942 and RMSE = 8.419 NTU). The CNN-RF model could predict turbidity for different water types, with all calibration accuracies R² exceeding 0.94. The turbidity predictions from all OWT-based models were blended using membership weights. As shown in Figure 6, the R² and RMSE on the validation set are 0.859 and 9.963 NTU for OWT 1, 0.849 and 13.833 NTU for OWT 2, and 0.857 and 16.597 NTU for OWT 3, respectively. The CNN-RF model exhibited significant prediction deviations in the low-value region for OWT 1, with the relative scattered points distributed on both sides of the 1:1 line. All three types of models produced underestimated predictions in the high-value region (turbidity > 85 NTU) but overestimated turbidity in the low-value region (turbidity < 50 NTU). Furthermore, Figure 7 shows that with the CNN-RF model not applying weighted blending to the prediction results of the three OWTs, the overall accuracy indices R² and RMSE on the validation set are 0.861 and 13.758 NTU, respectively. In contrast, the validation accuracy of the blending CNN-RF model weighted by membership grades improved significantly, with a higher R² of 0.900 and a lower RMSE of 11.698 NTU. The validation points were more closely aggregated around the 1:1 line, with a relatively high slope of 0.789, and the extent of underestimation or overestimation was mitigated.

3.3. Sentinel-2 Image Application

3.3.1. Applicability Analysis of the OWT Classification

The OWTs were mapped from Sentinel-2 images using the membership functions of the FCM method with image spectral as input. Figure 8 presents the membership maps and water type classification maps derived from Sentinel-2 images for the four typical water bodies. The membership maps illustrated the degree to which the pixels belonged to each of the OWTs. According to Figure 8a, the OWT 3 membership map showed strong membership in a spatially continuous region dominating the entire Chagan Lake, while the OWT 2 map showed strong membership in the lake’s northwest corner. The membership for OWT 1 showed weaker affiliation, with values below 0.1. The Chagan Lake water was classified as OWT 2 and OWT 3. Figure 8b,c show that Erlong Lake and Jingpo Lake are both classified into OWT 1 and OWT 2. Specifically, the spatial pattern of water types for these lakes was similar, in which OWT 1 was distributed in the southern part of the water bodies, while OWT 2 was distributed in the northern part. The Xingxingshao Reservoir, as illustrated in Figure 8d, has OWT 1 as the dominant type, which is predominant throughout the entire lake, with the membership values of most pixels exceeding 0.9.

3.3.2. Applicability Analysis of the Turbidity Estimation

As illustrated in Figure 9, the weighted blending CNN-RF model and unweighted models based on three OWTs are applied to Sentinel-2 images to produce water turbidity maps for four typical water bodies. The spatial distribution patterns of water turbidity generated by these two types of models were similar. Among the four water bodies, Chagan Lake had the highest water turbidity, followed by Erlong Lake, Jingpo Lake, and Xingxingshao Reservoir in that order. Nearly the entire Chagan Lake had relatively higher turbidity with values above 45 NTU, whereas its northwest corner had lower turbidity. In the Erlong Lake and Jingpo Lake, relatively higher turbidity was primarily distributed in their northern parts. The water in Xingxingshao Reservoir exhibited the lowest turbidity, approximately 10 NTU. In addition, the turbidity maps derived from the weighted blending model could effectively avoid discontinuous boundaries, especially in water body regions with significant turbidity variations, such as Chagan Lake (Figure 9e) and Erlong Lake (Figure 9f).

4. Discussion

4.1. Effectiveness of the FCM Method for OWT Classification

Several studies have demonstrated that a universal algorithm for water bodies with different optical characteristics often fails to deliver accurate predictions. Thus, we classified the water types prior to model construction. With intermixed optical signals from optically active substances in water, assigning unknown samples to definite classes based on spectral characteristics is limited in effectiveness. This study adopted a fuzzy classification approach, the FCM method, to efficiently and automatically assign observations to multiple OWTs with varying membership grades. Two cluster validity indices were used to effectively determine an appropriate number of clusters. The FCM method successfully distinguished spectral differences among waters with distinct optically active substances and turbidity levels. The grouped points provided better datasets for constructing the turbidity estimation models.

Nevertheless, some uncertainties remained in the clustering outcomes. The clustering results might have been affected by insufficient in situ data, as it was not possible to adequately characterize and distinguish all water conditions. In addition, the limited spectral resolution of the Sentinel-2 MSI sensor might omit detailed spectral characteristics that could help identify the absorption and scattering properties of the waters. This unsupervised clustering approach, which utilized Euclidean distance as an evaluation criterion, exhibited potential deviations that could tend to misclassify samples with indistinct spectral features, and the confused points used for modeling might negatively impact prediction accuracy. Regarding the application of the FCM method to Sentinel-2 images, the “hard” classification results in the figures presented reasonable and expected distribution patterns, with each pixel classified into the dominant OWT. Although some pixels exhibited higher membership grades in a specific OWT, they remained small compared to those of other OWTs. As a result, the classification outcomes corresponded to the OWT with the maximum membership grade across all classes.

4.2. Performance of Blending CNN-RF Model

The predictions showed that the hybrid CNN-RF models, used as the base regressor, achieved better turbidity estimation performance. The CNN model’s internal filters continuously captured abundant local spatial and spectral features of the input data within the sliding window through the layer-by-layer convolutional structure. The deeper layers of the CNN integrated these local features to more effectively model and explicate the relationship between water turbidity and spectral reflectance. The sliding window size, an important parameter in CNN-based feature extraction, directly influences feature presentation, which, in turn, impacts model performance. This study determined 9 × 9 as the optimal window size for the input image patches, which can balance model accuracy and computational efficiency. The inversion maps showed that the adopted window size can reflect the fine spatial distribution details of turbidity. The RF method can effectively handle the high-dimensional data generated by the CNN model, which is more effective and useful for nonlinear regression estimation than the CNN’s internal linear activation functions. The hybrid CNN-RF model effectively integrated CNN’s feature extraction capability and RF’s data handling and regression prediction abilities, thereby serving as a base regressor for precise turbidity estimation. The proposed weighted blending CNN-RF model based on OWTs demonstrated outstanding advantages in terms of model performance and remote sensing image-based applications compared with the model without water classification and the unweighted model with OWTs. The proposed model also showed strong robustness and generalization ability. Furthermore, compared to the prediction accuracy achieved by machine learning and deep learning models for water turbidity in previous studies, the proposed blending CNN-RF model demonstrated superior performance (Table 4). The machine learning algorithms were more commonly applied. Li et al. (2023a) [14] and Li et al. (2023b) [64] achieved turbidity estimation across a large geographical spatial scale, which encompassed the entire national scope, while other studies focused primarily on specific water bodies.

Compared with unweighted models based on OWTs, the turbidity maps derived from the weighted blending model better maintained the continuous spatial distribution of turbidity. Specifically, each OWT-based CNN-RF model’s learnable parameters were updated to correspond to the specific OWT for turbidity estimation. As a result, the turbidity predictions were closer within specific water types but distinct across different OWTs. Consistently, the spatial distribution maps demonstrated that the differences in distribution patterns were also not significant among specific water types. Notably, blending the predictions of models for various OWTs through membership grade weights effectively and significantly reduced the uncertainty and deviation in the turbidity predictions by avoiding the rigid classification of water bodies into distinct types. The turbidity maps revealed that the typical water bodies with high turbidity were around croplands and human settlements. Human activities frequently cause soil disturbance, and domestic or agricultural wastewater might flow into these water bodies. These factors resulted in increased particulate matter concentrations or mesotrophic or eutrophic water, which, in turn, increased water turbidity.

However, the proposed blending turbidity estimation model had limitations related to its predictive accuracy, generalization capability, and application scope. Specifically, in this study, considering the abundance and quality of the dataset, we selected the appropriate Sentinel-2 images corresponding to the in situ data within a ±7-day period. Although the water quality of lakes and reservoirs in Northeast China is relatively stable, the time difference in image matching might lead to prediction inaccuracies. This is because it creates a mismatch between the turbidity measurements and the actual reflectance values at the time of image capture. Moreover, the sample points used for model construction were mainly collected from the water bodies in Northeast China. Despite capturing the typical reflectance characteristics of inland waters with varying turbidity, these samples were limited in the ability to adequately characterize the specific optical properties of waters in other geographic regions. This limitation, in turn, could affect the generalization capability and application scope of the proposed model. Furthermore, the proposed deep learning model was developed after optical water classification. This process could lead to a significant reduction in the number of samples available for each water type compared to the total dataset. Consequently, it was necessary to consider the risk of overfitting during model development, though it was not observed in this study. Thus, additional efforts can be devoted to the collection of in situ samples across several geographic regions with varying environmental conditions in subsequent studies. The expanded sample size and diversity were better able to support the construction of OWT-based deep learning models, which had the potential to enhance the performance and practical applicability of the developed model.

5. Conclusions

This study combined the spectral clustering of optical water types with the CNN-RF deep learning model for turbidity estimation using Sentinel-2 images. The acceptability and feasibility of the FCM method were verified to classify three optical water types based on spectral reflectance. The hybrid CNN-RF model successfully combined CNN’s proficiency in feature extraction with RF’s strengths in high-dimensional data processing and regression prediction, enabling it to serve as a base regressor for accurate turbidity estimation. A weighted sum of the turbidity predictions generated by the OWT-based CNN-RF model was computed, with the weight assigned to each OWT being the corresponding membership grade derived from the FCM clustering model. This approach effectively mitigates uncertainty and reduces prediction deviation by avoiding the rigid categorization of water bodies into discrete types. The proposed weighted blending CNN-RF model exhibited strong robustness and generalization ability, with the highest turbidity prediction accuracy (R² = 0.900 and RMSE = 11.698 NTU). The turbidity maps for four typical water bodies derived from the proposed model better retained the continuous spatial distribution of turbidity and effectively revealed the water quality condition. The OWT-based weighted blending CNN-RF model may exhibit broad applicability to similar inland water bodies. Furthermore, it can be expanded to encompass data from additional sensors with spectral properties similar to the Sentinel-2 MSI sensor. These findings shed light on the application of deep learning models in water turbidity estimation and effectively enhance the remote sensing technology for water quality monitoring.

Author Contributions

Conceptualization, Y.M. (Yue Ma); methodology, Y.M. (Yue Ma) and Q.C.; software, Q.C. and Q.Z.; formal analysis, Y.M. (Yue Ma) and Q.C.; investigation, Q.Z., Q.C. and Y.M. (Yongchao Ma); resources, K.S. and Q.Y.; data curation, Y.M. (Yue Ma) and Q.C.; writing—original draft preparation, Y.M. (Yue Ma); writing—review and editing, Q.C.; visualization, Q.Y., Q.Z., Q.C. and Y.M. (Yongchao Ma); supervision, Y.M. (Yue Ma); project administration, Y.M. (Yue Ma); funding acquisition, Y.M. (Yue Ma) and Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (42201433), and Capital Construction Funds for Innovation Capacity of Jilin Province (2023C036-1).

Data Availability Statement

The Google Earth Engine dataset is available at https://developers.google.com/earth-engine/datasets (accessed on 16 October 2024).

Acknowledgments

The State Key Laboratory of Black Soils Conservation and Utilization, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, provided persistent assistance for field sampling and laboratory analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gleick, P.H. Global freshwater resources: Soft-path solutions for the 21st century. Science 2003, 302, 1524–1528. [Google Scholar] [CrossRef] [PubMed]
Kuhn, C.; de Matos Valerio, A.; Ward, N.; Loken, L.; Sawakuchi, H.O.; Kampel, M.; Richey, J.; Stadler, P.; Crawford, J.; Striegl, R.; et al. Performance of Landsat-8 and Sentinel-2 surface reflectance products for river remote sensing retrievals of chlorophyll-a and turbidity. Remote Sens. Environ. 2019, 224, 104–118. [Google Scholar] [CrossRef]
Deng, X.Y.; Song, C.Q.; Liu, K.; Ke, L.H.; Zhang, W.S.; Ma, R.H.; Zhu, J.Y.; Wu, Q.H. Remote sensing estimation of catchment-scale reservoir water impoundment in the upper yellow river and implications for river discharge alteration. J. Hydrol. 2020, 585, 124791. [Google Scholar] [CrossRef]
Caballero, I.; Navarro, G.; Ruiz, J. Multi-platform assessment of turbidity plumes during dredging operations in a major estuarine system. Int. J. Appl. Earth Obs. 2018, 68, 31–41. [Google Scholar] [CrossRef]
Carey, C.C.; Ibelings, B.W.; Hoffmann, E.P.; Hamilton, D.P.; Brookes, J.D. Eco-physiological adaptations that favour freshwater cyanobacteria in a changing climate. Water Res. 2012, 46, 1394–1407. [Google Scholar] [CrossRef]
Pekel, J.F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef] [PubMed]
Akhtar, N.; Syakir Ishak, M.I.; Bhawani, S.A.; Umar, K. Various Natural and Anthropogenic Factors Responsible for Water Quality Degradation: A Review. Water 2021, 13, 2660. [Google Scholar] [CrossRef]
Duan, W.; Takara, K.; He, B.; Luo, P.; Nover, D.; Yamashiki, Y. Spatial and temporal trends in estimates of nutrient and suspended sediment loads in the Ishikari River, Japan, 1985 to 2010. Sci. Total Environ. 2013, 461–462, 499–508. [Google Scholar] [CrossRef]
Erena, M.; Domínguez, J.A.; Aguado-Giménez, F.; Soria, J.; García-Galiano, S. Monitoring Coastal Lagoon Water Quality through Remote Sensing: The Mar Menor as a Case Study. Water 2019, 11, 1468. [Google Scholar] [CrossRef]
Gao, P.; Pasternack, G.B.; Bali, K.M.; Wallender, W.W. Estimating suspended sediment concentration using turbidity in an irrigation-dominated southeastern California watershed. J. Irrig. Drain. Eng. 2008, 134, 250–259. [Google Scholar] [CrossRef]
Ayana, E. Determinants of Declining Water Quality; World Bank Group: Washington, DC, USA, 2019. [Google Scholar] [CrossRef]
McCarthy, M.J.; Muller-Karger, F.E.; Otis, D.B.; Méndez-Lázaro, P. Impacts of 40 years of land cover change on water quality in Tampa Bay, Florida. Cogent Geosci. 2018, 4, 1422956. [Google Scholar] [CrossRef]
Dogliotti, A.I.; Ruddick, K.G.; Nechad, B.; Doxaran, D.; Knaeps, E. A single algorithm to retrieve turbidity from remotely-sensed data in all coastal and estuarine waters. Remote Sens. Environ. 2015, 156, 157–168. [Google Scholar] [CrossRef]
Li, S.J.; Kutser, T.; Song, K.S.; Liu, G.; Li, Y. Lake Turbidity Mapping Using an OWTs-bp Based Framework and Sentinel-2 Imagery. Remote Sens. 2023, 15, 2489. [Google Scholar] [CrossRef]
Moore, G.K. Satellite remote sensing of water turbidity/Sonde de télémesure par satellite de la turbidité de l’eau. Hydrol. Sci. Bull. 2009, 25, 407–421. [Google Scholar] [CrossRef]
Ouyang, T.P.; Zhu, Z.Y.; Kuang, Y.Q. Assessing impact of urbanization on river water quality in the Pearl River Delta Economic Zone, China. Environ. Monit. Assess. 2006, 120, 313–325. [Google Scholar] [CrossRef]
Bustamante, J.; Pacios, F.; Diaz-Delgado, R.; Aragones, D. Predictive models of turbidity and water depth in the Doana marshes using Landsat TM and ETM+ images. J. Environ. Manag. 2009, 90, 2219–2225. [Google Scholar] [CrossRef]
Palmer, S.C.; Kutser, T.; Hunter, P.D. Remote sensing of inland waters: Challenges, progress and future directions. Remote Sens. Environ. 2015, 157, 1–8. [Google Scholar] [CrossRef]
Peterson, K.T.; Sagan, V.; Sloan, J.J. Deep learning-based water quality estimation and anomaly detection using Landsat-8/Sentinel-2 virtual constellation and cloud computing. GISci. Remote Sens. 2020, 57, 510–525. [Google Scholar] [CrossRef]
Petus, C.; Chust, G.; Gohin, F.; Doxaran, D.; Froidefond, J.M.; Sagarminaga, Y. Estimating turbidity and total suspended matter in the Adour River plume (South Bay of Biscay) using MODIS 250-m imagery. Cont. Shelf Res. 2010, 30, 379–392. [Google Scholar] [CrossRef]
Sebastiá-Frasquet, M.-T.; Aguilar-Maldonado, J.A.; Santamaría-Del-Ángel, E.; Estornell, J. Sentinel 2 Analysis of Turbidity Patterns in a Coastal Lagoon. Remote Sens. 2019, 11, 2926. [Google Scholar] [CrossRef]
Cui, M.Y.; Sun, Y.H.; Huang, C.; Li, M.J. Water Turbidity Retrieval Based on UAV Hyperspectral Remote Sensing. Water 2022, 14, 128. [Google Scholar] [CrossRef]
Wu, J.-L.; Ho, C.-R.; Huang, C.-C.; Srivastav, A.; Tzeng, J.-H.; Lin, Y.-T. Hyperspectral Sensing for Turbid Water Quality Monitoring in Freshwater Rivers: Empirical Relationship between Reflectance and Turbidity and Total Solids. Sensors 2014, 14, 22670–22688. [Google Scholar] [CrossRef]
Yang, H.B.; Kong, J.L.; Hu, H.H.; Du, Y.; Gao, M.Y.; Chen, F. A Review of Remote Sensing for Water Quality Retrieval: Progress and Challenges. Remote Sens. 2022, 14, 1170. [Google Scholar] [CrossRef]
Shen, X.; Feng, Q. Statistical model and estimation of inland riverine turbidity with Landsat 8 OLI images: A case study. Environ. Eng. Sci. 2018, 35, 132–140. [Google Scholar] [CrossRef]
Zhang, Y.Z.; Pulliainen, J.T.; Koponen, S.S.; Hallikainen, M.T. Water quality retrievals from combined Landsat TM data and ERS-2 SAR data in the Gulf of Finland. IEEE T. Geosci. Remote 2003, 41, 622–629. [Google Scholar] [CrossRef]
Silveira Kupssinskü, L.; Thomassim Guimarães, T.; Menezes de Souza, E.; Zanotta, D.C.; Roberto Veronez, M.; Gonzaga, L., Jr.; Mauad, F.F. A Method for Chlorophyll-a and Suspended Solids Prediction through Remote Sensing and Machine Learning. Sensors 2020, 20, 2125. [Google Scholar] [CrossRef] [PubMed]
Sipelgas, L.; Raudsepp, U.; Kõuts, T. Operational monitoring of suspended matter distribution using MODIS images and numerical modelling. Adv. Space Res. 2006, 38, 2182–2188. [Google Scholar] [CrossRef]
Song, K.S.; Wang, Q.; Liu, G.; Jacinthe, P.A.; Li, S.J.; Tao, H.; Du, Y.X.; Wen, Z.D.; Wang, X.; Guo, W.W.; et al. A unified model for high resolution mapping of global lake (>1 ha) clarity using Landsat imagery data. Sci. Total Environ. 2022, 810, 151188. [Google Scholar] [CrossRef]
Tang, W.; Zhao, C.; Lin, J.; Jiao, C.; Zheng, G.; Zhu, J.; Pan, X.; Han, X. Improved Spectral Water Index Combined with Otsu Algorithm to Extract Muddy Coastline Data. Water 2022, 14, 855. [Google Scholar] [CrossRef]
Uriarte, M.; Yackulic, C.B.; Lim, Y.; Arce-Nazario, J.A. Influence of land use on water quality in a tropical landscape: A multi-scale analysis. Landscape Ecol. 2011, 26, 1151–1164. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.S.; Wu, L.; Ren, H.Z.; Deng, L.C.; Zhang, P.C. Retrieval of Water Quality Parameters from Hyperspectral Images Using Hybrid Bayesian Probabilistic Neural Network. Remote Sens. 2020, 12, 1567. [Google Scholar] [CrossRef]
El Bilali, A.; Lamane, H.; Taleb, A.; Nafii, A. A framework based on multivariate distribution-based virtual sample generation and DNN for predicting water quality with small data. J. Cleaner Prod. 2022, 368, 133227. [Google Scholar] [CrossRef]
Yang, H.B.; Du, Y.; Zhao, H.L.; Chen, F. Water Quality Chl-a Inversion Based on Spatio-Temporal Fusion and Convolutional Neural Network. Remote Sens. 2022, 14, 1267. [Google Scholar] [CrossRef]
Pyo, J.; Duan, H.; Baek, S.; Kim, M.S.; Jeon, T.; Kwon, Y.S.; Lee, H.; Cho, K.H. A convolutional neural network regression for quantifying cyanobacteria using hyperspectral imagery. Remote Sens. Environ. 2019, 233, 111350. [Google Scholar] [CrossRef]
Du, C.G.; Wang, Q.; Li, Y.M.; Lyu, H.; Zhu, L.; Zheng, Z.B.; Wen, S.; Liu, G.; Guo, Y.L. Estimation of total phosphorus concentration using a water classification method in inland water. Int. J. Appl. Earth Obs. Geoinf. 2018, 71, 29–42. [Google Scholar] [CrossRef]
Cui, T.W.; Zhang, J.; Wang, K.; Wei, J.W.; Mu, B.; Ma, Y.; Zhu, J.H.; Liu, R.J.; Chen, X.Y. Remote sensing of chlorophyll a concentration in turbid coastal waters based on a global optical water classification system. ISPRS J. Photogramm. Remote Sens. 2020, 163, 187–201. [Google Scholar] [CrossRef]
Bhuyan, M.; Jayaram, C.; Menon, N.N.; Joseph, K.A. Satellite-Based Study of Seasonal Variability in Water Quality Parameters in a Tropical Estuary along the Southwest Coast of India. J. Indian Soc. Remote Sens. 2020, 48, 1265–1276. [Google Scholar] [CrossRef]
Menken, K.D.; Brezonik, P.L.; Bauer, M.E. Influence of Chlorophyll and Colored Dissolved Organic Matter (CDOM) on Lake Reflectance Spectra: Implications for Measuring Lake Properties by Remote Sensing. Lake Reserv. Manag. 2006, 22, 179–190. [Google Scholar] [CrossRef]
Zhang, F.F.; Li, J.S.; Shen, Q.; Zhang, B.; Tian, L.Q.; Ye, H.P.; Wang, S.L.; Lu, Z.Y. A soft-classification-based chlorophyll-a estimation method using MERIS data in the highly turbid and eutrophic Taihu Lake. Int. J. Appl. Earth Obs. Geoinf. 2019, 74, 138–149. [Google Scholar] [CrossRef]
Spyrakos, E.; O’Donnell, R.; Hunter, P.D.; Miller, C.; Scott, M.; Simis, S.G.H.; Neil, C.; Barbosa, C.C.F.; Binding, C.E.; Bradt, S.; et al. Optical types of inland and coastal waters. Limnol. Oceanogr. 2017, 63, 846–870. [Google Scholar] [CrossRef]
Davies-Colley, R.J.; Smith, D.G. Turbidity suspeni)ed sediment, and water clarity: A review. J. Am. Water. Resour. As. 2001, 37, 1085–1101. [Google Scholar] [CrossRef]
Bi, S.; Li, Y.M.; Xu, J.; Liu, G.; Song, K.S.; Mu, M.; Lyu, H.; Miao, S.; Xu, J.F. Optical classification of inland waters based on an improved Fuzzy C-Means method. Opt. Express 2019, 27, 34838–34856. [Google Scholar] [CrossRef] [PubMed]
Sharif, S.M.; Kusin, F.M.; Asha’ari, Z.H.; Aris, A.Z. Characterization of Water Quality Conditions in the Klang River Basin, Malaysia Using Self Organizing Map and K-means Algorithm. Procedia Environ. Sci. 2015, 30, 73–78. [Google Scholar] [CrossRef]
Moore, T.S.; Dowell, M.D.; Bradt, S.; Ruiz Verdu, A. An optical water type framework for selecting and blending retrievals from bio-optical algorithms in lakes and coastal waters. Remote Sens. Environ. 2014, 143, 97–111. [Google Scholar] [CrossRef] [PubMed]
Ren, M.; Liu, P.Y.; Wang, Z.H.; Yi, J. A Self-Adaptive Fuzzyc-Means Algorithm for Determining the Optimal Number of Clusters. Comput. Intel. Neurosc. 2016, 2016, 2647389. [Google Scholar] [CrossRef]
Garg, V.; Senthil Kumar, A.; Aggarwal, S.P.; Kumar, V.; Dhote, P.R.; Thakur, P.K.; Nikam, B.R.; Sambare, R.S.; Siddiqui, A.; Muduli, P.R.; et al. Spectral similarity approach for mapping turbidity of an inland waterbody. J. Hydrol. 2017, 550, 527–537. [Google Scholar] [CrossRef]
Fang, C.; Song, C.C.; Wen, Z.D.; Liu, G.; Wang, X.D.; Li, S.J.; Shang, Y.X.; Tao, H.; Lyu, L.L.; Song, K.S. A novel chlorophyll-a retrieval model based on suspended particulate matter classification and different machine learning. Environ. Res. 2024, 240, 117430. [Google Scholar] [CrossRef]
Ma, Y.; Song, K.S.; Wen, Z.D.; Liu, G.; Shang, Y.X.; Lyu, L.L.; Du, J.; Yang, Q.; Li, S.J.; Tao, H.; et al. Remote Sensing of Turbidity for Lakes in Northeast China Using Sentinel-2 Images with Machine Learning Algorithms. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9132–9146. [Google Scholar] [CrossRef]
Havens, T.C.; Bezdek, J.C.; Leckie, C.; Hall, L.O.; Palaniswami, M. Fuzzy c-Means Algorithms for Very Large Data. IEEE Trans. Fuzzy Syst. 2012, 20, 1130–1146. [Google Scholar] [CrossRef]
Chuang, K.-S.; Tzeng, H.-L.; Chen, S.; Wu, J.; Chen, T.-J. Fuzzy c-means clustering with spatial information for image segmentation. Comput. Med. Imag. Grap. 2006, 30, 9–15. [Google Scholar] [CrossRef]
Han, J.; Park, D.-C.; Woo, D.-M.; Min, S.-Y. Comparison of Distance Measures on Fuzzyc-means Algorithm for Image Classification Problem. AASRI Procedia 2013, 4, 50–56. [Google Scholar] [CrossRef]
Zhang, J.; Shen, L. An Improved Fuzzyc-Means Clustering Algorithm Based on Shadowed Sets and PSO. Comput. Intel. Neurosc. 2014, 2014, 368628. [Google Scholar] [CrossRef] [PubMed]
Verma, R.K.; Tiwari, R.; Thakur, P.S. Partition Coefficient and Partition Entropy in Fuzzy C Means Clustering. J. Sci. Res. Rep. 2023, 29, 1–6. [Google Scholar] [CrossRef]
Zhang, C.; Sargent, I.; Pan, X.; Li, H.P.; Gardiner, A.; Hare, J.; Atkinson, P.M. An object-based convolutional neural network (OCNN) for urban land use classification. Remote Sens. Environ. 2018, 216, 57–70. [Google Scholar] [CrossRef]
Romero, A.; Gatta, C.; Camps-Valls, G. Unsupervised Deep Feature Extraction for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1349–1362. [Google Scholar] [CrossRef]
Syariz, M.A.; Lin, C.-H.; Nguyen, M.V.; Jaelani, L.M.; Blanco, A.C. WaterNet: A Convolutional Neural Network for Chlorophyll-a Concentration Retrieval. Remote Sens. 2020, 12, 1966. [Google Scholar] [CrossRef]
Pu, F.L.; Ding, C.J.; Chao, Z.Y.; Yu, Y.; Xu, X. Water-Quality Classification of Inland Lakes Using Landsat8 Images by Convolutional Neural Networks. Remote Sens. 2019, 11, 1674. [Google Scholar] [CrossRef]
Friedman, J.H.; Meulman, J.J. Multiple additive regression trees with application in epidemiology. Stat. Med. 2003, 22, 1365–1381. [Google Scholar] [CrossRef]
Alnahit, A.O.; Mishra, A.K.; Khan, A.A. Stream water quality prediction using boosted regression tree and random forest models. Stoch. Env. Res. Risk A. 2022, 36, 2661–2680. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
Wang, J.J.; Shi, P.; Jiang, P.; Hu, J.W.; Qu, S.M.; Chen, X.Y.; Chen, Y.B.; Dai, Y.Q.; Xiao, Z.W. Application of BP Neural Network Algorithm in Traditional Hydrological Model for Flood Forecasting. Water 2017, 9, 48. [Google Scholar] [CrossRef]
Xie, D.; Qiu, Y.; Chen, X.; Zhao, Y.; Feng, Y. Evaluating Water Turbidity in Small Lakes Within the Taihu Lake Basin, Eastern China, Using Consumer-Grade UAV RGB Cameras. Drones 2024, 8, 710. [Google Scholar] [CrossRef]
Li, Y.; Li, S.J.; Song, K.S.; Liu, G.; Wen, Z.D.; Fang, C.; Shang, Y.X.; Lyu, L.L.; Zhang, L.L. Sentinel-3 OLCI observations of Chinese lake turbidity using machine learning algorithms. J. Hydrol. 2023, 622, 129668. [Google Scholar] [CrossRef]
Wang, Y.W.; Chen, J.; Cai, H.; Yu, Q.; Zhou, Z. Predicting water turbidity in a macro-tidal coastal bay using machine learning approaches. Estuar. Coast. Shelf Sci. 2021, 252, 107276. [Google Scholar] [CrossRef]
Leggesse, E.S.; Zimale, F.A.; Sultan, D.; Enku, T.; Srinivasan, R.; Tilahun, S.A. Predicting Optical Water Quality Indicators from Remote Sensing Using Machine Learning Algorithms in Tropical Highlands of Ethiopia. Hydrology 2023, 10, 110. [Google Scholar] [CrossRef]
Yang, W.L.; Fu, B.L.; Li, S.Z.; Lao, Z.N.; Deng, T.F.; He, W.; He, H.C.; Chen, Z.K. Monitoring multi-water quality of internationally important karst wetland through deep learning, multi-sensor and multi-platform remote sensing images: A case study of Guilin, China. Ecol. Indic. 2023, 154, 110755. [Google Scholar] [CrossRef]
Zhang, J.W.; Meng, F.; Fu, P.J.; Jing, T.T.; Xu, J.; Yang, X.Y. Tracking changes in chlorophyll-a concentration and turbidity in Nansi Lake using Sentinel-2 imagery: A novel machine learning approach. Ecol. Inform. 2024, 81, 102597. [Google Scholar] [CrossRef]
Singh, K.A.; Ryu, D.; Arora, M.; Tiwari, M.K.; Sahoo, B. Improving the accuracy of remotely sensed TSS and turbidity using quality enhanced water reflectance by a statistical resampling technique. Int. J. Appl. Earth Obs. Geoinf. 2025, 142, 104681. [Google Scholar] [CrossRef]
Kong, Y.; Jimenez, K.; Lee, C.M.; Winter, S.; Summers-Evans, J.; Cao, A.; Menczer, M.; Han, R.; Mills, C.; McCarthy, S.; et al. Monitoring Coastal Water Turbidity Using Sentinel2—A Case Study in Los Angeles. Remote Sens. 2025, 17, 201. [Google Scholar] [CrossRef]

Figure 1. Locations of four typical lakes and reservoirs and the spatial distribution of sampling points in waters where in situ measurements were made in Northeast China.

Figure 3. Assessment of cluster numbers based on cluster validity measures and the spectral reflectance of cluster centroids corresponding to different OWTs. (a) illustrates the values of PE and XB indices calculated based on different numbers of clusters; (b) illustrates the spectral reflectance of cluster centroids for distinct OWTs.

Figure 4. Cross-validation accuracy assessment of CNN models with different window sizes.

Figure 5. Comparison of accuracy among different base regressor models. (a) SVR, (b) BP, (c) XGBoost, (d) RF, (e) CNN, (f) CNN-RF.

Figure 6. Validation performance of CNN-RF models based on different OWTs. (a) CNN-RF model for OWT 1, (b) CNN-RF model for OWT 2, (c) CNN-RF model for OWT 3.

Figure 7. Validation performance of unweighted and weighted CNN-RF models based on OWTs.

Figure 8. Membership maps and optical water type classification results derived from the FCM model for typical water bodies using Sentinel-2 images. The first column presents Sentinel-2 images, the second to fourth columns show the membership maps of OWT 1, OWT 2, and OWT 3, respectively, and the last column displays OWT classification results. (a) Chagan Lake, (b) Erlong Lake, (c) Jingpo Lake, (d) Xingxingshao Reservoir.

Figure 9. Turbidity spatial distribution maps derived from unweighted and weighted blending models for typical water bodies using Sentinel-2 images. The first and second rows show the turbidity maps generated using unweighted and weighted blending models, respectively. (a,e) Chagan Lake; (b,f) Erlong Lake; (c,g) Jingpo Lake; (d,h) Xingxingshao Reservoir.

Table 1. Statistical summary of turbidity for sampling points from 2018 to 2024. Mean, S.D., Min., and Max. are in NTU. N denotes the number of collected samples in the specific year.

Sampling Year	N	Mean (NTU)	S.D. (NTU)	Min. (NTU)	Max. (NTU)
2018	188	36.051	31.686	0.532	136.597
2019	51	15.604	13.859	0.372	73.780
2021	111	83.737	39.281	4.303	192.486
2022	231	38.443	36.574	0.159	187.197
2023	19	57.756	19.409	25.838	102.843
2024	68	20.475	10.698	7.821	67.077
Total	668	42.271	38.026	0.159	192.486

Table 2. Statistical summary of in situ water quality parameters for several OWTs. Min, Max, Mean, and STD denote the minimum, maximum, average, and standard deviation, respectively.

OWTs	Statistics	Turbidity (NTU)	Chla (μg/L)	TSM (mg/L)
1	Min	0.159	0.474	0.727
	Max	102.843	47.383	54.500
	STD	26.630	9.465	18.944
	Mean	27.975	8.807	25.271
2	Min	0.372	0.085	0.571
	Max	187.197	107.996	205.333
	STD	35.173	18.979	40.713
	Mean	42.010	12.360	33.735
3	Min	1.055	0.061	0.857
	Max	192.486	58.908	390.000
	STD	46.250	16.469	47.117
	Mean	57.214	15.590	50.607

Table 3. Calibrated performance of CNN-RF models based on different OWTs. N denotes the number of samples in the calibration dataset.

OWTs	N	R²	RMSE (NTU)
1	120	0.955	5.632
2	226	0.942	8.419
3	121	0.953	10.180

Table 4. Comparison of quantitative turbidity estimation models in existing studies. N denotes the number of samples used for model development.

References		N	Turbidity Range (NTU)	Best Model	R²	RMSE (NTU)
Ma et al. (2021)	[49]	187	0.83–112.26	GBDT	0.88	9.90
Wang et al. (2021)	[65]	94	21.30–140.80	ANN	0.87	10.83
Li et al. (2023a)	[14]	484	0.00–282.74	BP-TURB	0.88	4.42
Li et al. (2023b)	[64]	1081	0.15–262.57	RF	0.92	12.65
Leggesse et al. (2023)	[66]	286	0.27–344.00	RF	0.80	7.82
Yang et al. (2023)	[67]	263	60.00–100.00	GB	0.75	0.51
Zhang et al. (2024)	[68]	360	2.87–31.43	XGBoost	0.79	2.18
Singh et al. (2025)	[69]	220	3.22–576.00	RF	0.77	33.16
Kong et al. (2025)	[70]	168	1.20–8.10	RF	0.63	1.58
This study	-	668	0.16–192.49	CNN-RF	0.90	11.70

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, Y.; Chen, Q.; Song, K.; Yang, Q.; Zheng, Q.; Ma, Y. An Optical Water Type-Based Deep Learning Framework for Enhanced Turbidity Estimation in Inland Waters from Sentinel-2 Imagery. Sensors 2025, 25, 6483. https://doi.org/10.3390/s25206483

AMA Style

Ma Y, Chen Q, Song K, Yang Q, Zheng Q, Ma Y. An Optical Water Type-Based Deep Learning Framework for Enhanced Turbidity Estimation in Inland Waters from Sentinel-2 Imagery. Sensors. 2025; 25(20):6483. https://doi.org/10.3390/s25206483

Chicago/Turabian Style

Ma, Yue, Qiuyue Chen, Kaishan Song, Qian Yang, Qiang Zheng, and Yongchao Ma. 2025. "An Optical Water Type-Based Deep Learning Framework for Enhanced Turbidity Estimation in Inland Waters from Sentinel-2 Imagery" Sensors 25, no. 20: 6483. https://doi.org/10.3390/s25206483

APA Style

Ma, Y., Chen, Q., Song, K., Yang, Q., Zheng, Q., & Ma, Y. (2025). An Optical Water Type-Based Deep Learning Framework for Enhanced Turbidity Estimation in Inland Waters from Sentinel-2 Imagery. Sensors, 25(20), 6483. https://doi.org/10.3390/s25206483

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Optical Water Type-Based Deep Learning Framework for Enhanced Turbidity Estimation in Inland Waters from Sentinel-2 Imagery

Abstract

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Acquisition and Processing

2.2.1. Field Survey and Laboratory Measurements

2.2.2. Sentinel-2 MSI Data Acquisition and Processing

2.3. Methodology

2.3.1. Fuzzy C-Means Algorithms for Optical Water Classification

2.3.2. Base Regressor Algorithms for Turbidity Estimation

2.3.3. Research Framework for Turbidity Estimation Based on the Proposed Models

2.3.4. Model Accuracy Evaluation

3. Results

3.1. Spectral Fuzzy Clustering Using in Situ Data

3.2. Turbidity Estimation Model Construction and Performance Evaluation

3.2.1. Base Regressor Construction

3.2.2. Performance Evaluation of the Blending Estimation Model Based on OWTs

3.3. Sentinel-2 Image Application

3.3.1. Applicability Analysis of the OWT Classification

3.3.2. Applicability Analysis of the Turbidity Estimation

4. Discussion

4.1. Effectiveness of the FCM Method for OWT Classification

4.2. Performance of Blending CNN-RF Model

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI