1. Introduction
Freshwater resources are essential for sustaining ecosystems and human societies, serving vital functions including drinking water supply, agricultural irrigation, industrial uses, habitat provision, flood regulation, and biodiversity maintenance. However, increasing human pressures such as urban expansion, intensified agriculture, and land use changes have degraded freshwater quality through nutrient enrichment, chemical contamination, and disrupted hydrological regimes. These challenges contribute to widespread freshwater scarcity and pollution, posing significant risks to public health and economic activities worldwide. Consequently, robust monitoring approaches capable of detecting spatial and temporal variations across numerous water bodies are critically needed [
1].
While traditional in situ water quality sampling provides detailed physical, chemical, and biological measurements, these methods are often labor-intensive, costly, and limited by accessibility and weather conditions. The resulting constraints on spatial and temporal coverage hinder timely monitoring at large scales. Although in situ methods remain the benchmark for accuracy, their high operational costs and logistical challenges motivate the development of prediction models for surface water quality parameters (SWQPs) that balance monitoring precision with efficiency [
2,
3,
4]. Advances in computational technologies have positioned remote sensing as a powerful complementary approach, enabling cost-effective, time-efficient, and extensive monitoring even in remote or inaccessible regions [
5,
6,
7].
The use of remotely sensed optical imagery to estimate SWQPs has been widely demonstrated across diverse aquatic environments. Conventional approaches primarily employed regression models to correlate spectral data with water quality concentrations. However, the complex interplay of multiple pollutants often introduces nonlinear relationships that weaken the accuracy of these models. To overcome these limitations, machine learning techniques have gained prominence by better capturing complex patterns and improving retrieval performance for SWQPs [
3,
4,
8].
Sentinel-2 satellite imagery, with its high spatial resolution and revisit frequency, provides a valuable data source for monitoring water quality parameters such as chlorophyll-a, turbidity, and suspended solids. Emerging methodologies combining Sentinel-2 data with advanced machine learning and deep learning models have demonstrated enhanced capabilities for detailed and timely water quality mapping, supporting sustainable water resource management [
8].
In recent years, Machine Learning (ML) and Deep Learning (DL) have emerged as leading technologies for classification and prediction tasks across diverse fields, owing to their high predictive performance [
9]. Specifically, in the context of water quality assessment using remotely sensed data, ML models have demonstrated considerable effectiveness in capturing the complex, nonlinear relationships between water quality indicators and spectral reflectance across multiple dimensions [
10]. DL models, characterized by their multiple computational layers, exhibit enhanced generalization capabilities and superior learning performance, enabling more robust feature extraction and representation [
9].
Despite these advantages, ML and DL approaches are often considered “black-box” models due to their limited interpretability and opaque internal decision-making processes [
11]. Nevertheless, their adaptability and high predictive accuracy ensure their widespread adoption in remote sensing-based water quality monitoring. These methods support the prediction and mapping of surface water quality parameters for environmental management. ML and DL approaches differ in the types of models used, the parameters targeted, and how spatial and spectral information are handled.
The following section presents a comprehensive review of recent research efforts employing ML and DL methodologies to estimate and monitor SWQPs. This overview highlights the key advancements, methodologies, and findings that demonstrate the capabilities and limitations of these data-driven approaches in the context of remote sensing-based water quality assessment that motivate the customized CNN framework developed in this work.
2. Related Work
Because this study sits at the intersection of remote sensing, machine/deep learning, and water quality, we include references spanning a range of sensors (e.g., Sentinel-2, Landsat, UAV hyperspectral) and water-body types. Inland Sentinel-2 applications provide the primary methodological context for our work, while other sensors are cited to illustrate general ML/DL concepts that are transferable to our Sentinel-2, province-scale setting.
2.1. Machine Learning (ML)-Based Studies
Numerous machine learning techniques have been employed to model complex and nonlinear relationships in water quality estimation and monitoring. Among these, Support Vector Machines (SVMs), Random Forests (RFs), and Artificial Neural Networks (ANNs) are the most widely used. SVM and its variants have gained substantial attention in recent years for modeling and classifying water quality parameters. Deng et al. (2021) [
12] utilized SVM to forecast algal bloom dynamics in marine waters, while Najafzadeh and Niazmardi (2021) [
13] applied a multiple-kernel SVR to estimate oxygen demand in the Karun River, achieving acceptable accuracy in terms of coefficient of determination (R
2) and a root mean square error (RMSE). Sillberg et al. (2021) [
14] combined an attribute-realization algorithm with SVM to classify water quality in the Chao Phraya River, using parameters such as nitrate, coliform counts, biochemical oxygen demand, and dissolved oxygen, with the linear kernel showing superior performance. Arias-Rodriguez et al. (2021) [
15] used SVR with remote sensing data to estimate several parameters, including turbidity, and reported improved accuracy compared to simple regression models. Latin et al. (2022) [
16] compared five machine learning models for turbidity classification in the Paraopeba River using Sentinel-2 imagery and found that SVM achieved an outstanding accuracy of 96%. Further enhancements in SVM performance have been achieved by integrating bio-inspired techniques and fuzzy similarity analyses, as demonstrated by Xi et al. (2023) [
17], Dehkordi et al. (2024) [
18], and Jamshidzadeh et al. (2024) [
19], respectively. Among these studies, Latin et al. (2022) [
16] stands out for achieving the highest turbidity classification accuracy using Sentinel-2 data with SVM.
Ensemble learning methods, particularly Random Forest (RF), have been increasingly utilized to improve prediction robustness by aggregating multiple decision trees. Liu et al. (2021) [
20] applied RF to estimate chlorophyll-a concentrations in Poyang Lake. Li et al. (2022) [
21] used RF with Sentinel-2 imagery to predict total nitrogen and phosphorus levels in Chaohu Lake. Similar studies by Wang et al. (2022) [
22] used RF to model suspended matter and nutrient dynamics in estuarine and watershed environments. More recent applications include Ghasemi et al. (2023) [
23], who optimized RF for turbidity prediction in coastal waters, and Mishra et al. (2024) [
24], who achieved high accuracy in predicting chlorophyll-a and turbidity in the Ganga River using Sentinel-2 data. These results collectively underscore RF’s exceptional reliability in satellite-based water quality assessment.
Artificial Neural Networks (ANNs) are inspired by biological neural processes and excel at modeling nonlinear and multivariate environmental relationships. Since 2012, ANNs have been applied extensively to predict various surface water quality parameters, such as chlorophyll-a, turbidity, total suspended solids (TSS), biochemical oxygen demand (BOD), and nutrient concentrations. Early studies by Liu et al. (2012) [
25] and Antanasijević et al. (2013) [
26] demonstrated improved accuracy through expanded datasets and advanced network designs, such as recurrent and generalized regression networks. Hybrid optimization approaches incorporating Principal Component Analysis (PCA) and genetic algorithms further enhanced performance, as shown by Ding et al. (2014) [
27] and Qiao et al. (2016) [
28]. Subsequent research integrated ANN with remote sensing data for water quality prediction. Amanollahi et al. (2017) [
29] applied an ANN with Landsat-8 imagery to estimate multiple water parameters in the Zarivar Wetland, while Sharaf El Din et al. (2017) [
30] employed a back-propagation ANN for turbidity prediction in the Saint John River, achieving high accuracy. Zhang et al. (2020) [
31] developed a Self-Adapting Selection of Multiple Neural Networks (SSNNs) model for hyperspectral data, yielding high accuracy across six water quality parameters. Haribowo et al. (2020) [
32] and Lv et al. (2020) [
33] also reported strong multi-year and nitrate concentration forecasts using optimized ANN structures and MODIS data. More recent works by Arıman (2021) [
34] and Elsayed et al. (2021) [
35] incorporated spectral indices and satellite imagery to robustly estimate nutrients and pollutants, reinforcing ANN’s versatility and strength in remote sensing-based water quality assessment.
Integrating remote sensing data with machine learning models, such as SVM and ANN, is a common approach for estimating surface water quality parameters, yet these conventional or “shallow” algorithms encounter significant challenges due to data complexity and heterogeneity. The scarcity, inconsistency, and noise of ground-truth observations, affected by atmospheric conditions and environmental factors, limit the models’ generalization capabilities, especially across diverse water bodies with varying turbidity, depth, and bottom reflectance.
Additionally, the high dimensionality of multispectral and hyperspectral data, reliance on manually engineered spectral features, and issues related to mixed pixels further degrade prediction accuracy and sensor transferability. These models also require extensive hyperparameter tuning, are prone to overfitting with limited data, and often act as “black boxes,” providing minimal physical insight.
Moreover, they struggle to accurately predict non-optically active water quality parameters like biochemical oxygen demand (BOD) and chemical oxygen demand (COD), especially under conditions with cloud cover and temporal data gaps, which complicates consistent, large-scale monitoring efforts [
8,
36]. Furthermore, most of these models operate on pixel-wise spectral inputs, focus on single parameters, and do not explicitly exploit spatial context or provide interpretability regarding which bands drive the predictions.
Consequently, recent research has shifted toward deep learning, particularly CNNs, which can automatically extract hierarchical spectral–spatial features, capture nonlinear patterns, and integrate heterogeneous data sources with minimal manual intervention [
8,
36]. These gaps/limitations motivate our use of a spectral–spatial CNN that can simultaneously estimate multiple parameters and can be interpreted using SHAP band-importance analysis.
2.2. Deep Learning (DL)-Based Studies
Deep learning, particularly convolutional neural network (CNN)-based architectures, can overcome these limitations by learning multi-level spectral and spatial representations from multispectral and hyperspectral data, thereby effectively modeling intricate spectral–spatial dependencies. When augmented with temporal architectures such as Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks, these models can capture time-dependent variations in water quality parameters, providing a comprehensive understanding of dynamic environmental changes. Techniques like transfer learning and data augmentation further bolster model performance under conditions of limited labeled data, allowing models to leverage knowledge from larger datasets and generalize across different regions and sensor platforms [
17,
18]. Despite their inherently higher computational demands and the ongoing challenge of interpretability, deep learning models currently stand as the most effective tools for large-scale, data-driven water quality prediction, offering superior accuracy, finer spatial resolution, and enhanced generalization relative to traditional shallow algorithms. Their ability to process vast, complex datasets makes them invaluable for real-time, large-scale environmental monitoring and management efforts.
Deep learning techniques, particularly CNNs, have gained significant traction in water quality monitoring and remote sensing applications. Pu et al. (2019) [
37] utilized an AlexNet-style CNN to classify inland lake water quality levels from Landsat-8 imagery, achieving superior accuracy compared to traditional SVM and random forest classifiers, especially under limited training data through transfer learning. Syariz et al. (2019) [
38] applied CNNs to estimate chlorophyll-a concentrations in Laguna Lake, Philippines, confirming CNNs’ capability to model Chl-a variability despite sensor calibration challenges. Annala et al. (2020) [
39] trained a one-dimensional CNN to invert a stochastic model of leaf optical properties (SLOP) for chlorophyll-a and b retrieval, achieving strong correlations (R
2 = 0.97 and 0.95, respectively) from hyperspectral data. Wang et al. (2020) [
40] compared DenseNet, ResNet, and VGG architectures for water versus non-water classification, reporting that DenseNet offered the highest robustness in distinguishing water from shadows and clouds. Ilteralp et al. (2021) [
41] proposed a multitask semi-supervised CNN for Chl-a estimation that integrated monthly classification as an auxiliary task, enhancing temporal prediction consistency.
Zhang et al. (2021) [
42] further optimized CNN architectures for high-resolution water body segmentation using enhanced feature extraction modules to better delineate mixed and vegetated pixels. Lopez-Betancur et al. (2022) [
43] developed a hybrid CNN–Multiple Linear Regression (MLR) model using smartphone imagery to estimate total suspended solids (TSS) and turbidity, achieving remarkably high accuracies (R
2 = 0.982 and 0.972, respectively). Peña et al. (2023) [
44] introduced DeepAqua, a self-supervised CNN using SAR imagery and knowledge distillation to map wetland water extent without manual labels, emphasizing scalability for large datasets. Hong et al. (2023) [
45] designed a CNN framework to generate a global, daily, gap-filled chlorophyll-a dataset (2001–2021) through multi-sensor fusion, achieving strong agreement with reference data and improved spatial–temporal continuity. Ivanda et al. (2023) [
46] applied a one-dimensional CNN to estimate Secchi disk depth in the northern Adriatic Sea, reporting superior predictive power over alternative models. Zhu et al. (2024) [
47] employed Sentinel-2 imagery to invert chlorophyll-a in optically complex coastal waters, achieving R
2 = 0.779, outperforming four other algorithms.
Moon et al. (2024) [
48] used CNNs with multispectral satellite inputs to map TSS across large river and lake systems, achieving higher explained variance and lower RMSE compared to empirical models. Liu et al. (2025) [
49] extended CNNs to hyperspectral imagery for multi-parameter water quality detection, including DO, with high accuracy. Wang et al. (2020) [
50] developed a multiscale CNN for urban surface water extraction via Google Earth Engine, confirming CNNs’ scalability for automated mapping. Fang et al. (2024) [
51] showed that CNN performance in Chl-a retrieval improved when data were stratified by suspended particulate matter (SPM) levels. Recently, Pang et al. (2025) [
52] optimized a multiscale CNN using Sentinel-3 imagery for chlorophyll-a and total suspended matter (TSM) retrieval, attaining R
2 > 0.90, while Jeong et al. (2025) [
53] integrated optical Sentinel-2 and SAR Sentinel-1 imagery to estimate Chl-a across Korean lakes, achieving R
2 = 0.7992, thereby highlighting the benefit of multi-sensor fusion.
To conclude, these CNN-based studies show that deep learning can improve water quality mapping by using spectral–spatial features and, in some cases, multi-sensor inputs. Nevertheless, the majority of existing work targets one or two optically active parameters (often chlorophyll-a or turbidity), relies on pre-trained architectures, and rarely addresses non-optically active indicators such as DO in parallel with optically related variables. In this context, our study extends the field by proposing a customized CNN architecture that uses spectral–spatial patches to jointly estimate DO and TOC over a province-scale domain and to analyze the physical basis of the retrieval through SHAP.
3. Research Purpose and Contribution
This study expands recent developments in surface water quality mapping by proposing a customized CNN architecture developed entirely in MATLAB (R2024a) and trained on the Sentinel-2 and in situ data from New Brunswick to estimate both an optically related parameter total organic carbon (TOC) and a more indirectly related parameter dissolved oxygen (DO). Unlike most previous approaches that relied on pre-trained or transfer-learned models (e.g., AlexNet, ResNet, or hybrid CNN–LSTM frameworks), the proposed network is specifically designed and optimized to simultaneously address both optical and non-optical water quality parameters. This dual-focus framework enables a direct performance comparison between parameters governed primarily by spectral reflectance, such as TOC, and those influenced by complex biogeochemical dynamics, such as DO.
The primary objectives of this study are: (1) develop a province-scale Sentinel-2 mosaic and water body mask that provide a consistent spatial basis for surface water quality estimation; (2) implement MATLAB routines that synchronize in situ SWQPs measurements (TOC and DO) with the corresponding Sentinel-2 imagery, ensuring accurate alignment of each station with its spectral data; (3) develop a match-up dataset construction algorithm in MATLAB for constructing a four-dimensional (4D) spectral–spatial training dataset that captures local neighborhood information around sampling sites and can be used effectively by a CNN; (4) design and train the customized CNN in MATLAB to predict TOC and DO from the 4D Sentinel-2 in situ dataset and evaluate its predictive proficiency; and (5) generate high-resolution spatial distribution maps of TOC and DO over the extracted water bodies and assess whether the predicted patterns are consistent with known hydrological and ecological conditions.
The proposed MATLAB-driven pipeline ensures precise integration and alignment of remote sensing, deep learning, and field data, enabling robust modeling and high-resolution spatial visualization of water quality across the study area. By constructing a purpose-specific CNN architecture, the study contributes a novel, scalable, and effective framework for comprehensive water quality mapping, with potential applicability in diverse environmental monitoring contexts.
5. Methodology
The methodological framework of this study, illustrated in
Figure 4, systematically integrates the sequential processes for estimating SWQPs by combining Sentinel-2 satellite imagery with field-based measurements. This comprehensive workflow encompasses remote sensing data acquisition and preprocessing, laboratory analysis of collected water samples, matching-up multi-source data, development and training of a customized CNN model, and subsequent spatial mapping of predicted water quality parameters across New Brunswick Province.
5.1. Image Mosaicking
The Sentinel-2 imagery utilized in this study consisted of Level-2A products, which have been both geometrically and atmospherically corrected to provide surface reflectance data consistent with ground conditions. Each image’s twelve spectral bands were stacked into a single multi-band TIFF file, with all pixels resampled to a uniform spatial resolution of 10 m to ensure spatial consistency. The eighteen resulting multi-band images were then mosaicked using QGIS software (version 3.38.1), producing a seamless mosaic that comprehensively covers the entire study area.
5.2. Water Pixel Segmentation
A water mask is a binary raster layer in satellite imagery that differentiates water bodies from other land covers, with water pixels assigned a value of ‘1’ (white) and non-water pixels set to ‘0’ (black). Its primary function is to isolate water surfaces to avoid their spectral influence skewing analyses, as water exhibits distinct reflectance characteristics compared to terrestrial features. Various techniques exist for water masking, including band ratio indices like the Normalized Difference Water Index (NDWI), Modified NDWI (MNDWI), and Water Ratio Index (WRI), as well as machine learning-based unsupervised classifications, such as K-means clustering, which has been employed in this study. Applying a water mask is critical for improving accuracy in mapping surface water quality parameters (SWQPs), as it restricts modeling and analysis strictly to relevant water pixels, thereby enhancing data reliability and model performance [
58,
59,
60].
5.3. Laboratory Analysis and Estimation of SWQPs Concentrations
This section describes the water quality measurements for DO and TOC obtained from the Government of New Brunswick’s routine monitoring program, with sampling and laboratory analyses conducted by the provincial environmental laboratory according to APHA standard methods [
57]. The authors did not perform the laboratory analyses but used the quality-controlled results provided by the monitoring program for match-up with Sentinel-2 imagery.
5.3.1. Dissolved Oxygen (DO)
DO is one of the most critical indicators of water quality, reflecting the amount of oxygen available in water for aquatic organisms and chemical processes [
57]. It provides essential insight into the ecological health of aquatic systems and the degree of organic pollution. The measurement of DO is commonly performed using an electrochemical probe (such as a membrane electrode or optical sensor) based on the Winkler titration method or direct instrumental analysis. In the Winkler method, oxygen in the sample reacts with manganese sulfate and alkaline iodide-azide solutions to form a precipitate, which, upon acidification, releases iodine equivalent to the amount of dissolved oxygen. The released iodine is then titrated with sodium thiosulfate to determine DO concentration [
57].
According to the APHA standard procedure, the measurement steps include Sampling: Collect the water sample without air bubbles to prevent oxygen exchange. Fixation: Add reagents (manganese sulfate and alkaline iodide-azide) to fix the oxygen content. Acidification: Add sulfuric acid to dissolve the precipitate and release iodine. Titration: Titrate the released iodine with sodium thiosulfate to determine the DO concentration.
5.3.2. Total Organic Carbon (TOC)
TOC is an important analytical parameter that quantifies the amount of organic carbon present in a water sample. It serves as a key indicator of water quality and is widely applied in environmental monitoring, wastewater treatment, and ecological assessments. Essentially, TOC represents the total concentration of carbon bound within organic compounds in pure water and other aqueous systems [
61].
The measurement of TOC is typically performed using a carbonaceous analyzer, which oxidizes the organic carbon present in the sample to carbon dioxide (CO
2) through either wet chemical oxidation or catalytic combustion. The generated CO
2 is then quantified using an infrared detector, or alternatively, it may be converted to methane (CH
4) and measured by a flame ionization detector. The resulting CO
2 or CH
4 concentration is directly proportional to the amount of organic carbon in the sample [
61].
According to the analytical procedures outlined by [
57], the general steps for determining TOC are as follows: Sampling: Collect representative water samples for analysis. Oxidation: Employ a carbonaceous analyzer to convert organic carbon in the sample to CO
2 via catalytic combustion or wet chemical oxidation. Detection: Measure the generated CO
2 directly using an infrared detector or after conversion to CH
4 using a flame ionization detector. Calculation: Determine the TOC concentration by subtracting the inorganic carbon content from the total measured carbon. It is important to note that the above procedure provides a general framework, and specific analytical details may vary depending on the instrument model, calibration standards, and the methodological requirements prescribed by the testing laboratory.
5.3.3. Justification for Dissolved Oxygen (DO) and Total Organic Carbon (TOC) Focus
Dissolved Oxygen (DO) is a fundamental indicator of aquatic ecosystem health, reflecting the balance between oxygen production (photosynthesis and reaeration) and consumption (respiration and oxidation of organic matter). Low DO levels directly affect the survival, growth, and distribution of fish and invertebrates and are widely used in regulatory frameworks to identify stressed or impaired water bodies. Total Organic Carbon (TOC) quantifies the concentration of organic carbon present in the water column, integrating inputs from natural sources (e.g., wetlands, soils, vegetation) and anthropogenic activities (e.g., wastewater, land-use change). TOC influences light attenuation, thermal structure, and oxygen demand, and is therefore a key driver of biogeochemical processes in rivers and lakes [
57].
In this study, DO and TOC were selected for three main reasons. First, both parameters are routinely monitored across New Brunswick through the provincial water-quality program, providing a sufficiently dense and spatially extensive dataset to support training and validation of the proposed CNN. Second, DO is commonly treated as a non-optically active parameter, whereas TOC is more strongly linked to optically active constituents and water color, making them a complementary pair for assessing whether a single spectral–spatial architecture can handle both optical and non-optical indicators. Third, DO and TOC are directly relevant for management and regulatory decision-making in the study area, as they jointly characterize habitat suitability and organic matter dynamics in the province’s rivers and lakes. Other parameters, such as chlorophyll-a, turbidity, water clarity, total nitrogen, and total phosphorus, are also important for water-quality assessment and represent natural targets for future extensions of this framework.
5.4. Spatial and Temporal Match-Up Between Field Data and Satellite Imagery
To ensure temporal consistency between field measurements and satellite observations, a MATLAB-based match-up algorithm was developed to match the acquisition dates of Sentinel-2 images with the corresponding in situ sampling data. The code automatically reads the geographical coordinates (longitude and latitude) of each sampling station from the Excel dataset and identifies the specific Sentinel image covering that location. Given the seasonal sampling design of the Canadian Rivers Institute program and the revisit frequency of Sentinel-2, field measurements were matched to satellite acquisitions using a 2-week temporal window centered on each image date. For each station, all DO and TOC measurements falling within this window were retrieved, and the value with the closest acquisition date was assigned as the representative SWQP value. This approach provides a practical balance between temporal representativeness and sample size in a multi-season, province-wide analysis.
To evaluate the impact of the chosen window on Dissolved Oxygen, which is more sensitive to short-term meteorological and hydrodynamic variability, we compared DO summary statistics with |Δt| ≤ 7 days and 7 < |Δt| ≤ 14 days and found no obvious differences in mean and variance. Additionally, for DO, a sensitivity experiment using a restricted ±7-day window, where feasible, yielded very similar model performance (R2 and RMSE) to the full-window configuration, indicating that the CNN predictions are robust to moderate temporal offsets around the acquisition date. Nevertheless, we acknowledge that in highly dynamic reaches, a narrower matching window would be preferable, and future work will investigate same-day or sub-weekly synchronization strategies and temporally explicit deep learning architectures when higher-frequency in situ data become available. In cases where no field data are available within this temporal window, the algorithm reports “No available measurements” for the corresponding location.
5.5. Construction of 4D Training Dataset
The input data are organized in the form of a four-dimensional array with dimensions B × K × K × W, where each element represents a multispectral image patch. Specifically, B denotes the number of spectral bands, K × K corresponds to the spatial dimensions of each patch centered on the target pixel, and W represents the SWQP associated with that pixel, such as DO and TOC. Given that Sentinel-2 imagery provides 12 spectral bands, the value of B is set to 12. Considering that all pixels in the mosaic are standardized to a spatial resolution of 10 m, and the average width of water bodies within the study area is approximately 160 m, a patch size of 16 × 16 pixels was selected.
All twelve Sentinel-2 Level-2A spectral bands were retained as input channels in the 4D array in order to allow the CNN to learn spectral–spatial relationships without prior manual exclusion of potentially informative wavelengths. At this model scale, the inclusion of all bands did not constitute a computational bottleneck, and we therefore favored using the complete spectral information provided by Sentinel-2. This configuration ensures that each extracted patch adequately captures the full extent of typical water bodies, minimizing the occurrence of missing or non-data pixels within the batches and enhancing the spatial representativeness of the training data.
To construct the 4D array, a MATLAB-based code was developed. The process begins by reading the geographic coordinates (longitude and latitude) of each water sampling station from an Excel sheet, originally recorded in the WGS 84 system, and converting them into the UTM projection of the Sentinel-2 imagery for accurate spatial alignment. The corresponding pixel locations on the water body GeoTIFF images are then identified, and 16 × 16 patches are extracted around the central pixels. For each patch, spectral reflectance values from all twelve bands are collected, ensuring a comprehensive representation of the multispectral characteristics at each sampling location. This procedure is repeated for all stations to generate a consistent training dataset ready for CNN modeling.
5.6. The Proposed CNN Architecture Design
The proposed CNN architecture is a deep learning model specifically tailored for the accurate prediction of SWQPs from Sentinel-2 multispectral imagery. CNNs are suitable for water quality mapping [
45,
46,
47,
52]. Unlike traditional methods that depend on spectral indices, this architecture learns contextual and local features directly from small neighborhood patches within the imagery, thereby improving model robustness across diverse water conditions. To enhance performance, transfer learning techniques are employed, allowing the CNN to be fine-tuned using localized datasets, which improves accuracy in specific geographic regions. This approach promotes generalizability and facilitates real-time application in operational water management systems [
8,
36].
The proposed CNN was developed entirely in MATLAB (R2024a), distinguishing it from most prior approaches that typically depend on pre-trained models or transfer learning frameworks. Model inputs comprised pixel patches, each containing 12 spectral bands corresponding to the Sentinel-2 reflectance channels, with all data pre-normalized before training. The CNN architecture includes three convolutional layers, followed by a flattening layer and two fully connected dense layers. The first convolutional layer (Conv1) receives the input and applies 16 filters of size 3 × 3 with same padding and ReLU activation; since inputs are normalized, batch normalization is omitted, and no pooling is performed to preserve spatial resolution. The second convolutional layer (Conv2) employs 32 filters of 3 × 3 with same padding and ReLU activation, followed by batch normalization and a 2 × 2 max-pooling layer, reducing spatial dimensions. The third convolutional layer (Conv3) increases depth with 64 filters of 3 × 3, same padding, and ReLU activation, again followed by batch normalization and 2 × 2 max-pooling. The resulting feature maps are flattened into a 1024-length vector feeding into two fully connected layers: Dense1 contains 128 neurons with ReLU activation and 30% dropout for regularization, and Dense2 has 64 neurons with ReLU activation. The final output layer is a single neuron with a linear activation function to predict continuous SWQP values. Pooling operations are limited to those described to balance spatial detail preservation with computational efficiency.
To enhance the generalization capability of the CNN and mitigate overfitting, data augmentation was applied to the training patches through random geometric transformations. Specifically, random rotations within a range of −10° to +10° and random scaling between 0.8× and 1.2× of the original size were introduced, simulating variations in viewing angles and spatial resolution. These augmentations enable the network to learn more invariant and robust spatial-spectral representations of water quality features, especially valuable when training data is limited. Without data augmentation, the network would overfit to training patch orientations rather than learning generalizable spectral–spatial relationships. Augmented training improved generalization by 8–10% in R2 value, confirming its value beyond object detection for learning orientation-invariant surface water quality patterns.
Additionally, a dropout layer with a rate of 0.3 was applied after the first fully connected layer, randomly deactivating 30% of neurons during each training iteration. This technique prevents the CNN from over-relying on specific features, promotes more distributed learning, and reduces co-adaptation among neurons. The ensemble-like effect of dropout, where each training pass trains a slightly different subnetwork, improves prediction stability and robustness during inference when dropout is disabled. In this architecture, dropping approximately 38 neurons out of 128 in the first dense layer compels the remaining neurons to develop more meaningful feature representations, thereby enhancing the model’s ability to generalize across diverse water quality conditions.
Figure 5 illustrates the overall CNN architecture, highlighting convolutional feature extraction, flattening, and fully connected layers optimized for efficient and accurate SWQP prediction from Sentinel-2 imagery.
The CNN and associated preprocessing routines were implemented in the MATLAB environment to build on existing code developed for this study area. However, the architecture and workflow are platform-independent in principle and could be implemented in Python or other environments using standard deep learning libraries.
5.7. The Proposed CNN Training Algorithm
The proposed CNN model was trained using a custom training loop implemented in MATLAB (R2024a), employing the Adam (Adaptive Moment Estimation) optimization algorithm. Adam combines the advantages of Momentum and Root Mean Square Propagation (RMSProp), providing adaptive learning rates for each parameter and ensuring robust convergence. During the training process, the network parameters (weights and biases), collectively denoted by θ, are iteratively updated to minimize the loss function L(θ).
At each iteration
t, the gradient of the loss function with respect to the model parameters is computed as
where
represents the partial derivatives of the loss function
with respect to each trainable parameter
. The symbol
denotes the gradient operator, which measures how sensitively the loss changes with respect to changes in the network parameters [
62].
- 2.
Adam Optimization
Adam maintains two moving averages of past gradients to stabilize updates: the first moment estimate
representing the mean of gradients, and the second moment estimate
representing the uncentered variance [
63]. The full Adam update rule for each parameter
at iteration tis given by: For each parameter
at iteration
t, the Adam update is defined as
where
: Gradient (partial derivatives) of the loss function
with respect to each parameter
. It represents how much each parameter influences the loss,
: Gradient operator of
w.r.t. parameters
,
: Learning rate at iteration t,
: Exponential decay rate for the first moment (typically 0.9–0.95),
: Exponential decay rate for the second moment (typically 0.99–0.999),
: Small constant (e.g., 10
−8) added for numerical stability.
In this study, the gradient decay and squared gradient decay parameters were set to
and
, respectively. These values balance convergence speed and stability, preventing oscillations and overshooting during optimization [
63].
- 3.
Learning Rate Scheduling
To promote stable convergence and avoid oscillations near the optimal point, an exponential learning rate decay strategy was adopted. The learning rate at epoch tis given by:
where
= the initial learning rate, “decay Rate” = 0.90, “decay Step” = 300 epochs.
Thus, the learning rate decreases by 10% every 300 epochs, ensuring smoother convergence throughout training.
- 4.
RMSProp Background
RMSProp is an adaptive learning rate optimization algorithm used to train neural networks. It was developed by Geoffrey Hinton and introduced in his lecture notes (not an official paper), and it is now one of the standard optimizers used for deep learning. RMSProp improves on plain Stochastic Gradient Descent (SGD) by scaling the learning rate individually for each parameter based on the magnitude of recent gradients. This helps prevent large oscillations in parameter updates and makes training more stable, especially when dealing with nonstationary objectives as in CNNs [
64,
65]. At iteration t, the RMSProp updates are defined as
Adam improves upon RMSProp by incorporating momentum via first moment estimates () and bias correction terms.
- 5.
Loss Function
The CNN model was trained to minimize a masked Mean Squared Error (MSE) loss, which ignores invalid or missing data [
28,
29]. The loss function is defined as
where
: True (observed) value,
: Predicted value,
: Binary mask (1 for valid patch, 0 for invalid),
: Number of valid samples.
This masking ensures that missing or invalid data points do not influence the gradient computations or optimization process.
- 6.
Evaluation Metrics
To assess predictive accuracy and model generalization, two standard regression metrics were used: RMSE and R
2.
where
: Mean of observed values,
Both metrics were computed separately for the training, validation, and testing subsets to evaluate accuracy, robustness, and generalization of the proposed CNN model.
In developing the proposed CNN model, the patch dataset was dynamically partitioned at each training epoch into three subsets for training, validation, and testing using a random split at the station level. This strategy maximizes the use of the data and provides stable estimates of model skill; however, it does not explicitly control for spatial dependence among samples originating from the same river reach or lake. As a result, some degree of spatial autocorrelation between training and testing subsets may be present, and the reported performance metrics should be interpreted with this limitation in mind. More stringent spatially explicit validation strategies (e.g., leave-one-water-body-out or geographic blocking) are identified as important directions for future work to further quantify large-scale transferability.
The training subset was employed to optimize the network parameters by updating the weights and biases through the Adam (Adaptive Moment Estimation) optimization algorithm, which adaptively adjusts the learning rate based on the first and second moment estimates of the gradients [
63]. The validation subset was used to continuously monitor model performance, guide learning rate decay, and enable early stopping to prevent overfitting. Meanwhile, the testing subset was reserved for the independent evaluation of the model’s generalization performance on unseen data. This dynamic random partitioning approach improved the robustness of the training process and reduced sampling bias across epochs, resulting in a more reliable assessment of the model’s predictive capability for surface water quality estimation.
During training, model performance was monitored on the validation subset, and an early-stopping criterion was applied: if the validation loss failed to improve for a predefined number of epochs, training was stopped, and the parameters from the best epoch were retained. This procedure reduces the risk of classical overfitting to the training data, although it does not eliminate the potential optimism introduced by spatial autocorrelation under random data splitting.
6. Results and Discussion
6.1. Extraction of Water Bodies
In this study, the k-means clustering algorithm was employed to segment water pixels from Sentinel-2 imagery by grouping pixels based on spectral similarity. Among the generated clusters, the cluster identified as water exhibited the most consistent and coherent representation of water bodies. The initial k-means water/non-water segmentation was subsequently inspected and refined manually to correct obvious misclassifications, primarily along complex shorelines, in narrow channels, and at isolated artifacts such as cloud shadows or small islands. These edits consisted of local additions or removals of contiguous pixel groups from the water mask and affected only a small fraction of the total water-masking area (visually estimated at a few percent), with the vast majority of water pixels originating directly from the k-means clustering. Although this refinement improves the visual accuracy of the mask, it introduces an element of operator dependence.
A limitation of the current workflow is that the manual refinement of the k-means water mask was performed by a single operator, and we did not conduct a formal inter-operator reproducibility assessment. Given the relatively small proportion of pixels modified, we expect the influence on the CNN training dataset to be limited; however, future work will aim to reduce this subjectivity further by adopting fully automated or rule-based refinement strategies (e.g., morphological filtering, connectivity constraints, or deep learning–based water segmentation) that yield reproducible masks without manual editing.
This crucial step ensured accurate delineation of water boundaries, which is essential for reliable water quality analysis.
Figure 6 illustrates the refined water mask clipped to New Brunswick’s borders, clearly depicting the water bodies used for subsequent analyses. The use of k-means for water segmentation is well-supported in recent research, proving effective in distinguishing water surfaces with high accuracy.
In this study, after generating the final water mask, all spectral bands from the Sentinel-2 imagery were normalized to a 0 to 1 scale using QGIS software. This normalization standardizes the reflectance values across the entire dataset, ensuring uniform input data suitable for CNN training. The normalized water pixels were then structured into a four-dimensional (4D) array that facilitates batch extraction and efficient feeding into the CNN model. This preprocessing step is critical for maintaining consistency in input data, improving model convergence and prediction accuracy. The normalization approach used in QGIS aligns with best practices for satellite imagery preparation, preparing multispectral data for advanced machine learning applications.
6.2. Concentrations of SWQPs
The concentrations of SWQPs have been examined to remove outliers. Outliers may result from incorrect data entry, equipment malfunctions, or other measurement errors. Based on a 95% confidence level, the Z critical value of 1.96 was chosen. All datasets should fall within the interval of (μ ± 2σ). Any data with Z values greater than 1.96 or less than −1.96 are considered outliers and removed, where σ is the standard deviation of the population [
66,
67,
68].
Table 2 shows comprehensive descriptive statistics of the selected SWQPs. Concentrations ranged from 4.28 to 12.6 mg/L with a mean value of 8.659 mg/L for DO, and from 0.50 to 18.30 mg/L with a mean value of 6.414 mg/L for TOC.
6.3. MATLAB Implementation: In Situ Measurements and Satellite Imagery Match-Up
In this study, a MATLAB-based match-up algorithm was developed and implemented to align in situ water quality measurements with Sentinel-2 satellite imagery acquisition dates, ensuring temporal consistency for accurate model training. The algorithm matched field sampling stations to satellite images within a 2-week window; stations with multiple measurements in this period were assigned the SWQP value (with the closest acquisition date), while those without corresponding data were marked as “No available measurements”. This automated synchronization process provides a reliable, temporally coherent dataset for subsequent construction of four-dimensional input arrays and CNN training, thereby enhancing the overall accuracy and robustness of the predictive model. The MATLAB codes used in this study are available at Zenodo:
https://zenodo.org/records/18760111 (accessed on 24 February 2026).
6.4. MATLAB Implementation: 4D Array Construction
The construction of the 4D training dataset offers a comprehensive and automated framework for preparing input data tailored for CNN modeling. The developed code systematically extracts spatial patches centered around each water sampling station, retrieves the corresponding spectral reflectance values across all Sentinel-2 bands, and organizes this data into structured four-dimensional arrays.
These 4D arrays capture spectral, spatial (patch size), and temporal (sampling stations/SWQPs) dimensions, forming the essential input for CNN training. The MATLAB codes used in this study are available at Zenodo:
https://zenodo.org/records/18760111 (accessed on 24 February 2026). Additionally, an auxiliary script was utilized to export and visually inspect the extracted dataset, verifying patch integrity and ensuring minimal occurrence of no-data pixels, particularly across narrow river stretches. This rigorous validation process guaranteed the spatial representativeness and completeness of the training data, bolstering the dataset’s reliability before CNN design and training. By integrating automatic spatial patch extraction, spectral data retrieval, and quality control, the approach presents a robust foundation for accurate and efficient prediction of SWQPs.
6.5. CNN Data Splitting Ratios
The SWQPs dataset was divided into three distinct subsets to support the training and evaluation of the proposed CNN model. Specifically, 70% of the data was allocated for training to optimize the network parameters, 15% for validation to monitor model performance and guide hyperparameter tuning during the learning process, and the remaining 15% for testing to independently evaluate the model’s generalization capability on unseen samples. This structured partitioning facilitated a comprehensive assessment of the model’s effectiveness across different deep learning stages. A detailed summary of the data distribution, including the number of monitoring stations associated with each parameter, is presented in
Table 3.
6.6. Analysis of the CNN Models
The customized CNN architecture for SWQPs prediction successfully demonstrates strong performance on the available dataset. The network architecture features three convolutional layers, each utilizing 3 × 3 kernels with increasing filter depths, 16 in the first layer, 32 in the second, and 64 in the third, facilitating the extraction of fine-grained local spatial features in early layers and progressively more abstract, complex spectral–spatial representations in deeper layers. This kernel size was chosen to preserve spatial resolution and promote computational efficiency while maintaining stable gradient flows during training. The ReLU activation function was applied throughout convolutional and dense layers for its properties of accelerating convergence and mitigating vanishing gradient issues, thereby enabling effective learning of nonlinear relationships between spectral reflectance and SWQPs.
Training employed the Adam optimizer with an initial learning rate of 0.005, decayed by 10% every 300 epochs, tuned to balance rapid convergence and stability. The optimizer’s decay coefficients (β
1 = 0.92, β
2 = 0.99) further refined training behavior, while a maximum epoch limit of 5000 prevented overfitting. Data augmentation and dropout techniques were integrated to improve the model’s generalization.
Table 4 summarizes the detailed network configuration, including layer-wise filters or neuron counts, kernel sizes, activations, padding, pooling, batch normalization, dropout, and output shapes. This carefully crafted design enables the CNN to effectively capture complex spectral and spatial variations in water quality from multispectral Sentinel-2 data, ensuring robust prediction performance with balanced computational demands.
The trained CNN model was rigorously evaluated across training, validation, and testing datasets using the coefficient of determination (R2) and root mean square error (RMSE) metrics, confirming its accurate prediction capability for SWQPs. For DO, RMSE values were 0.220, 0.249, and 0.266 mg/L (3.07% relative error), with corresponding R2 values of 0.980, 0.975, and 0.974 across training, validation, and test sets. Convergence was reached at epoch 4417 after approximately 9 h of training. For TOC, RMSEs of 0.330, 0.603, and 0.440 mg/L (6.86% relative error) and R2 values of 0.993, 0.983, and 0.981 were achieved. The model converged at epoch 3866, requiring roughly 8.5 h of computation. These consistently high R2 values and low RMSEs across all datasets underscore the model’s robustness and its ability to capture complex spectral and spatial patterns in the Sentinel-2 imagery indicative of water quality. These <7% testing relative errors across New Brunswick’s diverse watersheds demonstrate robust generalization.
Visual comparisons between predicted and observed values illustrate the reliability and flexibility of the fully custom-built MATLAB implementation. The MATLAB codes used in this study are available at Zenodo:
https://zenodo.org/records/18760111 (accessed on 24 February 2026). This comprehensive evaluation demonstrates the CNN’s strong generalization ability and suitability for accurate SWQP estimation in diverse environmental settings, affirming the effectiveness of deep learning for water quality monitoring. The customized CNN architecture exhibited strong adaptability in predicting both optically active parameters like TOC and non-optically active parameters such as DO, underscoring its capability to integrate complex spatial-spectral information from Sentinel-2 multispectral imagery. Unlike traditional pixel-wise or regression-based approaches, the convolutional framework enables the network to capture contextual spatial dependencies by analyzing local neighborhoods of pixels. This allows the model to infer patterns related to water mixing, influences from adjacent land surfaces, and reflectance gradients, which are essential spatial characteristics impacting water quality. Such integration of local spatial context significantly enhances prediction accuracy and robustness, especially across heterogeneous and dynamic water bodies.
Visual fits (
Figure 7) and statistical indicators (
Table 5) confirm the stability and reliability of the CNN models, demonstrating strong predictive performance and generalization to unseen data. This highlights the advantages of CNNs in modeling complex environmental phenomena by leveraging hierarchical spatial-spectral feature extraction, leading to more accurate and operationally relevant water quality predictions.
While the CNN exhibits high predictive performance for DO and TOC across the randomly partitioned training, validation, and testing subsets, we recognize that the spatial structure of the dataset can lead to optimistic estimates when splits are not explicitly constrained by geography or hydrology. Because multiple stations may be located within the same river reach or lake, some spatial autocorrelation between subsets is likely, and the reported R2 and RMSE values may therefore overestimate performance under strict leave-water-body-out conditions. Future work will address this by implementing spatially explicit cross-validation schemes, such as geographic blocking or leave-one-water-body-out validation, to provide a more conservative assessment of the model’s ability to generalize to entirely unseen water bodies.
Although the proposed CNN exhibits high accuracy for both DO and TOC, this study did not perform a systematic benchmarking against alternative machine learning algorithms (e.g., Random Forest, Support Vector Regression, XGBoost) or empirical/semi-analytical retrieval models using identical training/test splits and input features. Consequently, the present work should be interpreted as demonstrating that a customized, patch-based CNN can achieve strong performance and spatially coherent maps at the provincial scale. A rigorous, multi-model comparison, incorporating tree-based ensembles and widely used empirical algorithms, represents an important direction for future research to more fully quantify the relative gains of deep learning in Sentinel-2-based water quality retrieval.
6.7. SHAP Analysis of the Proposed CNN
SHAP (SHapley Additive exPlanations) analysis was applied to interpret the CNN predictions and identify the contribution of each Sentinel-2 spectral band to SWQP estimation, providing transparency to the model’s “black-box” behavior. In this study, SHAP values were computed for all 12 spectral bands across the 16 × 16 pixel patches in the test dataset, revealing the spectral sensitivity of the CNN for DO and TOC predictions.
The SHAP workflow was conducted in Python using the trained CNN model exported from MATLAB. Normalized input patches ensured consistency with training conditions. To compute SHAP values, SHAP analysis was performed using Python 3.11 with SHAP 0.45.0 and ONNX Runtime 1.16.3. The trained MATLAB (R2024a) CNN was exported to ONNX format using exportONNXNetwork() to ensure exact preservation of architecture and weights. Numerical equivalence was verified by comparing MATLAB predict(net, X) against Python ONNX ort_session.run(X) on 10 validation patches (statistical metrics are identical).
The Deep Explainer method of SHAP estimated the additive contribution of each spectral band to individual patch predictions, which were then aggregated to produce summary plots and feature importance rankings.
Figure 8 and
Figure 9 illustrate the SHAP beeswarm plots for DO and TOC, respectively, with each dot representing a patch colored by feature value and positioned to indicate its impact on predictions.
Key findings include the notable contribution of NIR and Red Edge bands (B8A, B6, B8) to DO predictions, reflecting water properties influenced by vegetation, turbidity, and algal biomass. The dominance of NIR and red-edge bands in the SHAP ranking for DO is consistent with this indirect physical basis, reflecting the importance of phytoplankton, suspended matter, and water color patterns as proxies for oxygen production and consumption processes rather than direct optical detection of DO. Although Dissolved Oxygen (DO) is commonly classified as a non-optically active parameter, its spatial and temporal variability is tightly linked to processes that do have strong optical signatures. In particular, phytoplankton and aquatic vegetation (photosynthetic oxygen sources) affect reflectance in the green, red, and red-edge/NIR bands; suspended and dissolved organic matter influence red, red-edge, NIR, and SWIR reflectance and are associated with oxygen-consuming organic loads; and water mass properties, mixing regimes, and stratification imprint characteristic patterns in multispectral water color and local spatial texture. By ingesting 2D spectral–spatial patches, the proposed CNN can exploit these indirect but physically meaningful proxies, together with contextual information about adjacent land cover and watershed setting, to infer DO distributions, even though DO itself does not exhibit a distinct absorption feature in the Sentinel-2 bands.
Visible bands showed mixed impacts, with B4 (Red) and B3 (Green) positively associated with photosynthetic activity, while the coastal blue band (B1) slightly decreased predictions due to effects in shallow or turbid waters. Short-wave infrared bands (SWIR1 and SWIR2) contributed moderately by capturing water content and suspended solids, whereas the water vapor band (B9) had minimal influence. The observation that certain bands, such as the water vapor band B9, exhibit near-zero SHAP contributions indicates that they play little role in the final DO and TOC predictions and could be candidates for removal in streamlined models. In this study, we did not train ablation models with reduced band sets, since our primary objective was to evaluate the feasibility and accuracy of a full-spectrum, patch-based CNN framework. However, the SHAP-derived importance ranking provides a principled basis for future band-reduction experiments, in which subsets focusing on the most influential NIR, red-edge, visible, and SWIR bands could be tested to improve efficiency and possibly enhance robustness by eliminating redundancies.
This SHAP analysis not only quantifies spectral band importance but also provides ecological insights into the model’s decision-making process, enhancing confidence in its predictive ability for diverse water quality conditions. Overall, SHAP provides insight into how spectral bands contribute to the CNN predictions and helps interpret the links between Sentinel-2 reflectance and SWQP estimates.
The Sentinel-2 spectral bands play distinct and crucial roles in predicting water quality parameters like TOC, as supported by extensive research. The Narrow Near-Infrared (NIR) band (B8A) exhibits the strongest positive influence on TOC predictions since it is sensitive to suspended matter, algal biomass, and vegetation-related organic content. The Blue band (B2) also contributes positively, indicative of clearer water and surface scattering, enhancing TOC estimation. Short-Wave Infrared bands (SWIR1—B11, and SWIR2—B12) moderately affect predictions by capturing water absorption, sediment presence, and suspended solids. Red Edge bands (B5, B6, and B7) and the Green band (B3) detect vegetation, algal pigments, and chlorophyll, moderately influencing TOC estimations. The Coastal band (B1) and Water Vapor band (B9) have limited to mixed impacts due to atmospheric and shallow water effects. The Red band (B4) supports prediction by reflecting chlorophyll-related absorption. Collectively, these findings confirm that the CNN model leverages the spectral and ecological characteristics captured by Sentinel-2’s diverse bands to effectively model complex water quality dynamics like TOC. This spectral integration aligns with known bio-optical properties of water, supporting robust and interpretable remote sensing-based water quality assessments.
The SHAP analysis conducted in this study not only enhances the interpretability of the CNN model by quantifying the contribution of each Sentinel-2 spectral band to SWQP predictions but also provides strategic guidance for future sensor selection and band prioritization. By identifying the most influential bands, the analysis enables the potential reduction in spectral bands used in operational water quality monitoring without significantly compromising predictive accuracy. This reduction can streamline data processing and sensor design, optimizing resource use. Additionally, SHAP visualizations reveal non-linear interactions and synergistic effects between spectral bands and water quality parameters, offering deeper insights into complex spectral water quality relationships captured by the model. Understanding these interactions supports the design of more efficient and targeted remote sensing systems, improving both the cost-effectiveness and accuracy of water quality assessments. Thus, SHAP contributes not only to model transparency but also to practical decision-making in remote sensing application design and environmental monitoring strategies.
SHAP-based band importance also highlights opportunities to simplify the input configuration. While this work retained all 12 Sentinel-2 bands, future studies will explore ablation experiments guided by SHAP rankings to determine whether reduced band sets can maintain prediction accuracy while lowering computational cost and facilitating sensor design and band-selection decisions.
6.8. Spatial Mapping of Concentrations Using the Proposed CNN
The CNN-based spatial prediction of SWQPs was executed by processing the full Sentinel-2-derived water mask TIFF image on a pixel-by-pixel basis. The MATLAB codes used in this study are available at Zenodo:
https://zenodo.org/records/18760111 (accessed on 24 February 2026). Each pixel served as the center of a spatial patch, enabling the model to evaluate not only the pixel’s spectral reflectance across twelve Sentinel-2 bands but also the contextual spatial dependencies from its neighboring pixels. This patch-based approach enhances predictions by incorporating local spatial texture and water body heterogeneity, critical for accurately modeling spatial variations in water quality across diverse aquatic environments.
Such spatial context integration is essential since water quality indicators often exhibit local spatial correlations influenced by mixing patterns, shoreline effects, and sediment dispersal, which cannot be captured by pixel-wise analysis alone. This approach aligns with recent advances in remote sensing-based water quality monitoring using convolutional neural networks that leverage both spectral and spatial information for continuous mapping at high resolution over large water bodies. By applying the trained CNN model to each patch centered on every water pixel of the mask, the study generates fine-grained, spatially continuous water quality maps, thus providing practical outputs for environmental monitoring and decision-making.
Due to the large size of the Sentinel-2 image (approximately 68 GB), direct processing exceeded the available computing memory capacity. To overcome this, a custom MATLAB routine was developed to divide the image into smaller, manageable tiles. The trained CNN models were then sequentially applied to each tile, processing the water mask segment by segment until the entire image was covered. This tiling strategy enabled efficient handling of large-scale remote sensing data within memory limits while preserving the spatial continuity and ensuring complete coverage. Such an approach is commonly used in large geospatial data processing to balance computational feasibility and data integrity, allowing high-resolution spatial prediction without requiring excessive hardware resources.
The computations were performed on a workstation featuring an Intel Core i7-7820HQ processor (2.90 GHz, 8 cores) with 32 GB RAM. Despite the tiling optimization to handle the large Sentinel-2 image, the overall processing time exceeded 11 h. This lengthy runtime mainly results from the computational intensity of pixel-level evaluations for each spatial patch encompassing the water mask, reflecting the complexity of capturing spatial relationships and context around every pixel for accurate modeling of surface water quality parameters. Such processing times are typical in CNN applications for high-resolution satellite imagery requiring detailed spatial-spectral analyses.
Figure 10 and
Figure 11 display the final spatial concentration maps for DO and TOC, respectively, illustrating the spatial variability retrieved via this comprehensive processing approach.
The approach of generating spatially continuous CNN predictions of SWQPs enables a detailed and high-resolution assessment across the entire study area, providing critical insights into spatial patterns and concentration gradients. For instance, the DO spatial distribution map for New Brunswick reveals significant variability, with concentrations ranging from 3.9 to 20.5 mg/L. Lower DO values (3.8–8.3 mg/L), shown in green on the map, concentrate mainly in northwestern inland zones along narrow rivers and confined channels. These areas likely experience slow water flow, limited mixing, and elevated organic decomposition, leading to oxygen depletion. Contributing factors include reduced photosynthesis caused by turbidity or shading, alongside microbial respiration driven by runoff inputs. Understanding these spatial patterns is essential for targeted environmental monitoring, resource management, and intervention strategies to protect and improve water quality. Such spatial distribution analyses align with geostatistical and GIS-driven water quality assessment practices, essential for effective water resource planning and ecosystem health evaluation.
The spatial distribution map of DO in New Brunswick reveals distinct regional variability that aligns with hydrological and ecological conditions. Lower DO levels (3.8–8.3 mg/L), depicted in green, dominate northwestern inland areas along narrow rivers and confined channels characterized by slower flow, limited mixing, and elevated organic decomposition. These conditions, combined with factors such as turbidity, shading that reduces photosynthesis, and microbial respiration fueled by runoff, contribute to oxygen depletion in these zones.
Conversely, higher DO concentrations (>10 mg/L), shown in yellow to red, are prevalent in southern and southeastern regions, especially within large lakes and coastal waters where greater circulation, aeration, and photosynthetic activity occur. Open water facilitates atmospheric oxygen exchange, and cooler temperatures in these southern waters further increase oxygen solubility. Areas exhibiting low oxygen may signal eutrophic or stagnant conditions linked to nutrient enrichment.
Parallel spatial patterns for TOC demonstrate lower concentrations (0.7–5.8 mg/L) primarily in northern narrow rivers with less development and faster-flowing waters that limit organic carbon accumulation. Higher TOC levels (>7.9 mg/L) concentrate in southern lakes and vegetated catchments due to organic runoff, enhanced biological productivity, slower flow, and inputs from wetlands and peat soils. These patterns reflect the influence of land cover, hydrological connectivity, and catchment geomorphology on water quality.
These spatially continuous CNN-generated maps describe hydrological and ecological patterns that can support targeted monitoring and water resource management in New Brunswick. This aligns with established regional water quality assessments emphasizing the role of watershed characteristics and land use in aquatic ecosystem health.
This province-wide CNN framework demonstrates operational potential for routine water quality monitoring across large watersheds. Key practical impacts include: (1) early warning: detects DO undersaturation (<5 mg/L) near Saint John industrial discharges, enabling targeted interventions, (2) regulatory compliance: generates standardized DO/TOC maps for New Brunswick environmental reporting, and (3) public access: high-resolution maps support citizen science (non-professionals), recreational water safety advisories (official public warnings) for beaches, fishing, and drinking water intakes. Applied province-wide, this system could reduce traditional sampling costs while increasing spatial coverage. Integration with provincial dashboards would enable real-time water management decisions for agriculture, industry, and public health.
7. Conclusions
This study developed a customized CNN architecture for simultaneous estimation and spatial mapping of key surface water quality parameters (Dissolved Oxygen (DO) and Total Organic Carbon (TOC)) from Sentinel-2 multispectral imagery. Addressing the limitations of traditional and transfer-learned models, the proposed CNN integrates optical and non-optical features within a unified framework, leveraging spectral–spatial feature extraction optimized through adaptive training, data augmentation, and rigorous regularization. The model demonstrated robust predictive performance with high accuracy (R2 > 0.97) and realistic spatial distribution patterns consistent with known hydrological and ecological dynamics across New Brunswick. Importantly, the study introduces a comprehensive, end-to-end pipeline from Sentinel-2 preprocessing to province-scale spatial mapping that offers a flexible alternative to traditional empirical or site-specific regression models, particularly for applications requiring province-wide, high-resolution predictions. Its patch-based design explicitly exploits local spatial context; it treats optically active (DO) and non-optically active (TOC) parameters within a unified architecture. For DO, the CNN leverages optically observable proxies of oxygen dynamics, such as phytoplankton, turbidity, and spatial water-mass structure, rather than direct DO absorption, underscoring that the model captures coupled biogeochemical and optical processes rather than a purely empirical black-box relationship.
Moreover, applying SHAP explainability provides critical interpretability, revealing key spectral bands driving model predictions and offering insights into ecological influences on water quality, which guides future sensor design and data reduction.
The proposed CNN pipeline offers several practical advantages over many existing empirical or single-parameter ML approaches, including its end-to-end design, use of spectral–spatial patches, ability to handle both DO and TOC simultaneously, and SHAP-based interpretability. These features make the framework particularly suitable for scalable, province-wide water quality services, even though a full quantitative benchmark against other ML algorithms remains a subject for future work.
Overall, this research study advances the state of remote sensing water quality monitoring by delivering a transparent, adaptable, scalable, and reproducible deep learning framework capable of continuous, cost-effective, multiparameter assessment. These contributions offer substantial value for environmental monitoring, resource management, and sustainable aquatic ecosystem protection. Future research will explore temporal and regional extensions, additional water quality indicators, and hybrid deep learning architectures to enhance spatiotemporal generalization and accuracy.