1. Introduction
Globally, pollution and surface water scarcity constitute one of the main environmental challenges for humanity. The rapid urban growth recorded in recent decades has significantly increased the demand for drinking water. As a consequence of this population expansion and associated industrialization, numerous surface water sources located near urban centers have experienced depletion or deterioration in their quality [
1,
2].
Water resources provide ecosystem services of high ecological and economic value to society [
3]. However, these water bodies are highly vulnerable to pollution, especially when subjected to overexploitation. In many urban contexts, rivers serve as recipients of domestic and industrial wastewater discharges, which increases the pollutant load derived from anthropogenic activities [
4]. Consequently, water quality monitoring and analysis have become priority research areas in recent years [
5,
6].
In response to concerns about water pollution, continuous water quality monitoring campaigns have been implemented in several countries. The aim is to understand and prevent threats by collecting data that allows for the analysis of changes in the parameters that determine water quality [
7,
8,
9]. Current and emerging risks are detailed as a basis for developing conservation and restoration strategies for inland water bodies [
3]. Furthermore, the most suitable strategy for ensuring sustainability in management practices is through environmental monitoring [
2,
8].
The conventional process for water quality assessment involves field visits, sample collection at different locations, laboratory analysis, and subsequent comparison of the results with regulatory standards [
7,
10]. Nevertheless, this traditional method has limitations, such as the deployment of trained personnel and the cost of laboratory tests. Furthermore, access to certain water bodies is difficult or even impossible, and the process is time-consuming, thus failing to provide real-time updates [
4,
11,
12]. An alternative technique to in situ monitoring is the monitoring of water bodies through remote sensing (using satellite imagery). Remote sensing has the potential to provide a valuable complementary source of data at local and global scales. Remote sensing methods for measuring the quality of inland waters date back almost 50 years; since then, hundreds of publications have demonstrated promising remote sensing models for estimating the biological, chemical, and physical properties of inland water bodies [
13].
In recent years, the field has experienced exponential growth, driven by open access to medium- and high-resolution satellite platforms such as Sentinel-2, Landsat, and MODIS as well as the development of airborne hyperspectral sensors. Several recent reviews agree that most studies have focused on estimating optically active parameters, particularly chlorophyll-a (Chl-a), total suspended solids (TSS), and colored dissolved organic matter (CDOM) [
14,
15,
16]. These constituents directly modify water reflectance in the visible and near-infrared ranges through absorption and scattering processes, facilitating their spectral modeling. In contrast, parameters such as surface temperature are retrieved from thermal infrared (TIR) emissions and are not optically active in the VIS–NIR domain [
17,
18]. Although temperature plays an important role in regulating biogeochemical processes (e.g., phytoplankton growth), it does not directly influence water-leaving reflectance and is therefore considered a non-optically active parameter [
19]. This trend is due to the fact that these constituents directly modify water reflectance in the visible and near-infrared ranges, thus facilitating their spectral modeling. However, as recent reviews in the field of remote sensors applied to water monitoring [
20,
21] point out, this thematic concentration has generated a gap in the recovery of dissolved nutrients and other non-optically active parameters, whose estimation depends on indirect relationships, spectral proxies or advanced machine learning schemes [
19].
In this context, approaches based on artificial intelligence and deep learning have shown performance improvements in specific contexts and case studies in estimating water quality parameters, particularly in continental systems where spatial variability is high [
22]. These models allow for capturing nonlinear relationships between multispectral or hyperspectral reflectance and pollutant concentrations, in some cases surpassing the performance of strictly physical models. In contrast, as noted by Jaywant and Arif (2024), along with Wu et al. 2025, these approaches depend on large, well-calibrated datasets, have limited physical interpretability, and may show low generalizability outside the training domain [
20,
23]. While deep learning architectures have evolved toward hybrid approaches with more sophisticated attention mechanisms and optimization strategies, weaknesses persist in the systematic construction of variables, regional stability, and standardization of validation schemes [
22]. At the applied level, satellite monitoring has been shown to strengthen the operational management of reservoirs, although it requires continuous validation and workflows adapted to end users [
24]; furthermore, the use of cloud platforms improves calibration processes in data-scarce regions but demands rigorous quality controls to avoid bias [
25]. Systematic reviews consistently confirm that non-optically active parameters remain underrepresented and that comparable methodological frameworks that guarantee reproducibility and transferability across contrasting hydrological contexts are still lacking [
15].
Despite the rapid development of retrieval algorithms, considerably less attention has been devoted to systematically analyzing the image processing workflows that precede and condition model performance. Recent reviews highlight that preprocessing choices—such as atmospheric correction schemes, spectral feature construction, band selection, spatial resampling, and multi-sensor harmonization—vary widely across studies, directly influencing retrieval accuracy and model stability [
15,
19]. In particular, large-scale implementations using cloud-based platforms demonstrate the importance of consistent quality control and harmonization procedures when integrating multi-temporal and multi-sensor datasets [
25]. Similarly, methodological inconsistencies extend to validation strategies, including differences in data partitioning schemes, cross-validation design, performance metrics selection, uncertainty quantification, and transferability testing [
22,
24]. This heterogeneity complicates inter-study comparability, limits reproducibility, and constrains the operational scalability of proposed models, particularly in optically complex or data-scarce environments.
Therefore, recent evidence confirms substantial progress in spectral modeling and the incorporation of artificial intelligence [
19,
22]. In contrast, persistent challenges remain related to multi-sensor harmonization and large-scale processing consistency [
25], robust retrieval of non-optically active parameters [
15], and the lack of standardized validation frameworks and transferable modeling strategies [
22] as well as operational constraints linked to atmospheric correction and in situ validation requirements [
24]. These limitations are particularly critical in highly turbid or optically heterogeneous waters, where model uncertainty remains significant.
While several recent reviews have focused primarily on model architectures, sensors, or comparative performance metrics, this review emphasizes the structural role of preprocessing choices and validation design in shaping reported model performance. The intention is not to identify optimal algorithms but to highlight how methodological decisions upstream condition downstream results and uncertainty. For this reason, the present study aims to provide a structured synthesis of methodological approaches used in the recent literature of water quality estimation in lakes and rivers using remote sensing and artificial intelligence, with an emphasis on the image processing and validation strategies employed in the recent literature. This review seeks to synthesize the main methodological approaches used from image acquisition and correction to predictive modeling and statistical evaluation, comparatively analyze the artificial intelligence algorithms applied to optically active and non-optically active parameters, and identify trends, limitations, and standardization opportunities that strengthen the reproducibility and applicability of these models in aquatic systems.
3. Water Quality Parameters Estimable by Remote Sensing
The implementation of remote sensing methods in water quality monitoring is possible due to the availability of sensors capable of measuring the spectral response of water and has expanded rapidly due to its ability to cover large water bodies in shorter timeframes and at lower costs [
26,
27]. In addition, the optical properties of water make it possible to estimate parameters associated with changes in water composition, providing relevant information for simulation and forecasting models applied to water quality studies [
17,
28] (
Table 1).
Two types of optical properties of water are analyzed: inherent optical properties (IOPs) and apparent optical properties (AOPs) [
17,
18]. IOPs control the quantity of physical parameters that measure the photon collision process in water; they are intrinsic properties of the aqueous medium and independent of radiation intensity. Additionally, they are characterized by being easy to define but can be extremely difficult to measure, especially in the field; the most common IOPs are absorption and scattering [
29]. On the other hand, AOPs present physical parameters that depend on the structure of the radiation field; generally, they are much easier to measure but difficult to interpret due to their variation caused by environmental factors; the most common AOPs are the vertical attenuation coefficient (Kd) and reflectance [
17].
In remote-sensing-based water quality studies, water bodies are commonly classified according to their optical properties, reflecting the relative contributions of different constituents to the overall spectral signal. Optically active parameters that contribute to the total water-leaving radiance include phytoplankton (primarily represented by chlorophyll-a), organic and inorganic suspended solids, colored dissolved organic matter (CDOM) and water clarity (see
Table 1). The relative proportions and interactions among these constituents largely determine variations in water clarity, light attenuation, and spectral reflectance patterns and are therefore widely used as indicators of water quality status [
13,
30]. Their combined influence governs absorption and backscattering processes within the water column, forming the bio-optical basis for most remote sensing retrieval algorithms applied to inland waters.
Optically inactive parameters do not exhibit a direct spectral signature detectable by conventional optical sensors; nevertheless, they may be indirectly related to optically active constituents. As a result, remote-sensing-based models have been developed for estimating dissolved nutrients (e.g., nitrogen and phosphorus species), dissolved oxygen, and even certain heavy metals through proxy variables and machine learning approaches. Nevertheless, studies addressing these parameters remain comparatively limited, and their retrieval performance is often highly region-specific [
28,
31].
The optical characteristics of inland waters are frequently dominated by variable sediment loads and colored dissolved organic matter (CDOM), resulting in highly complex absorption and scattering regimes. This optical heterogeneity complicates inversion processes and reduces the transferability of globally developed algorithms [
32]. Consequently, parameter retrieval models are rarely universal, and regional calibration using in situ measurements remains essential to ensure reliability and reduce uncertainty [
19,
30].
To synthesize these optical interactions and retrieval pathways,
Figure 2 presents a conceptual framework linking inherent optical properties (IOPs) and apparent optical properties (AOPs) to the spectral response of inland waters and the subsequent estimation of water quality parameters. The scheme highlights how absorption and scattering processes govern water-leaving reflectance, which underpins the direct retrieval of optically active constituents such as chlorophyll-a, suspended solids, and CDOM [
13,
30]. In contrast, non-optically active parameters—such as nutrients, Biochemical Oxygen Demand (BOD)/Chemical Oxygen Demand (COD), and trace metals—lack a direct spectral signature and are typically inferred through indirect relationships, proxy variables, or data-driven models, often leading to reduced transferability across water bodies with different optical regimes.
Figure 2 also emphasizes that atmospheric correction and remote sensing reflectance (Rrs) retrieval are not neutral preprocessing steps but major sources of uncertainty that condition downstream model performance and comparability across studies.
4. Remote Sensing Platforms and Sensors for Lakes and Rivers
Earth observation systems utilize diverse sensor technologies in broad categories of optical and microwave systems. In inland water quality assessments, optical sensors are still the main source of information as they are sensitive to spectral variations induced by water constituents [
15,
19]. Optical sensors work in the visible (VIS), near-infrared (NIR), shortwave infrared (SWIR) and thermal infrared (TIR) portions of the electromagnetic spectrum. Multispectral VIS–NIR sensors pick up reflected solar radiation and are well-suited for estimating optically active parameters including chlorophyll-a (Chl-a), total suspended solids (TSS), and colored dissolved organic matter (CDOM) [
19]. The low reflectance of water in the NIR and SWIR regions facilitates water delineation and enhances contrast with adjacent land surfaces, providing the spectral basis for many retrieval algorithms. Nonetheless, optical sensors are constrained by cloud cover and solar illumination, only allowing for observations of daylight and clear-sky conditions. Thermal infrared sensors, which detect radiation associated with surface temperature, are widely applied in the study of thermal stratification and surface heating patterns in lakes and reservoirs [
24]. Unlike optical sensors operating in the visible-to-shortwave infrared (VIS–SWIR) range, thermal infrared (TIR) sensors do not depend on solar illumination and can therefore be used during both daytime and nighttime conditions [
28]. However, in satellite platforms that integrate both optical and thermal sensors (e.g., Landsat missions), daytime acquisitions provide complementary spectral information from VIS–NIR–SWIR bands, enabling a more comprehensive interpretation of water quality dynamics [
24,
25]. Radiometric calibration and atmospheric correction have made satellite-based surface temperature products more consistent over inland environments. Microwave instruments (e.g., Synthetic Aperture Radar (SAR)) can operate independently of solar illumination and can penetrate clouds. While not widely used for direct estimation of optically active water quality parameters, SAR can contribute to flood mapping and hydrodynamic monitoring that affect sediment transport and nutrient dynamics [
15].
The Landsat series provides long-term multispectral observations with moderate spatial resolution (15–30 m) across visible, near-infrared, and shortwave infrared bands. In addition, thermal infrared (TIR) bands are available for surface temperature retrieval; these are acquired at a native spatial resolution of 100 m and commonly resampled to 30 m to match multispectral products. Recent studies highlight that the improved radiometric resolution and stray-light correction capabilities of newer missions enhance sensitivity over low-reflectance targets such as inland waters and improve atmospheric correction performance [
22]. The Sentinel-2 MultiSpectral Instrument (MSI) has expanded the potential use of inland water applications greatly because of higher spatial resolution (10–20 m) and the integration of red-edge bands, which help detect phytoplankton and blooms [
19,
22]. Its short revisit time over Landsat improves the monitoring of dynamic processes in lakes and rivers. MODIS provides high-temporal-resolution observations that are well suited to large lakes and regional bloom monitoring, although its coarse spatial resolution limits its application in smaller or heterogeneous inland water bodies [
15]. For remote sensing platform selection, the trade-off between the spatial, spectral, and temporal resolutions is made according to the characteristics of targeted water bodies. Recent empirical work has indicated that multi-sensor integration of imagery, notably including Landsat-8 and Sentinel-2, may help increase temporal records and enhance consistency of surveillance. Despite this, due to differences in spectral band definitions, radiometric responses and atmospheric correction outputs, harmonization challenges are introduced that need to be overcome to ensure comparability and reproducibility [
25]. The lack of standardized harmonization protocols is still a critical methodological limitation that limits long-term transferability across varied hydrological contexts.
The spectral characteristics of major satellite platforms and optically active water constituents are summarized in
Figure 3.
Figure 3a illustrates how Landsat 8 OLI/TIRS and Sentinel-2 MSI bands are distributed across atmospheric transmission windows in the visible, near-infrared, shortwave infrared, and thermal infrared regions.
Figure 3b shows representative spectral responses of key inland water components, including chlorophyll-a, open water, non-algal particles or suspended sediments, and colored dissolved organic matter (CDOM). It is important to note that the thermal infrared (TIR) region operates in a different wavelength domain (µm) than the reflective VIS–NIR–SWIR region (nm) and is therefore represented separately in the spectral interpretation. Together, these elements in
Figure 3 highlight why band selection, atmospheric preprocessing, and sensor-specific spectral configurations are critical for reliable water quality retrieval, particularly when estimating optically active constituents directly or using them as proxies for non-optically active variables.
5. Image Processing Workflows for Water Quality Estimation
Although early techniques were based mainly on empirical band ratios derived from ocean color theory, recent research has moved toward multistage processing systems that combine radiometric correction, spectral transformation, nonlinear modeling, and structured validation frameworks, as examined systematically in recent reviews of inland and coastal water monitoring [
20,
23,
27]. This trend has been made possible by enhanced sensor configurations—including the improved spectral resolution of Sentinel-2 MSI and higher radiometric stability of Landsat-OLI—and advances in computational capacity and cloud-based processing environments that have increased the operational scalability of water quality retrieval frameworks [
15,
27]. Nevertheless, inland waters (lakes and rivers) remain optically complex systems. Nonlinear absorption and scattering phenomena between phytoplankton pigments, suspended particulate matter, and colored dissolved organic matter (CDOM) pose a challenge to conventional atmospheric correction assumptions [
35,
36,
37]. As a result, uncertainties introduced during reflectance retrieval can propagate through subsequent modeling stages, influencing feature sensitivity and ultimately limiting predictive robustness.
In order to provide an overview of the methodological heterogeneity found in recent work,
Table 2 provides a comparative overview of typical image processing pipelines, showing changes between atmospheric correction strategies, spectral feature construction approaches, modeling design, and validation schemes. This comparative analysis also shows that predictive accuracy is seldom a function of the selection of algorithms but rather depends on the interplay between preprocessing robustness, spectral representation, and evaluation framework. The comparative patterns in
Table 2 point to the structuring of inland water remote sensing workflows as consisting of three closely related aspects: atmospheric correction and radiometric preprocessing, spectral feature construction, and modeling–validation design. Although these phases are frequently presented one after the other, the studies reviewed emphasize that their interactions ultimately dictate predictive robustness, transferability, and uncertainty propagation for a predictive algorithm. Varying correction strategy may affect spectral stability; feature engineering can influence information that is available to learning algorithms; and validation design can determine how performance metrics will be interpreted. The subsequent sub-sections explore these components, starting with approaches for atmospheric correction, continuing with the construction of the spectral features, and ending with modeling approaches and validation frameworks. It is important to note that the geographical distribution of the analyzed case studies is uneven, with a strong concentration in China and limited representation from other regions such as Africa, South America, and parts of Europe. This pattern reflects the temporal scope of the present review, which focuses on studies published between 2022 and March 2026.
5.1. Atmospheric Correction Strategies in Inland Waters
Atmospheric correction (AC) is a cornerstone of inland water remote sensing as the water-leaving signal is typically weak relative to atmospheric path radiance and is biased by the adjacency effects and optical complexity. The contrastive analysis of
Table 2 highlights that the correction strategy varies greatly between studies, considering the sensor-dependent processing chains and optical variability of lakes and rivers. Commonly proposed solutions include land-based surface reflectance processing systems (Sen2Cor and LaSRC), aquatic techniques (ACOLITE/DSF, C2RCC/C2X-Nets, POLYMER), and radiative-transfer-based tools (such as FLAASH and 6SV). Level-2 surface reflectance or Rrs products are used at various levels, and even when there is no explicit announcement of the correction scheme, this may cause problems of reproducibility and comparison between different studies. Apart from methodological classification, the practical consequences of AC are found in the study of its influence on reflectance retrieval.
The effects of atmospheric correction on the magnitude and composition of the water-leaving signal are illustrated in
Figure 4.
Figure 4a shows that the radiance measured by the sensor includes not only the desired water-leaving component but also atmospheric path radiance and surface-reflected contributions.
Figure 4b provides an example of how top-of-atmosphere reflectance is transformed into aquatic reflectance after atmospheric correction, modifying both the spatial appearance of the water body and the spectral signal used for feature engineering and model development. Therefore, atmospheric correction should not be interpreted merely as a visual enhancement step but as a quantitative preprocessing stage that can propagate uncertainty into downstream water quality retrieval models.
This statement is supported by empirical evidence from the reviewed literature. In previous studies, AC seems to be an explicitly considered variable related to downstream performance. For instance, Fu et al. (2022) in Poyang Lake used multiple processors (Sen2Cor, C2RCC, and FLAASH) before ensemble modeling [
38], and Zhenyu et al. (2025) in Manas Lake [
47] compared C2RCC, POLYMER, and SeaDAS/l2gen [
50]. These comparisons suggest that correction residuals may propagate into predictive metrics and uncertainty estimates in turbid or CDOM-rich environments. In contrast, other studies rely on a single aquatic-oriented processor adapted to the optical conditions of the target water body. Chaojie et al. (2022) applied POLYMER with a semi-analytical TSM framework in Lake Geneva [
55], whereas Salvatore et al. relied on C2RCC/C2X-Nets in a riverine context [
11]. As
Table 2 demonstrates, there is no universally dominant algorithm; instead, AC performance is context-dependent and influenced by sensor configuration, water type, and adjacency intensity. Cumulatively, the evidence from the reviewed studies confirms that atmospheric correction uncertainty has a great impact on predictive robustness, especially using high-capacity AI models. In the absence of explicit indication of processor choice and parameterization, enhancements attributable to modeling advances might be due, at least in part, to upstream preprocessing variance.
In addition to multi-processor comparisons, a second methodological trend is the decision to use aquatic-specific correction systems that are explicitly developed for optically complex waters. Authors developed POLYMER with semi-analytical TSM retrieval framework in Lake Geneva, using it through embedding inherent optical property relationships into the preprocessing stage. On riverine systems with strong turbidity gradients, implemented C2RCC and C2X-Nets in the Chao Phraya River, thus demonstrating the promise of neural-network-based atmospheric correction in scenarios where the land-surface assumption breaks down. A similar reasoning is performed in multi-lake and multi-sensor research [
45,
56], where processors such as iCOR, LaSRC, and POLYMER are selected according to sensor configuration and water optical properties. As
Table 2 shows, these selections are context-based, showing that performance achieved by atmospheric correction cannot be generalized to inland water systems. A recurring limitation identified across the reviewed studies is the poorly documented provenance of atmospheric correction procedures, including the specific processors, parameter settings, and post-processing steps applied. This lack of transparency critically limits reproducibility and complicates cross-study comparability, as differences in predictive performance may reflect upstream preprocessing choices rather than intrinsic model capabilities. Consequently, the absence of standardized reporting for atmospheric correction workflows remains a major barrier to the operational reliability and transferability of remote-sensing-based water quality models [
36,
37]. Some research can be based on Level-2 surface reflectance or on Rrs products without the processor or parameter settings being explicitly specified [
57,
58]. While relatively operationally feasible, this omission reduces the reproducibility and interpretability of the performance differences described in
Table 2. Validation metrics (i.e., R
2, RMSE, MAE or MAPE) may indicate sufficient predictive skill, but without clear documentation of the AC pipeline, it is unknown whether variation in performance across studies is related to the model architecture, feature construction or residual atmospheric effects. This issue is of special relevance to high-capacity machine learning frameworks. As summarized in
Table 2, studies using more than one processor [
38,
50] implicitly acknowledge that inland water retrieval is sensitive to atmospheric residuals and adjacency contamination. Implementation of deep learning architectures (e.g., CNNs or Transformer-based models) present the risk that some fine spectral artifacts introduced through imperfect correction will lead to the accidental acquisition of certain features as predictive features [
26].
As a result, improvements ascribed to modeling sophistication might not be due to intrinsic algorithmic superiority but due to a degree of bias in preprocessing. This structural dependency is further articulated in uncertainty reporting. There is a lack of explicit separation of uncertainty sources, as summative error metrics are described the most in several studies. Methodologically, uncertainty in inland water retrieval may be conceptually decomposed for at least the following three parts: algorithmic uncertainty caused by model variance and parameter unreliability; matchup uncertainty due to temporal and spatial representativeness of in situ data; and preprocessing uncertainty due to atmospheric correction residuals and sensor harmonization. This conceptual decomposition aligns with recent discussions in the remote sensing literature emphasizing the need to account for multiple sources of uncertainty when validating water quality products [
22,
59]. Specifically, uncertainty quantification has been identified as a key challenge in deep-learning-based inversion frameworks, where optimization strategies increasingly incorporate uncertainty estimation to improve model generalization and interpretability [
22]. Similarly, systematic reviews of inland water quality remote sensing highlight that, despite advances in modeling, formal uncertainty quantification remains underdeveloped, particularly regarding preprocessing residuals and in situ matchup representativeness [
59]. Without explicit decomposition of these components, uncertainty statements remain descriptive and cannot diagnose the origin of predictive variability, limiting interpretability and cross-study comparability.
An additional source of preprocessing uncertainty that is often underreported is the effect of sun glint. In optical remote sensing of water bodies, specular reflection from the water surface can introduce elevated reflectance values that are not related to water constituents but to viewing geometry and surface roughness. In many operational workflows, glint correction is implicitly handled within atmospheric correction processors, particularly for moderate-resolution sensors. However, its implementation and effectiveness are rarely documented explicitly. In higher-spatial-resolution imagery, individual pixels may exhibit strong glint effects due to localized wave orientation, resulting in anomalous reflectance values that can propagate into feature construction and modeling stages. These effects are often treated as outliers or filtered during preprocessing, but their presence highlights the need for clearer reporting of glint correction procedures within remote sensing workflows. Finally, atmospheric correction is rarely implemented in isolation. Several workflows incorporate complementary quality control procedures prior to modeling, including cloud and cirrus masking, adjacency screening, threshold filtering, and removal of anomalous reflectance values [
38,
59]. While these steps are sometimes briefly reported, they need to take place to avoid inserting invalid or mixed pixels into machine learning pipelines. In highly heterogeneous inland waters, these masking strategies might influence predictive stability in a way comparable to that of the atmospheric correction algorithm itself, thereby further reinforcing the need for transparent preprocessing documentation.
5.2. Spectral Feature Construction and Feature Engineering
After surface reflectance or Rrs is identified, transformation of spectral information into predictive features is the crux of inland water quality workflows. Three of the strategies that we selected based on the reviewed papers include (i) actual multispectral bands, (ii) spectral indices and combinations, and (iii) a multiscale feature integration using hyperspectral, UAV, or multi-platform datasets. Lake-oriented works also involve multispectral bands based on Sentinel-2 MSI, Landsat OLI, MODIS, or Sentinel-3 OLCI models, where a feature construction method of filtering data is usually taken from band ratios, red-edge pairs, or reflectance relationships chosen empirically. For example, reflectance outputs derived from multiple atmospheric processors were used and subsequently employed ensemble learning models, treating the spectral inputs as model-driven features and not as implicitly predefined indices. Similarly, studied Taihu Lake (China) and implemented empirical spectral relationships to estimate trophic conditions and water quality parameters. The main objective was to show that band combination still takes center stage in optically productive lakes [
57,
60]. In contrast, different riverine and multi-regional studies show that with high optical heterogeneity there needs to be a richer representation of the features. Studies in the Pearl River (China) and in the Chao Phraya River (Thailand) prove the effective use of a combination of reflectance bands that are sensitive to suspended matter and CDOM variability for turbidity-induced systems, and in such settings, feature construction is driven by known absorption and scattering features, particularly in the red and near-infrared domain [
11,
61]. A considerable trend towards spectral integration at multiple scales is found in studies utilizing a hyperspectral or high-resolution platform. Other author integrated ASD field spectrometer measurements, UAV imagery, and Planet data through karst wetland architecture (China) that supported learning transfer over spectral and spatial domains [
60]. By doing so, high-resolution spectral signatures can provide fine-grained feedback to more detailed satellite-based architectures that improve the richness of features and generalization at heterogeneous optical environments. Similarly, other authors use integrated hyperspectral or multisensor inputs to improve spectral separability in complex inland waters with respect to multiple sensor layers. [
45,
51]. Feature construction in a semistructural form is also shown. A semi-analytical recovery of TSM in Lake Geneva has been proposed, incorporating intrinsic optical property relationships within a feature extraction process. Such physics-informed feature strategies are radically different from purely empirical or machine-learning-driven pipeline approaches, as they restrict spectral–parameter relationships via radiative transfer principles [
55]. However, the transparency of feature engineering varies dramatically across the reviewed literature. Some studies report unequivocally the spectral combinations or transformations used, others use generic statements such as “surface reflectance bands used as model inputs”, and some authors explicitly report the use of spectral transformations [
56,
58]. This lack of consistency in reporting makes comparisons between studies difficult and reproducibility of results even more challenging, especially when performance variations are accounted for with different approaches to modeling with no description of prior feature selection. In contrast, comparative evidence shows that generation of spectral features is intimately bound with model capability. Tree-based ensembles (RF, XGBoost, CatBoost) can accommodate multiband inputs without heavy pre-engineering, whereas deep architectures (e.g., CNN/Transformer [
26]) may implicitly learn spectral interactions if provided consistent reflectance inputs. Despite this, high-capacity models are also more sensitive to spectral inconsistencies arising from atmospheric residuals or cross-sensor differences. Consequently, feature construction cannot be interpreted independently from preprocessing robustness and validation design.
The reviewed workflows suggest a move from simple index-based representations toward richer, multiband and multiscale spectral feature sets. Although this evolution increases the potential for capturing nonlinear optical relationships in complex inland waters, it further heightens the requirement for clear documentation including feature pipelines, scaling procedures, and sensor harmonization steps to support methodological transparency and reproducibility.
5.3. Modeling Approaches and Validation Design
The reviewed literature suggests great methodological variability in the modeling strategies for inland water quality estimation, from empirical regression to ensemble machine learning and deep learning networks. In
Table 2, it is evident that tree-based ensembles—especially RF, XGBoost and CatBoost—are the most widely used ensemble methods together with support vector machines (SVM/SVR), partial least squares regression (PLSR) and recently convolutional and Transformer-based Neural Networks.
Figure 5 conceptually summarizes the main pathways through which spectral profiles are constructed, as reported in the reviewed literature.
Figure 5a presents direct multispectral band inputs, illustrating typical wavelength ranges and their relevance for water quality estimation, using Sentinel-2 MSI as an example.
Figure 5b shows commonly used engineered spectral indices in inland water remote sensing, including the red-edge ratio (RER), Normalized Difference Chlorophyll Index (NDCI), NIR/red ratio for turbidity, Normalized Difference Water Index (NDWI), and Modified NDWI (MNDWI), along with their formulas and target applications. Here, R
λ denotes reflectance at wavelength λ (nm).
Figure 5c illustrates a multiscale spectral integration framework that combines in situ spectra, UAV-based hyperspectral data, and satellite multispectral imagery through feature extraction. This process generates a feature matrix integrating raw bands, spectral indices, texture metrics, ancillary data, and metadata, which are subsequently used as inputs for machine learning models to retrieve water quality parameters (e.g., Chl-a, TSS, CDOM, and nutrients).
While direct band inputs retain their full spectral dimensions, engineered indices provide targeted transformations made by optical reasoning, and multiscale integration opens up representational potential in spatial and spectral domains. The choice of whether to implement these techniques impacts model interpretability, transferability and sensitivity to preprocessing residuals. Ensemble learning is highly applicable in lake environments with nonlinear spectral–optical relations. Author used RF, XGBoost and CatBoost combined with PLS/PLSR in Poyang Lake (China) and assessed model robustness with the LOOCV method. Similarly, other author report combinations of RF, boosting algorithms and regression baselines for multi-lake or multi-river systems in China. Such investigations tend to set a benchmark for several algorithms for the same spectral dataset, and so one might conclude that performance increases are based on comparisons rather than being taken as a priori [
45,
56,
60].
Kernel-based methods and classical regression are still commonly applied techniques, if only in a more localized framework. Author used empirical regression frameworks to quantify trophic state and water quality parameters in Taihu Lake (China), showing that simpler models can be effective under well-characterized optical regimes. Likewise, author used regression-based models in the Pearl River and showed that interpretability and local calibration is favored over algorithmic complexity in river systems [
57,
61]. Even more sophisticated architectures are introduced in studies that call for more generalization or more sophisticated feature integration. Neural networks, convolutional layers, and Transformer-based models were explicitly combined for multi-lake datasets, demonstrating a trend toward a more efficient architecture in learning high-order spectral interactions. [
26]. However, even when deep learning models exist, they typically do not automatically translate to better transferability. Without spatially or temporally structured validation, high-capacity models may overfit to region-specific reflectance patterns, particularly when atmospheric residuals or adjacency effects persist.
Validation design is a critical determinant of interpretability. The reviewed studies report a variety of strategies, including leave-one-out cross-validation (LOOCV), k-fold cross-validation, split-sample train/test partitions, and performance metrics such as R
2, RMSE, MAE, and MAPE. Some works additionally report uncertainty estimates [
26,
50], although uncertainty is rarely decomposed into model variance, sampling error, and preprocessing uncertainty components. In multi-sensor classification studies such as Pooja et al. (2026), kappa statistics and accuracy scores are used, reflecting categorical evaluation rather than continuous regression assessment [
52].
A key pattern emerging from the comparative analysis is that predictive performance cannot be interpreted independently of validation structure. Random data splitting, commonly used in many workflows, may inflate performance estimates in spatially autocorrelated aquatic systems. By contrast, validation approaches that emphasize cross-validation rigor or multi-processor comparisons [
50] provide stronger evidence of generalization capacity. Nevertheless, despite these designs, clear hold-out techniques (spatial or temporal in nature) for measuring transferability in different hydrological or seasonal conditions could still be included. Importantly, the impact of upstream preprocessing on modeling strategy is observed in different studies. Ensemble and deep learning models can capture subtle spectral variations; however, their apparent improvements may partly reflect differences in atmospheric correction provenance and reflectance quality rather than model sophistication alone. Therefore, algorithm selection should be seen as but one piece within a larger workflow consisting of atmospheric correction, feature engineering, and validation design. Overall, our analysis points to methodological development in inland water remote sensing that is moving from isolated algorithm comparison toward integrated workflow optimization. Future advancements will also most likely rely much less on the scaling of architectural complexity and far more on the synchronization of preprocessing pipelines, the strengthening of validation frameworks, and the overtly quantifiable measurement of uncertainty propagation across the stages. Such integration would allow for more reliable multi-study comparison as well as increase confidence in the operational scalability of satellite-based surface reflectance monitoring.
6. Validation Strategies and Performance Metrics
Validation is foundational to inland water quality model workflows. Although atmospheric correction and spectral feature construction decide the reliability of the predictive inputs, validation design ultimately provides credibility, transferability, and the scientific robustness of the reported results. In the reviewed literature, there is a significant variability in in situ integration, temporal coherence, spatial representation, data partition and uncertainty reporting. Heterogeneity contributes to the challenges of comparability between studies and highlights the lack of standardized validation criteria.
Field–satellite integration remains the primary step for calibrating water quality retrieval models. Most studies have measured chlorophyll-a (Chl-a), total suspended matter (TSM), turbidity or CDOM absorption in in situ campaigns, which are verified alongside satellite acquisitions. Structured measurements are clearly apparent at Lake Taihu [
57] and Lake Geneva [
55], where laboratory measurements were systematically combined with Sentinel-2-derived reflectance. Despite this, density of sampling and frequency of sampling differ widely between systems. For riverine investigations like those carried out on Chao Phraya River [
11] and Pearl River [
61], hydrodynamic variability, as well as sensitivity to timing mismatches and spatial heterogeneity, is high.
Temporal alignment is a crucial, inconsistently treated uncertainty source. While most lake-based studies enforce the same-day matching [
38,
57,
60], some employ ±1–3-day windows of time to maximize matchup availability [
11,
56]. Finally, relaxed windows may introduce matchup uncertainty in optically dynamic systems, albeit being operationally sensible. This alignment is further made more complicated due to the different timing of acquisition and illumination geometry, as investigated at multiple scales and with multiple platforms [
60]. Temporal mismatch is underappreciated and is rarely, if ever, quantified separately from overall model error. This is further limited by spatial representativeness.
Different spatial resolutions obtained for multiple Sentinel-2 (10–20 m), Landsat-8 (30 m) and Sentinel-3 (300 m) models directly influence the correlation of point-based field specimens to satellite pixels. Landsat imagery provides moderate-spatial-resolution data that are widely used for inland water quality applications. In this context, most studies rely on the visible-to-shortwave infrared (VIS–SWIR) bands, which are available at 30 m spatial resolution. Thermal infrared (TIR) bands, in contrast, are acquired at a native resolution of approximately 100 m and are primarily used for surface temperature retrieval rather than optical water quality parameter estimation. In relatively homogenous lakes, mismatch effects may be moderate, while in narrow or optically complex rivers, adjacency effects and mixed pixels contribute to bias [
11,
61]. Limited studies document spatial buffering or multi-pixel averaging methods, and even fewer study the impacts of spatial aggregation upon predictive robustness.
Performance evaluation is commonly reported using R
2, RMSE, and MAE with periodic use of MAPE. In addition, high coefficients of determination are commonly reported for tree ensemble methods [
38,
50], while deep learning architectures often demonstrate improved predictive performance under specific preprocessing and validation conditions [
26]. Nevertheless, performance metrics are typically dataset-specific and seldom normalized across studies, which confounds cross-study comparability. Crucially, most studies rely on an internal partitioning scheme and not geographically independent validation. To further elucidate these recurrent methodological trends,
Table 3 consolidates the top validation designs present in all studies reviewed, categorizing them according to partition method, matching of temporal sequences, spatial independence and uncertainty reporting.
Table 3 shows that random train–test splits within one water body are the most common option [
55,
58], followed by k-fold cross-validation on the same water body [
38,
50,
56]. Although semi-analytical calibration–validation frameworks are less common, they are more commonly applied in physics-driven investigations [
55]. Nevertheless, truly independent geographic validation rarely takes place.
The patterns described in
Table 3 suggest a structural limitation: most validation schemes retain spatial autocorrelation and do not test model transferability across separate water bodies. As a result, overfitting is a real threat, especially in a high-capacity setting. Random splits might lead to the model learning domain-specific spectral signatures rather than generalizable optical relationships for more individual sites. Cross-validation minimizes variance caused by arbitrary partitioning, but it does not ensure geographic independence. Multiscale integration studies aim to alleviate the issue by exploiting cross-platform integration performance, but cross-waterbody generalization is only minimally investigated [
60].
Uncertainty reporting demonstrates comparable inconsistency. While aggregated predictive error is common, decomposing uncertainty into algorithmic, match-up, and preprocessing components is rare. For example, studies comparing multiple atmospheric processors [
38,
50] implicitly acknowledge the importance of preprocessing sensitivity but seldom isolate its contribution to total predictive error. It becomes hard to tell whether improvements are due to model architecture, feature engineering, or validation design without structured uncertainty decomposition. Thus, the comparative findings suggest that progress in modeling sophistication should be accompanied by equally robust validation tools. Standardizing temporal windows, reporting spatial aggregation procedures, implementing geographically independent validation, and explicitly disaggregating uncertainty are all likely to substantially improve reproducibility and methodological transparency in inland water quality remote sensing. The uncertainty reporting among the studies examined differs significantly; although several contain explicit measures with R
2 and RMSE, very few decompose uncertainty into identifiable components. There are three main sources of uncertainty:
Algorithmic uncertainty (variance in the model and unstable model parameters),
Matchup uncertainty (mismatch of field–satellite time and space),
Preprocessing uncertainty (residuals in atmospheric correction and feature harmonization).
In addition to these commonly recognized sources, it is important to consider that uncertainty propagation begins at the sensor level. Uncertainties associated with at-sensor radiance measurements—including radiometric calibration, detector sensitivity, and spectral response functions—directly influence subsequent transformations to at-sensor reflectance and, ultimately, to atmospherically corrected remote sensing reflectance (Rrs) [
36,
37]. The propagation of these uncertainties is further affected by sensor-specific characteristics, such as spectral band width and configuration. Narrow-band sensors (e.g., MODIS or hyperspectral instruments) may facilitate more stable atmospheric correction and spectral discrimination, whereas broader multispectral bands (e.g., Landsat) can introduce additional challenges due to spectral mixing and reduced sensitivity to specific absorption features [
36].
Consequently, uncertainty propagation should be understood as a multistage process extending from at-sensor radiance to derived environmental parameters. Errors introduced during radiometric measurement and atmospheric correction may propagate through feature construction and machine learning models, ultimately affecting the reliability of retrieved variables such as chlorophyll-a, suspended solids, or nutrient concentrations. Despite its importance, this end-to-end uncertainty propagation is rarely quantified explicitly in current inland water remote sensing studies, representing a critical area for future methodological development [
22].
Aggregated predictive error, without decoupling these components from the actual models, is commonly reported for the majority of publications. Consequently, reports of uncertainty are typically descriptive rather than diagnostic statements. Structured uncertainty frameworks that explicitly measure preprocessing sensitivity, temporal mismatch, and spatial representativeness effects would be valuable in future research. Subsequent decomposition would substantially improve interpretability and reproducibility of inland water quality modeling.
7. Current Challenges, Methodological Gaps and Future Research Directions
Despite the methodological improvements presented in
Section 4 and
Section 5, a number of persistent issues still stand in the way of the reliability and scalability of satellite-based inland water quality estimation methodologies. One major structural limitation is the lack of standardized pre-model data quality control, particularly in the application of masks (clouds, cloud shadows, land adjacency, and floating vegetation) and in the handling of invalid or missing pixels. In addition to introducing uncertainty, cloud masking can substantially reduce data availability, particularly in regions with persistent cloud cover, potentially biasing temporal analyses and limiting the detection of short-term events such as runoff-driven turbidity peaks or algal blooms Recent reviews also highlight that cloud masking is an ongoing major bottleneck for optical water applications, as default thresholding assumptions (e.g., low NIR water reflectance) often fail in optically complex (Case 2) waters, leading to the loss of valid observations and an increase in the need for the use of hybrid and ML masking (e.g., Fmask/CFmask, S2cloudless, IdePix) and sensor-based masking solutions [
62].
This is directly in keeping with the operational practices of the reviewed empirical studies, where invalid pixels are eliminated from matchup windows using mask filtering (e.g., flagging pixels which appear during a match as cloud/land/floating plants and eliminating matchups that have insufficient valid pixels) [
11]. A more critical second gap is systematic treatment of missing data and temporal discontinuity. Even in well-studied areas, long time series are often disrupted by clouds, cloud shadows, and sensor artifacts that lead to non-continuous records and biased sampling of extreme events (e.g., runoff-driven turbidity peaks) [
57]. Recent synthesis studies underscore realistic steps such as multi-day compositing (e.g., 8–16 day products) and complementary sensor integration and identify novel reconstruction techniques (e.g., EOF-based reconstructions and newer progressive spatiotemporal gap-filling frameworks) as potential solutions for rebuilding discontinuous records when optical observations are unavailable [
62]. To the extent that our review is guided by this concept, it offers implicit methodological guidance for the explicit use of masking (not merely preprocessing) and the documentation of missing data management as it influences the optical regimes that are modeled for training and validation. A third problem is adjacency effects and mixed pixels, particularly in rivers and narrow water bodies, where land reflectance contamination can overpower the water-leaving signal and propagate to features and model outputs. Recent reviews also note that some atmospheric processors contain adjacency-effect correction (e.g., SIMEC, adjacency modules within Sen2Cor/ATCOR), but these include assumptions that may fail in shallow, turbid, or bloom-dominated waters, precisely the conditions where robust monitoring is most needed [
62]. This reiterates a previously noted shortcoming: many pipelines continue to underreport masking settings and correction parameterization, thereby reducing reproducibility across sites and sensors. Finally, the evaluation of generalization and uncertainty remains underdeveloped relative to the increasing complexity of current models. While deep learning is increasingly used for nonlinear inversion, recent reviews stress the growing importance of uncertainty estimation (e.g., Bayesian Neural Networks, mixture density networks) to provide credibility bounds rather than deterministic maps alone [
23].
In inland waters, this is particularly relevant because uncertainty is often dominated not only by model variance but also by matchup uncertainty (spatiotemporal mismatch) and preprocessing residuals, yet these components are rarely separated in current practice. Taken together, these gaps motivate several near-term research directions. First, the field would benefit from standardized, sensor-aware quality control protocols that report (i) which masks were applied (cloud/cloud shadow/land/floating vegetation), (ii) how many pixels remained after masking, and (iii) what thresholds or ML models were used, given that masking choices can remove a substantial fraction of otherwise usable observations in complex waters [
11,
62]. Second, inland water studies should more explicitly address missing data bias by reporting clear-sky availability and adopting transparent gap-handling strategies (e.g., compositing, spatiotemporal reconstruction, or multi-sensor fusion) when building long-term products [
55,
62].
Third, future work should prioritize transferability tests (cross-waterbody/cross-season/cross-sensor) paired with uncertainty decomposition, so that improvements can be attributed to modeling advances versus upstream data limitations. Finally, the increasing availability of cloud computing and harmonized archives creates an opportunity to benchmark workflows under common protocols, which would directly address the comparability limitations identified in
Section 4 and
Section 5.