Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies

Zúñiga-Grajeda, Virgilio; Lomeli, Jennifer Aleysha; Villota-González, Freddy Hernán; García-García, César Alejandro; Sulbarán-Rangel, Belkis

doi:10.3390/limnolrev26020019

Open AccessReview

Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies

by

Virgilio Zúñiga-Grajeda

¹

,

Jennifer Aleysha Lomeli

²

,

Freddy Hernán Villota-González

³

,

César Alejandro García-García

¹

and

Belkis Sulbarán-Rangel

^2,*

¹

Department of Information Sciences and Technological Development, University of Guadalajara, Campus Tonalá, Tonalá 45425, Mexico

²

Department of Water and Energy, University of Guadalajara, Campus Tonalá, Tonalá 45425, Mexico

³

Department of Sustainability and Territorial Sciences, University of Guadalajara, Campus Tlajomulco, Tlajomulco de Zúñiga 45641, Mexico

^*

Author to whom correspondence should be addressed.

Limnol. Rev. 2026, 26(2), 19; https://doi.org/10.3390/limnolrev26020019

Submission received: 29 March 2026 / Revised: 6 May 2026 / Accepted: 7 May 2026 / Published: 10 May 2026

Download

Browse Figures

Versions Notes

Abstract

Freshwater ecosystems are increasingly affected by eutrophication, sediment loading, and other anthropogenic pressures, creating a growing need for monitoring frameworks that are spatially extensive, temporally consistent, and methodologically robust. Although in situ sampling remains essential, its limited spatial coverage and operational constraints have accelerated the use of satellite remote sensing combined with artificial intelligence (AI) and machine learning (ML) for water quality assessment. This review critically examines recent studies published between 2020 and March 2026 on the estimation of physicochemical water quality parameters in lakes and rivers using remote sensing, with particular attention to the methodological structure of image processing workflows rather than performance metrics alone. The synthesis shows that predictive performance is strongly conditioned by three interrelated stages: atmospheric correction (AC), spectral feature construction, and validation design. Across the reviewed studies, substantial variation is observed in atmospheric correction processors, spectral engineering strategies, and model architectures, leading to differences in the spectral inputs and analytical conditions used for model development. Validation approaches remain highly heterogeneous and often rely on internal data splits without geographically independent testing, which weakens claims of model generalizability. In addition, few studies explicitly distinguish algorithmic, matchup, and preprocessing uncertainties, revealing a persistent gap in uncertainty reporting. Overall, the review suggests that improvements attributed to newer ML models may partly reflect upstream preprocessing choices rather than algorithmic superiority alone. Future research should prioritize transparent reporting of atmospheric correction pipelines, structured uncertainty decomposition, standardized validation protocols, and cross-site transferability assessments. By synthesizing these methodological patterns, this review provides a consolidated methodological synthesis that supports improved reproducibility, comparability, and operational reliability of remote-sensing-based freshwater quality monitoring.

Keywords:

freshwater monitoring; remote sensing; machine learning; artificial intelligence; water quality; atmospheric correction; spectral feature engineering; validation design; uncertainty analysis; lakes and rivers

1. Introduction

Globally, pollution and surface water scarcity constitute one of the main environmental challenges for humanity. The rapid urban growth recorded in recent decades has significantly increased the demand for drinking water. As a consequence of this population expansion and associated industrialization, numerous surface water sources located near urban centers have experienced depletion or deterioration in their quality [1,2].

Water resources provide ecosystem services of high ecological and economic value to society [3]. However, these water bodies are highly vulnerable to pollution, especially when subjected to overexploitation. In many urban contexts, rivers serve as recipients of domestic and industrial wastewater discharges, which increases the pollutant load derived from anthropogenic activities [4]. Consequently, water quality monitoring and analysis have become priority research areas in recent years [5,6].

In response to concerns about water pollution, continuous water quality monitoring campaigns have been implemented in several countries. The aim is to understand and prevent threats by collecting data that allows for the analysis of changes in the parameters that determine water quality [7,8,9]. Current and emerging risks are detailed as a basis for developing conservation and restoration strategies for inland water bodies [3]. Furthermore, the most suitable strategy for ensuring sustainability in management practices is through environmental monitoring [2,8].

The conventional process for water quality assessment involves field visits, sample collection at different locations, laboratory analysis, and subsequent comparison of the results with regulatory standards [7,10]. Nevertheless, this traditional method has limitations, such as the deployment of trained personnel and the cost of laboratory tests. Furthermore, access to certain water bodies is difficult or even impossible, and the process is time-consuming, thus failing to provide real-time updates [4,11,12]. An alternative technique to in situ monitoring is the monitoring of water bodies through remote sensing (using satellite imagery). Remote sensing has the potential to provide a valuable complementary source of data at local and global scales. Remote sensing methods for measuring the quality of inland waters date back almost 50 years; since then, hundreds of publications have demonstrated promising remote sensing models for estimating the biological, chemical, and physical properties of inland water bodies [13].

In recent years, the field has experienced exponential growth, driven by open access to medium- and high-resolution satellite platforms such as Sentinel-2, Landsat, and MODIS as well as the development of airborne hyperspectral sensors. Several recent reviews agree that most studies have focused on estimating optically active parameters, particularly chlorophyll-a (Chl-a), total suspended solids (TSS), and colored dissolved organic matter (CDOM) [14,15,16]. These constituents directly modify water reflectance in the visible and near-infrared ranges through absorption and scattering processes, facilitating their spectral modeling. In contrast, parameters such as surface temperature are retrieved from thermal infrared (TIR) emissions and are not optically active in the VIS–NIR domain [17,18]. Although temperature plays an important role in regulating biogeochemical processes (e.g., phytoplankton growth), it does not directly influence water-leaving reflectance and is therefore considered a non-optically active parameter [19]. This trend is due to the fact that these constituents directly modify water reflectance in the visible and near-infrared ranges, thus facilitating their spectral modeling. However, as recent reviews in the field of remote sensors applied to water monitoring [20,21] point out, this thematic concentration has generated a gap in the recovery of dissolved nutrients and other non-optically active parameters, whose estimation depends on indirect relationships, spectral proxies or advanced machine learning schemes [19].

In this context, approaches based on artificial intelligence and deep learning have shown performance improvements in specific contexts and case studies in estimating water quality parameters, particularly in continental systems where spatial variability is high [22]. These models allow for capturing nonlinear relationships between multispectral or hyperspectral reflectance and pollutant concentrations, in some cases surpassing the performance of strictly physical models. In contrast, as noted by Jaywant and Arif (2024), along with Wu et al. 2025, these approaches depend on large, well-calibrated datasets, have limited physical interpretability, and may show low generalizability outside the training domain [20,23]. While deep learning architectures have evolved toward hybrid approaches with more sophisticated attention mechanisms and optimization strategies, weaknesses persist in the systematic construction of variables, regional stability, and standardization of validation schemes [22]. At the applied level, satellite monitoring has been shown to strengthen the operational management of reservoirs, although it requires continuous validation and workflows adapted to end users [24]; furthermore, the use of cloud platforms improves calibration processes in data-scarce regions but demands rigorous quality controls to avoid bias [25]. Systematic reviews consistently confirm that non-optically active parameters remain underrepresented and that comparable methodological frameworks that guarantee reproducibility and transferability across contrasting hydrological contexts are still lacking [15].

Despite the rapid development of retrieval algorithms, considerably less attention has been devoted to systematically analyzing the image processing workflows that precede and condition model performance. Recent reviews highlight that preprocessing choices—such as atmospheric correction schemes, spectral feature construction, band selection, spatial resampling, and multi-sensor harmonization—vary widely across studies, directly influencing retrieval accuracy and model stability [15,19]. In particular, large-scale implementations using cloud-based platforms demonstrate the importance of consistent quality control and harmonization procedures when integrating multi-temporal and multi-sensor datasets [25]. Similarly, methodological inconsistencies extend to validation strategies, including differences in data partitioning schemes, cross-validation design, performance metrics selection, uncertainty quantification, and transferability testing [22,24]. This heterogeneity complicates inter-study comparability, limits reproducibility, and constrains the operational scalability of proposed models, particularly in optically complex or data-scarce environments.

Therefore, recent evidence confirms substantial progress in spectral modeling and the incorporation of artificial intelligence [19,22]. In contrast, persistent challenges remain related to multi-sensor harmonization and large-scale processing consistency [25], robust retrieval of non-optically active parameters [15], and the lack of standardized validation frameworks and transferable modeling strategies [22] as well as operational constraints linked to atmospheric correction and in situ validation requirements [24]. These limitations are particularly critical in highly turbid or optically heterogeneous waters, where model uncertainty remains significant.

While several recent reviews have focused primarily on model architectures, sensors, or comparative performance metrics, this review emphasizes the structural role of preprocessing choices and validation design in shaping reported model performance. The intention is not to identify optimal algorithms but to highlight how methodological decisions upstream condition downstream results and uncertainty. For this reason, the present study aims to provide a structured synthesis of methodological approaches used in the recent literature of water quality estimation in lakes and rivers using remote sensing and artificial intelligence, with an emphasis on the image processing and validation strategies employed in the recent literature. This review seeks to synthesize the main methodological approaches used from image acquisition and correction to predictive modeling and statistical evaluation, comparatively analyze the artificial intelligence algorithms applied to optically active and non-optically active parameters, and identify trends, limitations, and standardization opportunities that strengthen the reproducibility and applicability of these models in aquatic systems.

2. Materials and Methods

2.1. Search Strategy and Type of Review

This study corresponds to an integrative systematic review aimed at identifying, analyzing, and synthesizing recent advances (2020–March 2026) in remote-sensing-based water quality estimation, with particular emphasis on image processing workflows and model validation strategies. The integrative approach facilitated the analysis of methodological trends, preprocessing variability, and validation practices across heterogeneous case studies focused on lakes and rivers. The literature search was conducted using major academic databases with broad coverage in environmental sciences and geospatial technologies, including Scopus and ScienceDirect, while Google Scholar was used as a complementary source. To improve transparency and reproducibility, structured Boolean search strategies and quotation marks were applied. An initial broad search string was defined as follows: (“remote sensing”) AND (“image processing” OR “atmospheric correction”) AND (“model validation” OR “accuracy assessment”). This exploratory query retrieved approximately 5000 records over the broader period 2002– March 2026. The purpose of this first stage was to characterize the general methodological landscape linking remote sensing preprocessing and validation frameworks, regardless of the specific application area. The temporal distribution of publications retrieved through this general search shows a marked increase in scientific output over the last 15 years (Figure 1), reflecting the rapid methodological expansion of remote sensing and validation frameworks in environmental monitoring.

2.2. Selection Criteria and Analytical Procedure

To narrow the focus to inland water quality applications, a second, more stringent search string was applied as follows: (“water quality estimation”) AND (“remote sensing”) AND (“image processing” OR “atmospheric correction”) AND (“model validation” OR “accuracy assessment”). Based on this refined search and subsequent screening, 24 peer-reviewed papers constituted the final analytical corpus of this review. Although the final corpus includes 24 studies, selection prioritized methodological transparency rather than sample size, ensuring robust comparative analysis. The full list of these studies is provided in the Supplementary Material (Table S1).

The final article selection was based on the following inclusion criteria: (i) publications between 2020 and March 2026; (ii) studies addressing inland water quality estimation in lakes or rivers using remote sensing; (iii) clear description of image preprocessing procedures, including atmospheric correction, cloud masking, reflectance derivation, or radiometric harmonization; and (iv) implementation and reporting of quantitative validation metrics, such as R², RMSE, MAE, or cross-validation schemes. Full-text access was required to enable a complete methodological assessment of preprocessing and validation procedures.

The selection process was conducted in two phases. In the first phase, automated filters were applied according to publication year, document type, and language. In the second phase, a manual evaluation was performed using the predefined inclusion criteria, with particular attention to methodological transparency in preprocessing and validation strategies. The selected articles were then organized and comparatively analyzed to identify common methodological patterns, differences in image processing workflows, validation designs, and performance reporting practices.

3. Water Quality Parameters Estimable by Remote Sensing

The implementation of remote sensing methods in water quality monitoring is possible due to the availability of sensors capable of measuring the spectral response of water and has expanded rapidly due to its ability to cover large water bodies in shorter timeframes and at lower costs [26,27]. In addition, the optical properties of water make it possible to estimate parameters associated with changes in water composition, providing relevant information for simulation and forecasting models applied to water quality studies [17,28] (Table 1).

Two types of optical properties of water are analyzed: inherent optical properties (IOPs) and apparent optical properties (AOPs) [17,18]. IOPs control the quantity of physical parameters that measure the photon collision process in water; they are intrinsic properties of the aqueous medium and independent of radiation intensity. Additionally, they are characterized by being easy to define but can be extremely difficult to measure, especially in the field; the most common IOPs are absorption and scattering [29]. On the other hand, AOPs present physical parameters that depend on the structure of the radiation field; generally, they are much easier to measure but difficult to interpret due to their variation caused by environmental factors; the most common AOPs are the vertical attenuation coefficient (Kd) and reflectance [17].

In remote-sensing-based water quality studies, water bodies are commonly classified according to their optical properties, reflecting the relative contributions of different constituents to the overall spectral signal. Optically active parameters that contribute to the total water-leaving radiance include phytoplankton (primarily represented by chlorophyll-a), organic and inorganic suspended solids, colored dissolved organic matter (CDOM) and water clarity (see Table 1). The relative proportions and interactions among these constituents largely determine variations in water clarity, light attenuation, and spectral reflectance patterns and are therefore widely used as indicators of water quality status [13,30]. Their combined influence governs absorption and backscattering processes within the water column, forming the bio-optical basis for most remote sensing retrieval algorithms applied to inland waters.

Optically inactive parameters do not exhibit a direct spectral signature detectable by conventional optical sensors; nevertheless, they may be indirectly related to optically active constituents. As a result, remote-sensing-based models have been developed for estimating dissolved nutrients (e.g., nitrogen and phosphorus species), dissolved oxygen, and even certain heavy metals through proxy variables and machine learning approaches. Nevertheless, studies addressing these parameters remain comparatively limited, and their retrieval performance is often highly region-specific [28,31].

The optical characteristics of inland waters are frequently dominated by variable sediment loads and colored dissolved organic matter (CDOM), resulting in highly complex absorption and scattering regimes. This optical heterogeneity complicates inversion processes and reduces the transferability of globally developed algorithms [32]. Consequently, parameter retrieval models are rarely universal, and regional calibration using in situ measurements remains essential to ensure reliability and reduce uncertainty [19,30].

To synthesize these optical interactions and retrieval pathways, Figure 2 presents a conceptual framework linking inherent optical properties (IOPs) and apparent optical properties (AOPs) to the spectral response of inland waters and the subsequent estimation of water quality parameters. The scheme highlights how absorption and scattering processes govern water-leaving reflectance, which underpins the direct retrieval of optically active constituents such as chlorophyll-a, suspended solids, and CDOM [13,30]. In contrast, non-optically active parameters—such as nutrients, Biochemical Oxygen Demand (BOD)/Chemical Oxygen Demand (COD), and trace metals—lack a direct spectral signature and are typically inferred through indirect relationships, proxy variables, or data-driven models, often leading to reduced transferability across water bodies with different optical regimes. Figure 2 also emphasizes that atmospheric correction and remote sensing reflectance (Rrs) retrieval are not neutral preprocessing steps but major sources of uncertainty that condition downstream model performance and comparability across studies.

4. Remote Sensing Platforms and Sensors for Lakes and Rivers

Earth observation systems utilize diverse sensor technologies in broad categories of optical and microwave systems. In inland water quality assessments, optical sensors are still the main source of information as they are sensitive to spectral variations induced by water constituents [15,19]. Optical sensors work in the visible (VIS), near-infrared (NIR), shortwave infrared (SWIR) and thermal infrared (TIR) portions of the electromagnetic spectrum. Multispectral VIS–NIR sensors pick up reflected solar radiation and are well-suited for estimating optically active parameters including chlorophyll-a (Chl-a), total suspended solids (TSS), and colored dissolved organic matter (CDOM) [19]. The low reflectance of water in the NIR and SWIR regions facilitates water delineation and enhances contrast with adjacent land surfaces, providing the spectral basis for many retrieval algorithms. Nonetheless, optical sensors are constrained by cloud cover and solar illumination, only allowing for observations of daylight and clear-sky conditions. Thermal infrared sensors, which detect radiation associated with surface temperature, are widely applied in the study of thermal stratification and surface heating patterns in lakes and reservoirs [24]. Unlike optical sensors operating in the visible-to-shortwave infrared (VIS–SWIR) range, thermal infrared (TIR) sensors do not depend on solar illumination and can therefore be used during both daytime and nighttime conditions [28]. However, in satellite platforms that integrate both optical and thermal sensors (e.g., Landsat missions), daytime acquisitions provide complementary spectral information from VIS–NIR–SWIR bands, enabling a more comprehensive interpretation of water quality dynamics [24,25]. Radiometric calibration and atmospheric correction have made satellite-based surface temperature products more consistent over inland environments. Microwave instruments (e.g., Synthetic Aperture Radar (SAR)) can operate independently of solar illumination and can penetrate clouds. While not widely used for direct estimation of optically active water quality parameters, SAR can contribute to flood mapping and hydrodynamic monitoring that affect sediment transport and nutrient dynamics [15].

The Landsat series provides long-term multispectral observations with moderate spatial resolution (15–30 m) across visible, near-infrared, and shortwave infrared bands. In addition, thermal infrared (TIR) bands are available for surface temperature retrieval; these are acquired at a native spatial resolution of 100 m and commonly resampled to 30 m to match multispectral products. Recent studies highlight that the improved radiometric resolution and stray-light correction capabilities of newer missions enhance sensitivity over low-reflectance targets such as inland waters and improve atmospheric correction performance [22]. The Sentinel-2 MultiSpectral Instrument (MSI) has expanded the potential use of inland water applications greatly because of higher spatial resolution (10–20 m) and the integration of red-edge bands, which help detect phytoplankton and blooms [19,22]. Its short revisit time over Landsat improves the monitoring of dynamic processes in lakes and rivers. MODIS provides high-temporal-resolution observations that are well suited to large lakes and regional bloom monitoring, although its coarse spatial resolution limits its application in smaller or heterogeneous inland water bodies [15]. For remote sensing platform selection, the trade-off between the spatial, spectral, and temporal resolutions is made according to the characteristics of targeted water bodies. Recent empirical work has indicated that multi-sensor integration of imagery, notably including Landsat-8 and Sentinel-2, may help increase temporal records and enhance consistency of surveillance. Despite this, due to differences in spectral band definitions, radiometric responses and atmospheric correction outputs, harmonization challenges are introduced that need to be overcome to ensure comparability and reproducibility [25]. The lack of standardized harmonization protocols is still a critical methodological limitation that limits long-term transferability across varied hydrological contexts.

The spectral characteristics of major satellite platforms and optically active water constituents are summarized in Figure 3. Figure 3a illustrates how Landsat 8 OLI/TIRS and Sentinel-2 MSI bands are distributed across atmospheric transmission windows in the visible, near-infrared, shortwave infrared, and thermal infrared regions. Figure 3b shows representative spectral responses of key inland water components, including chlorophyll-a, open water, non-algal particles or suspended sediments, and colored dissolved organic matter (CDOM). It is important to note that the thermal infrared (TIR) region operates in a different wavelength domain (µm) than the reflective VIS–NIR–SWIR region (nm) and is therefore represented separately in the spectral interpretation. Together, these elements in Figure 3 highlight why band selection, atmospheric preprocessing, and sensor-specific spectral configurations are critical for reliable water quality retrieval, particularly when estimating optically active constituents directly or using them as proxies for non-optically active variables.

5. Image Processing Workflows for Water Quality Estimation

Although early techniques were based mainly on empirical band ratios derived from ocean color theory, recent research has moved toward multistage processing systems that combine radiometric correction, spectral transformation, nonlinear modeling, and structured validation frameworks, as examined systematically in recent reviews of inland and coastal water monitoring [20,23,27]. This trend has been made possible by enhanced sensor configurations—including the improved spectral resolution of Sentinel-2 MSI and higher radiometric stability of Landsat-OLI—and advances in computational capacity and cloud-based processing environments that have increased the operational scalability of water quality retrieval frameworks [15,27]. Nevertheless, inland waters (lakes and rivers) remain optically complex systems. Nonlinear absorption and scattering phenomena between phytoplankton pigments, suspended particulate matter, and colored dissolved organic matter (CDOM) pose a challenge to conventional atmospheric correction assumptions [35,36,37]. As a result, uncertainties introduced during reflectance retrieval can propagate through subsequent modeling stages, influencing feature sensitivity and ultimately limiting predictive robustness.

In order to provide an overview of the methodological heterogeneity found in recent work, Table 2 provides a comparative overview of typical image processing pipelines, showing changes between atmospheric correction strategies, spectral feature construction approaches, modeling design, and validation schemes. This comparative analysis also shows that predictive accuracy is seldom a function of the selection of algorithms but rather depends on the interplay between preprocessing robustness, spectral representation, and evaluation framework. The comparative patterns in Table 2 point to the structuring of inland water remote sensing workflows as consisting of three closely related aspects: atmospheric correction and radiometric preprocessing, spectral feature construction, and modeling–validation design. Although these phases are frequently presented one after the other, the studies reviewed emphasize that their interactions ultimately dictate predictive robustness, transferability, and uncertainty propagation for a predictive algorithm. Varying correction strategy may affect spectral stability; feature engineering can influence information that is available to learning algorithms; and validation design can determine how performance metrics will be interpreted. The subsequent sub-sections explore these components, starting with approaches for atmospheric correction, continuing with the construction of the spectral features, and ending with modeling approaches and validation frameworks. It is important to note that the geographical distribution of the analyzed case studies is uneven, with a strong concentration in China and limited representation from other regions such as Africa, South America, and parts of Europe. This pattern reflects the temporal scope of the present review, which focuses on studies published between 2022 and March 2026.

5.1. Atmospheric Correction Strategies in Inland Waters

Atmospheric correction (AC) is a cornerstone of inland water remote sensing as the water-leaving signal is typically weak relative to atmospheric path radiance and is biased by the adjacency effects and optical complexity. The contrastive analysis of Table 2 highlights that the correction strategy varies greatly between studies, considering the sensor-dependent processing chains and optical variability of lakes and rivers. Commonly proposed solutions include land-based surface reflectance processing systems (Sen2Cor and LaSRC), aquatic techniques (ACOLITE/DSF, C2RCC/C2X-Nets, POLYMER), and radiative-transfer-based tools (such as FLAASH and 6SV). Level-2 surface reflectance or Rrs products are used at various levels, and even when there is no explicit announcement of the correction scheme, this may cause problems of reproducibility and comparison between different studies. Apart from methodological classification, the practical consequences of AC are found in the study of its influence on reflectance retrieval.

The effects of atmospheric correction on the magnitude and composition of the water-leaving signal are illustrated in Figure 4. Figure 4a shows that the radiance measured by the sensor includes not only the desired water-leaving component but also atmospheric path radiance and surface-reflected contributions. Figure 4b provides an example of how top-of-atmosphere reflectance is transformed into aquatic reflectance after atmospheric correction, modifying both the spatial appearance of the water body and the spectral signal used for feature engineering and model development. Therefore, atmospheric correction should not be interpreted merely as a visual enhancement step but as a quantitative preprocessing stage that can propagate uncertainty into downstream water quality retrieval models.

This statement is supported by empirical evidence from the reviewed literature. In previous studies, AC seems to be an explicitly considered variable related to downstream performance. For instance, Fu et al. (2022) in Poyang Lake used multiple processors (Sen2Cor, C2RCC, and FLAASH) before ensemble modeling [38], and Zhenyu et al. (2025) in Manas Lake [47] compared C2RCC, POLYMER, and SeaDAS/l2gen [50]. These comparisons suggest that correction residuals may propagate into predictive metrics and uncertainty estimates in turbid or CDOM-rich environments. In contrast, other studies rely on a single aquatic-oriented processor adapted to the optical conditions of the target water body. Chaojie et al. (2022) applied POLYMER with a semi-analytical TSM framework in Lake Geneva [55], whereas Salvatore et al. relied on C2RCC/C2X-Nets in a riverine context [11]. As Table 2 demonstrates, there is no universally dominant algorithm; instead, AC performance is context-dependent and influenced by sensor configuration, water type, and adjacency intensity. Cumulatively, the evidence from the reviewed studies confirms that atmospheric correction uncertainty has a great impact on predictive robustness, especially using high-capacity AI models. In the absence of explicit indication of processor choice and parameterization, enhancements attributable to modeling advances might be due, at least in part, to upstream preprocessing variance.

In addition to multi-processor comparisons, a second methodological trend is the decision to use aquatic-specific correction systems that are explicitly developed for optically complex waters. Authors developed POLYMER with semi-analytical TSM retrieval framework in Lake Geneva, using it through embedding inherent optical property relationships into the preprocessing stage. On riverine systems with strong turbidity gradients, implemented C2RCC and C2X-Nets in the Chao Phraya River, thus demonstrating the promise of neural-network-based atmospheric correction in scenarios where the land-surface assumption breaks down. A similar reasoning is performed in multi-lake and multi-sensor research [45,56], where processors such as iCOR, LaSRC, and POLYMER are selected according to sensor configuration and water optical properties. As Table 2 shows, these selections are context-based, showing that performance achieved by atmospheric correction cannot be generalized to inland water systems. A recurring limitation identified across the reviewed studies is the poorly documented provenance of atmospheric correction procedures, including the specific processors, parameter settings, and post-processing steps applied. This lack of transparency critically limits reproducibility and complicates cross-study comparability, as differences in predictive performance may reflect upstream preprocessing choices rather than intrinsic model capabilities. Consequently, the absence of standardized reporting for atmospheric correction workflows remains a major barrier to the operational reliability and transferability of remote-sensing-based water quality models [36,37]. Some research can be based on Level-2 surface reflectance or on Rrs products without the processor or parameter settings being explicitly specified [57,58]. While relatively operationally feasible, this omission reduces the reproducibility and interpretability of the performance differences described in Table 2. Validation metrics (i.e., R², RMSE, MAE or MAPE) may indicate sufficient predictive skill, but without clear documentation of the AC pipeline, it is unknown whether variation in performance across studies is related to the model architecture, feature construction or residual atmospheric effects. This issue is of special relevance to high-capacity machine learning frameworks. As summarized in Table 2, studies using more than one processor [38,50] implicitly acknowledge that inland water retrieval is sensitive to atmospheric residuals and adjacency contamination. Implementation of deep learning architectures (e.g., CNNs or Transformer-based models) present the risk that some fine spectral artifacts introduced through imperfect correction will lead to the accidental acquisition of certain features as predictive features [26].

As a result, improvements ascribed to modeling sophistication might not be due to intrinsic algorithmic superiority but due to a degree of bias in preprocessing. This structural dependency is further articulated in uncertainty reporting. There is a lack of explicit separation of uncertainty sources, as summative error metrics are described the most in several studies. Methodologically, uncertainty in inland water retrieval may be conceptually decomposed for at least the following three parts: algorithmic uncertainty caused by model variance and parameter unreliability; matchup uncertainty due to temporal and spatial representativeness of in situ data; and preprocessing uncertainty due to atmospheric correction residuals and sensor harmonization. This conceptual decomposition aligns with recent discussions in the remote sensing literature emphasizing the need to account for multiple sources of uncertainty when validating water quality products [22,59]. Specifically, uncertainty quantification has been identified as a key challenge in deep-learning-based inversion frameworks, where optimization strategies increasingly incorporate uncertainty estimation to improve model generalization and interpretability [22]. Similarly, systematic reviews of inland water quality remote sensing highlight that, despite advances in modeling, formal uncertainty quantification remains underdeveloped, particularly regarding preprocessing residuals and in situ matchup representativeness [59]. Without explicit decomposition of these components, uncertainty statements remain descriptive and cannot diagnose the origin of predictive variability, limiting interpretability and cross-study comparability.

An additional source of preprocessing uncertainty that is often underreported is the effect of sun glint. In optical remote sensing of water bodies, specular reflection from the water surface can introduce elevated reflectance values that are not related to water constituents but to viewing geometry and surface roughness. In many operational workflows, glint correction is implicitly handled within atmospheric correction processors, particularly for moderate-resolution sensors. However, its implementation and effectiveness are rarely documented explicitly. In higher-spatial-resolution imagery, individual pixels may exhibit strong glint effects due to localized wave orientation, resulting in anomalous reflectance values that can propagate into feature construction and modeling stages. These effects are often treated as outliers or filtered during preprocessing, but their presence highlights the need for clearer reporting of glint correction procedures within remote sensing workflows. Finally, atmospheric correction is rarely implemented in isolation. Several workflows incorporate complementary quality control procedures prior to modeling, including cloud and cirrus masking, adjacency screening, threshold filtering, and removal of anomalous reflectance values [38,59]. While these steps are sometimes briefly reported, they need to take place to avoid inserting invalid or mixed pixels into machine learning pipelines. In highly heterogeneous inland waters, these masking strategies might influence predictive stability in a way comparable to that of the atmospheric correction algorithm itself, thereby further reinforcing the need for transparent preprocessing documentation.

5.2. Spectral Feature Construction and Feature Engineering

After surface reflectance or Rrs is identified, transformation of spectral information into predictive features is the crux of inland water quality workflows. Three of the strategies that we selected based on the reviewed papers include (i) actual multispectral bands, (ii) spectral indices and combinations, and (iii) a multiscale feature integration using hyperspectral, UAV, or multi-platform datasets. Lake-oriented works also involve multispectral bands based on Sentinel-2 MSI, Landsat OLI, MODIS, or Sentinel-3 OLCI models, where a feature construction method of filtering data is usually taken from band ratios, red-edge pairs, or reflectance relationships chosen empirically. For example, reflectance outputs derived from multiple atmospheric processors were used and subsequently employed ensemble learning models, treating the spectral inputs as model-driven features and not as implicitly predefined indices. Similarly, studied Taihu Lake (China) and implemented empirical spectral relationships to estimate trophic conditions and water quality parameters. The main objective was to show that band combination still takes center stage in optically productive lakes [57,60]. In contrast, different riverine and multi-regional studies show that with high optical heterogeneity there needs to be a richer representation of the features. Studies in the Pearl River (China) and in the Chao Phraya River (Thailand) prove the effective use of a combination of reflectance bands that are sensitive to suspended matter and CDOM variability for turbidity-induced systems, and in such settings, feature construction is driven by known absorption and scattering features, particularly in the red and near-infrared domain [11,61]. A considerable trend towards spectral integration at multiple scales is found in studies utilizing a hyperspectral or high-resolution platform. Other author integrated ASD field spectrometer measurements, UAV imagery, and Planet data through karst wetland architecture (China) that supported learning transfer over spectral and spatial domains [60]. By doing so, high-resolution spectral signatures can provide fine-grained feedback to more detailed satellite-based architectures that improve the richness of features and generalization at heterogeneous optical environments. Similarly, other authors use integrated hyperspectral or multisensor inputs to improve spectral separability in complex inland waters with respect to multiple sensor layers. [45,51]. Feature construction in a semistructural form is also shown. A semi-analytical recovery of TSM in Lake Geneva has been proposed, incorporating intrinsic optical property relationships within a feature extraction process. Such physics-informed feature strategies are radically different from purely empirical or machine-learning-driven pipeline approaches, as they restrict spectral–parameter relationships via radiative transfer principles [55]. However, the transparency of feature engineering varies dramatically across the reviewed literature. Some studies report unequivocally the spectral combinations or transformations used, others use generic statements such as “surface reflectance bands used as model inputs”, and some authors explicitly report the use of spectral transformations [56,58]. This lack of consistency in reporting makes comparisons between studies difficult and reproducibility of results even more challenging, especially when performance variations are accounted for with different approaches to modeling with no description of prior feature selection. In contrast, comparative evidence shows that generation of spectral features is intimately bound with model capability. Tree-based ensembles (RF, XGBoost, CatBoost) can accommodate multiband inputs without heavy pre-engineering, whereas deep architectures (e.g., CNN/Transformer [26]) may implicitly learn spectral interactions if provided consistent reflectance inputs. Despite this, high-capacity models are also more sensitive to spectral inconsistencies arising from atmospheric residuals or cross-sensor differences. Consequently, feature construction cannot be interpreted independently from preprocessing robustness and validation design.

The reviewed workflows suggest a move from simple index-based representations toward richer, multiband and multiscale spectral feature sets. Although this evolution increases the potential for capturing nonlinear optical relationships in complex inland waters, it further heightens the requirement for clear documentation including feature pipelines, scaling procedures, and sensor harmonization steps to support methodological transparency and reproducibility.

5.3. Modeling Approaches and Validation Design

The reviewed literature suggests great methodological variability in the modeling strategies for inland water quality estimation, from empirical regression to ensemble machine learning and deep learning networks. In Table 2, it is evident that tree-based ensembles—especially RF, XGBoost and CatBoost—are the most widely used ensemble methods together with support vector machines (SVM/SVR), partial least squares regression (PLSR) and recently convolutional and Transformer-based Neural Networks.

Figure 5 conceptually summarizes the main pathways through which spectral profiles are constructed, as reported in the reviewed literature. Figure 5a presents direct multispectral band inputs, illustrating typical wavelength ranges and their relevance for water quality estimation, using Sentinel-2 MSI as an example.

Figure 5b shows commonly used engineered spectral indices in inland water remote sensing, including the red-edge ratio (RER), Normalized Difference Chlorophyll Index (NDCI), NIR/red ratio for turbidity, Normalized Difference Water Index (NDWI), and Modified NDWI (MNDWI), along with their formulas and target applications. Here, R_λ denotes reflectance at wavelength λ (nm). Figure 5c illustrates a multiscale spectral integration framework that combines in situ spectra, UAV-based hyperspectral data, and satellite multispectral imagery through feature extraction. This process generates a feature matrix integrating raw bands, spectral indices, texture metrics, ancillary data, and metadata, which are subsequently used as inputs for machine learning models to retrieve water quality parameters (e.g., Chl-a, TSS, CDOM, and nutrients).

While direct band inputs retain their full spectral dimensions, engineered indices provide targeted transformations made by optical reasoning, and multiscale integration opens up representational potential in spatial and spectral domains. The choice of whether to implement these techniques impacts model interpretability, transferability and sensitivity to preprocessing residuals. Ensemble learning is highly applicable in lake environments with nonlinear spectral–optical relations. Author used RF, XGBoost and CatBoost combined with PLS/PLSR in Poyang Lake (China) and assessed model robustness with the LOOCV method. Similarly, other author report combinations of RF, boosting algorithms and regression baselines for multi-lake or multi-river systems in China. Such investigations tend to set a benchmark for several algorithms for the same spectral dataset, and so one might conclude that performance increases are based on comparisons rather than being taken as a priori [45,56,60].

Kernel-based methods and classical regression are still commonly applied techniques, if only in a more localized framework. Author used empirical regression frameworks to quantify trophic state and water quality parameters in Taihu Lake (China), showing that simpler models can be effective under well-characterized optical regimes. Likewise, author used regression-based models in the Pearl River and showed that interpretability and local calibration is favored over algorithmic complexity in river systems [57,61]. Even more sophisticated architectures are introduced in studies that call for more generalization or more sophisticated feature integration. Neural networks, convolutional layers, and Transformer-based models were explicitly combined for multi-lake datasets, demonstrating a trend toward a more efficient architecture in learning high-order spectral interactions. [26]. However, even when deep learning models exist, they typically do not automatically translate to better transferability. Without spatially or temporally structured validation, high-capacity models may overfit to region-specific reflectance patterns, particularly when atmospheric residuals or adjacency effects persist.

Validation design is a critical determinant of interpretability. The reviewed studies report a variety of strategies, including leave-one-out cross-validation (LOOCV), k-fold cross-validation, split-sample train/test partitions, and performance metrics such as R², RMSE, MAE, and MAPE. Some works additionally report uncertainty estimates [26,50], although uncertainty is rarely decomposed into model variance, sampling error, and preprocessing uncertainty components. In multi-sensor classification studies such as Pooja et al. (2026), kappa statistics and accuracy scores are used, reflecting categorical evaluation rather than continuous regression assessment [52].

A key pattern emerging from the comparative analysis is that predictive performance cannot be interpreted independently of validation structure. Random data splitting, commonly used in many workflows, may inflate performance estimates in spatially autocorrelated aquatic systems. By contrast, validation approaches that emphasize cross-validation rigor or multi-processor comparisons [50] provide stronger evidence of generalization capacity. Nevertheless, despite these designs, clear hold-out techniques (spatial or temporal in nature) for measuring transferability in different hydrological or seasonal conditions could still be included. Importantly, the impact of upstream preprocessing on modeling strategy is observed in different studies. Ensemble and deep learning models can capture subtle spectral variations; however, their apparent improvements may partly reflect differences in atmospheric correction provenance and reflectance quality rather than model sophistication alone. Therefore, algorithm selection should be seen as but one piece within a larger workflow consisting of atmospheric correction, feature engineering, and validation design. Overall, our analysis points to methodological development in inland water remote sensing that is moving from isolated algorithm comparison toward integrated workflow optimization. Future advancements will also most likely rely much less on the scaling of architectural complexity and far more on the synchronization of preprocessing pipelines, the strengthening of validation frameworks, and the overtly quantifiable measurement of uncertainty propagation across the stages. Such integration would allow for more reliable multi-study comparison as well as increase confidence in the operational scalability of satellite-based surface reflectance monitoring.

6. Validation Strategies and Performance Metrics

Validation is foundational to inland water quality model workflows. Although atmospheric correction and spectral feature construction decide the reliability of the predictive inputs, validation design ultimately provides credibility, transferability, and the scientific robustness of the reported results. In the reviewed literature, there is a significant variability in in situ integration, temporal coherence, spatial representation, data partition and uncertainty reporting. Heterogeneity contributes to the challenges of comparability between studies and highlights the lack of standardized validation criteria.

Field–satellite integration remains the primary step for calibrating water quality retrieval models. Most studies have measured chlorophyll-a (Chl-a), total suspended matter (TSM), turbidity or CDOM absorption in in situ campaigns, which are verified alongside satellite acquisitions. Structured measurements are clearly apparent at Lake Taihu [57] and Lake Geneva [55], where laboratory measurements were systematically combined with Sentinel-2-derived reflectance. Despite this, density of sampling and frequency of sampling differ widely between systems. For riverine investigations like those carried out on Chao Phraya River [11] and Pearl River [61], hydrodynamic variability, as well as sensitivity to timing mismatches and spatial heterogeneity, is high.

Temporal alignment is a crucial, inconsistently treated uncertainty source. While most lake-based studies enforce the same-day matching [38,57,60], some employ ±1–3-day windows of time to maximize matchup availability [11,56]. Finally, relaxed windows may introduce matchup uncertainty in optically dynamic systems, albeit being operationally sensible. This alignment is further made more complicated due to the different timing of acquisition and illumination geometry, as investigated at multiple scales and with multiple platforms [60]. Temporal mismatch is underappreciated and is rarely, if ever, quantified separately from overall model error. This is further limited by spatial representativeness.

Different spatial resolutions obtained for multiple Sentinel-2 (10–20 m), Landsat-8 (30 m) and Sentinel-3 (300 m) models directly influence the correlation of point-based field specimens to satellite pixels. Landsat imagery provides moderate-spatial-resolution data that are widely used for inland water quality applications. In this context, most studies rely on the visible-to-shortwave infrared (VIS–SWIR) bands, which are available at 30 m spatial resolution. Thermal infrared (TIR) bands, in contrast, are acquired at a native resolution of approximately 100 m and are primarily used for surface temperature retrieval rather than optical water quality parameter estimation. In relatively homogenous lakes, mismatch effects may be moderate, while in narrow or optically complex rivers, adjacency effects and mixed pixels contribute to bias [11,61]. Limited studies document spatial buffering or multi-pixel averaging methods, and even fewer study the impacts of spatial aggregation upon predictive robustness.

Performance evaluation is commonly reported using R², RMSE, and MAE with periodic use of MAPE. In addition, high coefficients of determination are commonly reported for tree ensemble methods [38,50], while deep learning architectures often demonstrate improved predictive performance under specific preprocessing and validation conditions [26]. Nevertheless, performance metrics are typically dataset-specific and seldom normalized across studies, which confounds cross-study comparability. Crucially, most studies rely on an internal partitioning scheme and not geographically independent validation. To further elucidate these recurrent methodological trends, Table 3 consolidates the top validation designs present in all studies reviewed, categorizing them according to partition method, matching of temporal sequences, spatial independence and uncertainty reporting. Table 3 shows that random train–test splits within one water body are the most common option [55,58], followed by k-fold cross-validation on the same water body [38,50,56]. Although semi-analytical calibration–validation frameworks are less common, they are more commonly applied in physics-driven investigations [55]. Nevertheless, truly independent geographic validation rarely takes place.

The patterns described in Table 3 suggest a structural limitation: most validation schemes retain spatial autocorrelation and do not test model transferability across separate water bodies. As a result, overfitting is a real threat, especially in a high-capacity setting. Random splits might lead to the model learning domain-specific spectral signatures rather than generalizable optical relationships for more individual sites. Cross-validation minimizes variance caused by arbitrary partitioning, but it does not ensure geographic independence. Multiscale integration studies aim to alleviate the issue by exploiting cross-platform integration performance, but cross-waterbody generalization is only minimally investigated [60].

Uncertainty reporting demonstrates comparable inconsistency. While aggregated predictive error is common, decomposing uncertainty into algorithmic, match-up, and preprocessing components is rare. For example, studies comparing multiple atmospheric processors [38,50] implicitly acknowledge the importance of preprocessing sensitivity but seldom isolate its contribution to total predictive error. It becomes hard to tell whether improvements are due to model architecture, feature engineering, or validation design without structured uncertainty decomposition. Thus, the comparative findings suggest that progress in modeling sophistication should be accompanied by equally robust validation tools. Standardizing temporal windows, reporting spatial aggregation procedures, implementing geographically independent validation, and explicitly disaggregating uncertainty are all likely to substantially improve reproducibility and methodological transparency in inland water quality remote sensing. The uncertainty reporting among the studies examined differs significantly; although several contain explicit measures with R² and RMSE, very few decompose uncertainty into identifiable components. There are three main sources of uncertainty:

Algorithmic uncertainty (variance in the model and unstable model parameters),
Matchup uncertainty (mismatch of field–satellite time and space),
Preprocessing uncertainty (residuals in atmospheric correction and feature harmonization).

In addition to these commonly recognized sources, it is important to consider that uncertainty propagation begins at the sensor level. Uncertainties associated with at-sensor radiance measurements—including radiometric calibration, detector sensitivity, and spectral response functions—directly influence subsequent transformations to at-sensor reflectance and, ultimately, to atmospherically corrected remote sensing reflectance (Rrs) [36,37]. The propagation of these uncertainties is further affected by sensor-specific characteristics, such as spectral band width and configuration. Narrow-band sensors (e.g., MODIS or hyperspectral instruments) may facilitate more stable atmospheric correction and spectral discrimination, whereas broader multispectral bands (e.g., Landsat) can introduce additional challenges due to spectral mixing and reduced sensitivity to specific absorption features [36].

Consequently, uncertainty propagation should be understood as a multistage process extending from at-sensor radiance to derived environmental parameters. Errors introduced during radiometric measurement and atmospheric correction may propagate through feature construction and machine learning models, ultimately affecting the reliability of retrieved variables such as chlorophyll-a, suspended solids, or nutrient concentrations. Despite its importance, this end-to-end uncertainty propagation is rarely quantified explicitly in current inland water remote sensing studies, representing a critical area for future methodological development [22].

Aggregated predictive error, without decoupling these components from the actual models, is commonly reported for the majority of publications. Consequently, reports of uncertainty are typically descriptive rather than diagnostic statements. Structured uncertainty frameworks that explicitly measure preprocessing sensitivity, temporal mismatch, and spatial representativeness effects would be valuable in future research. Subsequent decomposition would substantially improve interpretability and reproducibility of inland water quality modeling.

7. Current Challenges, Methodological Gaps and Future Research Directions

Despite the methodological improvements presented in Section 4 and Section 5, a number of persistent issues still stand in the way of the reliability and scalability of satellite-based inland water quality estimation methodologies. One major structural limitation is the lack of standardized pre-model data quality control, particularly in the application of masks (clouds, cloud shadows, land adjacency, and floating vegetation) and in the handling of invalid or missing pixels. In addition to introducing uncertainty, cloud masking can substantially reduce data availability, particularly in regions with persistent cloud cover, potentially biasing temporal analyses and limiting the detection of short-term events such as runoff-driven turbidity peaks or algal blooms Recent reviews also highlight that cloud masking is an ongoing major bottleneck for optical water applications, as default thresholding assumptions (e.g., low NIR water reflectance) often fail in optically complex (Case 2) waters, leading to the loss of valid observations and an increase in the need for the use of hybrid and ML masking (e.g., Fmask/CFmask, S2cloudless, IdePix) and sensor-based masking solutions [62].

This is directly in keeping with the operational practices of the reviewed empirical studies, where invalid pixels are eliminated from matchup windows using mask filtering (e.g., flagging pixels which appear during a match as cloud/land/floating plants and eliminating matchups that have insufficient valid pixels) [11]. A more critical second gap is systematic treatment of missing data and temporal discontinuity. Even in well-studied areas, long time series are often disrupted by clouds, cloud shadows, and sensor artifacts that lead to non-continuous records and biased sampling of extreme events (e.g., runoff-driven turbidity peaks) [57]. Recent synthesis studies underscore realistic steps such as multi-day compositing (e.g., 8–16 day products) and complementary sensor integration and identify novel reconstruction techniques (e.g., EOF-based reconstructions and newer progressive spatiotemporal gap-filling frameworks) as potential solutions for rebuilding discontinuous records when optical observations are unavailable [62]. To the extent that our review is guided by this concept, it offers implicit methodological guidance for the explicit use of masking (not merely preprocessing) and the documentation of missing data management as it influences the optical regimes that are modeled for training and validation. A third problem is adjacency effects and mixed pixels, particularly in rivers and narrow water bodies, where land reflectance contamination can overpower the water-leaving signal and propagate to features and model outputs. Recent reviews also note that some atmospheric processors contain adjacency-effect correction (e.g., SIMEC, adjacency modules within Sen2Cor/ATCOR), but these include assumptions that may fail in shallow, turbid, or bloom-dominated waters, precisely the conditions where robust monitoring is most needed [62]. This reiterates a previously noted shortcoming: many pipelines continue to underreport masking settings and correction parameterization, thereby reducing reproducibility across sites and sensors. Finally, the evaluation of generalization and uncertainty remains underdeveloped relative to the increasing complexity of current models. While deep learning is increasingly used for nonlinear inversion, recent reviews stress the growing importance of uncertainty estimation (e.g., Bayesian Neural Networks, mixture density networks) to provide credibility bounds rather than deterministic maps alone [23].

In inland waters, this is particularly relevant because uncertainty is often dominated not only by model variance but also by matchup uncertainty (spatiotemporal mismatch) and preprocessing residuals, yet these components are rarely separated in current practice. Taken together, these gaps motivate several near-term research directions. First, the field would benefit from standardized, sensor-aware quality control protocols that report (i) which masks were applied (cloud/cloud shadow/land/floating vegetation), (ii) how many pixels remained after masking, and (iii) what thresholds or ML models were used, given that masking choices can remove a substantial fraction of otherwise usable observations in complex waters [11,62]. Second, inland water studies should more explicitly address missing data bias by reporting clear-sky availability and adopting transparent gap-handling strategies (e.g., compositing, spatiotemporal reconstruction, or multi-sensor fusion) when building long-term products [55,62].

Third, future work should prioritize transferability tests (cross-waterbody/cross-season/cross-sensor) paired with uncertainty decomposition, so that improvements can be attributed to modeling advances versus upstream data limitations. Finally, the increasing availability of cloud computing and harmonized archives creates an opportunity to benchmark workflows under common protocols, which would directly address the comparability limitations identified in Section 4 and Section 5.

8. Conclusions

This review provides a structured synthesis of the use of remote sensing and artificial intelligence for water quality estimation in lakes and rivers, with particular attention to image processing workflows and validation strategies. The analysis indicates that reported model performance is not solely a function of algorithm selection but is strongly conditioned by the interaction among atmospheric correction, spectral feature construction, modeling approach, and validation design. Atmospheric correction emerges as a critical stage, especially in optically complex inland waters, where residual errors may propagate into feature sensitivity and influence downstream predictive modeling. In parallel, feature engineering has evolved from simple band ratios to multiband and multiscale representations, which can enhance modeling capacity but also increase sensitivity to preprocessing inconsistencies and potentially limit generalizability. A central methodological challenge identified across the reviewed studies relates to validation design. Many studies rely on internally partitioned datasets, with limited use of geographically independent validation and insufficient characterization of uncertainty sources. As a result, reported performance metrics may not fully reflect model transferability or operational robustness. Rather than resolving ongoing methodological debates, this review aims to provide guidance by organizing and clarifying how preprocessing choices and validation strategies influence reported outcomes. The synthesis suggests that improving transparency in preprocessing workflows, adopting more consistent validation practices, and incorporating explicit uncertainty analysis can support more reliable and comparable assessments. Overall, advancing remote-sensing-based water quality estimation will likely depend on continued efforts to improve methodological consistency, reporting clarity, and cross-site evaluation, particularly in the context of heterogeneous and optically complex inland water systems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/limnolrev26020019/s1, Table S1: Selected studies included in the analytical corpus (2020–early 2026).

Author Contributions

Conceptualization, B.S.-R. and V.Z.-G.; methodology, J.A.L. and F.H.V.-G.; investigation C.A.G.-G., J.A.L. and F.H.V.-G.; writing—original draft preparation, V.Z.-G.; writing—review and editing C.A.G.-G., B.S.-R., J.A.L. and F.H.V.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The author would like to thank the Secretaría de Ciencias, Humanidades, Tecnología e Innovación de México (SECHITI) for the support through the student maintenance scholarship CVU No. 1324892. During the preparation of this study, the authors used OpenAI and GPT-5 for language editing, figure generation, and stylistic enhancement. The authors have reviewed and edited the output and assume full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AC	Atmospheric Correction
ACOLITE	Atmospheric Correction for OLI ‘Lite’
AOPs	Apparent Optical Properties
ANN	Artificial Neural Network
ASDs	Analytical Spectral Devices
BOD	Biochemical Oxygen Demand
BOD5	Biochemical Oxygen Demand (5-day)
C2RCC	Case 2 Regional Coast ColoSHuttler
C2X-Nets	Case 2 Extreme Neural Networks
CART	Classification and Regression Trees
CDOM	Colored Dissolved Organic Matter
CNN	Convolutional Neural Network
COD	Chemical Oxygen Demand
CODMn	Permanganate Index (Chemical Oxygen Demand Mn method)
Chl-a	Chlorophyll-a
DEM	Digital Elevation Model
DL	Deep Learning
DO	Dissolved Oxygen
EOF	Empirical Orthogonal Functions
ETR	Extremely Randomized Trees
FLAASH	Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes
GBT	Gradient Boosting Trees
GCPs	Ground Control Points
HydroSHEDS	Hydrological data and maps based on Shuttle Elevation Derivatives at multiple Scales
IOPs	Inherent Optical Properties
iCOR	Atmospheric Correction for Inland and Coastal Waters
JRC	Joint Research Centre (Global Surface Water dataset)
KNN	K-Nearest Neighbors
LGBM	Light Gradient Boosting Machine
LOOCV	Leave-One-Out Cross-Validation
LULC	Land Use / Land Cover
LaSRC	Landsat Surface Reflectance Code
LWIR	Long-Wave Infrared
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MNDWI	Modified Normalized Difference Water Index
ML	Machine Learning
MODIS	Moderate Resolution Imaging Spectroradiometer
MSI	MultiSpectral Instrument
NB	Naïve Bayes
NDBI	Normalized Difference Built-up Index
NDCI	Normalized Difference Chlorophyll Index
NDWI	Normalized Difference Water Index
NDVI	Normalized Difference Vegetation Index
NH4-N	Ammonium Nitrogen
NH3-N	Ammonia Nitrogen
NIR	Near-Infrared
OLCI	Ocean and Land Color Instrument
OLI	Operational Land Imager
PLS	Partial Least Squares
PLSR	Partial Least Squares Regression
POLYMER	Polymer-based algorithm applied to MERIS
RER	Red-edge Ratio
RF	Random Forest
RMSE	Root Mean Square Error
RNN	Recurrent Neural Network
RPD	Ratio of Performance to Deviation
Rrs	Remote Sensing Reflectance
SAR	Synthetic Aperture Radar
SeaDAS	SeaWiFS Data Analysis System
SR	Surface Reflectance
SVM	Support Vector Machine
SVR	Support Vector Regression
SWIR	Shortwave Infrared
Sen2Cor	Sentinel-2 Atmospheric Correction Processor
TIR	Thermal Infrared
TN	Total Nitrogen
TP	Total Phosphorus
TSI	Trophic State Index
TSM	Total Suspended Matter
TSS	Total Suspended Solids
TOA	Top of Atmosphere reflectance
UAV	Unmanned Aerial Vehicle
VH	Vertical transmit–Horizontal receive (polarization)
VIS	Visible Spectrum
XGBoost	Extreme Gradient Boosting
6SV	Second Simulation of the Satellite Signal in the Solar Spectrum
k-fold	k-fold Cross-Validation

References

Strokal, M.; Spanier, J.E.; Kroeze, C.; Koelmans, A.A.; Flörke, M.; Franssen, W.; Hofstra, N.; Langan, S.; Tang, T.; van Vliet, M.T.H.; et al. Global multi-pollutant modelling of water quality: Scientific challenges and future directions. Curr. Opin. Environ. Sustain. 2019, 36, 116–125. [Google Scholar] [CrossRef]
Lyu, F.; Zhang, H.; Dang, C.; Gong, X. A novel framework for water accounting and auditing for efficient management of industrial water use. J. Clean. Prod. 2023, 395, 136458. [Google Scholar] [CrossRef]
Chai, Y.; Yue, Y.; Borthwick, A.G.L.; Wang, Y.; Slater, L.; She, D.; Feng, D.; Miao, C. Water resources remain sustainable in global revegetated regions. Sci. Bull. 2025, 70, 4080–4090. [Google Scholar] [CrossRef]
Bose, D.; Bhattacharya, R.; Kaur, T.; Banerjee, R.; Bhatia, T.; Ray, A.; Batra, B.; Mondal, A.; Ghosh, P.; Mondal, S. Overcoming water, sanitation, and hygiene challenges in critical regions of the global community. Water-Energy Nexus 2024, 7, 277–296. [Google Scholar] [CrossRef]
Vardon, M.J.; Le, T.H.L.; Martinez-Lagunes, R.; Pule, O.B.; Schenau, S.; May, S.; Grafton, R.Q. Accounting for water: A global review and indicators of best practice for improved water governance. Ecol. Econ. 2025, 227, 108396. [Google Scholar] [CrossRef]
Lee, Q.X.; Teo, F.Y.; Selvarajoo, A.; Lim, S.P.; Goh, H.B.; Falconer, R.A. A Review of Assessment Methods for Coastal Hydro-Environmental Processes: Research Trends and Challenges. Water 2025, 17, 3278. [Google Scholar] [CrossRef]
Denpetkul, T.; Pumkaew, M.; Sittipunsakda, O.; Srathongneam, T.; Mongkolsuk, S.; Sirikanchana, K. Risk-based critical concentrations of enteric pathogens for recreational water criteria and recommended minimum sample volumes for routine water monitoring. Sci. Total Environ. 2024, 950, 175234. [Google Scholar] [CrossRef] [PubMed]
Marisa, M.-H.; Adrián, F.-R.; Jannice, A.-V.; Misael Sebastián, G.-H.; Diego, D.-V. Water quality management in a tropical karstic system influenced by land use in Chiapas, Mexico. Environ. Chall. 2024, 16, 100981. [Google Scholar] [CrossRef]
Saadi, H.; Diongue, D.M.L.; Ogilvie, A.; Martin, D.; Tall, O.; Bellanger, J.; Ndiaye, A.; Faye, S. Seasonal variations and drivers of water quality in semi-arid freshwater lakes: Multivariate spatial analysis in Lake Guiers, Senegal. J. Hydrol. Reg. Stud. 2025, 61, 102695. [Google Scholar] [CrossRef]
Silva, G.M.; Campos, D.F.; Brasil, J.A.; Tremblay, M.; Mendiondo, E.M.; Ghiglieno, F. Advances in Technological Research for Online and In Situ Water Quality Monitoring—A Review. Sustainability 2022, 14, 5059. [Google Scholar] [CrossRef]
Virdis, S.G.P.; Xue, W.; Winijkul, E.; Nitivattananon, V.; Punpukdee, P. Remote sensing of tropical riverine water quality using sentinel-2 MSI and field observations. Ecol. Indic. 2022, 144, 109472. [Google Scholar] [CrossRef]
Villota-González, F.H.; Sulbarán-Rangel, B.; Zurita-Martínez, F.; Gurubel-Tun, K.J.; Zúñiga-Grajeda, V. Assessment of Machine Learning Models for Remote Sensing of Water Quality in Lakes Cajititlán and Zapotlán, Jalisco—Mexico. Remote Sens. 2023, 15, 5505. [Google Scholar] [CrossRef]
Topp, S.N.; Pavelsky, T.M.; Jensen, D.; Simard, M.; Ross, M.R.V. Research Trends in the Use of Remote Sensing for Inland Water Quality Science: Moving Towards Multidisciplinary Applications. Water 2020, 12, 169. [Google Scholar] [CrossRef]
Majid, A.; Ikhsan, N.; Hassan, Z. Utility of satellite imagery in estimating coastal marine water attributes. Cont. Shelf Res. 2025, 292, 105509. [Google Scholar] [CrossRef]
Ngamile, S.; Madonsela, S.; Kganyago, M. Trends in remote sensing of water quality parameters in inland water bodies: A systematic review. Front. Environ. Sci. 2025, 13, 1549301. [Google Scholar] [CrossRef]
Arias-Rodriguez, L.F.; Duan, Z.; Díaz-Torres, J.D.; Basilio Hazas, M.; Huang, J.; Kumar, B.U.; Tuo, Y.; Disse, M. Integration of Remote Sensing and Mexican Water Quality Monitoring System Using an Extreme Learning Machine. Sensors 2021, 21, 4118. [Google Scholar] [CrossRef] [PubMed]
Casey, K.A.; Rousseaux, C.S.; Gregg, W.W.; Boss, E.; Chase, A.P.; Craig, S.E.; Mouw, C.B.; Reynolds, R.A.; Stramski, D.; Ackleson, S.G.; et al. A global compilation of in situ aquatic high spectral resolution inherent and apparent optical property data for remote sensing applications. Earth Syst. Sci. Data 2020, 12, 1123–1139. [Google Scholar] [CrossRef]
Lo Prejato, M.; McKee, D.; Mitchell, C. Inherent Optical Properties-Reflectance Relationships Revisited. J. Geophys. Res. Ocean. 2020, 125, e2020JC016661. [Google Scholar] [CrossRef]
Sun, Y.; Wang, D.; Li, L.; Ning, R.; Yu, S.; Gao, N. Application of remote sensing technology in water quality monitoring: From traditional approaches to artificial intelligence. Water Res. 2024, 267, 122546. [Google Scholar] [CrossRef] [PubMed]
Jaywant, S.A.; Arif, K.M. Remote Sensing Techniques for Water Quality Monitoring: A Review. Sensors 2024, 24, 8041. [Google Scholar] [CrossRef]
Arias-Rodriguez, L.F.; Tüzün, U.F.; Duan, Z.; Huang, J.; Tuo, Y.; Disse, M. Global Water Quality of Inland Waters with Harmonized Landsat-8 and Sentinel-2 Using Cloud-Computed Machine Learning. Remote Sens. 2023, 15, 1390. [Google Scholar] [CrossRef]
Pang, Z.; Zhou, Z.; Fu, J.e.; Jiang, W.; Qin, X.; Sun, M. Deep learning-based remote sensing retrieval of inland water quality: A review. J. Hydrol. Reg. Stud. 2025, 61, 102759. [Google Scholar] [CrossRef]
Wu, Z.; Pang, J.; Li, J.; Wang, Y.; Ruan, J.; Zhang, X.; Yang, L.; Pang, Y.; Gao, Y. A review of remote sensing-based water quality monitoring in turbid coastal waters. Intell. Mar. Technol. Syst. 2025, 3, 24. [Google Scholar] [CrossRef]
Lioumbas, J.; Christodoulou, A.; Katsiapi, M.; Xanthopoulou, N.; Stournara, P.; Spahos, T.; Seretoudi, G.; Mentes, A.; Theodoridou, N. Satellite remote sensing to improve source water quality monitoring: A water utility’s perspective. Remote Sens. Appl. Soc. Environ. 2023, 32, 101042. [Google Scholar] [CrossRef]
Rossoni, R.B.; Laipelt, L.; Paiva, R.C.D.d.; Fan, F.M. Remote sensing and big data: Google Earth Engine data to assist calibration processes in hydro-sediment modeling on large scales. Remote Sens. Appl. Soc. Environ. 2024, 36, 101352. [Google Scholar] [CrossRef]
Qiao, H.; Lee, Z.; Wang, D.; Zheng, Z.; Ye, X.; Dou, C. One-step retrieval of water-quality parameters from satellite top-of-atmosphere measurements. Remote Sens. Environ. 2025, 323, 114709. [Google Scholar] [CrossRef]
Deng, Y.; Zhang, Y.; Pan, D.; Yang, S.X.; Gharabaghi, B. Review of Recent Advances in Remote Sensing and Machine Learning Methods for Lake Water Quality Management. Remote Sens. 2024, 16, 4196. [Google Scholar] [CrossRef]
Yang, H.; Kong, J.; Hu, H.; Du, Y.; Gao, M.; Chen, F. A Review of Remote Sensing for Water Quality Retrieval: Progress and Challenges. Remote Sens. 2022, 14, 1770. [Google Scholar] [CrossRef]
Niu, C.; Tan, K.; Wang, X.; Du, P.; Pan, C. A semi-analytical approach for estimating inland water inherent optical properties and chlorophyll a using airborne hyperspectral imagery. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103774. [Google Scholar] [CrossRef]
Maciel, D.A.; Pahlevan, N.; Barbosa, C.C.F.; Martins, V.S.; Smith, B.; O’Shea, R.E.; Balasubramanian, S.V.; Saranathan, A.M.; Novo, E.M.L.M. Towards global long-term water transparency products from the Landsat archive. Remote Sens. Environ. 2023, 299, 113889. [Google Scholar] [CrossRef]
Pahlevan, N.; Smith, B.; Schalles, J.; Binding, C.; Cao, Z.; Ma, R.; Alikas, K.; Kangro, K.; Gurlin, D.; Hà, N.; et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sens. Environ. 2020, 240, 111604. [Google Scholar] [CrossRef]
Atwood, E.C.; Jackson, T.; Laurenson, A.; Jönsson, B.F.; Spyrakos, E.; Jiang, D.; Sent, G.; Selmes, N.; Simis, S.; Danne, O.; et al. Framework for Regional to Global Extension of Optical Water Types for Remote Sensing of Optically Complex Transitional Water Bodies. Remote Sens. 2024, 16, 3267. [Google Scholar] [CrossRef]
Monitoring Coastal and Estuarine Water Quality: Transitioning from MODIS to VIIRS. NASA Applied Remote Sensing Training Program (AR-SET). Available online: https://www.earthdata.nasa.gov/learn/trainings/monitoring-coastal-estuarine-water-quality-transitioning-from-modis-viirs (accessed on 30 April 2026).
Kääb, A.; Winsvold, S.H.; Altena, B.; Nuth, C.; Nagler, T.; Wuite, J. Glacier Remote Sensing Using Sentinel-2. Part I: Radiometric and Geometric Performance, and Application to Ice Velocity. Remote Sens. 2016, 8, 598. [Google Scholar] [CrossRef]
Gholizadeh, M.H.; Melesse, A.M.; Reddi, L. A Comprehensive Review on Water Quality Parameters Estimation Using Remote Sensing Techniques. Sensors 2016, 16, 1298. [Google Scholar] [CrossRef]
Warren, M.A.; Simis, S.G.H.; Martinez-Vicente, V.; Poser, K.; Bresciani, M.; Alikas, K.; Spyrakos, E.; Giardino, C.; Ansper, A. Assessment of atmospheric correction algorithms for the Sentinel-2A MultiSpectral Imager over coastal and inland waters. Remote Sens. Environ. 2019, 225, 267–289. [Google Scholar] [CrossRef]
Vanhellemont, Q.; Ruddick, K. Atmospheric correction of Sentinel-3/OLCI data for mapping of suspended particulate matter and chlorophyll-a concentration in Belgian turbid coastal waters. Remote Sens. Environ. 2021, 256, 112284. [Google Scholar] [CrossRef]
Fu, B.; Lao, Z.; Liang, Y.; Sun, J.; He, X.; Deng, T.; He, W.; Fan, D.; Gao, E.; Hou, Q. Evaluating optically and non-optically active water quality and its response relationship to hydro-meteorology using multi-source data in Poyang Lake, China. Ecol. Indic. 2022, 145, 109675. [Google Scholar] [CrossRef]
Guo, H.; Huang, J.J.; Chen, B.; Guo, X.; Singh, V.P. A machine learning-based strategy for estimating non-optically active water quality parameters using Sentinel-2 imagery. Int. J. Remote Sens. 2021, 42, 1841–1866. [Google Scholar] [CrossRef]
Tian, S.; Guo, H.; Xu, W.; Zhu, X.; Wang, B.; Zeng, Q.; Mai, Y.; Huang, J.J. Remote sensing retrieval of inland water quality parameters using Sentinel-2 and multiple machine learning algorithms. Environ. Sci. Pollut. Res. 2023, 30, 18617–18630. [Google Scholar] [CrossRef] [PubMed]
Salas, E.A.L.; Kumaran, S.S.; Partee, E.B.; Willis, L.P.; Mitchell, K. Potential of mapping dissolved oxygen in the Little Miami River using Sentinel-2 images and machine learning algorithms. Remote Sens. Appl. Soc. Environ. 2022, 26, 100759. [Google Scholar] [CrossRef]
Adusei, Y.Y.; Quaye-Ballard, J.; Adjaottor, A.A.; Mensah, A.A. Spatial prediction and mapping of water quality of Owabi reservoir from satellite imageries and machine learning models. Egypt. J. Remote Sens. Space Sci. 2021, 24, 825–833. [Google Scholar] [CrossRef]
Li, S.; Song, K.; Wang, S.; Liu, G.; Wen, Z.; Shang, Y.; Lyu, L.; Chen, F.; Xu, S.; Tao, H.; et al. Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm. Sci. Total Environ. 2021, 778, 146271. [Google Scholar] [CrossRef]
Woo Kim, Y.; Kim, T.; Shin, J.; Lee, D.-S.; Park, Y.-S.; Kim, Y.; Cha, Y. Validity evaluation of a machine-learning model for chlorophyll a retrieval using Sentinel-2 from inland and coastal waters. Ecol. Indic. 2022, 137, 108737. [Google Scholar] [CrossRef]
Shi, L.; Gao, C.; Wang, T.; Liu, L.; Wu, Y.; You, X. Information extraction of seasonal dissolved oxygen in urban water bodies based on machine learning using sentinel-2 imagery: An open access application in Baiyangdian Lake. Ecol. Inform. 2024, 82, 102782. [Google Scholar] [CrossRef]
Gao, L.; Shangguan, Y.; Sun, Z.; Shen, Q.; Shi, Z. Estimation of Non-Optically Active Water Quality Parameters in Zhejiang Province Based on Machine Learning. Remote Sens. 2024, 16, 514. [Google Scholar] [CrossRef]
Zhang, J.; Meng, F.; Fu, P.; Jing, T.; Xu, J.; Yang, X. Tracking changes in chlorophyll-a concentration and turbidity in Nansi Lake using Sentinel-2 imagery: A novel machine learning approach. Ecol. Inform. 2024, 81, 102597. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Z.; Yue, H. Water quality parameters retrieval and nutrient status evaluation based on machine learning methods and Sentinel- 2 imagery: A case study of the Hongjiannao Lake. Environ. Monit. Assess. 2025, 197, 556. [Google Scholar] [CrossRef] [PubMed]
Greene, J.A.; Metlitsky, L.; Levine, A.; Foley, E.; Henry, M.; Azarderakhsh, M.; Blake, R.A.; Norouzi, H. A new perspective on estimating Chlorophyll-a concentrations using machine learning and remote sensing: A case study of New York state lakes. Ecol. Indic. 2025, 180, 114316. [Google Scholar] [CrossRef]
Tan, Z.; Simis, S.G.H.; Yang, C.; Shen, M.; Li, J.; Duan, H. Revealing two decades of chlorophyll-a dynamics in arid oligotrophic lakes of Xinjiang, China using a deep recurrent approach. Water Res. 2025, 285, 124058. [Google Scholar] [CrossRef]
Niu, C.; Tan, K.; Wang, X.; Pan, C. Mapping nutrient pollution in inland water bodies using multi-platform hyperspectral imagery and deep regression network. J. Hazard. Mater. 2025, 488, 137314. [Google Scholar] [CrossRef]
Singh, P.; Yadav, B. Ensemble-based mapping and trophic characterization of lentic water bodies in the Middle Ganga Basin under climate and land use change. J. Environ. Manag. 2026, 400, 128754. [Google Scholar] [CrossRef]
Monitoring Water Quality in Lakes and Coastal Regions Using STREAM. NASA Applied Remote Sensing Training Program (ARSET). Available online: https://www.earthdata.nasa.gov/learn/trainings/monitoring-water-quality-lakes-coastal-regions-using-stream (accessed on 30 April 2026).
Remote Sensing of Water Quality. U.S. Department of the Interior. Available online: https://eros.usgs.gov/doi-remote-sensing-activities/2023/usgs/remote-sensing-water-quality (accessed on 30 April 2026).
Li, C.; Odermatt, D.; Bouffard, D.; Wüest, A.; Kohn, T. Coupling remote sensing and particle tracking to estimate trajectories in large water bodies. Int. J. Appl. Earth Obs. Geoinf. 2022, 110, 102809. [Google Scholar] [CrossRef]
Pan, D.; Deng, Y.; Yang, S.X.; Gharabaghi, B. Recent Advances in Remote Sensing and Artificial Intelligence for River Water Quality Forecasting: A Review. Environments 2025, 12, 158. [Google Scholar] [CrossRef]
Li, N.; Zhang, Y.; Shi, K.; Zhang, Y.; Sun, X.; Wang, W.; Huang, X. Monitoring water transparency, total suspended matter and the beam attenuation coefficient in inland water using innovative ground-based proximal sensing technology. J. Environ. Manag. 2022, 306, 114477. [Google Scholar] [CrossRef]
Karthick, M.; Shanmugam, P.; Saravana Kumar, G. Long-term water quality assessment in coastal and inland waters: An ensemble machine-learning approach using satellite data. Mar. Pollut. Bull. 2024, 209, 117036. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Liu, L.; Liu, S.; Shi, Z.; Shi, C. The Application of Remote Sensing Technology in Inland Water Quality Monitoring and Water Environment Science: Recent Progress and Perspectives. Remote Sens. 2025, 17, 667. [Google Scholar] [CrossRef]
Fu, B.; Li, S.; Lao, Z.; Yuan, B.; Liang, Y.; He, W.; Sun, W.; He, H. Multi-sensor and multi-platform retrieval of water chlorophyll a concentration in karst wetlands using transfer learning frameworks with ASD, UAV, and Planet CubeSate reflectance data. Sci. Total Environ. 2023, 901, 165963. [Google Scholar] [CrossRef] [PubMed]
Cheng, K.H.; Chan, S.N.; Lee, J.H.W. Remote sensing of coastal algal blooms using unmanned aerial vehicles (UAVs). Mar. Pollut. Bull. 2020, 152, 110889. [Google Scholar] [CrossRef]
Diganta, M.T.M.; Uddin, M.G.; Rahman, A.; Olbert, A.I. A comprehensive review of various environmental factors’ roles in remote sensing techniques for assessing surface water quality. Sci. Total Environ. 2024, 957, 177180. [Google Scholar] [CrossRef]

Figure 1. Temporal distribution of publications retrieved using the general search string (2002–March 2026).

Figure 2. Conceptual framework of aquatic remote sensing: (A) optical foundations linking inherent optical properties (IOPs) and apparent optical properties (AOPs); (B) water constituents and their spectral response, distinguishing optically active and non-active components; and (C) retrieval pathway including atmospheric correction, surface reflectance, and direct and indirect approaches (including machine learning).

Figure 3. Spectral basis for inland water quality remote sensing: (a) atmospheric transmission windows and spectral bands of Sentinel-2 and Landsat 8 across the visible-to-thermal infrared range; (b) representative spectral responses of key water constituents (chlorophyll-a, water, suspended sediments, and CDOM), highlighting wavelength regions relevant for their detection. Adapted from [33,34].

Figure 4. Atmospheric correction effects on inland water remote sensing signals: (a) Conceptual decomposition of at-sensor radiance above a water body, showing the contributions of total at-sensor radiance, atmospheric path radiance, water-leaving radiance, and surface-reflected radiance. (b) Conceptual example of atmospheric correction, comparing top-of-atmosphere reflectance with water-leaving reflectance. The wavelength values represent representative spectral regions commonly used in multispectral remote sensing and are included for illustrative purposes rather than corresponding to a specific sensor configuration. (c,d) Representative imagery illustrating how atmospheric correction modifies both the magnitude of the spectral signal and the spatial contrast of the water body, which directly affects feature construction, model performance, and uncertainty in water quality retrieval. Adapted from [53,54].

Figure 5. Spectral feature engineering strategies for inland water quality retrieval. (a) Direct multispectral band. (b) Engineered spectral indices commonly applied in remote sensing of inland waters. (c) Multiscale spectral integration framework.

Table 1. Summary of optically active water quality indicators monitored by remote sensing.

Parameter	Definition	Purpose of Measurement	Dominant Spectral Response
Chlorophyll-a	Photosynthetically active compounds found in plants, algae, and cyanobacteria that convert light into energy for photosynthesis.	Lake productivity/trophic state, detection of harmful algal blooms.	Fluorescent—680 nm. Absorption—450–475 nm and 670 nm. Backscattering: ~550 nm and ~700 nm.
Total suspended solids (TSS).	Inorganic and organic particles suspended throughout a water column.	Inorganic sediment flow, biogeochemical cycles, light conditions.	Peak reflectance between ~500 and ~800 depending on the concentration.
Colored dissolved organic matter (CDOM).	Colored portion of total dissolved organic carbon.	Carbon production and cycle, light conditions.	Highly absorbent, especially below 500 nm.
Turbidity	Turbidity is an optical property related to the scattering and attenuation of light caused by suspended particles in water.	Lake productivity/trophic state, light conditions, sediment concentrations, detection of harmful algal blooms.	Highly dependent on particle composition and concentration but generally associated with increased reflectance as turbidity rises.

Table 2. Comparative analysis of image processing workflows for inland water quality estimation (2020–March 2026).

Water Body/Region	Sensor/Platform	Atmospheric Correction/Preprocessing	Modeling Approach	Validation	Reference
Poyang Lake, China	Multi-source multispectral + hyperspectral imagery	Multi-source workflow; exact AC not explicit in the abstract	Ensemble learning + LOOCV-ML	LOOCV; estimation of Chl-a, TSM, TP, TN, NH₄-N and BOD₅	[38]
Small urban waterbodies, China	Sentinel-2	Screening of 255 band combinations; AC not explicit in the accessible abstract	RF, SVR, NN	Comparison of models for TP, TN and COD	[39]
Inland reservoirs	Sentinel-2	AC not explicit in the abstract	XGBoost, SVR, RF, ANN	XGBoost was the best; 2018–2020 rebuild for Chl-a, DO and NH₃-N	[40]
Little Miami River, USA	Sentinel-2 (10 m)	Spectral predictors derived from Sentinel-2	RF, SVM	RMSE of 0.201–0.241 mg/L for DO	[41]
Owabi Reservoir, Ghana	Sentinel-2, Landsat-8	Intersensor comparison; AC not explicit in the abstract	RF, SVM, MLR	Repeated k-fold CV; Sentinel-2 + RF was the best	[42]
45 typical lakes, China	Sentinel-2 MSI	C2R TSSCC processor + clustering k-means under Rrs	LR, SVM, CatBoost	SVM was the best; validation with R² = 0.88	[43]
78 lakes and estuaries, South Korea	Sentinel-2 MSI	6-band Rrs + 4 spectral ratios	LGBM and other ML models	R² = 0.75; RMSE = 15.15 mg m⁻³	[44]
Baiyangdian Lake, China	Sentinel-2	AC not explicitly stated in the abstract	9 ML algorithms; ETR was the best	251 in situ matchups—image; seasonal assessment of DO	[45]
Zhejiang Province, China	Sentinel-2 + water quality station data	Data preparation + optical combinations; processor not explicitly stated in accessible text	SVR, RF, XGBoost, KNN	Spatial mapping of CODMn, DO, TN and TP	[46]
Nansi Lake, China	Sentinel-2	20/17 bands–features + SHAP	Stacking of 8 ML models	Stacking improved ~12% compared to XGBoost at extreme values	[47]
Hongjiannao Lake, China	Sentinel-2	Boruta, RFE and SHAP for variable selection	RF, BP Neural Network, SVM	Evaluation using R², RMSE, MAE and RPD	[48]
New York State lakes, USA	Landsat-8/9 + Sentinel-2	Inland water atmospheric correction + basin covariates	Several ML models; ETR was the best	R² = 0.72; RMSE = 8.19 μg/L	[49]
Major lakes >100 km², Xinjiang, China	MODIS (2002–2023)	Spectral sequence modeling	RNN + comparison with conventional models	R² = 0.72; RMSE = 0.75 mg m⁻³	[50]
Yangtze Delta “One River and Three Lakes”, China	Ground–air–satellite hyperspectral data	Multi-source hyperspectral workflow; patch size analysis	Channel-attention deep regression	R² de 0.8137–0.8315 para TN, TP and NH3-N	[51]
Cross-mission inland/coastal waters	Sentinel-2 MSI + Sentinel-3 OLCI	SeaDAS, POLYMER and ACOLITE	Mixture density network (MDN)	Independent matchups; useful for cross-mission harmonization	[31]
Haridwar watershed (LSWBs), India	Multi-source: Landsat-7/8 + Sentinel-2 + Sentinel-1 (SAR)	Multi-sensor workflow; TOA (Landsat) + SR (Sentinel-2); spectral indices (NDWI, MNDWI, NDVI, NDBI); SAR (VH); DEM, LULC, HydroSHEDS, JRC masking	Ensemble learning (RF, SVM, GBT, KNN, CART, NB) + majority voting; TSI modeling	Independent field validation; mapping accuracy = 87.27% (kappa ≈ 0.69); TSI accuracy = 78.57% (kappa ≈ 0.74)	[52]

Table 3. Dominant validation patterns highlighting temporal matching practices, spatial independence, partition strategies, and uncertainty reporting.

Validation Pattern	Temporal Matching	Spatial Independence	Partition Strategy	Uncertainty Reporting	Representative Studies
Random train–test split within study dataset	Same-day or ±1 day	No geographic independence	Random split	Aggregated error only	[57,58]
Calibration–validation subset within single lake/system	Same-day	Single lake	Calibration–validation division	Partial	[55]
Multiscale/cross-platform validation	Coordinated acquisition	Partial scale independence	Cross-scale CV	Limited	[51,60]
Deep learning architectures with internal partition	Not clearly specified	Multi-site (not independent testing)	Random split	Aggregated	[26]
Independent field validation with external GCPs	Not strictly same-day (field campaign vs annual composites)	Multi-site independent validation (external dataset)	Train/test split + independent validation dataset	Accuracy + kappa coefficient (reported separately for mapping and TSI)	[52]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zúñiga-Grajeda, V.; Lomeli, J.A.; Villota-González, F.H.; García-García, C.A.; Sulbarán-Rangel, B. Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies. Limnol. Rev. 2026, 26, 19. https://doi.org/10.3390/limnolrev26020019

AMA Style

Zúñiga-Grajeda V, Lomeli JA, Villota-González FH, García-García CA, Sulbarán-Rangel B. Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies. Limnological Review. 2026; 26(2):19. https://doi.org/10.3390/limnolrev26020019

Chicago/Turabian Style

Zúñiga-Grajeda, Virgilio, Jennifer Aleysha Lomeli, Freddy Hernán Villota-González, César Alejandro García-García, and Belkis Sulbarán-Rangel. 2026. "Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies" Limnological Review 26, no. 2: 19. https://doi.org/10.3390/limnolrev26020019

APA Style

Zúñiga-Grajeda, V., Lomeli, J. A., Villota-González, F. H., García-García, C. A., & Sulbarán-Rangel, B. (2026). Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies. Limnological Review, 26(2), 19. https://doi.org/10.3390/limnolrev26020019

Article Menu

Estimation of Water Quality in Lakes and Rivers Using Remote Sensing and Artificial Intelligence: A Review of Image Processing and Validation Strategies

Abstract

1. Introduction

2. Materials and Methods

2.1. Search Strategy and Type of Review

2.2. Selection Criteria and Analytical Procedure

3. Water Quality Parameters Estimable by Remote Sensing

4. Remote Sensing Platforms and Sensors for Lakes and Rivers

5. Image Processing Workflows for Water Quality Estimation

5.1. Atmospheric Correction Strategies in Inland Waters

5.2. Spectral Feature Construction and Feature Engineering

5.3. Modeling Approaches and Validation Design

6. Validation Strategies and Performance Metrics

7. Current Challenges, Methodological Gaps and Future Research Directions

8. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI