Next Article in Journal
Inversion and Interpretability Analysis of Bottom-Water Dissolved Oxygen in the Bohai Sea Using Multi-Source Remote Sensing Data
Previous Article in Journal
A Dual-Branch Perception Network for High-Precision Oriented Object Detection in Remote Sensing
Previous Article in Special Issue
Meridional Changes in Satellite Chlorophyll and Fluorescence in Optically-Complex Coastal Waters of Northern Patagonia
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deciphering Multi-Scale Anthropogenic Drivers of River Water Quality: A Synergistic ML-GAM Cascade Framework with Sentinel-2

1
Key Laboratory of Ministry of Education for Coastal and Wetland Ecosystems, Xiamen University, Xiamen 361102, China
2
Global Ocean Negative Carbon Emissions (ONCE) Program, Carbon Neutral Innovation Research Center, Fujian Key Laboratory of Marine Carbon Sequestration, Xiamen University, Xiamen 361102, China
3
Fujian Provincial Environmental Monitoring Center Station, Fuzhou 350003, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2026, 18(5), 840; https://doi.org/10.3390/rs18050840
Submission received: 15 January 2026 / Revised: 14 February 2026 / Accepted: 26 February 2026 / Published: 9 March 2026
(This article belongs to the Special Issue Remote Sensing of Inland Waters and Their Catchments (2nd Edition))

Highlights

What are the main findings?
  • CatBoost was identified as the optimal model for retrieving key water quality parameters (TN, TP, CODMn, and turbidity) from Sentinel-2 imagery, demonstrating superior accuracy and robustness in a dynamic fluvial system.
  • Generalized additive models (GAMs) revealed scale-dependent and nonlinear responses of water quality to natural and anthropogenic drivers across buffer zones ranging from 50 m to 20 km, highlighting the multiphasic effects of factors such as forest cover, land use, and population density.
What are the implications of the main findings?
  • This study provides a transferable remote sensing–ML-GAM framework that moves beyond water quality mapping to quantitatively decipher multi-scale driver thresholds, supporting spatially explicit watershed zoning and targeted management strategies.
  • The findings offer actionable insights for differentiated pollution control—such as optimizing riparian buffers for nitrogen and phosphorus interception—and establish a basis for real-time, satellite-based monitoring to track management effectiveness in subtropical coastal rivers.

Abstract

While understanding the drivers of river water quality is crucial, the dependence on ground observations hinders the accurate quantification of driver thresholds, as well as the scale-dependent effects of buffer zones. By transcending the limitations of ground observations, satellite remote sensing provides the spatially continuous data required to define effective buffer zones and determine the threshold intervals for natural and anthropogenic drivers, effectively promoting sustainable watershed management. Herein, we determined the total nitrogen (TN), total phosphorus (TP), permanganate index (CODMn), and turbidity in the Minjiang River of Fujian Province by synergizing Sentinel-2 imagery and in situ data (2021–2024). Subsequently, we further employed generalized additive models (GAMs) considering scale-dependent (50 m to 20 km) characteristics to screen and evaluate the natural–anthropogenic factors influencing the water quality indicators. The GAMs revealed that TN exhibited multiphasic responses to forest cover and water area, characterized by alternating positive and negative effects across their range. TP was found to be predominantly driven by agricultural and urban land use, showing clear scale–threshold effects. This study provides an integrated framework that moves beyond retrieval to quantitatively assess the impact of multi-scale natural–anthropogenic factors, offering actionable insights for precise watershed zoning and science-based management for the sustainable development of river systems.

1. Introduction

Inland waters are vital for supporting fundamental ecological functions and socio-economic development worldwide [1]. Global warming and intensive human activities are exerting unprecedented pressure on these systems [2,3,4], and this is severely degrading the water quality of many rivers [5,6]. As water quality parameters (WQPs) serve as key indicators of aquatic ecosystem health [7], effective monitoring is essential for timely risk detection and management. Conventionally, water quality assessment primarily relies on laboratory analyses, on-site testing, and automated monitoring stations [8]. While these practices offer temporal continuous high-precision data, they still suffer from a lengthy processing time, high costs, and a limited spatial coverage [9,10], thereby failing to achieve the continuous spatial monitoring of river systems, which remains a challenge for regulatory agencies.
The sparse distribution of monitoring stations is inherently insufficient to capture the continuous gradients of the impacts of key natural–anthropogenic factors, thereby constraining the effectiveness of long-term WQP data for shaping sustainable management strategies. River WQPs are influenced by a complex combination of human activities (e.g., land use and population density) and natural factors (e.g., temperature, precipitation, and vegetation) [11,12,13,14,15,16,17]. Crucially, the influences of these drivers are not linear and exhibit significant scale dependence [2,14,15]. The effect of a factor can vary dramatically across spatial scales, from the immediate riparian zone to the broader watershed. A key bottleneck lies in the disconnect between point-based monitoring and socio-ecological interpretation, hindering the translation of limited observations into the spatially explicit evidence necessary for understanding complex watershed systems.
Remote sensing technology, with its high temporal frequency and broad coverage, offers a powerful solution for comprehensive and cost-effective water quality monitoring over large areas [18,19]. The availability of land-based satellite sensors such as Sentinel-2, which has superior spatiotemporal resolution, has advanced the retrieval of WQPs of inland waters [20,21,22,23,24]. Furthermore, the integration of high-resolution satellite imagery with machine learning (ML) and deep learning methods holds great potential for estimating both optically active constituents and non-optically active parameters such as total nitrogen (TN) and total phosphorus (TP) in diverse water bodies [22,23,24,25,26,27,28,29].
Although parameters such as TN and TP do not possess direct optical signatures, they often co-vary with optically active constituents (OACs) such as chlorophyll-a, total suspended solids (TSSs), and colored dissolved organic matter (CDOM) due to coupled biogeochemical processes in aquatic systems [30,31]. Remote sensing retrieves these parameters indirectly by capturing the spectral features of OACs, which serve as proxies. This indirect relationship, governed by the inherent optical properties (IOPs) and apparent optical properties (AOPs) of water bodies [32,33], forms the theoretical basis for employing statistical and machine learning models to establish empirical links between satellite reflectance spectra and in situ WQPs.
The Minjiang River, the largest river in Fujian Province, draining nearly half of the province’s area, presents an ideal system due to its complex topography, diverse land use, and significant anthropogenic pressures, serving as a representative model for subtropical coastal rivers in Southeast China. Its complete hydrological system from source to estuary, its location in a subtropical monsoon climate zone, and its intense socio-economic activity make it an ideal case for investigating water quality and its drivers in a dynamic system. The objectives of this study are (1) to map the spatial patterns of WQPs (TN, TP, turbidity, and CODMn) by utilizing in situ water quality data and Sentinel-2 imagery and applying multiple ML models and (2) to investigate the mechanisms by which natural and anthropogenic factors influence WQPs by quantitatively explaining these relationships across multiple spatial scales (from 50 m to 20 km) using generalized additive models (GAMs). This study establishes a transferable framework for high-precision riverine water quality monitoring and provides deeper, actionable insights into multi-scale environmental controls, supporting targeted zoning strategies and science-based management.

2. Materials and Methods

2.1. Study Area: The Minjiang River as a Representative Fluvial System

The Minjiang River is the largest independent river in Fujian Province, China, discharging into the East China Sea. It originates in Junkou Town, Jianning County, at the border between Fujian and Jiangxi Provinces. Three major tributaries—the Jian River, Futun River, and Sha River—converge near Nanping City to form the main stem, which flows into the East China Sea. With a total length of 562 km, the watershed spans 60,992 km2 (Figure 1). In terms of drainage area and discharge volume, the Jian River is the largest of the three headwaters, while the Sha River is the smallest. As Fujian’s largest river, the Minjiang River basin covers approximately half of the province’s total area. Not only does it perform vital ecological functions, providing habitats and breeding grounds for numerous rare flora and fauna, but it also supplies water for 40% of the province’s economic output and drinking water for one-third of its population.

2.2. Data Acquisition and Preprocessing

2.2.1. In Situ Water Quality Parameters

High-frequency water quality monitoring data were obtained from the China National Environmental Monitoring Centre (CNEMC). This dataset includes 11 national-level water quality monitoring stations within the Minjiang River area: Pengdun, Fangcun, Shilian, Yangkeng, Huangtian, Xiongjiang, Xiaxiyuan, Minan, Wenshanli, and Guantou (Figure 1a). These stations span the main stem, three major tributaries (Jianxi, Futunxi, and Shaxi), and the estuarine transition zone, covering key hydrological regions of the Minjiang River. Data were collected from 1 January 2021 to 31 December 2024, at 4 h intervals.
This study selected four WQPs as inversion targets: total phosphorus (TP, mg L−1), total nitrogen (TN, mg L−1), permanganate index (CODMn, mg L−1), and turbidity (NTU). To align with Sentinel-2 satellite overpass times, missing and anomalous values in the raw data were removed, and daily averages were calculated to represent true surface conditions for each day. This approach reduces noise from sub-daily variability and increases the number of valid satellite–field match-ups, a common practice in water quality remote sensing studies [34,35]. Descriptive statistics for each parameter are presented in Table 1.

2.2.2. Satellite Imagery and Cloud Masking

All satellite imagery were acquired and preprocessed on the Google Earth Engine (GEE) platform, a cloud-based geospatial processing system widely used for large-scale environmental analyses [36]. This study employed Sentinel-2 Level-2A surface reflectance data (Collection: COPERNICUS/S2_SR_HARMONIZED), which has undergone atmospheric correction and is suitable for water quality remote sensing [37]. The data span January 2021 to December 2024, consistent with the in situ monitoring period. From each image, 12 spectral bands were extracted, including visible, red-edge, near-infrared (NIR), and short-wave infrared (SWIR) bands (B1, B2, B3, B4, B5, B6, B7, B8, B8A, B11, and B12), as well as two quality bands (QA60 and MSK_CLDPRB) [38].
To minimize the impacts of clouds and their shadows on water surface spectral information, a dual cloud-masking strategy was applied. First, the QA60 quality band was used for bitmask-based masking to exclude pixels flagged as cloud or cirrus. Second, the cloud probability product (MSK_CLDPRB) was used for probabilistic masking, removing pixels with cloud probability exceeding 15% [21,39]. This rigorous dual-masking process ensured that only clear, cloud-free, high-quality pixels were retained for subsequent analysis. To minimize land adjacency effects and mixed-pixel contamination in narrow river segments, water pixels were extracted using a combined MNDWI threshold coupled with manual visual inspection.

2.2.3. Multi-Scale Geospatial Driver Dataset

To investigate the influence of human activities and natural factors on the spatiotemporal variation in water quality parameters, this study selected multiple geospatial drivers and analyzed them at various spatial scales. All data were acquired and processed on the GEE platform. We generated 100 randomly distributed sampling points within the Minjiang River. These 100 sampling points were generated for spatial representation, and each point contributed multi-year (2019–2024) monthly water quality retrievals, yielding thousands of observations for the subsequent GAM analysis. For each point, circular buffers with radii ranging from 50 m to 20 km were created (50, 100, 200, 500, 1000, 2000, 5000, 10,000, and 20,000 m) [2]. The interval increase was designed to capture the decay of landscape influences with distance. This multi-scale approach was designed to capture local, catchment, and broader watershed influences [12].
The specific factors and their data sources are listed in Table 2. For continuous variables (e.g., population density, precipitation, temperature, and the NDVI), the arithmetic mean was calculated within each buffer distance. For categorical variables (e.g., land use), the proportional area of different land cover types (such as urban, agricultural, forest, and water body cover) was computed. The temporal range for all factors was aligned with the water quality retrieval data from 2019 to 2024. The water quality value retrieved for each sampling point and month was linked to the driver variables (e.g., land use composition, the NDVI, and climatic factors) extracted for the corresponding month and year, ensuring that the analysis captured concurrent driver–response relationships. A multi-scale geospatial driver dataset was constructed for each sampling point to quantitatively assess the mechanisms driving WQP variations.
To mitigate the potential confounding effects of multicollinearity arising from the nested multi-scale buffer design, a two-step preprocessing procedure was implemented prior to GAM construction. Continuous variables (excluding NDVI) were first transformed using log1p to normalize distributions and stabilize variance. Subsequently, Variance Inflation Factor (VIF) screening was performed on the transformed dataset, and variables with VIF > 10 were sequentially removed to eliminate severe multicollinearity.

2.3. Methods

To transcend the limitations of conventional approaches, this study introduced an analytical framework that synergistically couples advanced retrieval with GAM explanation. Methodologically, the processes comprised three main steps (Figure 2). First, we moved beyond single-model reliance by implementing a suite of machine learning models and systematical comparison. Then, leveraging the robust retrieval results, we systematically identified hotspots and areas with high variability, which represent critical zones requiring additional attention for watershed management. Finally, we bridged the gap from mapping to mechanistic understanding by employing generalized additive models (GAMs) to quantify the nonlinear, multi-scale driving mechanisms behind the observed water quality patterns. This integrated pipeline constitutes a comprehensive solution for transforming satellite data into actionable insights for watershed management.

2.3.1. WQP Retrieval Using Machine Learning Methods

To develop robust and accurate WQP retrieval models, this study employed eight machine learning algorithms with distinct principles and advantages: Random Forest (RF), Gradient Boosting Tree (GBT), Support Vector Regression (SVR), XGBoost, CatBoost, and LightGBM [40,41,42,43,44,45]. Given the potential correlations and shared underlying mechanisms among different water quality parameters, a neural network-based multi-task learning model was also implemented [46]. This model simultaneously learns multiple related tasks—specifically, the retrieval of TN, TP, turbidity, and CODMn—by sharing latent feature representations [47]. It aims to improve generalizability and retrieval accuracy by leveraging domain knowledge shared across tasks. To avoid data leakage, we split the matched dataset into a training set (70%) and a completely independent test set (30%). All hyperparameter tuning and model selection processes were conducted using the training set. All performance metrics were calculated using this test set. All models were trained and validated using identical datasets to ensure comparability. Model development and validation were conducted in a Python (3.12.4) environment. The multi-task learning model was built using the TensorFlow (2.19.0)/Keras (3.9.2) framework, while the other models primarily utilized scikit-learn, XGBoost, CatBoost, and LightGBM libraries.
To comprehensively evaluate model performance, this study employed a suite of widely recognized metrics: the coefficient of determination (R2), root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). The MAPE expresses error as a relative percentage and is favored in applied contexts due to its interpretability. As a general guideline, MAPE values below 10% indicate strong predictive accuracy, while those between 10% and 20% are considered acceptable.
Direct use of raw band reflectance, while preserving full spectral information, may not optimally capture spectral features sensitive to specific water quality parameters. Therefore, this study performed feature engineering to compute several validated spectral indices known to be sensitive to inland water characteristics. These indices, derived from band combinations, enhance or highlight specific absorption and scattering features associated with TN, TP, CODMn, turbidity, and other parameters. The final set of 18 input variables included 10 core bands (B2, B3, B4, B5, B6, B7, B8, B8A, B11, and B12) and 8 derived spectral indices. The selected bands span blue, green, red, red-edge, and short-wave infrared regions, designed to capture the primary optical properties of the water body. The 8 derived spectral indices are detailed in Table 3.

2.3.2. Quantifying Driving Mechanisms Using GAMs

In this study, GAMs were used to quantify the nonlinear impacts of geographic, climatic, and temporal factors at multiple spatial scales on the water quality parameters of the Minjiang River [48]. GAMs are particularly suitable for analyzing ecological factor–water quality relationships due to their flexible smoothing functions, which can handle complex nonlinear relationships more effectively than traditional linear models (Figure 3).
The four predictor variables were log1p-transformed to mitigate the influence of extreme values or standardized where appropriate. A combination of the tensor product smoothing function and the univariate smoothing function was applied. Residual analyses were conducted to verify model fit, ensuring that the residuals met the assumptions of independence and normality.

3. Results

3.1. Comparative Performance of Machine Learning Retrieval Models

CatBoost achieved the best overall performance (Table 4), obtaining the highest R2 values (TN: 0.740; TP: 0.726; turbidity: 0.496; CODMn: 0.534) and the lowest RMSEs across all four parameters, demonstrating superior prediction accuracy and stability. Ensemble learning algorithms, particularly CatBoost, XGBoost, and GBT based on gradient boosting frameworks, significantly outperformed traditional machine learning approaches such as SVM, highlighting their advantage in capturing complex nonlinear spectral–water quality relationships. In contrast, turbidity and CODMn were more challenging to retrieve, with lower absolute performance across models (R2: 0.319–0.534), likely due to their more complex optical behaviors. Nevertheless, CatBoost and XGBoost maintained top performance. The multi-task learning model (PLE) achieved a high R2 (0.620) for turbidity, suggesting its particular strength in retrieving parameters with intricate optical properties. In summary, except for NTU using the PLE model, CatBoost demonstrated the most consistent and robust performance across the selected features and dataset (Figure 4).
SHAP analysis identified the NDVI, the NDCI, the NDTI, and B5 as the most influential features (Figure 5). The dominance of red-edge bands and the NDCI for TN and TP is consistent with their well-documented roles as proxies for chlorophyll-a, reflecting the empirical coupling between nutrient levels and phytoplankton biomass in eutrophic inland waters. The high importance of the NDTI and SWIR bands for turbidity and TP aligns with the particulate nature of suspended sediments and phosphorus transport. For CODMn, the contribution of water-sensitive indices (the MNDWI and NDWI) and red-edge bands captures both allochthonous (terrestrial humic) and autochthonous (algal-derived) organic matter sources. These feature–response linkages are grounded in established bio-optical and biogeochemical principles, lending credibility to the empirical retrieval framework.

3.2. Spatiotemporal Patterns and Trends of Water Quality Parameters

The optimal machine learning model and Sentinel-2 remote sensing data reveal significant heterogeneity in the annual average spatial distribution of the four WQPs (TN, TP, CODMn, and turbidity) in the Minjiang River. TN concentrations range from 1.45 to 2.39 mg/L (Figure 6a), and TP concentrations range from 1.61 to 2.10 mg/L (Figure 6c), with elevated levels concentrated in estuaries and certain tributaries, reflecting the coupling of anthropogenic inputs and hydrological conditions. CODMn ranges from 0.73 to 1.56 mg/L (Figure 6e), indicating spatial heterogeneity in organic pollution linked to land use and sewage discharge. Turbidity varies from 11.4 to 159.4 NTU (Figure 6g), with higher values in areas experiencing strong hydrological disturbance, underscoring dynamic sediment transport in flowing rivers.
Temporal trend (slope) analysis shows positive slopes in some regions for TN and TP, indicating increasing nutrient concentrations (Figure 6b,d), potentially due to rising agricultural and urban inputs, while negative slopes elsewhere suggest the effectiveness of localized pollution control. The slope patterns of CODMn and turbidity reflect a dynamic balance between organic pollution and sediment transport driven by human activities and natural processes such as precipitation and vegetation cover (Figure 6f,h). The six-year analysis of average concentrations and trends provides a quantitative basis for understanding the drivers of water quality change, supporting targeted management in complex river systems.
A continuous spatial sequence of “river source–tributary of Jiulong River–confluence–main stream–estuary” is selected to examine monthly mean water quality. The spatiotemporal dynamics of the monthly average WQPs from 2021 to 2024 (Figure 6i–l) reveal both geographical gradients and temporal fluctuations. Along the elevation gradient, many WQPs exhibit pronounced variations from source to estuary. Persistent color anomalies in tributary or confluence zones may indicate pollution entry points.
The monthly heatmaps in Figure 6i–l help reveal polluted segments and underscore the value of remote sensing in water quality surveillance. By detecting persistent high-concentration zones, pollution hotspots can be directly located. This high-resolution spatiotemporal approach overcomes the limitations of traditional point-based monitoring and continuously captures large-scale environmental heterogeneity, thereby providing quantitative support for pollution source tracing and zoned watershed management.

3.3. Scale-Dependent Responses of Water Quality to Geospatial Drivers

The GAMs showed strong performance in estimating WQPs, as shown in Figure S1. The observed and fitted values align closely along the 1:1 line, with no systematic bias. The normal Q-Q plot indicates that residuals follow the theoretical distribution, and the residual vs. predicted plot shows random scatter around zero, confirming the reliability of the model. Table 5 summarizes model performance for each WQP. The turbidity and CODMn models exhibited strong explanatory power, with adjusted R2 values of 0.678 and 0.658, respectively, and deviance explained of 68.2% and 66.3%. The TN model showed substantially lower performance (adjusted R2 = 0.293), while the TP model showed intermediate performance. The consistency between the R2, deviance explained, AIC, and BIC values confirms the robustness of these rankings.
The TN model indicates the existence of a complex, nonlinear relationship with riparian and watershed characteristics. At a 50 m scale (Figure 7a), the NDVI shows a prominent positive effect at moderate coverage, followed by a shift to inhibition at high coverage in TN. Vegetation may initially function as a nutrient sink through uptake. However, at high coverage, vegetation could be associated with conditions that promote nitrogen release, such as increased litter decomposition. At a broad watershed scale (20 km), the forest proportion curve for TN shows a continuous upward trend, while the NDVI exhibits an inhibitory effect (Figure 7d,f). This implies that land use composition (forest proportion) and vegetation functional intensity (the NDVI) can exert divergent effects on TN at large scales. The response to waterbodies also varies with scale: a “small-scale decrease-then-increase, large-scale continuous increase” trend was observed at 1 km, shifting to a continuous increase at 20 km (Figure 7b,e), which may reflect the interplay between local denitrification processes and broader hydrological connectivity influencing nitrogen transport.
TP dynamics are strongly influenced by land use, showing clear scale–threshold effects. TP responds nonlinearly to the NDVI at 50 m (Figure 8a), though the underlying mechanisms likely differ due to phosphorus’s particulate nature. A key finding is a decrease in TP with an increasing crop proportion (1 km) (Figure 8e), potentially reflecting the combined effects of crop uptake and soil conservation practices in agricultural areas. At 20 km, the relationship with the built-up area is complex, showing an initial decrease in TP, followed by a gradual weakening of this inhibitory effect (Figure 8g). This may indicate a balance between the efficiency of centralized sewage collection (reducing point sources) and the generation of non-point source runoff from expanded impervious surfaces. TP shows a “small-scale continuous decrease, large-scale decrease-then-increase” trend (Figure 8d,f), consistent with phosphorus’s strong affinity for sediments and its potential for long-distance transport under high-flow conditions. The observed pattern of an initial strong inhibitory effect followed by a gradual weakening in response to the built-up area proportion suggests a balance between sewage collection (inhibitory) and non-point source pollution (promotive).
CODMn and turbidity share some common driver response patterns but are distinguished by factors related to their intrinsic composition (Figure 9 and Figure 10). At 50 m, both exhibit a “rise-then-fall” trend in response to the NDVI at 50 m (Figure 9a and Figure 10a), indicating a potential optimal level of vegetation cover for intercepting organic matter and sediments. The proportion of water bodies at 50 m has a “rise-then-fall” effect on both, reflecting a balance between local purification and pollution/sediment input (Figure 9c and Figure 10c). The large-scale water area has a “decrease-then-increase” effect, due to a shift from local purification to watershed-scale pollution/sediment transport (Figure 9f and Figure 10f). Population density-related factors show fluctuating patterns, indicating nonlinear balances between emissions and governance (Figure 9d and Figure 10e).
Notable differences emerged in their sensitivity to specific drivers. CODMn is more sensitive to an increasing tree proportion (50 m), likely due to humus input elevating organic pollution (Figure 9b). Turbidity decreases with built-up area proportion, transitioning from positive to negative, as impervious surfaces and drainage networks reduce sediment mobilization (Figure 10b). CODMn may increase with wind speed, possibly due to the resuspension of organic particles (Figure 9h). While both share mechanisms involving vegetation functions, water body-scale effects, and human activity balances, their distinct drivers arise from differences in organic pollution (humus and particle resuspension) versus sediment characteristics (soil conservation and resuspension).

4. Discussion

The results collectively demonstrate the efficacy of an integrated remote sensing and machine learning framework for retrieving key WQPs and deciphering their complex drivers in a dynamic fluvial system. The generated spatiotemporal maps successfully reveal pronounced heterogeneity and trends, pinpointing critical pollution hotspots. The influence of geospatial factors on each WQP exhibits significant scale dependence and parameter specificity. These findings lay the foundation for a deeper mechanistic interpretation of model performance, underlying processes, and broader implications for watershed management, elaborated in the following sections.

4.1. Superiority of CatBoost and the Value of Multi-Task Learning in Fluvial Remote Sensing

This study advances river water quality monitoring and driver analysis. In a comparison of eight machine learning models, CatBoost was identified as the optimal model for WQP retrieval, providing a reliable benchmark for future river remote sensing applications. Its superior performance was attributed to its algorithmic architecture. Its use of oblivious trees regularizes model complexity; its ordered boosting mitigates target leakage; and its inherent robustness to noisy, correlated features is particularly advantageous for capturing complex spectral–water quality relationships in optically complex inland waters [45,49]. The integration of multi-scale geographic factor analysis (50 m to 20 km) successfully revealed scale-dependent mechanisms of natural and human influences, overcoming the limitations of single-scale analyses in prior studies.
Furthermore, high-resolution spatiotemporal mapping accurately captures the spatial gradients and seasonal dynamics of WQPs, offering a precise quantitative basis for targeted management. These technological innovations not only enhance the accuracy and efficiency of water quality assessment in the Minjiang River but also establish replicable frameworks for dynamic monitoring and mechanism exploration in complex watersheds.

4.2. Mechanistic Interpretation of Scale-Dependent Driver Effects

The drivers of WQPs in the Minjiang River exhibit significant scale dependence and parameter specificity [50], rooted in interactions between natural ecological processes and human activities [51,52]. For TN and TP, vegetation at 50 m (the NDVI) produces nonlinear “promotion-then-inhibition” effects via nutrient uptake, interception, and litter decomposition, while wind speed uniformly promotes both through sediment resuspension and nutrient release.
However, their land use responses differ: TP is more sensitive to crop proportion (1000 m) and built-up area proportion (20 km), reflecting agricultural phosphorus control, crop uptake, and a balance between urban sewage collection and non-point source pollution [53]. In contrast, TN responds more strongly to forest proportion (20 km) and water connectivity (50 m to 20 km), driven by nitrogen’s gaseous/dissolved migration and dependence on denitrification [54,55]. CODMn is primarily influenced by forest-derived humus input and wind-induced resuspension of organic particles. Turbidity is closely tied to sediment transport and land use (e.g., reduced sediment in built-up areas due to surface hardening and vegetation affecting runoff).
These differences arise from three core mechanisms: (1) the distinct chemical behaviors of WQPs (nitrogen cycling in gaseous/dissolved forms vs. phosphorus in particulate forms), (2) the scale dependence of watershed processes (local purification vs. large-scale connectivity-driven transport), and (3) the specificity of human activities (fertilizer types and sewage treatment efficiency). These findings align with those of previous studies emphasizing the regulatory role of natural land cover and the detrimental impacts of agricultural/urban land use while further revealing differential effects across spatial scales.

4.3. Implications for Watershed Zoning and Differentiated Management Strategies

Based on the multi-scale driving mechanisms of WQPs, we propose tailored water quality management strategies for the Minjiang River. First, prioritize the establishment and protection of riparian ecological buffer zones. Restore native vegetation along riverbanks to leverage the strong purification effect of NDVI at 50 m in order to enhance the interception of non-point source pollution. Second, implement zoned and classified management: strengthen agricultural phosphorus fertilizer control and optimize urban sewage treatment in downstream areas with intensive agriculture and urbanization to reduce TP enrichment.
In upstream forested areas, maintain natural purification capacity while adjusting forest composition (e.g., avoiding large-scale monocultures) to minimize nitrogen release from litter decomposition at 20 km. Additionally, adopt nitrogen–phosphorus synergistic control: for TN, focus on optimizing watershed hydrological connectivity (e.g., by protecting small wetlands) to promote denitrification; for TP, given its particulate nature, manage sediment transport (e.g., by reducing soil erosion in agricultural zones).
Finally, establish a dynamic monitoring system based on remote sensing inversion results, using high-resolution spatiotemporal maps to identify pollution sources and track management effectiveness in real time.

4.4. Limitations and Future Research Directions

This study has several limitations. Despite applying a dual cloud-masking procedure, persistent cloud cover during the rainy season led to gaps in high-quality satellite imagery, affecting the continuity of time-series inversion. Cloud cover is associated with precipitation events, and the preferential omission of rainy-day observations may lead to an underestimation of the influence of precipitation-driven factors such as surface runoff and non-point source pollution on water quality dynamics. The effect sizes of such factors in our GAMs should therefore be interpreted as conservative lower bounds.
Additionally, the absence of routine monitoring records on the Shaxi and Jianxi tributaries introduced spatial bias in model training. This limitation highlights the complementary value of satellite-based retrieval for assessing water quality in ungauged reaches and underscores the need to expand in situ networks in these data-sparse tributaries. Turbidity exhibits an exceptionally wide range (1.85–351 NTU) with several extremely high values. CODMn represents a heterogeneous mixture of organic compounds with diverse and often weak spectral signatures, making a consistent spectral response inherently difficult to establish. This may be the reason for the poor performance of their models. Model input features primarily comprised standard spectral bands and indices; incorporating more targeted spectral indicators for nitrogen and phosphorus could further improve accuracy [56].
Multiple drivers were considered, but their interactions were not examined. Interactions between land use and meteorology remain unexplored. Sewage treatment rates and agricultural fertilizer application were not explicitly quantified in our GAMs due to the lack of datasets covering the entire basin. These factors substantially influence riverine nitrogen and phosphorus levels [57]. Additionally, while CatBoost performed robustly in this study, its performance may be basin-specific, and its transferability to other regions remains unverified.
The retrieval accuracy in narrow, forest-shaded headwater streams remains subject to additional uncertainty. Future research should integrate multi-source data (e.g., drone-based spectral data and high-resolution satellite imagery) and deep learning models [31,58,59] to improve inversion accuracy for complex water bodies. Factor interactions could be quantified using structural equation modeling and coupled with hydrological models to elucidate the “pollution source–migration–water quality response” chain [60]. Expanding the scope to longer time series and broader watersheds will allow model transferability to be tested, ultimately providing more comprehensive scientific support for global river water quality management.

5. Conclusions

This study developed an integrated Sentinel-2 retrieval and multi-scale GAM framework for assessing river water quality and its drivers in the Minjiang River. CatBoost achieved the best inversion performance (R2: TN 0.74; TP: 0.73; CODMn: 0.53), and PLE achieved a turbidity value of 0.62. SHAP analysis revealed that red-edge bands identified the NDVI, the NDCI, the NDTI, and B5 as the most influential features. The retrieved maps revealed spatial heterogeneity, with elevated levels consistently concentrated in estuarine and tributary segments. Temporal trend analysis identified both increasing and decreasing trajectories, indicating divergent pressures and localized management effectiveness.
The GAMs further revealed that water quality drivers were scale-dependent and parameter-specific. TN and TP both showed nonlinear responses to the NDVI at the 50 m riparian scale, but likely via different mechanisms. TP was more directly influenced by agricultural (1 km) and urban (20 km) land use, while TN responded more to forest cover and hydrological connectivity. CODMn and turbidity shared vegetation and water body controls but diverged in sensitivity to humus input and impervious surfaces.
These findings support scale-explicit, parameter-targeted watershed management strategies. Additionally, these insights support a shift from uniform to precise, zoned watershed management. Looking ahead, integrating multi-source data will overcome cloud constraints and enrich feature representation. Exploring driver interactions via structural equation modeling and rigorously testing the transferability of CatBoost across diverse basins will be crucial steps toward developing a universal, predictive framework for global river water quality assessment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs18050840/s1, Figure S1. GAM fitted vs. observed values of TN; Figure S2. Diagnostic plots for the GAM of TN; Figure S3. Smooth effect plots of environmental and spatial variables on TN from GAM; Figure S4. GAM fitted vs. observed values of TP; Figure S5. Diagnostic plots for the GAM of TP; Figure S6. Smooth effect plots of environmental and spatial variables on TP from GAM; Figure S7. GAM fitted vs. observed values of CODMn; Figure S8. Diagnostic plots for the GAM of CODMn; Figure S9. Smooth effect plots of environmental and spatial variables on CODMn from GAM; Figure S10. GAM fitted vs. observed values of turbidity; Figure S11. Diagnostic plots for the GAM of turbidity; Figure S12. Smooth effect plots of environmental and spatial variables on turbidity from GAM; Table S1. The performance assessment of retrieval modeling.

Author Contributions

J.D.: conceptualization, formal analysis, investigation, methodology, visualization, and writing—original draft. X.X.: resources, data curation, and writing—original draft. D.L.: resources. G.Z.: investigation. H.L. (Hanyi Li): investigation. Y.L. (Yiming Lei): visualization. J.L.: writing—review and editing. H.L. (Haoliang Lu): writing—review and editing. Y.L. (Yi Li): writing—review and editing. H.H.: conceptualization, investigation, resources, project administration, funding acquisition, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key R&D Program of China (2022YFF0803100), Ocean Negative Carbon Emissions (ONCE) program, the National Natural Science Foundation of China (No. U25A20801), and the Environmental Protection Technology Plan Project of Fujian Province (2022R004).

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cheng, C.; Zhang, F.; Shi, J.; Kung, H.-T. What is the relationship between land use and surface water quality? A review and prospects from remote sensing perspective. Environ. Sci. Pollut. Res. 2022, 29, 56887–56907. [Google Scholar] [CrossRef]
  2. Wu, J.; Lu, J. Spatial scale effects of landscape metrics on stream water quality and their seasonal changes. Water Res. 2021, 191, 116811. [Google Scholar] [CrossRef]
  3. Huisman, J.; Codd, G.A.; Paerl, H.W.; Ibelings, B.W.; Verspagen, J.M.H.; Visser, P.M. Cyanobacterial blooms. Nat. Rev. Microbiol. 2018, 16, 471–483. [Google Scholar] [CrossRef]
  4. Oliveira Santos, V.; Guimarães, B.M.; Neto, I.E.; de Souza Filho, F.D.; Costa Rocha, P.A.; Thé, J.V.; Gharabaghi, B. Chlorophyll-a Estimation in 149 Tropical Semi-Arid Reservoirs Using Remote Sensing Data and Six Machine Learning Methods. Remote Sens. 2024, 16, 1870. [Google Scholar] [CrossRef]
  5. Harvey, E.T.; Kratzer, S.; Philipson, P. Satellite-based water quality monitoring for improved spatial and temporal retrieval of chlorophyll-a in coastal waters. Remote Sens. Environ. 2015, 158, 417–430. [Google Scholar] [CrossRef]
  6. Sahoo, D.P.; Sahoo, B.; Tiwari, M.K. MODIS-Landsat fusion-based single-band algorithms for TSS and turbidity estimation in an urban-waste-dominated river reach. Water Res. 2022, 224, 119082. [Google Scholar] [CrossRef] [PubMed]
  7. Terry, J.A.; Sadeghian, A.; Lindenschmidt, K.-E. Modelling Dissolved Oxygen/Sediment Oxygen Demand under Ice in a Shallow Eutrophic Prairie Reservoir. Water 2017, 9, 131. [Google Scholar] [CrossRef]
  8. Cai, X.; Li, Y.; Lei, S.; Zeng, S.; Zhao, Z.; Lyu, H.; Dong, X.; Li, J.; Wang, H.; Xu, J.; et al. A hybrid remote sensing approach for estimating chemical oxygen demand concentration in optically complex waters: A case study in inland lake waters in eastern China. Sci. Total Environ. 2023, 856, 158869. [Google Scholar] [CrossRef]
  9. Du, C.; Wang, Q.; Li, Y.; Lyu, H.; Zhu, L.; Zheng, Z.; Wen, S.; Liu, G.; Guo, Y. Estimation of total phosphorus concentration using a water classification method in inland water. Int. J. Appl. Earth Obs. Geoinf. 2018, 71, 29–42. [Google Scholar] [CrossRef]
  10. Sowrav, S.F.F.; Debsarma, S.K.; Das, M.K.; Ibtehal, K.M.; Rahman, M.; Hridita, N.T.; Broty, A.A.; Hoque, M.S.A. Developing a semi-automated technique of surface water quality analysis using GEE and machine learning: A case study for Sundarbans. Heliyon 2025, 11, e42404. [Google Scholar] [CrossRef]
  11. Feng, L.; Hou, X.; Zheng, Y. Monitoring and understanding the water transparency changes of fifty large lakes on the Yangtze Plain based on long-term MODIS observations. Remote Sens. Environ. 2019, 221, 675–686. [Google Scholar] [CrossRef]
  12. Zhang, J.; Li, S.; Dong, R.; Jiang, C.; Ni, M. Influences of land use metrics at multi-spatial scales on seasonal water quality: A case study of river systems in the Three Gorges Reservoir Area, China. J. Clean. Prod. 2019, 206, 76–85. [Google Scholar] [CrossRef]
  13. Zhang, S.; Yan, X.; Feng, T.; Zhang, X.; Qiao, R.; Ren, Y.; Chen, Q. Unraveling nonlinear impacts of land use change on riverine water quality under future scenarios. Ecol. Indic. 2025, 179, 114258. [Google Scholar] [CrossRef]
  14. Shi, P.; Zhang, Y.; Li, Z.; Li, P.; Xu, G. Influence of land use and land cover patterns on seasonal water quality at multi-spatial scales. Catena 2017, 151, 182–190. [Google Scholar] [CrossRef]
  15. Xiao, M.; Yi, Y.; Zhang, W.; Yue, F. Spatial scale effects of the relationships between land use and water quality: Example from the urban rivers, Northern China. Earth Crit. Zone 2025, 2, 100049. [Google Scholar] [CrossRef]
  16. Mello, K.D.; Randhir, T.O.; Valente, R.A.; Vettorazzi, C.A. Riparian restoration for protecting water quality in tropical agricultural watersheds. Ecol. Eng. 2017, 108, 514–524. [Google Scholar] [CrossRef]
  17. Aparicio-Ibáñez, J.; Pimentel, R.; Bonet-García, F.J.; Polo, M.J. Using NDVI-derived vegetation vigour as a proxy for soil water content in Mediterranean-mountain traditional water management systems: Seasonal variability and restoration impacts. Ecol. Indic. 2025, 174, 113468. [Google Scholar] [CrossRef]
  18. Yang, H.; Kong, J.; Hu, H.; Du, Y.; Gao, M.; Chen, F. A Review of Remote Sensing for Water Quality Retrieval: Progress and Challenges. Remote Sens. 2022, 14, 1770. [Google Scholar] [CrossRef]
  19. Salah, M.; Salem, S.I.; Utsumi, N.; Higa, H.; Ishizaka, J.; Oki, K. 3LATNet: Attention based deep learning model for global Chlorophyll-a retrieval from GCOM-C satellite. ISPRS J. Photogramm. Remote Sens. 2025, 220, 490–508. [Google Scholar] [CrossRef]
  20. Shang, Y.; Song, K.; Lai, F.; Lyu, L.; Liu, G.; Fang, C.; Hou, J.; Qiang, S.; Yu, X.; Wen, Z. Remote sensing of fluorescent humification levels and its potential environmental linkages in lakes across China. Water Res. 2023, 230, 119540. [Google Scholar] [CrossRef]
  21. Pahlevan, N.; Smith, B.; Schalles, J.; Binding, C.; Cao, Z.; Ma, R.; Alikas, K.; Kangro, K.; Gurlin, D.; Hà, N.; et al. Seamless retrievals of chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote Sens. Environ. 2020, 240, 111604. [Google Scholar] [CrossRef]
  22. Dong, L.; Gong, C.; Huai, H.; Wu, E.; Lu, Z.; Hu, Y.; Li, L.; Yang, Z. Retrieval of Water Quality Parameters in Dianshan Lake Based on Sentinel-2 MSI Imagery and Machine Learning: Algorithm Evaluation and Spatiotemporal Change Research. Remote Sens. 2023, 15, 5001. [Google Scholar] [CrossRef]
  23. Smith, M.E.; Robertson Lain, L.; Bernard, S. An optimized Chlorophyll a switching algorithm for MERIS and OLCI in phytoplankton-dominated waters. Remote Sens. Environ. 2018, 215, 217–227. [Google Scholar] [CrossRef]
  24. Aurin, D.; Mannino, A.; Franz, B. Spatially resolving ocean color and sediment dispersion in river plumes, coastal systems, and continental shelf waters. Remote Sens. Environ. 2013, 137, 212–225. [Google Scholar] [CrossRef]
  25. He, H.; Li, X.; Wang, D.; Qiao, W.; Sun, Y.; Han, Y.; Zhang, F.; Zhao, X. A novel quad-modality deep neural network for estimating chlorophyll-a concentrations in Lianyungang’s lakes and reservoirs using Sentinel-2 MSI data. Water Res. 2025, 286, 124246. [Google Scholar] [CrossRef]
  26. Shi, X.; Yu, H.; Zhao, S.; Sun, B.; Liu, Y.; Huo, J.; Wang, S.; Wang, J.; Wu, Y.; Wang, Y.; et al. Impacts of environmental factors on Chlorophyll-a in lakes in cold and arid regions: A 10-year study of Wuliangsuhai Lake, China. Ecol. Indic. 2023, 148, 110133. [Google Scholar] [CrossRef]
  27. Kumar, A.; Equeenuddin, S.M.; Mishra, D.R.; Acharya, B.C. Remote monitoring of sediment dynamics in a coastal lagoon: Long-term spatio-temporal variability of suspended sediment in Chilika. Estuar. Coast. Shelf Sci. 2016, 170, 155–172. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Wu, L. Quantify water quality variation of urban river from hyperspectral images through ripple propagation network with spatially inconsecutive sampling. J. Hydrol. Reg. Stud. 2025, 62, 102765. [Google Scholar] [CrossRef]
  29. Qiao, Z.; Sun, S.; Jiang, Q.O.; Xiao, L.; Wang, Y.; Yan, H. Retrieval of Total Phosphorus Concentration in the Surface Water of Miyun Reservoir Based on Remote Sensing Data and Machine Learning Algorithms. Remote Sens. 2021, 13, 4662. [Google Scholar] [CrossRef]
  30. Guo, H.; Huang, J.J.; Chen, B.; Guo, X.; Singh, V.P. A machine learning-based strategy for estimating non-optically active water quality parameters using Sentinel-2 imagery. Int. J. Remote Sens. 2021, 42, 1841–1866. [Google Scholar] [CrossRef]
  31. Shen, L.Q.; Amatulli, G.; Sethi, T.; Raymond, P.; Domisch, S. Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework. Sci. Data 2020, 7, 161. [Google Scholar] [CrossRef]
  32. Kirk, J.T.O. Light and Photosynthesis in Aquatic Ecosystems, 3rd ed.; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  33. Mobley, C. Light and Water: Radiative Transfer in Natural Waters; Academic Press: Cambridge, MA, USA, 1994. [Google Scholar]
  34. Zhang, Y.; He, X.; Lian, G.; Bai, Y.; Yang, Y.; Gong, F.; Wang, D.; Zhang, Z.; Li, T.; Jin, X. Monitoring and spatial traceability of river water quality using Sentinel-2 satellite images. Sci. Total Environ. 2023, 894, 164862. [Google Scholar] [CrossRef]
  35. Zhao, Y.; He, X.; Pan, S.; Bai, Y.; Wang, D.; Li, T.; Gong, F.; Zhang, X. Satellite retrievals of water quality for diverse inland waters from Sentinel-2 images: An example from Zhejiang Province, China. Int. J. Appl. Earth Obs. Geoinf. 2024, 132, 104048. [Google Scholar] [CrossRef]
  36. Sherjah, P.Y.; Sajikumar, N.; Nowshaja, P.T. Quality monitoring of inland water bodies using Google Earth Engine. J. Hydroinform. 2023, 25, 432–450. [Google Scholar] [CrossRef]
  37. Aschbacher, J.; Milagro-Pérez, M.P. The European Earth monitoring (GMES) programme: Status and perspectives. Remote Sens. Environ. 2012, 120, 3–8. [Google Scholar] [CrossRef]
  38. Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
  39. Zhao, Y.; Chen, M.; He, J.; Ma, Y. Monitoring water quality parameters using multi-source data-driven machine learning models. Eng. Appl. Comput. Fluid Mech. 2025, 19, 2509658. [Google Scholar] [CrossRef]
  40. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  41. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  42. Awad, M.; Khanna, R. Support Vector Regression. In Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Awad, M., Khanna, R., Eds.; Apress: Berkeley, CA, USA, 2015; pp. 67–80. [Google Scholar]
  43. Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Wolff, E. Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geosci. Remote Sens. Lett. 2018, 15, 607–611. [Google Scholar] [CrossRef]
  44. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
  45. Ostroumova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
  46. Tang, H.; Liu, J.; Zhao, M.; Gong, X. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, Virtual Event, Brazil, 22–26 September 2020; pp. 269–278. [Google Scholar]
  47. Román-Herrera, J.C.; Rodríguez-Peces, M.J.; Garzón-Roca, J. Comparison between Machine Learning and Physical Models Applied to the Evaluation of Co-Seismic Landslide Hazard. Appl. Sci. 2023, 13, 8285. [Google Scholar] [CrossRef]
  48. Wood, S.N. Generalized Additive Models. Annu. Rev. Stat. Its Appl. 2025, 12, 497–526. [Google Scholar] [CrossRef]
  49. Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]
  50. Ji, J.; Xu, M.; Wang, S.; Cao, C.; Zhang, X.; Tian, F.; Zheng, J.; Sang, Y. Analysis of spatial pattern of vegetation resilience and influencing factors in Hubei Province based on long time series remote sensing data. Environ. Sustain. Indic. 2025, 27, 100742. [Google Scholar] [CrossRef]
  51. Ai, L.; Shi, Z.H.; Yin, W.; Huang, X. Spatial and seasonal patterns in stream water contamination across mountainous watersheds: Linkage with landscape characteristics. J. Hydrol. 2015, 523, 398–408. [Google Scholar] [CrossRef]
  52. Liu, J.; Shen, Z.; Chen, L. Assessing how spatial variations of land use pattern affect water quality across a typical urbanized watershed in Beijing, China. Landsc. Urban Plan. 2018, 176, 51–63. [Google Scholar] [CrossRef]
  53. Sun, W.; Song, X.; Mu, X.; Gao, P.; Wang, F.; Zhao, G. Spatiotemporal vegetation cover variations associated with climate change and ecological restoration in the Loess Plateau. Agric. For. Meteorol. 2015, 209–210, 87–99. [Google Scholar] [CrossRef]
  54. Chen, J.; Xiao, H.; Li, Z.; Liu, C.; Wang, D.; Wang, L.; Tang, C. Threshold effects of vegetation coverage on soil erosion control in small watersheds of the red soil hilly region in China. Ecol. Eng. 2019, 132, 109–114. [Google Scholar] [CrossRef]
  55. Liu, J.; Gao, G.; Wang, S.; Jiao, L.; Wu, X.; Fu, B. The effects of vegetation on runoff and soil loss: Multidimensional structure analysis and scale characteristics. J. Geogr. Sci. 2018, 28, 59–78. [Google Scholar] [CrossRef]
  56. Zhang, Y.; Wu, L. Surveillance of urban river environment by quantifying distributions of water quality parameters using hyperspectral remote sensing-based ripple propagation graph network. Environ. Pollut. 2025, 384, 126875. [Google Scholar] [CrossRef] [PubMed]
  57. Chen, X.; Strokal, M.; van Vliet, M.; Liu, L.; Bai, Z.; Ma, L.; Kroeze, C. Keeping Nitrogen Use in China within the Planetary Boundary Using a Spatially Explicit Approach. Environ. Sci. Technol. 2024, 58, 9689–9700. [Google Scholar] [CrossRef] [PubMed]
  58. Hong, S.; Morgan, B.J.; Stocker, M.D.; Smith, J.; Pachepsky, Y.A. Spatial patterns of water quality and remote sensing indices from UAV-based multispectral imagery across an irrigation pond. Heliyon 2025, 11, e42622. [Google Scholar] [CrossRef] [PubMed]
  59. Ukwuoma, C.C.; Cai, D.; Bamisile, O.; Ukwuoma, C.D.; Otuka, C.I.; Anyanwu, N.O.; Ukwuoma, C.O.; Huang, Q. Optimised multi-hierarchical feature fusion with multi-kernel CNN and spectral-spatial convolutions for remote sensing image classification. Remote Sens. Appl. Soc. Environ. 2025, 40, 101727. [Google Scholar] [CrossRef]
  60. Chen, Y.; Yao, K.; Zhu, B.; Gao, Z.; Xu, J.; Li, Y.; Hu, Y.; Lin, F.; Zhang, X. Water Quality Inversion of a Typical Rural Small River in Southeastern China Based on UAV Multispectral Imagery: A Comparison of Multiple Machine Learning Algorithms. Water 2024, 16, 553. [Google Scholar] [CrossRef]
Figure 1. Location and situation of the study area and sampling sites. (a) Location of the study area. (b) Land use/land cover of the study area in 2024. (c) Population density of the study area in 2020.
Figure 1. Location and situation of the study area and sampling sites. (a) Location of the study area. (b) Land use/land cover of the study area in 2024. (c) Population density of the study area in 2020.
Remotesensing 18 00840 g001
Figure 2. Research roadmap for WQP inversion model of the Minjiang River.
Figure 2. Research roadmap for WQP inversion model of the Minjiang River.
Remotesensing 18 00840 g002
Figure 3. Research roadmap for analyzing the driving factors of WQPs in the Minjiang River.
Figure 3. Research roadmap for analyzing the driving factors of WQPs in the Minjiang River.
Remotesensing 18 00840 g003
Figure 4. Optimal model performance and evaluation for TN (a), TP (b), CODMn (c), and turbidity (d).
Figure 4. Optimal model performance and evaluation for TN (a), TP (b), CODMn (c), and turbidity (d).
Remotesensing 18 00840 g004
Figure 5. SHAP feature importance summary for TN (a), TP (b), CODMn (c), and turbidity (d) derived from the optimal model.
Figure 5. SHAP feature importance summary for TN (a), TP (b), CODMn (c), and turbidity (d) derived from the optimal model.
Remotesensing 18 00840 g005
Figure 6. Spatial distribution map of and change in WQPs in the Minjiang River in 2019–2024. Average values of (a) TN, (c) TP, (e) CODMn, and (g) turbidity from 2019 to 2024. Slopes of (b) TN, (d) TP, (f) CODMn, and (h) turbidity from 2019 to 2024. Monthly heatmaps of (i) TN, (j) TP, (k) turbidity, and (l) CODMn.
Figure 6. Spatial distribution map of and change in WQPs in the Minjiang River in 2019–2024. Average values of (a) TN, (c) TP, (e) CODMn, and (g) turbidity from 2019 to 2024. Slopes of (b) TN, (d) TP, (f) CODMn, and (h) turbidity from 2019 to 2024. Monthly heatmaps of (i) TN, (j) TP, (k) turbidity, and (l) CODMn.
Remotesensing 18 00840 g006
Figure 7. Marginal effect analysis and spatiotemporal patterns of TN in the Minjiang River. (a) NDVI at 50 m, (b) proportion of water at 1000 m, (c) NDVI at 1000 m, (d) NDVI at 20,000 m, (e) proportion of water at 20,000 m, (f) proportion of trees at 20,000 m, (g) population density at 50 m, and (h) wind speed at 0.25°. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Figure 7. Marginal effect analysis and spatiotemporal patterns of TN in the Minjiang River. (a) NDVI at 50 m, (b) proportion of water at 1000 m, (c) NDVI at 1000 m, (d) NDVI at 20,000 m, (e) proportion of water at 20,000 m, (f) proportion of trees at 20,000 m, (g) population density at 50 m, and (h) wind speed at 0.25°. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Remotesensing 18 00840 g007
Figure 8. Marginal effect analysis and spatiotemporal patterns of TP in the Minjiang River. (a) NDVI at 50 m, (b) proportion of crops at 1000 m, (c) proportion of trees at 50 m, (d) proportion of water at 50 m, (e) proportion of crops at 1000 m, (f) proportion of water at 20,000 m, (g) proportion of built-up area at 20,000 m, and (h) wind speed at 0.25°. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Figure 8. Marginal effect analysis and spatiotemporal patterns of TP in the Minjiang River. (a) NDVI at 50 m, (b) proportion of crops at 1000 m, (c) proportion of trees at 50 m, (d) proportion of water at 50 m, (e) proportion of crops at 1000 m, (f) proportion of water at 20,000 m, (g) proportion of built-up area at 20,000 m, and (h) wind speed at 0.25°. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Remotesensing 18 00840 g008
Figure 9. Marginal effect analysis and spatiotemporal patterns of CODMn in the Minjiang River. (a) NDVI at 50 m, (b) proportion of trees at 50 m, (c) proportion of water at 50 m, (d) population density at 1000 m, (e) NDVI at 20,000 m, (f) proportion of water at 20,000 m, (g) wind speed at 0.25°, and (h) temperature at 0.25°. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Figure 9. Marginal effect analysis and spatiotemporal patterns of CODMn in the Minjiang River. (a) NDVI at 50 m, (b) proportion of trees at 50 m, (c) proportion of water at 50 m, (d) population density at 1000 m, (e) NDVI at 20,000 m, (f) proportion of water at 20,000 m, (g) wind speed at 0.25°, and (h) temperature at 0.25°. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Remotesensing 18 00840 g009
Figure 10. Marginal effect analysis and spatiotemporal patterns of turbidity in the Minjiang River. (a) NDVI at 50 m, (b) proportion of built-up area at 50 m, (c) proportion of water at 50 m, (d) proportion of trees at 50 m, and (e) population density at 1000 m, (f) proportion of water at 20 km. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Figure 10. Marginal effect analysis and spatiotemporal patterns of turbidity in the Minjiang River. (a) NDVI at 50 m, (b) proportion of built-up area at 50 m, (c) proportion of water at 50 m, (d) proportion of trees at 50 m, and (e) population density at 1000 m, (f) proportion of water at 20 km. The red dashed line represents the zero marginal effect line, while the solid blue line illustrates the changes in the marginal effects of the influencing factors. Note that only variables with significant changes in marginal effects are selected and displayed based on the criterion.
Remotesensing 18 00840 g010
Table 1. Statistical information of the four WQPs.
Table 1. Statistical information of the four WQPs.
Water Quality ParametersMaximumMinimumMeanSample Size
TP, mg L−15.900.591.90812
TN, mg L−14.950.581.90810
CODMn, mg L−14.680.752.16887
Turbidity, NTU3511.8550.0903
Table 2. Inventory of geospatial variables and their data sources for analysis.
Table 2. Inventory of geospatial variables and their data sources for analysis.
CategoryFactorGEE ImageCollection ID ResolutionTime IntervalTime Range
Human ActivitiesPopulation DensityWorldPop/GP/100m/pop (https://developers.google.com/earth-engine/datasets/catalog/WorldPop_GP_100m_pop?hl=th, accessed on 15 June 2025)100 m/2020
LULCGOOGLE/DYNAMICWORLD/V1 (https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_DYNAMICWORLD_V1, accessed on 15 June 2025)10 mMonth2019–2024
Vegetation NDVICOPERNICUS/S2_SR_HARMONIZED (https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED?hl=zh-cn, accessed on 15 June 2025) 10 mMonth2019–2024
Climatic FactorsPrecipitationECMWF/ERA5/HOURLY (https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_LAND_HOURLY, accessed on 15 June 2025)0.25°Month2019–2024
TemperatureECMWF/ERA5/HOURLY (https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_LAND_HOURLY?hl=th, accessed on 15 June 2025)0.25°Month2019–2024
Table 3. Calculation formula and significance of remote sensing index.
Table 3. Calculation formula and significance of remote sensing index.
Remote Sensing IndexCalculation Formula (Sentinel 2)Advantage
Normalized Difference Moisture Index (NDMI)NDMI = (B8 − B11)/(B8 + B11)Displays moisture
Normalized Difference Turbidity Index (NDTI)NDTI = (B4 − B3)/(B4 + B3)Assesses water turbidity
Normalized Difference Wetland Index (NDWI)NDWI = (B3 − B11)/(B3 + B11)Identifies water body
Shortwave Infrared Vegetation Index (SWIRVI)SWIRVI = (B8 − B11)/(B8 + B11)Detects vegetation water stress
Normalized Difference Vegetation Index (NDVI)NDVI = (B8 − B4)/(B8 + B4)Quantifies green vegetation
Normalized Difference Water Index (NDWI)NDWI = (B3 − B8)/(B3 + B8)Monitors changes to water content
Normalized Difference Chlorophyll Index (NDCI)NDCI = (B5 − B4)/(B5 + B4)Retrieves chlorophyll concentration
Modified Normalized Difference Water Index (MNDWI)MNDWI = (B3 − B11)/(B3 + B11)Enhances water extraction accuracy
Table 4. Performance assessment of optimal retrieval modeling.
Table 4. Performance assessment of optimal retrieval modeling.
Model
Performance
WQPs
TNTPTurbidityCODMn
CatBoostR20.7400.7310.4960.534
RMSE0.5810.5910.8210.606
MAPE (%)21.3621.7523.1220.81
NRMSE0.0950.09410.16700.110
PLER20.6730.2940.6200.378
RMSE0.6400.0460.8460.791
MAPE (%)24.7234.8821.8033.4
NRMSE0.1010.01350.0990.187
Table 5. Comparative performance metrics of the GAM across WQPs.
Table 5. Comparative performance metrics of the GAM across WQPs.
Quantitative
Indicators
Water Quality Parameters
TNTPCODMnTurbidity
Adjusted R-Squared (R-sq.(adj))0.2410.2930.6580.678
Deviance Explained (%)25.330.466.368.2
Akaike Information Criterion0.2530.3040.6630.682
Bayesian Information Criterion0.2410.2930.6580.678
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Du, J.; Xiao, X.; Lin, D.; Zhang, G.; Li, H.; Lei, Y.; Liu, J.; Lu, H.; Li, Y.; Hong, H. Deciphering Multi-Scale Anthropogenic Drivers of River Water Quality: A Synergistic ML-GAM Cascade Framework with Sentinel-2. Remote Sens. 2026, 18, 840. https://doi.org/10.3390/rs18050840

AMA Style

Du J, Xiao X, Lin D, Zhang G, Li H, Lei Y, Liu J, Lu H, Li Y, Hong H. Deciphering Multi-Scale Anthropogenic Drivers of River Water Quality: A Synergistic ML-GAM Cascade Framework with Sentinel-2. Remote Sensing. 2026; 18(5):840. https://doi.org/10.3390/rs18050840

Chicago/Turabian Style

Du, Jinfang, Xilin Xiao, Da Lin, Guanglong Zhang, Hanyi Li, Yiming Lei, Jingchun Liu, Haoliang Lu, Yi Li, and Hualong Hong. 2026. "Deciphering Multi-Scale Anthropogenic Drivers of River Water Quality: A Synergistic ML-GAM Cascade Framework with Sentinel-2" Remote Sensing 18, no. 5: 840. https://doi.org/10.3390/rs18050840

APA Style

Du, J., Xiao, X., Lin, D., Zhang, G., Li, H., Lei, Y., Liu, J., Lu, H., Li, Y., & Hong, H. (2026). Deciphering Multi-Scale Anthropogenic Drivers of River Water Quality: A Synergistic ML-GAM Cascade Framework with Sentinel-2. Remote Sensing, 18(5), 840. https://doi.org/10.3390/rs18050840

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop