1. Introduction
Inland lakes are vital to both human societies and ecological systems, supplying essential resources such as drinking water, irrigation, and fisheries. They also play a critical role in ecosystem balance, regional climate regulation, groundwater recharge, and flood risk mitigation [
1,
2]. Chlorophyll-a (Chl-a) concentration serves as a key indicator of phytoplankton biomass and eutrophication, offering a direct measure of primary productivity and water quality. Consequently, accurate monitoring of Chl-a holds considerable significance for assessing ecological health and conducting comprehensive water quality surveillance.
Traditional assessments of Chl-a primarily rely on field sampling, which is costly, time-consuming, potentially hazardous, and limited in its capacity to achieve broad spatial coverage and continuous monitoring. In contrast, satellite remote sensing can provide large-scale, noninvasive water quality data with high spatial and temporal resolutions. With ongoing advancements in sensor performance and accessibility, the utilization of remote sensing data for quantitative Chl-a retrieval has gained increasing prevalence [
3]. Nevertheless, the complexity of optical properties in inland waters, coupled with pronounced regional and seasonal variations, frequently results in a highly non-linear relationship between Chl-a concentrations and spectral signals.
Machine learning (ML) algorithms, such as random forest (RF) [
4,
5], extreme gradient boosting (XGBoost) [
6,
7], and support vector machine (SVM) [
8,
9], have been extensively applied in water quality retrieval, particularly for addressing non-linear relationships. The performance of these ML models largely depends on the quality and suitability of input features, rendering feature engineering a critical step. Various band combinations and band-based indices have been proposed to enhance the extraction of Chl-a signal patterns from spectral data [
10]. Examples include the normalized difference chlorophyll index (NDCI) [
11], maximum chlorophyll index (MCI) [
12], enhanced three-band index [
13], GrB2 index [
14], and near-infrared to red ratio [
15,
16]. These approaches can achieve satisfactory performance in specific study areas or with certain sensors [
17]. However, adapting or reconstructing the feature engineering process is often necessary when the sensor type or research region changes, which can limit the model’s robustness and generalizability.
The emergence of deep learning presents a promising avenue for retrieving Chl-a. A deep neural network (DNN) leverages multiple layers of non-linear activation functions to automatically extract complex features from raw inputs, reducing dependence on prior knowledge and feature engineering. This approach ensures the global applicability of extracted Chl-a features and enhances the model’s capacity to model complex functions [
18,
19]. With high-resolution remote sensing data, the increased spectral purity and pixel-level information content further facilitate convolutional neural network (CNNs) and similar architectures in effectively extracting local spectral and spatial features [
20]. As a result, deep learning models have demonstrated significant potential in Chl-a retrieval, offering automated feature extraction, robust generalization, and high predictive accuracy.
However, current applications of deep learning in Chl-a retrieval are still subjected to several challenges. First, deep neural networks typically require large amounts of training data, making them challenging to obtain in practical settings. Second, while multilayer perceptron (MLP)-based frameworks can represent complex input–output mappings, they provide limited interpretability regarding the contribution of specific features to Chl-a estimates [
21,
22]. Achieving both high accuracy and automated feature extraction while maintaining clear interpretability of the relationship between spectral characteristics and Chl-a remains an unresolved research question.
Recent advancements in artificial intelligence have introduced baseline deep models based on the Kolmogorov–Arnold network (KAN) representation theory [
23,
24]. These models have demonstrated strong interpretability and robustness under limited data conditions and are increasingly being applied across various domains [
25,
26]. We employ a Kolmogorov–Arnold network (KAN) for per-pixel Chl-a retrieval because the task is inherently spectral. Predictions are governed by per-band reflectance values rather than spatial context; hence, the convolutional inductive bias of CNNs offers limited advantage. In contrast, KANs are designed to operate on vector inputs, aligning with the input structure of multispectral or hyperspectral data. More importantly, KANs provide built-in, function-level interpretability: each edge is parameterized by a learnable univariate kernel function, enabling direct extraction of band-wise activation profiles and edge-wise attributions. This architecture affords an explicit understanding of how individual spectral bands and their non-linear compositions influence Chl-a predictions, capabilities that typically require post hoc interpretability tools when using MLPs or CNNs. These properties (task–model alignment, built-in interpretability, and data-efficient learning) make KAN a theoretically grounded and practically robust choice for inland water Chl-a estimation.
This study evaluates the KAN model in Baiyangdian Lake (Hebei Province, China), a representative inland lake test site. As one of the largest freshwater lakes in northern China, Baiyangdian exhibits diverse aquatic habitats, varying degrees of anthropogenic influence, and complex water-optical properties. These attributes designate it as a strong proxy for wider inland water systems and enable the generalization of our findings to other environments with similar ecological characteristics.
The objectives of this study included the following:
- 1.
To develop a robust KAN model for retrieving Chl-a concentrations in inland lakes and validate its performance.
- 2.
To identify the most influential spectral variables in retrieving Chl-a concentrations within Baiyangdian using attribution scores and to further forge a theoretical foundation for model interpretability.
- 3.
To investigate the driving factors underlying water quality dynamics during 2020–2024 by generating remote sensing-based Chl-a maps spanning this time frame and analyzing their spatio-temporal variations.
Figure 1 illustrates the overall technical framework and workflow of this research.
3. Results
3.1. Comparison with Other Models
In this study, five representative models for Chl-a concentration retrieval were constructed and compared, involving three machine learning models (SVM, XGBoost, and RF) and two deep learning models (DNN and CNN). The hyperparameters of SVM, XGBoost, and RF were optimized utilizing a randomized search strategy to determine their optimal configurations. For the deep learning models, the DNN was designed with three hidden layers (comprising 128, 256, and 32 neurons, respectively) and an initial learning rate of , and the Adam optimizer was employed. The CNN model uses a pixel patch centered on each target pixel to form a spatio-spectral input constructed from the eight raw Sentinel-2 reflectance bands (B2–B8A). All bands are co-registered to a common 10 m grid, cloud/shadow and land pixels are masked, reflection or replicate padding is applied at image borders to preserve the full context, and each channel is standardized using training-set statistics. The network then applies a convolutional kernel in two hidden convolutional layers with 8 and 16 filters, respectively (ReLU activations; no pooling to retain local context), followed by a fully connected layer with 32 neurons to produce a scalar Chl-a prediction. The CNN is optimized with Adam (initial learning rate ), and all models are trained and evaluated on the same dataset with early stopping on the validation split to mitigate overfitting.
Upon the determination of their optimal parameters, the proposed KAN model was compared with the five baseline models (CNN, DNN, SVM, RF, and XGBoost) for performance evaluation.
Figure 5 shows the comparison between measured and predicted values, and
Table 2 presents the quantitative evaluation indices. As shown in
Table 3, the KAN model achieved superior performance in terms of
, MAE, and RMSE, yielding
, MAE = 1.1920 μg/L, and RMSE = 1.6705 μg/L, which were markedly better than the results of other models. These findings demonstrate that the KAN model maximizes the fit between measured and predicted Chl-a concentrations while effectively reducing prediction bias and uncertainty. This strong performance underscores the potential of KAN in high-precision Chl-a retrieval tasks in remote sensing applications.
3.2. Comparison with Machine Learning Algorithms Using Domain-Specific Engineered Features
Unlike many lake water-quality retrieval studies that boost the signal-to-noise ratio (SNR) by handcrafting spectral indices, we deliberately restricted the KAN inputs to the eight raw Sentinel-2 reflectance bands (B2–B8A). First, hand-engineered indices impose strong priors and can obscure higher-order, subtle non-linear interactions; in contrast, the KAN’s adaptive functional bases learn such relationships directly from the data. Second, indices tuned to a particular optical regime often lack portability across lakes with differing biogeophysical conditions. Training on raw bands improves cross-system transferability, allowing a single trained KAN to be applied across multiple water types without redesigning features. Third, attribution scores computed on raw bands map directly to physically observed quantities, facilitating sensor selection and management decisions, whereas mixing composite indices would dilute causal interpretability and inflate model complexity. Nevertheless, to quantify the incremental value of handcrafted features and to provide a comparable benchmark, we also applied the same feature-engineering scheme to three representative machine-learning baselines and evaluated them on the same dataset.
Further experiments were conducted to enhance the performance of three traditional machine learning algorithms (SVM, RF, and XGBoost) on Chl-a retrieval by integrating carefully selected feature engineering approaches. These engineered features were derived from the original Sentinel-2 bands, leveraging commonly used spectral bands and remote sensing indices from previous studies on similar environments and water bodies (see
Table 3). The engineered features included various band ratios, normalized difference indices, and empirically chosen sensitive band combinations. Such features have been extensively validated in the literature as beneficial for retrieving water-related parameters.
Figure 6 illustrates the comparisons between measured and predicted values using these feature-engineered machine learning algorithms. Upon training and validation on the expanded feature sets (see
Table 4), XGBoost demonstrated the best performance among these three models, achieving
, RMSE = 1.7898 μg/L), and MAE = 1.0017 μg/L. Meanwhile, XGBoost slightly outperformed RF (
, RMSE = 1.8862 μg/L, and MAE = 1.5849 μg/L) and SVM (
, RMSE = 1.9827 μg/L, and MAE = 1.7023 μg/L).
Despite these improvements, feature-engineered traditional machine learning methods still lag behind the proposed KAN model. Notably, KAN achieved superior accuracy in estimating Chl-a concentrations without the need for additional, labor-intensive feature engineering. This result further highlights the model’s robustness and adaptability, suggesting that KAN can effectively capture and represent essential spectral characteristics without relying on domain expertise or extensive prior knowledge.
3.3. Model Interpretability Analysis
Feature attribution analysis facilitated deeper comprehension of the relative importance of each spectral band in Chl-a concentration retrieval.
Figure 7 illustrates that pruning reduces the KAN model’s hierarchical structure to two hidden layers and three nodes. This streamlined architecture simplifies the calculation of attribution scores for each input band and its associated edges. These scores quantify the contribution of each feature to the Chl-a predictions, emphasizing the critical influence of key bands on achieving high retrieval accuracy.
Table 5 provides detailed attribution scores. As shown in the table, B5 and B3 exert the strongest influence on Chl-a retrieval, presenting respective scores of 1.007 and 0.497. The prominence of band B5 coincides with a notable fluorescence peak at approximately 710 nm, commonly associated with Chl-a. Similarly, the substantial contribution of band B3 aligns with increased reflectance characteristics observed in phytoplankton-rich waters. In contrast, bands B2, B6, and B4 exhibit moderate importance (attribution scores of 0.485, 0.428, and 0.294, respectively), while bands B8, B8A, and B7 contribute the least (0.101, 0.199, and 0.0178, respectively). This suggests that the latter bands yield comparatively weaker signals or increased noise in capturing Chl-a fluorescence features, which is consistent with their limited spectral sensitivity.
Analysis of the hierarchical tree structure’s edges offers additional insights into inter-band relationships (
Table 6). For example, edges
,
, and
present attribution scores of 0.8973, 0.485, and 0.373, respectively, underscoring the essential roles of B5, B2, and B3. These high-contributing edges indicate non-linear interactions among these key spectral bands, thereby facilitating the model to more accurately capture the dynamic changes in Chl-a. In contrast, edges
and
yield relatively low attribution scores of 0.040 and 0.253, respectively, reflecting their limited impact on the final predictions. At the hidden layer level, the second-layer node
achieves a notably high attribution score of 0.897, highlighting its critical role in synthesizing complex multi-band interactions and underlying Chl-a variation mechanisms. Additionally, the single node at the final output layer attains an attribution score of 0.999, which further emphasizes its pivotal role in aggregating information and guiding final predictions.
This detailed examination of the KAN model’s internal mechanisms elucidates the direct impact of individual input bands on Chl-a concentration estimates and reveals stable, meaningful interaction structures across multiple feature extraction layers. The presence of nodes and edges with lower contribution scores provides a rationale for simplifying the model and refining the feature set in subsequent studies. Collectively, this attribution analysis method establishes a robust foundation for enhancing model interpretability and predictive robustness, which also paves novel avenues for probing the relationships between water spectral responses and changes in Chl-a concentrations.
3.4. Spatio-Temporal Variation in Chl-a Concentration
Herein, remote sensing maps of Baiyangdian Lake from 2020 to 2024 were generated using the KAN model, covering the spring (March–May), summer (June–August), and autumn (September–November) seasons. Winter data were excluded due to the cold climate and ice cover in northern China, which hinder effective retrieval. To minimize cloud contamination, we adopted an image-acquisition strategy targeting one scene per 10-day interval that satisfied CLOUDY_PIXEL_PERCENTAGE ≤ 10%. If no suitable scene was available on the target date, we searched within a -day window and selected the scene with the lowest cloud fraction. When no scene within this window met the threshold, monthly data gaps were filled using inverse-distance-in-time weighted interpolation based on the remaining available scenes. For data processing, the normalized difference water index (NDWI) was employed to delineate the water boundaries and compute the intersection of water areas for each season. The seasonal average products were obtained by accumulating the data and dividing by the number of seasonal images. These seasonal products were then aggregated over the year and divided by three to derive the annual average products. The entire workflow was implemented using Python v3.10.16 GDAL library.
Figure 8 reveals that the overall Chl-a concentration in Baiyangdian Lake remains relatively low, exhibiting pronounced seasonal variation: lowest in spring, intermediate in autumn, and highest in summer. This pattern largely reflects favorable temperature conditions during the summer and autumn months, which foster rapid phytoplankton growth. Similar seasonal dynamics have also been documented in other Chinese lakes [
31].
Further analysis shows a declining trend in the lake’s annual mean
Chl-a concentration between 2020 and 2022 (annual averages around 7 μg L
−1), followed by a marked increase in 2023–2024 (approximately 10 μg L
−1). Although the timing aligns with periods of reduced human activity during COVID-19 restrictions and subsequent resumption [
32], our dataset does not include independent indicators of anthropogenic pressure. Consequently, the interpretations presented here are correlative rather than demonstrably causal. Therefore, we frame these associations as hypotheses and emphasize the importance of continued monitoring and the systematic collection of pressure indicators. Such data are essential for disentangling the relative contributions of climatic and anthropogenic drivers to inter-annual variability and for supporting more targeted and effective management strategies.
4. Discussion
4.1. Impact of Data and Preprocessing on Chl-a Retrieval
In terms of data acquisition, the number of cloud-free scenes available within a given year (N) fluctuates markedly across years and seasons due to cloudiness and satellite-revisit constraints. When coverage is insufficient, we rely on -day substitution or within-month time-weighted interpolation. While these methods help fill temporal gaps, they can introduce sampling aliasing and interpolation errors, potentially biasing the annual mean and thereby affecting the interpretation of inter-annual change. To quantify the impact of temporal sampling density, we performed Monte Carlo resamples for each year at (month-stratified, without-replacement subsampling, with -day substitution and within-month time-weighted interpolation applied).
The results (
Table 7) show that the accuracy of the annual mean Chl-a depends strongly on
N. When coverage is high (e.g.,
), annual averages are stable and preserve inter-annual rank ordering; as coverage decreases, reliance on substitution/interpolation increases and uncertainty inflates. Our resampling indicates that the standard deviation of the annual mean is
for
, rising to
for
. This suggests that, under sparse coverage, part of the apparent “anomalies” may be sampling artifacts rather than genuine biogeochemical change. Accordingly, when reporting annual means, we include the corresponding
N and uncertainty estimate (SD/CI), use
as the reference baseline, and flag years with
as unsuitable for trend assessment.
From a data-processing standpoint, we favored the ESA L2A/Sen2Cor workflow, supplemented by scene-wise reflectance normalization and conversion from bottom-of-atmosphere reflectance to remote-sensing reflectance . This configuration provides a standardized, globally supported BOA baseline and, after normalization, yields stable cross-date spectra over optically complex inland waters. Alternative atmospheric-correction processors (e.g., ACOLITE, C2RCC, and iCOR) are primarily optimized for coastal/marine conditions or require site-specific parameterization, whereas our objective was to establish an operational and reproducible pipeline applicable across seasons and years. A formal multi-processor intercomparison will be pursued in future work to further assess performance differences.
4.2. Summary of Algorithmic Performance and Interpretability
The proposed KAN model achieved the highest Chl-a retrieval performance , significantly outperforming all other tested models. Handcrafted features improved classical ML baselines (BP, RF, and XGBoost) with , respectively. XGBoost showed the largest reductions in error (MAE and RMSE ), highlighting the importance of feature engineering in classical frameworks. Among deep models, a fully connected DNN trained on raw spectra outperformed the original ML baselines but did not exceed feature-engineered ML, likely due to the small dataset. The CNN was slightly inferior overall, plausibly reflecting the limited utility of spatial convolutions for medium-sized lakes, where mixed-pixel effects and environmental interference can diminish the benefits of spatial-context aggregation.
Regarding interpretability, the KAN model introduces an innovative approach by applying learnable activation functions to the weights (edges) and retains a fully connected structure, unlike traditional MLPs that use fixed activation functions at the nodes. Specifically, the computational process of an MLP can be expressed as follows:
Obviously, MLPs treat linear transformations and non-linearities separately, implementing them through W and , respectively. In contrast, KANs handle these components collectively using . Consequently, traditional linear weight matrices are not utilized; instead, each weight parameter is replaced by a one-dimensional learnable function parameterized by spline functions. In KANs, the nodes aggregate incoming signals without introducing additional non-linearities. Analyzing attribution scores allows for intuitive visualization of each sub-node’s contributions to the parent node. Hence, the critical feature variables B3 and B5 are further identified for Chl-a concentration estimation.
4.3. Limitations and Recommendations
Although the KAN model demonstrates excellent performance on the held-out test set, the dataset size (n = 104) warrants a more rigorous validation protocol to ensure robustness and reproducibility. While KAN attains high predictive accuracy with a degree of interpretability, its use of learnable edge functions yields complex functional compositions that limit full model transparency. Attribution analysis provides valuable insights but remains local and model-dependent, rather than constituting demonstrably causal explanations. Inference is further constrained by data availability and processing choices: the in situ matchup set is relatively small and restricted to a single lake, temporal alignment relies on a narrow -day window, which introduces uncertainty, and conclusions are conditioned on a specific processing pipeline ( normalization → NDWI mask ), under which residual thin-cloud, adjacency, or sun-glint effects may persist. Annual means are sensitive to the density of cloud-free scenes N; under sparse coverage, Monte Carlo analysis indicates markedly wider uncertainty bands, which can affect the interpretation of inter-annual trends.
To validate and stress-test KAN across diverse inland-water systems, future work should incorporate multi-lake, multi-year datasets spanning different optical water types and trophic states, with same-day satellite–in situ matchups whenever feasible. We will employ stratified k-fold cross-validation (e.g., –10) and, where appropriate, spatio-temporal blocking or leave-one-lake/leave-one-year designs to obtain more stable performance estimates, mitigate overfitting, and more rigorously assess transferability. Uncertainty should be quantified by systematically varying scene density and preprocessing parameters (e.g., alignment windows and normalization strategies), and communicating predictive uncertainty via confidence intervals and calibration curves alongside standard performance metrics. Finally, cross-sensor evaluations (e.g., Landsat-8/9, PRISMA, and UAV hyperspectral) and open code/data releases will further enhance the robustness, reproducibility, and operational relevance of the proposed approach.
5. Conclusions
This study demonstrates that a KAN can accurately retrieve Chl-a concentrations from Sentinel-2 imagery over optically complex inland lakes, achieving performance comparable to, and in some cases exceeding, that of conventional machine-learning and deep-learning models, while retaining the added benefits of meaningful interpretability. Attribution analysis identifies bands B3 and B5 as the primary spectral predictors. These findings have direct operational value: water-resource agencies can automate Chl-a mapping at a weekly cadence and spatial resolution without heavy reliance on in situ sampling, monitoring programs can prioritize quality control for bands B3/B5 when scheduling acquisitions or assessing scene usability, and environmental managers can embed KAN-derived concentration thresholds into early warning systems to trigger rapid mitigation during bloom-risk periods. For Baiyangdian Lake, the resulting maps from 2020–2024 reveal a pronounced seasonal cycle and a gradual inter-annual improvement consistent with recent restoration efforts, providing a quantitative basis for refining lake nutrient-reduction targets. Looking forward, future work should extend the KAN framework to support multi-parameter retrieval (e.g., TSS, CDOM, and SDD), test its transferability across diverse lake types globally, and integrate the model with real-time data streams, from UAV-based hyperspectral imaging to hydrodynamic forecasts, to build a fully integrated, adaptive decision-support system for inland water management.