Review Reports - Retrieving Chlorophyll-a Concentrations in Baiyangdian Lake from Sentinel-2 Data Using Kolmogorov–Arnold Networks

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This is a well-written study that introduces a novel and powerful method, the Kolmogorov-Arnold Network (KAN), to the field of inland water quality monitoring. The work is a contribution demonstrating the high potential of KANs for achieving both accuracy and interpretability in complex remote sensing applications. The manuscript is well-structured and the results are promising. I recommend this paper for publication after a few minor revisions that will further strengthen the analysis and enhance its clarity for the reader.

The KAN model demonstrates excellent performance on the test set. To make the validation even more robust, especially given the dataset size of 104 samples, the authors might consider adding a note in the discussion about the potential benefits of using k-fold cross-validation in future studies to further confirm the model's generalization capabilities.

The authors do an excellent job of preprocessing the Sentinel-2 data, including applying an atmospheric correction to convert surface reflectance to remote-sensing reflectance. The comparison shown in Figure 3, which validates the corrected spectra against in-situ measurements, is a commendable inclusion. To further clarify the methodology, it would be beneficial to add a sentence or two in the discussion section briefly explaining why this specific method was chosen over other available processors.

In the description of the CNN model, the authors mention using a "5×5 pixel patch". A brief clarification on how these patches were integrated with the spectral bands as input to the model would be helpful for readers interested in replicating the comparison.

The analysis of Chl-a maps from 2020-2024 is descriptive but lacks depth. The attribution of the 2020-2022 decline in Chl-a to the COVID-19 pandemic and the subsequent increase to resumed human activities is speculative. No ancillary data are presented to support this causal link. Consequently, the strong recommendation that "it is imperative for relevant management authorities to implement proactive measures" is not sufficiently supported by the evidence presented in the manuscript

For improved readability, it is recommended to standardize the unit for chlorophyll-a concentration throughout the manuscript. The text, tables, and figures variously use µg/L , mg/m³ , and at one point, µg/I. Using µg/L consistently would be ideal.

In Figure 1, there appears to be a minor typo: "Hyperparameter Turing" should likely be "Hyperparameter Tuning".

Author Response

Comment 1: The KAN model demonstrates excellent performance on the test set. To make the validation even more robust, especially given the dataset size of 104 samples, please add a note about the potential benefits of using k-fold cross-validation in future studies.

Response 1: Thank you for the helpful suggestion. We have added a sentence in the Discussion noting that, despite strong test performance, the limited dataset motivates stratified k-fold cross-validation in future work.

Location: Discussion — Limitations and Future Validation (page 17, paragraph 2).
Added text: “To validate and stress-test KAN across diverse inland-water systems, future work should incorporate multi-lake, multi-year datasets spanning different optical water types and trophic states, with same-day satellite–in situ matchups whenever feasible. We will employ stratified k-fold cross-validation (e.g., k = 5–10) and, where appropriate, spatio-temporal blocking or leave-one-lake/leave-one-year designs to obtain more stable performance estimates, mitigate overfitting, and more rigorously assess transferability.”

Comment 2: Please add a brief rationale clarifying why the Sentinel-2 processing workflow (Sen2Cor + normalization + R→Rrs) was chosen over other available processors.

Response 2: We agree this clarification is valuable. Two sentences were inserted in Methods — Data Processing and referenced in the Discussion.

Location: Discussion—Impact of Data and Preprocessing on Chl‑a Retrieval(page 16, paragraph 2).
Added text: “From a data-processing standpoint, we favored the ESA L2A/Sen2Cor workflow, supplemented by scene-wise reflectance normalization and conversion from bottom-of-atmosphere reflectance R(λ) to remote-sensing reflectance Rrs(λ). This configuration provides a standardized, globally supported BOA baseline and, after normalization, yields stable cross-date spectra over optically complex inland waters. Alternative atmospheric-correction processors (e.g., ACOLITE, C2RCC, iCOR) are primarily optimized for coastal/marine conditions or require site-specific parameterization, whereas our objective was to establish an operational and reproducible pipeline applicable across seasons and years. A formal multi-processor intercomparison will be pursued in future work to further assess performance differences.”

Comment 3: In the CNN description, you mention a ‘5×5 pixel patch’. Please clarify how the spatial patches were integrated with the spectral bands as input to the model.

Response 3: We expanded the CNN section with explicit input construction details. Location: Results—Comparison with Other Models (page 10, paragraph 1).
Added text: “The CNN model uses a 5×5 pixel patch centered on each target pixel to form a 5×5×8 spatio-spectral input constructed from the eight raw Sentinel-2 reflectance bands (B2–B8A). All bands are co-registered to a common 10 m grid; cloud/shadow and land pixels are masked; reflection or replicate padding is applied at image borders to preserve the full 5×5 context; and each channel is standardized using training-set statistics. The network then applies a 3×3 convolutional kernel in two hidden convolutional layers with 8 and 16 filters, respectively (ReLU activations; no pooling to retain local context), followed by a fully connected layer with 32 neurons to produce a scalar Chl-a prediction. The CNN is optimized with Adam (initial learning rate 1×10⁻³), and all models are trained and evaluated on the same dataset with early stopping on the validation split to mitigate overfitting.”

Comment 4: The analysis of 2020–2024 Chl-a is descriptive but the attribution to COVID-19 vs. resumed human activity is speculative. No ancillary data are provided; please temper causal language and avoid strong policy prescriptions.

Response 4: We appreciate this point and have moderated the wording to be strictly correlative and removed the prescriptive statement.

Location: Results—Spatio-Temporal Variation of Chl-a Concentration (page 14, paragraph 2).
Revised text: “Our dataset does not include independent indicators of anthropogenic pressure. Consequently, the interpretations presented here are correlative rather than demonstrably causal. We therefore frame these associations as hypotheses and emphasize the importance of continued monitoring and the systematic collection of pressure indicators. Such data are essential for disentangling the relative contributions of climatic and anthropogenic drivers to interannual variability and for supporting more targeted and effective management strategies.”

Comment 5: For readability, please standardize the unit for chlorophyll-a concentration throughout the manuscript (µg/L vs mg/m³; typo ‘µg/I’).

Response 5: Done. All text, tables, and figures now consistently use µg L^{-1}; the typo ‘µg/I’ has been corrected. Location: throughout; specific correction on page 8, line 17. Where relevant, we note numerical equivalence (1 mg/m³ ≈ 1 µg/L) but retain µg/L for consistency.

Comment 6: Minor typo in Figure 1: “Hyperparameter Turing” → “Hyperparameter Tuning”.

Response 6: Corrected in Figure 1 and its caption. Location: Figure 1.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript provides a relevant integration of Sentinel-2 imagery with Kolmogorov-Arnold Networks (KAN) for retrieving Chl-a concentrations.

First of all, the plagiarism rate of this paper is a bit high, at about 21%.
Paraphrasing Materials and Methods Sections 2.3 and 2.4.1 will help reduce the plagiarism rate.

- 1. The authors effectively highlight the importance of Chl-a monitoring as an indicator of ecological health.
1) However, they need to explain more clearly why they chose KAN over other deep learning approaches, such as CNN or MLP, especially the issue of interpretability.
2) In addition, they need to explain how Baiyangdian Lake reflects the conditions of the wider inland waters.

- 2. I think that additional rationale for the methodological decisions is needed.
1) The selection of sampling sites in Shaoche Dian and Quantou Township could be more clearly explained in terms of the representativeness of the wider lake area and anthropogenic influences.
2) The temporal alignment between satellite image acquisition and field sampling, and the reason why additional reflectance normalization was needed in addition to Sen2Cor atmospheric correction, should be clarified.
3) In addition, regarding the KAN model design, please clarify the rationale for choosing two hidden layers (16 and 8 nodes, respectively) and setting the attribution score threshold for pruning to 10^-2.

- 3. The results section should explain why domain-specific engineered features are not incorporated into the KAN approach to improve accuracy or interpretability.
1) It is necessary to discuss how the availability of cloud-free imagery may affect the accuracy of the annual Chl-a average, thereby increasing the reliability of the interpretation.
2) Since cloud cover varies from year to year, this could be a variable that causes different results from year to year.

- 4. Limitations related to model transparency, interpretability, and data availability constraints should be clearly described. I recommend providing specific suggestions for future validation or scalability testing of KAN across a variety of inland water systems.

- 5. Although this conclusion effectively summarizes the study's findings, it should emphasize the applied value of this study by explicitly presenting practical implications and detailed operational recommendations for environmental monitoring and management organizations.
1) Also, it is recommended that the conclusion be a single paragraph (currently three paragraphs).
2) The last part of the conclusion should present a concept for further research on areas that were not covered or reviewed in this manuscript, providing future goals for future researchers.

To adhere to scientific notation, I suggest that all Chl-a be italicized (i.e., Chl-a).

Below are some points in the abstract that should be considered for revision.

L4 hereby designed -> was designed
L5-6 As indicated by the results -> The results demonstrate that...
L14 utilizes -> leverages
L17 can inform -> may guide

Please consider the above and improve the clarity of your manuscript.

Thank you.

Author Response

Reviewers #2:

Comment 1.1:

The authors effectively highlight the importance of Chl-a monitoring … However, they need to explain more clearly why they chose KAN over CNN/MLP, especially interpretability.

Response 1.1:

Thank you for this suggestion. We added a concise rationale clarifying task–model alignment and interpretability.

Location: Introduction (page 2 , paragraph 4).

We employ a Kolmogorov–Arnold Network (KAN) for per-pixel Chl-a retrieval because the task is inherently spectral. Predictions are governed by per-band reflectance values rather than spatial context; hence, the convolutional inductive bias of CNNs offers limited advantage. In contrast, KANs are designed to operate on vector inputs, aligning with the input structure of multispectral or hyperspectral data. More importantly, KANs provide built-in, function-level interpretability: each edge is parameterized by a learnable univariate kernel function, enabling direct extraction of band-wise activation profiles and edge-wise attributions. This architecture affords an explicit understanding of how individual spectral bands and their nonlinear compositions influence Chl-a predictions—capabilities that typically require post hoc interpretability tools when using MLPs or CNNs. These properties—task–model alignment, built-in interpretability, and data-efficient learning—make KAN a theoretically grounded and practically robust choice for inland-water Chl-a estimation.

Comment 1.2:

Explain how Baiyangdian Lake reflects the wider inland waters.

Response 1.2:

We added a paragraph explaining representativeness in terms of size, habitat diversity, optical complexity, and human-pressure gradients.

Location: Introduction (page 2, paragraph 5).

This study evaluates the KAN model in Baiyangdian Lake (Hebei Province, China), a representative inland-lake test site. As one of the largest freshwater lakes in northern China, Baiyangdian exhibits diverse aquatic habitats, varying degrees of anthropogenic influence, and complex water-optical properties. These attributes make it a strong proxy for wider inland-water systems and support the generalization of our findings to other environments with similar ecological characteristics.

Comment 2.1:

Clarify the selection of sampling sites (Shaoche Dian, Quantou) for representativeness and anthropogenic influences.

Response 2.1:

We expanded the justification for site selection to cover hydrological centrality, habitat coverage, and pressure gradients (tourism traffic, agricultural drainage).

Location: Materials and Methods— Study Area (page 4, paragraph 2).

Revised text (excerpt):

Shaoche Dian and the waters of Quantou Village form the hydrological core of Baiyangdian Lake, together accounting for approximately 18% of its open-water surface. Real-time buoy observations indicate that these sub-basins capture the full north–south gradients in salinity and nutrient concentrations and encompass key habitat types—including reed fringes, open pelagic zones, and semi-enclosed bays—thereby reflecting the lake’s overall ecological heterogeneity. They also experience the heaviest anthropogenic loading: Shaoche Dian receives ~1.4 million visitors per year, accompanied by intensive motorized boat traffic, while Quantou Village is adjacent to ~7,500 ha of peri-urban agricultural land and supports a resident population exceeding 9,600 people. Collectively, these areas account for >35% of diffuse nutrient inputs to the lake (Baiyangdian Environmental Protection Bureau, 2024). Monitoring at these sites therefore provides a sensitive and representative basis for assessing lake-wide ecological dynamics and the effectiveness of ongoing restoration efforts.

Comment 2.2:

Clarify the temporal alignment between satellite acquisitions and field sampling, and why reflectance normalization was needed beyond Sen2Cor.

Response 2.2:

We clarified the monthly alignment protocol and the motivation for additional scene-wise normalization after atmospheric correction.

Location: Materials and Methods—Sentinel-2 data (page 5, paragraph 2).

Revised text (excerpt):

Sentinel-2 Level-2A (L2A) surface-reflectance imagery was obtained from the ESA Copernicus Open Access Hub. For each monthly in situ campaign, we selected the nearest cloud-free scene acquired under comparable meteorological conditions (low wind, no precipitation)—preferably on the same day or, if unavailable, within a narrow ±5-day window. This ensured temporal and spectral comparability at the monthly scale. All spectral bands were resampled to a common 10 m spatial resolution in SNAP and exported to ENVI format. Subsequent preprocessing (band stacking, mosaicking, cropping, and water masking) was performed in ENVI. The water extent of Baiyangdian was delineated using the Normalized Difference Water Index (NDWI; Gao, 1996); seasonal analyses used the spatial intersection of seasonal water masks to maintain a consistent spatial support.

Although the Sen2Cor processor provides bottom-of-atmosphere (BOA) reflectance, we further applied scene-wise reflectance normalization to improve cross-date spectral consistency over optically complex inland waters. This additional step mitigates residual spectral heterogeneity that can remain after atmospheric correction, including minor aerosol-model mismatches, adjacency effects from bright shorelines, thin-cloud or sun-glint residues, and variations in sun–sensor geometry.

Comment 2.3:

Clarify the rationale for choosing two hidden layers (16 and 8 nodes) and the pruning threshold 10^-2.

Response 2.3:

We added the architectural and pruning rationale based on cross-validation and ablation.

Location:

Materials and Methods—Methodology— KAN Network Design for Chl-a Retrieval (page 9, paragraph 2,3).

Revised text (excerpt):

For model construction, reflectance values from eight Sentinel-2 bands (B2–B8A) were used as input features. Extensive five-fold cross-validation indicated that a shallow network with two hidden layers achieved the best bias–variance trade-off for this relatively small tabular dataset. The first hidden layer comprised 16 neurons (i.e., 2× the input dimensionality), providing sufficient capacity to capture higher-order feature interactions, while the second layer compressed the representation to 8 neurons to reduce overfitting and facilitate subsequent pruning. Model parameters were optimized using the L-BFGS algorithm with an initial learning rate of 1×10⁻²; training proceeded until convergence.

Following training, the model was pruned to enhance interpretability. Attribution scores were computed for each neuron and connection, and a threshold of 1×10⁻² was applied. This value was selected because (a) it corresponded to the “elbow” of the attribution-score distribution, below which further reductions in the threshold yielded only marginal additional sparsity, and (b) ablation studies confirmed that stricter thresholds (e.g., ≤ 1×10⁻³) did not improve sparsity but reduced validation R² by more than 1%. Accordingly, all nodes and edges with attribution scores below 1×10⁻² were deemed negligible and removed.

Comment 3:

Explain why domain-specific engineered features are not incorporated into the KAN approach to improve accuracy or interpretability.

Response 3:

We added a paragraph explaining that raw-band inputs avoid hard priors, improve transferability, and enable direct physical attribution; we also retained a fair benchmark by applying engineered features to ML baselines.

Location:

Results—Comparison with Machine Learning Algorithms Using Domain-Specific Engineered Features (page 11, paragraph 1)

Added text (excerpt):

Unlike many lake water–quality retrieval studies that boost signal-to-noise ratio (SNR) by handcrafting spectral indices, we deliberately restricted the KAN inputs to the eight raw Sentinel-2 reflectance bands (B2–B8A). First, hand-engineered indices impose strong priors and can obscure higher-order, subtle nonlinear interactions; in contrast, KAN’s adaptive functional bases learn such relationships directly from the data. Second, indices tuned to a particular optical regime often lack portability across lakes with differing biogeophysical conditions. Training on raw bands improves cross-system transferability, allowing a single trained KAN to be applied across multiple water types without redesigning features. Third, attribution scores computed on raw bands map directly to physically observed quantities, facilitating sensor selection and management decisions, whereas mixing composite indices would dilute interpretability and inflate model complexity. Nevertheless, to quantify the incremental value of handcrafted features and to provide a comparable benchmark, we also applied the same feature-engineering scheme to three representative machine-learning baselines and evaluated them on the same dataset.

Comment 3.1:

Discuss how the availability of cloud-free imagery may affect the accuracy of the annual Chl-a average.

Response 3.1:

We added a Monte Carlo resampling analysis across \(N\in\{27,24,21,18,15\}\) to quantify uncertainty.

Location:

Discussion—Impact of Data and Preprocessing on Chl‑a Retrieval (page 14, paragraph 3; Table 7)

Added text:

In terms of data acquisition, the number of cloud-free scenes available within a given year (N) fluctuates markedly across years and seasons due to cloudiness and satellite-revisit constraints. When coverage is insufficient, we rely on ±5-day substitution or within-month time-weighted interpolation. While these methods help fill temporal gaps, they can introduce sampling aliasing and interpolation errors, potentially biasing the annual mean and thereby affecting the interpretation of interannual change. To quantify the impact of temporal sampling density, we performed B = 1000 Monte Carlo resamples for each year at N ∈ {27, 24, 21, 18, 15} (month-stratified, without-replacement subsampling, with ±5-day substitution and within-month time-weighted interpolation applied).

Comment 3.2:

Because cloud cover varies by year, it may cause year-to-year differences.

Response 3.2:

We explicitly acknowledged interannual variability in scene availability and its effect on uncertainty.

Location:

Discussion—Impact of Data and Preprocessing on Chl‑a Retrieval (page 15, paragraph 1; Table 7)

Added text:

The results (see Table chla_year_byN) show that the accuracy of the annual mean Chl-a depends strongly on N. When coverage is high (e.g., N = 27), annual averages are stable and preserve interannual rank ordering; as coverage decreases, reliance on substitution/interpolation increases and uncertainty inflates. Our resampling indicates that the standard deviation of the annual mean is ≈ 1.2 μg L⁻¹ for N ≥ 24, rising to ≈ 3.5 μg L⁻¹ for N < 15. This suggests that, under sparse coverage, part of the apparent “anomalies” may be sampling artifacts rather than genuine biogeochemical change. Accordingly, when reporting annual means, we include the corresponding N and an uncertainty estimate (SD/CI), use N = 27 as the reference baseline, and flag years with N ≤ 18 as unsuitable for trend assessment.

Comment 4:

Clearly describe limitations on transparency/interpretability/data, and provide concrete suggestions for broader validation or scalability.

Response 4:

We added a dedicated limitations paragraph and a concrete validation plan.

Location:

Discussion—Limitations & Recommendations (page. 17, paragraphs 1-2)

Revised text (excerpt):

Although the KAN model demonstrates excellent performance on the held-out test set, the dataset size (n = 104) warrants a more rigorous validation protocol to ensure robustness and reproducibility. While KAN attains high predictive accuracy with a degree of interpretability, its use of learnable edge functions yields complex functional compositions that limit full model transparency. Attribution analysis provides valuable insights but remains local and model-dependent, rather than constituting demonstrably causal explanations. Inference is further constrained by data availability and processing choices: the in situ matchup set is relatively small and restricted to a single lake; temporal alignment relies on a narrow ±5-day window, which introduces uncertainty; and conclusions are conditioned on a specific processing pipeline (L2A → normalization → NDWI mask → R(λ) → Rrs(λ)), under which residual thin-cloud, adjacency, or sun-glint effects may persist. Annual means are sensitive to the density of cloud-free scenes N; under sparse coverage, Monte Carlo analysis indicates markedly wider uncertainty bands, which can affect interpretation of interannual trends.

To validate and stress-test KAN across diverse inland-water systems, future work should incorporate multi-lake, multi-year datasets spanning different optical water types and trophic states, with same-day satellite–in situ matchups whenever feasible. We will employ stratified k-fold cross-validation (e.g., k = 5–10) and, where appropriate, spatio-temporal blocking or leave-one-lake/leave-one-year designs to obtain more stable performance estimates, mitigate overfitting, and more rigorously assess transferability. Uncertainty should be quantified by systematically varying scene density and preprocessing parameters (e.g., alignment windows, normalization strategies), and communicating predictive uncertainty via confidence intervals and calibration curves alongside standard performance metrics. Finally, cross-sensor evaluations (e.g., Landsat 8/9, PRISMA, UAV hyperspectral) and open code/data releases will further enhance the robustness, reproducibility, and operational relevance of the proposed approach.

Comment 5.1:

Emphasize the applied value by presenting practical implications and operational recommendations.

Response 5.1:

We rewrote the conclusion as a single paragraph emphasizing operational implications (10-m weekly mapping, B3/B5 QC prioritization, threshold-based alerts).

Location:

Conclusions (page. 17, paragraphs 3)

Revised text:

This study demonstrates that a Kolmogorov–Arnold Network (KAN) can accurately retrieve Chl-a concentrations from Sentinel-2 imagery over optically complex inland lakes, achieving performance comparable to—and in some cases exceeding—that of conventional machine learning and deep learning models, while retaining the added benefits of meaningful interpretability. Attribution analysis identifies bands B3 and B5 as the primary spectral predictors. These findings have direct operational value: water-resources agencies can automate Chl-a mapping at a weekly cadence and 10 m spatial resolution without heavy reliance on in situ sampling; monitoring programs can prioritize quality control for bands B3/B5 when scheduling acquisitions or assessing scene usability; and environmental managers can embed KAN-derived concentration thresholds into early-warning systems to trigger rapid mitigation during bloom-risk periods.Comment 5.2:

Consolidate the conclusion into one paragraph and outline future research not covered here.

Response 5.2:

Done. We present concise future directions (multi-parameter retrieval, global transferability, real-time integration with UAV/hydrodynamic forecasts).

Location:

Conclusions (page. 17, paragraphs 3)

Revised text (excerpt):

Looking forward, future work should extend the KAN framework to support multi-parameter retrieval (e.g., TSS, CDOM, SDD), test its transferability across diverse lake types globally, and integrate the model with real-time data streams—from UAV-based hyperspectral imaging to hydrodynamic forecasts—to build a fully integrated, adaptive decision-support system for inland-water management.

Comment 6 (abstract):

Abstract edits: L4 hereby designed → was designed; L5–6 As indicated by the results → The results demonstrate that …; L14 utilizes → leverages; L17 can inform → may guide.