Next Article in Journal
Groundwater Impacts and Sustainability in Italian Quarrying: Evaluating the Effectiveness of Existing Technical Standards
Previous Article in Journal
Contaminant Assessment and Potential Ecological Risk Evaluation of Lake Shore Surface Sediments
Previous Article in Special Issue
Identification of Gully-Type Debris Flow Shapes Based on Point Cloud Local Curvature Extrema
 
 
Article
Peer-Review Record

Physics-Informed Deep Learning for Karst Spring Prediction: Integrating Variational Mode Decomposition and Long Short-Term Memory with Attention

Water 2025, 17(14), 2043; https://doi.org/10.3390/w17142043
by Liangjie Zhao 1,*, Stefano Fazi 2, Song Luan 1, Zhe Wang 1, Cheng Li 1, Yu Fan 1 and Yang Yang 1,*
Reviewer 1:
Reviewer 2: Anonymous
Water 2025, 17(14), 2043; https://doi.org/10.3390/w17142043
Submission received: 4 June 2025 / Revised: 1 July 2025 / Accepted: 3 July 2025 / Published: 8 July 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The Paper "Physics-Informed VMD-LSTM with Attention for Karst Spring Prediction and Uncertainty Quantification" has some element of novelty but still cannot be accepted in its current form. Please find below my comments.

1.  For Karst Spring data from 2013 to 2018 is too small.

2. The authors made .. using an 80% training, 15% validation, and 5% testing data partitioning strategy.. which is not realistic to make the testing phase 5%, never seen in the literature. Testing phase is the most important in modeling, and taking only 5% would not be realistic. 
3.  How can you justify this statement.... During testing, RMSE decreased dramatically
from 0.726 to 0.220, and NSE improved from 0.867 to 0.988.
4.  The visual understanding of Fig. 2 indicated several noise from the data, how could the authors resolve these problems? 
5. Explain clearly the Model 1-12 methods you follow for model combinations.
6. How were the physics-informed constraints specifically formulated and integrated into the LSTM architecture, and what physical laws or assumptions were they based on for karst systems?
7. Can the authors elaborate on how the VMD modes were selected or interpreted in a physically meaningful way, and how their relevance to karst hydrodynamics was validated?
8. Regarding uncertainty quantification, how sensitive were the Monte Carlo dropout results to the number of realizations (e.g., 100), and was any comparison made with other uncertainty estimation techniques (e.g., Bayesian LSTM or quantile regression)?

Author Response

Comments 1: For Karst Spring data from 2013 to 2018 is too small.

Response 1: We appreciate the reviewer’s concern regarding the length of the dataset (2013–2018). While it is true that a longer time series is generally desirable, the five and a half years of high-resolution hourly data (approximately 48,000 records) used in this study provide a rich dataset that captures numerous hydrological events, including diverse rainfall-runoff dynamics, seasonal variations, flood peaks, and dry periods. The temporal resolution and data density compensate, to some extent, for the shorter total duration. Additionally, we selected this specific dataset because: (1) It is the only continuous, high-quality hourly spring discharge dataset available for this karst system. (2) The spring exhibits strong seasonal and event-based variability, and these are well represented within the selected period. (3) Other published studies on karst spring forecasting with machine learning often rely on daily data; our use of hourly resolution allows more detailed modeling of short-term response processes such as fast conduit flow and delayed epikarst contributions. To clarify this in the manuscript, we have added the following sentence in Section 2. (Line 121): “Although the dataset spans five and a half years (2013–2018), its hourly resolution ensures comprehensive coverage of hydrological variability, including seasonal transitions, storm events, and prolonged dry periods, providing sufficient information for robust model training and validation.”

Comments 2: The authors made using an 80% training, 15% validation, and 5% testing data partitioning strategy.. which is not realistic to make the testing phase 5%, never seen in the literature. Testing phase is the most important in modeling, and taking only 5% would not be realistic. 

Response 2: We appreciate the reviewer’s critical observation. We fully agree that the testing phase plays a vital role in evaluating the generalization ability of a predictive model. Although only 5% of the data was assigned to the test phase, this corresponds to 2,400 hourly samples—a substantial number, sufficient to cover both wet and dry seasons. The test period includes distinct rainfall events and recession periods, capturing key karst hydrodynamic regimes. We calculated the mean and standard deviation of flow across the three subsets: Training set: Mean = 1.119, Std = 2.486; Validation set: Mean = 0.798, Std = 2.094; Test set: Mean = 0.765, Std = 2.028. These statistics indicate that the test data is representative of the overall hydrological conditions and distributional characteristics. To further examine the reviewer’s concern, we also tested an alternative data split (70% training, 15% validation, 15% testing). The results were similar, with no significant degradation or improvement in model performance metrics. We have clarified this discussion in the revised manuscript (Section 4.2.1, Line 285) and added the following explanation: “While the test set accounts for 5% of the dataset, it includes both flood and dry periods. Statistical properties of mean and standard deviation confirm their mutual representativeness.

Comments 3: How can you justify this statement.... During testing, RMSE decreased dramatically from 0.726 to 0.220, and NSE improved from 0.867 to 0.988.

Response 3: Thank you for pointing this out. The sentence in the abstract was intended to describe the performance improvement of our proposed model relative to the baseline LSTM, using the same test dataset and experimental settings. To eliminate this ambiguity, we have rewritten the sentence in the revised abstract as follows: “Compared to the baseline LSTM, RMSE during testing decreased dramatically from 0.726 to 0.220, and NSE improved from 0.867 to 0.988.”

Comments 4: The visual understanding of Fig. 2 indicated several noise from the data, how could the authors resolve these problems?

Response 4: Thank you for this valuable observation. We acknowledge that Fig. 2 reveals some high-frequency fluctuations in the discharge time series, which may visually appear as noise. Karst spring discharge often exhibits genuine high-frequency variations due to the rapid response of conduit flow to localized rainfall inputs. To distinguish these from artificial noise, we applied Variational Mode Decomposition (VMD), which decomposes the signal into intrinsic mode functions (IMFs) with narrow frequency bands. We retained only the physically interpretable IMFs (Mode 2–6) based on energy contribution, spectral coherence, and correlation with precipitation. Higher-frequency IMFs (Mode 7–12), which captured <1% variance and showed weak physical relevance or susceptibility to noise, were explicitly excluded from the model inputs.

Comments 5: Explain clearly the Model 1-12 methods you follow for model combinations.

Response 5: We thank the reviewer for this observation. The numbering from 1 to 12 refers to Mode 1–12, the twelve intrinsic mode functions (IMFs) derived from Variational Mode Decomposition (VMD) of the spring discharge time series. These modes are not model architectures but band-limited signal components with different dominant periods and hydrological relevance. We do not train separate models for each mode, nor do we combine model outputs. Instead, we select a subset of physically meaningful modes (Mode2–6) and use them together as multivariate input features to the LSTM-based forecasting model. To avoid any potential confusion, we have made the following clarifications in the revised manuscript: (Line 244) “These modes were used simultaneously as multivariate inputs representing distinct hydrological processes.”

Comments 6: How were the physics-informed constraints specifically formulated and integrated into the LSTM architecture, and what physical laws or assumptions were they based on for karst systems?

Response 6: We thank the reviewer for this insightful question. The physics-informed constraint in our model is implemented through an entropy-based regularization term applied to the attention weights within the LSTM architecture. Karst systems are governed by nonlinear and multi-scale flow processes, including: (1) Fast conduit flow immediately after precipitation; (2) Delayed responses through epikarst and matrix pathways; (3) Extended recession behavior due to storage–release effects. These processes imply that only a few recent inputs (e.g., significant rainfall events) may strongly influence current discharge, while older or minor events contribute little. Our entropy-based regularization encourages the attention mechanism to reflect this selective hydrological memory. Thus, our physics-informed constraint is not based on a fixed physical equation (e.g., Darcy’s law), but rather on a behavioral assumption derived from physical understanding of karst hydrodynamics. As described in Section 3.2, The model employs a temporal attention mechanism that computes attention weights over the hidden states of the LSTM, This encourages the model to assign higher weights to a few key time steps—typically corresponding to rainfall peaks, rapid responses, or recession inflection points—rather than distributing attention uniformly.

Comments 7: Can the authors elaborate on how the VMD modes were selected or interpreted in a physically meaningful way, and how their relevance to karst hydrodynamics was validated?

Response 7: We appreciate the reviewer’s question regarding the selection and hydrological interpretation of the VMD modes. A total of twelve intrinsic mode functions (IMFs) were extracted using Variational Mode Decomposition (VMD) from the hourly spring discharge signal. To retain only physically meaningful modes, we applied a three-stage filtering strategy: (1) Variance Contribution ≥ 1%, To ensure each mode retained non-negligible signal energy. (2) Spectral Energy Concentration > 70%, Computed via power spectral density to filter out noise-dominated or mode-mixed components. (3) Significant Correlation with Precipitation (|r| > 0.3, p < 0.05), To identify modes that respond meaningfully to rainfall input, consistent with runoff generation processes. Only Modes 2 to 6 satisfied all criteria and were retained as inputs to the LSTM model. We also provide a summary table (Table 1) and visual decomposition (Figure 4), which highlight the distinct behavior of selected modes. The inclusion of only these modes led to improved model performance, as shown in Section 4.4.

Comments 8: Regarding uncertainty quantification, how sensitive were the Monte Carlo dropout results to the number of realizations (e.g., 100), and was any comparison made with other uncertainty estimation techniques (e.g., Bayesian LSTM or quantile regression)?

Response 8: We thank the reviewer for this important observation regarding the robustness of our uncertainty quantification strategy. To assess the stability of the Monte Carlo (MC) dropout results, we performed a sensitivity analysis by varying the number of realizations from 10 to 200. As shown in Section 4.3 and Supplementary Figure 11, both the RMSE of the predictive mean and the average width of the 95% confidence interval (CI) stabilized when the number of samples exceeded 100. Specifically, the changes in RMSE and CI width were less than 1% and 2%, respectively, between 100 and 200 realizations. This indicates that the choice of 100 realizations provides sufficiently robust and converged uncertainty estimates. 

Reviewer 2 Report

Comments and Suggestions for Authors

Authors try to develop a prediction tool to predict Karst spring discharge rate.  Using decomposition flow rate signal at different frequencies + precipitation data to predict spring discharge is a good idea and prediction accuracy is impressively high.

The manuscript is also professionally prepared with little editorial errors.

However, there are a few things that need to be clarified about the research objective.

  1. The aim of this study is to establish a robust, physically informed modeling framework for accurate forecasting of karst spring discharge under conditions of data limitation and hydrological uncertainty.

By reading the manuscript, readers are not clear what the data limitation is.  The model only requires the decomposed historical spring discharge data and the precipitation data.  Is the records of data limitations or are the measurement frequency a limitation?

  1. Physics-based constraints enhance model interpretability and prevent physically implausible outputs.

Need further clarification on how does this being implemented in your model, or how does the model is physically informed.

  1. The most important questions: what the model can forecast and what is the data requirement for the forecast. An example could be using precipitation data of the past two years to predict the spring discharge today, tomorrow or next week.  Predicting hourly discharge rate, or daily discharge rate, using 10 years data records etc.  The data requirement could be minimum 10 years discharge data, or 10 days discharge data + precipitation.
  2. What is the forecasting accuracy? We can see the prediction accuracy is fairly high and the model is very good in making predictions.  But there are differences between prediction and forecasting.  The main contribution and the objective for this research is forecasting.  Authors may provide a clarification of the prediction accuracy given is the forecasting accuracy.
  3. Regarding uncertainty, I like clarification on what uncertainties are truly simulated give provide a justification for why the chosen uncertainties are important. The current description is not clear.  For doing forecast, for example, what happened if the historical data is not long enough or the resolution is not high enough, or what’s the impact of data gaps.  If one what to use weather forecasts (precipitation), what is the uncertainty of weather forecasts on spring discharge forecasts?   This is meaningful because the very low frequency sign has been discarded from model training, which accounts for close to half of the total variations.
  4. Interpolation in 2016 is problematic. The interpolated data was labelled and should not be used for training.  If possible, should use data before 2016 as training dataset.   Also, if possible, should conduct a test on the impact of missing data on model prediction accuracy, that similar to 2016 data gaps.

Minor editorial improvements:

135 :Rainfall intensities exceeding Rainfall intensities exceeding 50 mm h-1 frequently generate sharp increases in spring

Author Response

Comments 1: The aim of this study is to establish a robust, physically informed modeling framework for accurate forecasting of karst spring discharge under conditions of data limitation and hydrological uncertainty. By reading the manuscript, readers are not clear what the data limitation is.  The model only requires the decomposed historical spring discharge data and the precipitation data.  Is the records of data limitations or are the measurement frequency a limitation?

Response 1: We thank the reviewer for pointing out this important ambiguity. The term “data limitation” in the original manuscript was indeed too vague. In the revised version, we have clarified the specific data challenges faced in karst hydrological modeling, including: (1) missing or interpolated discharge records due to sensor gaps and extreme conditions; (2) lack of internal hydrogeological observations and (3) limited input variables—only precipitation data were available. To address this, we have revised the sentence in the Abstract as follows:(Line 98)The aim of this study is to establish a robust, physically informed modeling framework for accurate forecasting of karst spring discharge under conditions of missing values, and limited input variables, which are common in karst hydrology.

Comments 2: Physics-based constraints enhance model interpretability and prevent physically implausible outputs. Need further clarification on how does this being implemented in your model, or how does the model is physically informed.

Response 2: We appreciate the reviewer’s request for clarification. In our revised manuscript, we have more explicitly explained how physics-based constraints are implemented and how the model is physically informed: (1) the selection of VMD modes for model input is based on their dominant periods and hydrological interpretability. Only modes reflecting karst-relevant frequencies (e.g., daily to sub-weekly) are retained, which aligns with physical understanding of flow and recharge processes. (2) we introduce a physics-guided attention penalty within the LSTM architecture. During dry periods, a penalty loss term is applied to suppress attention weights. This encourages the model to assign negligible importance to input features during physically implausible conditions, and helps mimic the causal behavior in karst systems, where spring discharge predominantly responds to rainfall events. The revised sentence now reads: (Line 92) “Physics-based constraints enhance model interpretability and prevent physically implausible outputs, by incorporating rainfall-informed attention penalties and hydrologically consistent mode selection (see Section 3.2).”

Comments 3: The most important questions: what the model can forecast and what is the data requirement for the forecast. An example could be using precipitation data of the past two years to predict the spring discharge today, tomorrow or next week.  Predicting hourly discharge rate, or daily discharge rate, using 10 years data records etc.  The data requirement could be minimum 10 years discharge data, or 10 days discharge data + precipitation.

Response 3: Thank you for the insightful comment. In the revised manuscript, we have added a detailed explanation in Section 3.1. To clarify: (1) The model performs multi-step-ahead forecasting at an hourly resolution, predicting spring discharge at t+1 to t+6; (2) The inputs include the previous 72 hours of precipitation and decomposed flow components (Mode2–Mode6); (3) The model was trained using approximately 5.5 years of hourly data, but operational prediction requires only recent input data; (4) Importantly, no future rainfall forecast is needed, ensuring real-time applicability. To address this, we have revised the sentence in Line194-202 as follows:The model is designed for multi-step-ahead forecasting of karst spring discharge at an hourly resolution, predicting discharge for the next 6 hours (t+1 to t+6) using historical data up to the current time t. Specifically, the model takes as input the past 72 hours of: VMD-decomposed flow components (Mode2–Mode6), hourly precipitation, and optionally previous spring discharge observations. The typical training dataset covers approximately 5–6 years of hourly data. For operational forecasting, short sequences (e.g., the most recent 72-hour inputs) are sufficient to produce near-term discharge forecasts. No future precipitation forecast is required, making the system suitable for real-time application.

Comments 4: What is the forecasting accuracy? We can see the prediction accuracy is fairly high and the model is very good in making predictions.  But there are differences between prediction and forecasting.  The main contribution and the objective for this research is forecasting.  Authors may provide a clarification of the prediction accuracy given is the forecasting accuracy.

Response 4: Thank you for this important clarification. We agree that prediction and forecasting must be distinguished. In our revised manuscript, we have now made this distinction explicit by clarifying that: (1) All predicted results are based on historical input only (no future rainfall or discharge used); (2) The model performs multi-step forecasting (t+1 to t+6 hours); (3) All metrics (e.g., RMSE, NSE) reported reflect real-time forecasting accuracy. This information has been added in Sections 3.1 and 4.3 to ensure the forecasting nature of the model is unambiguous.

Comments 5: Regarding uncertainty, I like clarification on what uncertainties are truly simulated give provide a justification for why the chosen uncertainties are important. The current description is not clear.  For doing forecast, for example, what happened if the historical data is not long enough or the resolution is not high enough, or what’s the impact of data gaps.  If one what to use weather forecasts (precipitation), what is the uncertainty of weather forecasts on spring discharge forecasts?   This is meaningful because the very low frequency sign has been discarded from model training, which accounts for close to half of the total variations.

Response 5: We acknowledge that the current model explicitly quantifies only epistemic uncertainty, which stems from variability in neural network parameters. This is captured using Monte Carlo Dropout, a standard approach for approximating uncertainty in deep learning models with limited samples. (1) In this study, we quantify epistemic uncertainty, i.e., model parameter uncertainty due to structural variability in neural networks. This is implemented using Monte Carlo Dropout, which introduces stochasticity at inference time by randomly deactivating neurons, thereby sampling different plausible models from the same trained network. (2) We acknowledge the existence of other important sources of uncertainty, including: Input uncertainty and Structural uncertainty. These were not explicitly quantified in the current study for the following reasons. Our short-term forecast setup relies only on observed past rainfall rather than forecast rainfall, so input uncertainty is not introduced yet; Structural uncertainty from VMD decomposition is mitigated by selecting physically meaningful modes with dominant periods aligned with known hydrological response times (see revised Table 1 and Section 3.1). (3) While it is true that the lowest-frequency modes were excluded from LSTM modeling. Low-frequency VMD modes primarily represent seasonal and multi-year background trends, which change slowly and are less relevant for short-term forecasting. Including them may dilute the short-term signal used for high-resolution predictions. Nevertheless, their contribution is implicitly captured in baseline trends and could be used in long-term scenario modeling. (4) We agree with the reviewer that historical data length and quality affect forecast reliability. While our current model assumes reasonably continuous hourly data. Forecast accuracy and uncertainty coverage can degrade when using short or low-resolution historical datasets. Wider uncertainty bands are typically observed in interpolated or data-sparse regions. Future work should systematically evaluate model robustness under varying data completeness and sampling intervals.

Comments 6: Interpolation in 2016 is problematic. The interpolated data was labelled and should not be used for training.  If possible, should use data before 2016 as training dataset.   Also, if possible, should conduct a test on the impact of missing data on model prediction accuracy, that similar to 2016 data gaps.

Response 6: We sincerely thank the reviewer for raising this important concern. We fully agree that interpolated data should not influence model training, especially when gaps are substantial (such as during 2016). (1) In our modeling framework, we introduced a binary flag flow_missing, which labels all interpolated discharge values. To ensure that these do not bias model learning, we implemented a loss-weighting strategy: all data points with flow_missing = 1 are assigned zero weight in the training loss function. As a result, these samples are effectively excluded from training and do not influence parameter updates. We have now explicitly described this strategy in Section 3.2 of the revised manuscript (Line 218). (2) Our uncertainty quantification results (Section 4.3) indicate that prediction intervals are notably wider during interpolated periods, confirming that the model expresses higher uncertainty where observational support is weak.

Comments 7: Minor editorial improvements: 135 :Rainfall intensities exceeding Rainfall intensities exceeding 50 mm h-1 frequently generate sharp increases in spring

Response 7: We thank the reviewer for pointing out the repetition in line 135. This has been corrected to: “Rainfall intensities exceeding 50 mm h⁻¹ frequently generate sharp increases in spring discharge.”

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I thinks it is ok now. 

Author Response

Comments : I think it is ok now.

Response: We sincerely thank the reviewer for their time and positive feedback. We are pleased to hear that the revised manuscript is acceptable. We appreciate your helpful comments during the review process, which greatly improved the quality of our work.

Reviewer 2 Report

Comments and Suggestions for Authors

There are some improvements could me make, such as presenting the prediction accuracies for Mode 1 to 6 differently.  The uncertainty analysis could also be improved by incorporating the "data limitation", such as increase data gaps, on and of interpolation flags etc. The t+6 uncertainties could also include precipitation forecast.  After the objective of the model is clearly defined, the improvement would be easy, and I do not need to review this paper again before publication. 

Author Response

Comments : There are some improvements could be made, such as presenting the prediction accuracies for Mode 1 to 6 differently. The uncertainty analysis could also be improved by incorporating the "data limitation", such as increased data gaps, on and off interpolation flags etc. The t+6 uncertainties could also include precipitation forecast. After the objective of the model is clearly defined, the improvement would be easy, and I do not need to review this paper again before publication.

Response: We sincerely thank the reviewer for this very helpful suggestion. (1) In our model, Mode1 to Mode6 represent the intrinsic frequency components extracted via Variational Mode Decomposition (VMD), and they are used as input features alongside rainfall data. Each mode captures different temporal characteristics of the spring discharge signal, with Mode1 reflecting the highest frequency and Mode6 the lowest. We agree that evaluating the prediction accuracy for each individual mode would provide more detailed insight into model performance across different hydrological scales. In the current version, we focus on predicting the total discharge rather than each VMD mode separately. However, this suggestion highlights an important direction for future work. (2) Regarding data limitation, we note that approximately 6.9% of the spring discharge data were missing and subsequently interpolated. Our results show that forecast uncertainty tends to increase in periods with more missing data, which aligns with expectations. This reinforces the importance of data quality and completeness in uncertainty quantification. In future studies, we plan to explore more detailed analyses on the impact of increasing data gaps, missing value patterns, and their interactions with forecasting uncertainty.

Once again, we greatly appreciate the reviewer’s valuable suggestions.

Back to TopTop