Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata

Water 2025, 17(14), 2097; https://doi.org/10.3390/w17142097

by Alexandr Neftissov¹

, Andrii Biloshchytskyi^2,3

, Ilyas Kazambayev¹

, Serhii Dolhopolov³

and Tetyana Honcharenko^3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Grzegorz Pęczkowski

Water 2025, 17(14), 2097; https://doi.org/10.3390/w17142097

Submission received: 26 May 2025 / Revised: 7 July 2025 / Accepted: 10 July 2025 / Published: 14 July 2025

(This article belongs to the Section New Sensors, New Technologies and Machine Learning in Water Sciences)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors presented an article about “An Advanced Ensemble Machine Learning Framework for Predicting Long-Term Average Discharge at Hydrological Stations”. The article presents a multi-layered system (NN, XGBoost, LightGBM, CatBoost, Meta-Ensemble) that performs logarithmically transformed long-term average discharge estimation using GRDC station data. The system is technically sound and well-structured. However, it has some shortcomings regarding scientific depth, innovation claims, and application scenarios.

I think the paper is not well organized and appropriate for the “Water”, but the paper will be ready for publication after a major revision. After reviewing the article “An Advanced Ensemble Machine Learning Framework for Predicting Long-Term Average Discharge at Hydrological Stations" here are my recommendations for improvements:

GRDC data are station-based metadata, and time series are not used. Although the article promises to monitor dynamic hydrological systems, only a static discharge feature is estimated (long-term average discharge). This contradicts the claims made under the real-time monitoring system. This conceptual ambiguity should be corrected. The distinction between "static discharge estimation" and "real-time monitoring" should be made clearly.

The model results are evaluated only on GRDC metadata. Still, the article presents it as a structure that can be integrated into real-time systems (IoT, sensor networks, SCADA systems, etc.). However, this integration is not supported by any example. Integration and usage scenarios should be presented through a case study taken from a real field (e.g., a river from Kazakhstan, data flow, real system).

System architecture and algorithm flows are multi-layered, but abstractly explained in a way disconnected from real data. Architectural diagrams should be simplified and supported with concrete examples. Otherwise, it becomes difficult for the reader to understand.

Why the ensemble model is so successful has not been analyzed in detail. Model explanation methods such as SHAP and LIME have not been mentioned at all. At least the effect of the most effective features (area, lat, sub_reg, etc.) on the model output should be shown with a technique such as SHAP.

The 2024–2025 literature is relatively incomplete, especially in the discharge prediction area, where there are many new articles on the use of deep learning.

*** Authors must consider them properly before submitting the revised manuscript. A point-by-point reply is required when the revised files are submitted.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Response to Reviewer 1 Comments

We are deeply grateful to the reviewer for their meticulous analysis and highly valuable feedback. The reviewer’s comments have been instrumental in helping us identify and address several critical weaknesses in the manuscript, particularly the conceptual ambiguity between our stated aims and the actual research conducted, as well as the need for deeper scientific analysis.

In response to this invaluable feedback, we have undertaken a major revision of the manuscript. The core of this revision involves a fundamental reframing of the paper’s focus. We have shifted the narrative away from a conceptual “real-time monitoring system” to more accurately reflect our primary contribution: the development of a robust, high-accuracy machine learning framework for estimating the static, long-term average (LTA) discharge from station metadata.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents a novel framework for predicting long-term average discharge at hydrological stations by leveraging machine learning techniques and global datasets. Different model configurations are used and presented. Despite the importance of the topic, the manuscript in its current form lacks clarity and coherence. The reader is overwhelmed with dense data-related information, yet it is difficult to understand that the target variable is discharge, used here as a proxy for water resource availability. One of the gaps the authors aim to address is computational efficiency, but they fail to clearly explain the advantages and computational relevance of the proposed framework. The results, as presented, are difficult to interpret; key steps in the machine learning modeling pipeline are either missing or unclear. For these reasons, I recommend major revisions before the manuscript can be considered for publication.

Other minor suggestions:

The section describing the dataset is too long and lacks essential information: What exactly are the data about? Are they river flows, WDN (Water Distribution Network) flows, or something else? A table summarizing the types of data would help clarify the ML input data selection process.

The quality of the figures is very low — in particular, I cannot read the X and Y labels in the heatmaps.

Although LTA (Long-Term Average) is the target metric, it is never explicitly defined in the manuscript.

Style suggestions:

Line 37: Please also refer to “Detailed simulation of storage hydropower systems in large Alpine watersheds” and “A review of flood management: from flood control to flood resilience.”; The sentence "With increasing climate variability and extreme weather events" and “Climate variability can outweigh the influence of climate mean changes for extreme precipitation under global warming” should be supported with the reference: “Analysis of high streamflow extremes in climate change studies: How do we calibrate hydrological models?”; Line 53: I do not think a new paragraph is necessary here, as the sentence continues the idea developed in the previous paragraph.; Lines 92–93: The transition between the paragraph ending at line 92 and the one starting at line 93 should be improved. As it stands, there is an abrupt jump to the discussion of computational and machine learning methods used to support remote sensing infrastructure control.

Author Response

Response to Reviewer 2 Comments

We extend our sincere gratitude to the reviewer for their thorough evaluation and constructive feedback on our manuscript. We acknowledge that the initial version lacked clarity in its structure and focus, leading to difficulties in understanding the core objectives and contributions of our research. In response to the valuable comments, we have undertaken a major revision of the manuscript. The key changes include:

A fundamental restructuring of the paper – we have condensed the overly detailed data section and integrated it into a more logically structured “Materials and Methods” section. This has significantly improved the flow and readability.
Reframing the core contribution – we have shifted the focus from a conceptual “real-time monitoring system” to the paper’s actual achievement: the development and validation of a high-performance machine learning framework for estimating a crucial, static hydrological characteristic – the Long-Term Average (LTA) discharge.
Enhancing clarity and detail – we have explicitly defined key terms, provided clear justifications for methodological choices, and expanded the analysis to make the results more interpretable.
Improving all visuals – all figures and tables have been completely redone to ensure high resolution and legibility, and their number has been optimized for conciseness.

We believe these comprehensive revisions have significantly improved the manuscript’s scientific rigor, clarity, and overall quality. As a result of this restructuring and by making the text more concise, we have also successfully reduced the manuscript’s length to 29 pages to avoid overwhelming the reader, as rightly pointed out by the reviewer. Please find our detailed point-by-point responses below.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors

In the work "An Advanced Ensemble Machine Learning Framework for Predicting Long-Term Average Discharge at Hydrological Stations", the authors present a hydrological model based on global data. It should be noted that the analysis is based on monitoring data from stations located on continents and in different climatic zones, and the main source of data is the Global Runoff Data Centre (GRDC) station catalogue. In the first part, in the introduction to the work, an overview of the topic was made, but ultimately the objectives of this work were not specified; therefore, the content is too general. In this case, the reader will find more details in the last chapter, the summary one. This part should be improved. He believes that the content of the second chapter (L. 163-361) of the work should be included in the Methodology part (Part 3 of the work). Part 2 L.233-234 provides the % of missing data depending on the data attributes - monthly and daily from 44 to 19%. However, this description does not precisely indicate what was done in this regard. The part describing the applied methodology lists the solutions used and the models used for analyses depending on the layer. Although it is possible to understand the goals of using the hybrid ensemble method, there is no information on the choice of XGBOST, for example. The same applies to the Isolation Forest anomaly detection algorithm. It should be explained.

To sum up, the prepared manuscript should have strong features of a scientific work, this is certainly a condition for future publication in the journal Water, but also in many others. In its current form, this work will constitute a report on research or a grant to the reader. I believe that the whole should be rewritten. The work is too extensive, which is why in this form the interest in the content may not be sufficient. Part 2 of the work can certainly be included in the methodology, and the content reduced to the essentials. In addition to systematising the whole, clear goals of the work should be given, which should be understandable to the reader. In relation to the entire study, despite the hybrid method used, I have doubts about the satisfactory results achieved. In particular, in terms of global data from different climatic zones. Furthermore, the measurement methods used at these stations may differ significantly and the results may be achieved with different measurement accuracy. Therefore, their reliability and quality in relation to future models could raise doubts.

Additional note: most of the included drawings of the work are not legible; the descriptions of the axes, scale, etc. This is probably due to the degree of resolution reduction when scaling them. This is certainly the case in the formatted pdf file. This should be corrected.

Author Response

Response to Reviewer 3 Comments

We would like to express our sincere appreciation to the reviewer for their insightful comments and valuable suggestions for improving our manuscript. We agree with the assessment that the original version lacked the clear structure and focus expected of a scientific article, and that certain methodological choices and limitations were not sufficiently addressed.

In response, we have performed a major revision to transform the manuscript from what could be perceived as a “research report” into a focused, well-structured scientific paper. The main revisions include:

Following the reviewer’s excellent advice, we have integrated the extensive data description (former Section 2) into the “Materials and Methods” section. This has streamlined the manuscript and improved its logical flow.
The Introduction has been revised to conclude with a clear and concise list of the study’s objectives, ensuring the reader understands the purpose of the work from the outset.
We have added explicit justifications for our choice of machine learning models (e.g., XGBoost) in the Methodology
We have expanded the Discussion section to more thoroughly address the limitations and potential uncertainties arising from the use of a heterogeneous global dataset, as rightly pointed out by the reviewer.

All figures have been recreated in high resolution to ensure they are fully legible. As a result of these changes, the manuscript has also been significantly condensed to 29 pages.

We are confident that these revisions address all the points raised and have substantially improved the manuscript’s scientific quality, making it more suitable for publication in Water.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for your reply. I believe you have now made this paper even better quality.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Reviewer 2 Report

Comments and Suggestions for Authors

The author have addressed all my comments. The manuscript is now worth for publication.

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for considering my comments and suggestions in the paper "An Advanced Ensemble Machine Learning Framework for Predicting Long-Term Average Discharge at Hydrological Stations." I have no other comments. In my opinion, the paper could be accepted for further publication in its current form.

Article Menu

An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata

Further Information

Guidelines

MDPI Initiatives

Follow MDPI