Review Reports - Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript addresses an important and timely topic “Scalability and Computational Performance of the TETIS Eco-hydrological Model Using Machine Learning-Based Prediction” by systematically analyzing the computational scalability of the TETIS hydrological model and proposing a machine-learning-based execution-time predictor. The manuscript addresses an important and timely topic: the systematic evaluation of computational scalability and performance of a distributed hydrological model. The work is technically sound, well-documented, and based on an extensive experimental dataset. The integration of machine learning for execution-time prediction is innovative and practically relevant. However, the manuscript is overly long and at times excessively descriptive, which reduces clarity and readability. Several methodological assumptions require clearer justification, and the generalizability of the findings should be more explicitly discussed.

General Comments and Suggestions

The study is strongly model-specific (TETIS v9.1), yet the manuscript occasionally implies broader applicability. Please, authors should clearly distinguish which findings are specific to the implementation of TETIS and which may be transferable to other distributed hydrological models.
The rationale for selecting spatial resolutions (200–5000 m), temporal configurations (50–15,000 time steps), and gauge-density scenarios is not sufficiently justified. Authors should clarify whether these ranges reflect realistic operational or research applications.
Only two mesoscale catchments were used in the experimental design. The representativeness of these basins in terms of climate, geomorphology, and drainage-network complexity should be discussed as a limitation.
The statement that temporal resolution is of secondary relevance because computational cost depends mainly on the number of time steps is debatable. Different temporal resolutions may affect internal process activation, I/O behavior, and numerical stability. This assumption should be better justified or reformulated more cautiously.
The machine-learning model does not consider several potentially relevant system-level variables (e.g., disk I/O, memory bandwidth, background system load). If authors should clarify that the predictive tool estimates configuration-level performance rather than full system-level performance.
Prediction errors for short execution times are extremely large (>900%). The manuscript should explicitly state that the tool is unreliable for very short runtimes and define a practical lower applicability threshold.

Specific comments & suggestions

The Results and Discussion sections contain repeated descriptions and could be substantially condensed.
Some figures (e.g., Figures 5 and 6) are difficult to read due to information density. Font sizes and legends should be improved for clarity.
Terminology such as “execution time,” “runtime,” “temporal resolution,” and “number of time steps” should be used consistently throughout the manuscript.
Execution-time units vary between seconds, minutes, hours, and days. Units should be standardized within sections and figures.
Several table captions are unclear or repetitive and should be revised for precision.
Minor grammatical and stylistic issues are present throughout the manuscript. A careful language revision is recommended.

Comments on the Quality of English Language

A careful language revision is recommended.

Author Response

Response: We thank the Reviewer for the careful evaluation of the manuscript and for the positive assessment of its relevance, technical soundness, and practical contribution. We particularly appreciate the recognition of the experimental design and the integration of machine-learning techniques for execution-time prediction. In response, we have substantially revised the manuscript to reduce excessive descriptive content, streamline the Results section, and relocate several detailed tables to the Appendices (Lines 199-279, 509-544). In addition, the Materials and Methods section was reorganized to foreground a general experimental and machine-learning framework applicable to distributed hydrological models, with TETIS v9.1 presented explicitly as a case study (Lines 83-198). The Discussion was expanded to clearly delimit model-specific findings and transferable methodological elements (Lines 346-355). We believe these revisions significantly improve readability, strengthen methodological clarity, and appropriately constrain the scope of inference.

General Comments and Suggestions:

Comments 1: The study is strongly model-specific (TETIS v9.1), yet the manuscript occasionally implies broader applicability. Please, authors should clearly distinguish which findings are specific to the implementation of TETIS and which may be transferable to other distributed hydrological models.

Response 1: We thank the Reviewer for this important comment. We agree that the study is model-specific, as all experiments and the predictive tool are explicitly developed for TETIS v9.1. To avoid overstating the scope of inference, we have revised the manuscript to clearly distinguish between results specific to the TETIS implementation and elements potentially transferable. This distinction is now explicitly addressed in the Discussion (Pages 10-11, Lines 346-355), where we clarify that numerical performance results apply exclusively to TETIS, while the experimental design and analysis framework are presented as a transferable starting point for scalability studies in other distributed hydrological models (e.g., SWAT, mHM). In addition, following this concern, the Materials and Methods (Pages 2-6, Lines 83-198), section was reorganized to emphasize a general, model-agnostic methodological framework before introducing TETIS as a controlled case study. In particular, the workflow (new Figure 1) now refers to generic steps such as model configuration and model run, and uses a general application, “Runtime Prediction Model,” as an example for other hydrological models (Page 6, Lines 192-198). This restructuring reinforces the framework's replicability while clearly delimiting model-specific findings.

Comments 2: The rationale for selecting spatial resolutions (200-5000 m), temporal configurations (50-15,000 time steps), and gauge-density scenarios is not sufficiently justified. Authors should clarify whether these ranges reflect realistic operational or research applications.

Response 2: Thank you for your comment. The justification for the selected spatial resolutions, temporal configurations, and gauge-density scenarios has been added to Section 2.1 Experimental design (Page 2, Lines 83-118). Specifically, the spatial resolutions (200-5000 m) are based on previous hydrological applications [16–18,28,29], (e.g., Droppers et al. [14]; Cortés-Torres et al. [30,31]) and are also representative of resolutions commonly associated with widely used hydrological and Earth observation products employed in distributed hydrological modeling. Temporal configurations (50-15,000 time steps) are justified as typical operational and research setups, ranging from event-based simulations to multi-decadal assessments [16–18,28], following Beneyto et al. [21] and Hernández-Sosa et al. [32]. Gauge-density scenarios are motivated by both sparsely instrumented basins and data-rich contexts enabled by satellite-derived inputs, as documented in García et al. [33], Gomis-Cebolla [34], Güiza-Villa et al. [25], and Droppers et al. [14].
These clarifications emphasize that the tested ranges are representative of realistic operational and research applications in distributed hydrological modeling.

Comments 3: Only two mesoscale catchments were used in the experimental design. The representativeness of these basins in terms of climate, geomorphology, and drainage-network complexity should be discussed as a limitation.

Response 3: Thank you for this comment. This limitation is now explicitly addressed in Section 4. Discussion (Page 11, Lines 380-389). We clarify that climatic conditions do not directly affect runtime, as climate variables are not part of the simulation algorithm. Regarding geomorphology and drainage-network complexity, we explain that in TETIS, horizontal transport is computed for all grid cells, independently of network complexity, with differences arising only from the mathematical operations applied to hillslopes, gullies, and channels. We acknowledge that potential second-order effects related to geomorphological complexity could influence runtime and note that assessing these effects lies beyond the scope of the present study.

Comments 4: The statement that temporal resolution is of secondary relevance because computational cost depends mainly on the number of time steps is debatable. Different temporal resolutions may affect internal process activation, I/O behavior, and numerical stability. This assumption should be better justified or reformulated more cautiously.

Response 4: Thank you for this comment. The statement has been reformulated and clarified in Section 2.3 Eco-Hydrological Model: TETIS (Page 5, Lines 178-184). The revised text now emphasizes that computational cost is primarily driven by the number of time steps, while also acknowledging that temporal resolution may influence internal process activation and numerical stability. The language has been made more cautious to avoid overgeneralization and to better reflect the model’s numerical behavior.

Comments 5: The machine-learning model does not consider several potentially relevant system-level variables (e.g., disk I/O, memory bandwidth, background system load). If authors should clarify that the predictive tool estimates configuration-level performance rather than full system-level performance.

Response 5: Thank you for this comment. This limitation is now explicitly stated in Section 4. Discussion (Pages 11-12, Lines 399-406). We clarify that the predictive tool provides configuration-level runtime estimates based on readily available hardware characteristics (e.g., RAM capacity, core count, and clock speed) and does not account for system-level factors such as disk I/O performance, memory bandwidth, or background system load. This distinction is clearly emphasized to avoid misinterpretation of the tool’s scope.

Comments 6: Prediction errors for short execution times are extremely large (>900%). The manuscript should explicitly state that the tool is unreliable for very short runtimes and define a practical lower applicability threshold.

Response 6: Thank you for this comment. This limitation is now explicitly addressed in Section 4. Discussion (Page 12, Lines 407-432). We clarify that the predictive tool is not reliable for very short execution times, where small absolute deviations translate into large relative errors due to system-level effects (e.g., disk access, operating-system scheduling, and timing resolution). A practical lower applicability threshold has now been defined, and the tool is recommended primarily for simulations that exceed it.

Specific comments & suggestions:

Comments 7: The Results and Discussion sections contain repeated descriptions and could be substantially condensed.

Response 7: Thank you for this suggestion. The Results section (Pages 6-10, Lines 199-333) has been substantially condensed to reduce redundancy in descriptions and terminology, and the Discussion section (Pages 10-12, Lines 334-449) was restructured and strengthened to focus on interpretation and implications. Several tables and detailed outputs were moved to the Appendices to improve readability and conciseness.

Comments 8: Some figures (e.g., Figures 5 and 6) are difficult to read due to information density. Font sizes and legends should be improved for clarity.

Response 8: Thank you for this comment. Figures 5 and 6 were revised to improve readability by increasing the font sizes of the axes and labels, enlarging marker sizes, and clarifying the legends. In addition, figure captions were revised to be more descriptive and to facilitate interpretation of the graphical information.

Comments 9: Terminology such as “execution time,” “runtime,” “temporal resolution,” and “number of time steps” should be used consistently throughout the manuscript.

Response 9: Thank you for this comment. Terminology has been standardized throughout the manuscript. The term “runtime” is now used consistently, and the distinction between “temporal resolution” and “number of time steps” is explicitly defined and applied uniformly across all sections.

Comments 10: Execution-time units vary between seconds, minutes, hours, and days. Units should be standardized within sections and figures.

Response 10: Thank you for this comment. Runtime units are standardized throughout the manuscript and figures, with minutes as the common unit for reporting runtime.

Comments 11: Several table captions are unclear or repetitive and should be revised for precision.

Response 11: Thank you for this comment. Table captions were revised throughout the manuscript to improve clarity, avoid repetition, and more precisely describe each table's content.

Comments 12: Minor grammatical and stylistic issues are present throughout the manuscript. A careful language revision is recommended.

Response 12: Thank you for this comment. The manuscript has been carefully revised to correct grammatical and stylistic issues and improve overall clarity and consistency.

Additional comments sent via Assistant Editor:

Comments 13: In particular, the authors are encouraged to include a concise summary table comparing the computational characteristics of the Topolco, Hantec, and hydrological simulation components.

Response 13: Thank you for this suggestion. A concise summary table comparing the computational characteristics of the Topolco, Hantec, and hydrological simulation components has been added to Section 2.3 Eco-Hydrological Model: TETIS (New Table 3, page 5). The table focuses on aspects known a priori, such as algorithmic structure and core usage.

Comments 14: Provide practical guidance on recommended model configurations for operational forecasting versus large ensemble experiments.

Response 14: Thank you for this comment. Practical guidance on recommended model configurations for operational forecasting and large ensemble experiments has been added to Section 4. Discussion (Page 12, Lines 424-432), where the results are translated into concrete recommendations based on runtime constraints and computational scalability.

Comments 15: Add a brief comparison between the Random Forest approach and simpler predictive models to justify the use of machine learning.

Response 15: Thank you for this suggestion. A brief clarification has been added to Section 2.1 (Page 3, Lines 119-126) to indicate that preliminary tests with simpler predictive models were conducted at an earlier stage of this research (Cortés-Torres et al. [38]). Based on this initial assessment, Random Forest was selected for its superior ability to capture nonlinear relationships. To maintain focus and conciseness, the detailed results of the preliminary comparison were not included in the manuscript.

Comments 16: And explicitly discuss the applicability of the proposed tool for real-time or early-warning hydrological applications.

Response 16: Thank you for this comment. The applicability of the proposed tool to real-time and early-warning hydrological applications is now explicitly discussed in Section 4. Discussion (Page 11, Lines 390-395) and Section 5. Conclusions (Page 13, Lines 453-465), where runtime constraints are linked to feasible spatiotemporal configurations and to operational decision-making under strict time windows.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper, entitled, “Scalability and Computational Performance of the TETIS Eco-2 hydrological Model Using Machine Learning-Based Prediction,” addresses a clear and well-motivated gap in hydrological modeling literature, where computational performance is rarely quantified in a comprehensive manner.

The paper would benefit from clearer theoretical justification, improved clarity and conciseness as emphasized in the following comments:

The Random Forest models show very high R² values, yet large mean percentage errors are reported. Please expand the discussion part by mentioning the reason behind this behavior.
The reviewer believes that the findings of this study can be valuable for future computational and hydrological modeling studies. Please clarify the extent to which the results and methodological framework are generalizable beyond the TETIS model.
The methodological workflow (Figure 2) would benefit from further clarification. Please either generate an additional figure or expand the “IV. Predictive Tool Development” component of Figure 2 to better illustrate the structure, inputs, and outputs of the predictive tool.
Lines 377–391 describe strong differences in prediction error across execution-time ranges, but the underlying reasons for this quantitative behavior are not explicitly discussed. Please provide a clearer explanation of why short and long simulations exhibit such different predictive performance.

The reviewer believes that the study is original, well-designed, and scientifically meaningful. However, addressing the issues above will significantly strengthen the paper’s credibility and impact.

Author Response

This paper, entitled, “Scalability and Computational Performance of the TETIS Eco-hydrological Model Using Machine Learning-Based Prediction,” addresses a clear and well-motivated gap in hydrological modeling literature, where computational performance is rarely quantified in a comprehensive manner. The paper would benefit from clearer theoretical justification, improved clarity and conciseness as emphasized in the following comments:

Response: We thank the Reviewer for the positive assessment of the study's relevance and motivation. In response to these general comments, the manuscript has been carefully revised to improve theoretical justification, clarity, and conciseness. Specifically, we have strengthened the methodological rationale in the Materials and Methods section (Page 2-6, Lines 83-198), synthesized the Results section (Page 6-10, Lines 199-333) to reduce redundancy, and expanded and reorganized the Discussion (Page 10-12, Lines 334-449) to better contextualize the findings and their implications. These revisions were made in response to the detailed comments below and aim to improve readability while preserving the study's technical rigor.

General Comments and Suggestions:

Comments 1: The Random Forest models show very high R² values, yet large mean percentage errors are reported. Please expand the discussion part by mentioning the reason behind this behavior.

Response 1: Thank you for this comment. This issue is now explicitly discussed in Section 4. Discussion (Page 12, Lines 407-420). We clarify that the coexistence of high R² values and large mean percentage errors is primarily associated with very short execution times, where small absolute deviations translate into large relative errors due to system-level effects such as disk access, operating system scheduling, and timing resolution. A practical lower applicability threshold has now been defined, and the predictive tool is recommended primarily for simulations exceeding this threshold, where prediction errors decrease substantially.

Comments 2: The reviewer believes that the findings of this study can be valuable for future computational and hydrological modeling studies. Please clarify the extent to which the results and methodological framework are generalizable beyond the TETIS model.

Response 2:Thank you for this comment. We have revised the manuscript to more clearly distinguish between results specific to the TETIS v9.1 implementation and those transferable beyond this model. To address this point structurally, the Materials and Methods section (Pages 2-6, Lines 83-198), has been reorganized so that the general methodological framework (experimental design and machine-learning-based performance prediction) is presented first, independent of any specific hydrological model. The description of TETIS is now introduced afterward as a controlled case study used to evaluate this framework. In addition, Section 4. Discussion (Pages 10-11, Lines 346-355) explicitly clarifies that numerical performance results apply exclusively to TETIS, while the experimental design and analytical framework can be adapted to other distributed hydrological models with comparable data structures and configuration schemes (e.g., SWAT or mHM). These revisions reduce ambiguity in scope and emphasize the study's methodological contribution.

Comments 3: The methodological workflow (Figure 2) would benefit from further clarification. Please either generate an additional figure or expand the “IV. Predictive Tool Development” component of Figure 2 to better illustrate the structure, inputs, and outputs of the predictive tool.

Response 3: Thank you for this comment. Figure 2 (New Figure 1, Page 4) has been revised to improve clarity. The component “IV. Prediction tool” has been expanded to explicitly illustrate its structure, including its main inputs and outputs. In addition, the workflow has been reformulated in a more general manner, making it applicable to distributed hydrological models, with TETIS presented as a specific case study to evaluate the proposed framework.

Comments 4: Lines 377-391 describe strong differences in prediction error across execution-time ranges, but the underlying reasons for this quantitative behavior are not explicitly discussed. Please provide a clearer explanation of why short and long simulations exhibit such different predictive performance.

Response 4: Thank you for this comment. This issue is now explicitly addressed in Section 4. Discussion (Page 12, Lines 407-420). In these cases, small absolute deviations—often influenced by system-level effects such as disk I/O, operating-system scheduling, and timing resolution—translate into large relative errors. In contrast, for longer simulations, absolute deviations become negligible relative to total runtime, leading to substantially lower percentage errors. Based on this analysis, a practical lower applicability threshold has been defined, and the predictive tool is recommended primarily for simulations exceeding this threshold.

Comments 5: The reviewer believes that the study is original, well-designed, and scientifically meaningful. However, addressing the issues above will significantly strengthen the paper’s credibility and impact.

Response 5: We sincerely thank the Reviewer for this positive and encouraging assessment. All comments and suggestions have been carefully addressed in the revised manuscript. In particular, we have improved clarity and conciseness, strengthened the theoretical justification, clarified the scope of applicability, and expanded the Discussion to better contextualize the results and limitations. We believe these revisions have substantially enhanced the manuscript’s credibility and overall impact.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript is too long and overly detailed, even for a technical journal.

Large portions of Sections 2 and 3 read like software documentation or benchmark reports rather than a scientific synthesis.
Many results are repeated across figures, tables, and text (e.g., execution-time trends across hardware).
Appendix A is informative but reinforces the perception of information overload.

Although the study is framed as broadly informative for hydrological modelling, all experiments are conducted on a single model (TETIS v9.1).

Claims regarding “distributed hydrological models” in general should be more carefully qualified.
Differences between conceptual grid-based models (TETIS, mHM) and physics-based models (e.g., ParFlow, MIKE SHE) are not sufficiently acknowledged.

This is not a flaw per se, but the scope of inference is currently overstated.

The RF models achieve very high R² values (>0.99), but:

Prediction errors exceed 200% for short execution times, which undermines claims of uniform robustness.
The discussion acknowledges this but downplays its implications, especially for real-time or small-catchment applications.
Feature importance is reported but not deeply interpreted in terms of model architecture or algorithmic complexity.

Despite the strong technical work, the manuscript could better emphasize:

why computational scalability matters for hydrological science (not only operations),
how execution-time constraints shape scientific inference, uncertainty analysis, and ensemble design.

At present, the manuscript sometimes feels like a high-quality benchmark study rather than a theory-informed modelling contribution.

Some figures (e.g., Figures 5 and A1) are visually dense and difficult to interpret without extended captions.
Repetition of hardware descriptions across sections should be reduced.
Minor language polishing is required (long sentences, heavy nominalization).

Author Response

General Comments and Suggestions:

Comments 1: The manuscript is too long and overly detailed, even for a technical journal.

Response 1: We thank the Reviewer for this comment. In response, the manuscript has been substantially revised to improve clarity and conciseness. Excessive descriptive content has been reduced across all sections, key methodological assumptions have been clarified, and the Discussion has been streamlined. In addition, several detailed tables have been moved to the Appendices to improve the flow and readability of the main text.

Comments 2: Large portions of Sections 2 and 3 read like software documentation or benchmark reports rather than a scientific synthesis.

Response 2: We thank the Reviewer for this observation. Sections 2 and 3 have been revised to reduce software-style descriptions and benchmark-like narration. The text has been synthesized to emphasize the methodological rationale, experimental design choices, and interpretation of results, while purely technical details have been shortened or moved to the Appendices. In addition, the Materials and Methods section (Pages 2-6, Lines 83-198) has been reorganized to foreground the general methodological framework—designed to be applicable to distributed hydrological models in general—with the TETIS model presented as a controlled case study. This revision highlights the novelty and replicability of the proposed approach beyond a single model implementation.

Comments 3: Many results are repeated across figures, tables, and text (e.g., execution-time trends across hardware).

Response 3: Thank you for this comment. This comment has been addressed by revising the Results section (Page 6-10, Lines 199-333) to avoid redundancy between figures, tables, and textual descriptions. Repetitive explanations have been removed, figures have been made more self-explanatory, and the text now focuses on highlighting key trends rather than restating visual information.

Comments 4: Appendix A is informative but reinforces the perception of information overload.

Response 4: Thank you for this comment. Appendix A has been reorganized and renumbered (now Appendix B). Its content is now referenced selectively from Section 3.2, where it supports specific interpretations, while the main text has been streamlined to reduce information overload.

Comments 5: Although the study is framed as broadly informative for hydrological modelling, all experiments are conducted on a single model (TETIS v9.1).

Response 5: Thank you for this comment. The Discussion section has been expanded to explicitly clarify the study's scope. As stated in Section 4. Discussion (Pages 10-11, Lines 346-355), all numerical results and performance patterns are specific to the TETIS v9.1 implementation. However, the experimental design and methodological framework are presented as a transferable baseline that may motivate and support similar scalability analyses for other distributed hydrological models. This clarification narrows the scope of inference while highlighting the study's methodological contribution.

Comments 6: Claims regarding “distributed hydrological models” in general should be more carefully qualified.

Response 6: Thank you for this comment. Throughout the revised manuscript, statements referring to “distributed hydrological models” have been carefully qualified. General claims have been rephrased to clearly distinguish between findings that apply specifically to TETIS and broader methodological considerations that may be relevant to other models. These revisions are mainly reflected in the Introduction (Pages 1-2, Lines 32-81), Materials and Methods (Pages 2-6, Lines 83-198) and Discussion sections (Pages 10-12, Lines 334-449).

Comments 7: Differences between conceptual grid-based models (TETIS, mHM) and physics-based models (e.g., ParFlow, MIKE SHE) are not sufficiently acknowledged.

Response 7: This point is now explicitly addressed in Section 4. Discussion (Pages 10-11, Lines 346-355), We clarify that TETIS belongs to the class of conceptual, grid-based distributed models and that its computational behavior cannot be directly extrapolated to fully physics-based models, which typically involve more complex governing equations and higher computational costs. While these differences are acknowledged, a detailed comparison across model typologies is considered beyond the scope of this study.

Comments 8: This is not a flaw per se, but the scope of inference is currently overstated.

Response 8: We appreciate this clarification. In response, the manuscript has been revised to moderate the scope of inference and avoid overgeneralization. The Discussion (Pages 10-12, Lines 334-449) now explicitly states the limits of applicability of the results, emphasizing that the study provides a model-specific analysis complemented by a methodological framework that may serve as a starting point for future, model-specific scalability studies.

Comments 9: The RF models achieve very high R² values (>0.99), but: Prediction errors exceed 200% for short execution times, which undermines claims of uniform robustness. The discussion acknowledges this but downplays its implications, especially for real-time or small-catchment applications.

Response 9: Thank you for this comment. This limitation is now explicitly addressed in Section 4. Discussion (Pages 12, Lines 407-420). We clarify that the predictive tool is not reliable for very short runtimes, where small absolute deviations cause large relative errors due to system-level effects such as disk I/O, operating-system scheduling, and timing resolution. The coexistence of very high R² values and large percentage errors is therefore a consequence of error-metric sensitivity at short runtimes rather than a lack of model robustness. To address this issue, a practical lower applicability threshold has been defined, and the tool is recommended primarily for simulations exceeding this threshold, where prediction errors decrease substantially, and estimates become more stable.

Comments 10: Feature importance is reported but not deeply interpreted in terms of model architecture or algorithmic complexity.

Response 10: Thank you for this comment. A concise summary table comparing the computational characteristics of the Topolco, Hantec, and hydrological simulation components has been added to Section 2.3 (New Table 3, page 5). This table focuses on aspects known a priori, such as the algorithmic structure and core usage, which directly explain the observed patterns of feature importance. A deeper interpretation of feature importance in terms of internal algorithmic complexity was not pursued, as the primary objective of the study is to characterize computational scalability and runtime behavior rather than to analyze software architecture details. Quantitative performance differences are instead examined and discussed in the Results section.

Comments 11: Despite the strong technical work, the manuscript could better emphasize: why computational scalability matters for hydrological science (not only operations),

Response 11: Thank you for this comment. The Discussion section has been revised to explicitly emphasize the scientific relevance of computational scalability beyond operational use. In Section 4 (Pages 10-11, Lines 346-398), we now discuss how runtime constraints affect experimental design, ensemble size, uncertainty analysis, and the balance between model complexity and inferential robustness.

Comments 12: how execution-time constraints shape scientific inference, uncertainty analysis, and ensemble design.

Response 12: This aspect has been explicitly addressed in the revised Discussion section (Pages 10-11, Lines 346-355), where we clarify how runtime constraints limit feasible ensemble sizes, sensitivity analyses, and multi-scenario experiments, thereby shaping scientific inference and uncertainty quantification.

Comments 13: At present, the manuscript sometimes feels like a high-quality benchmark study rather than a theory-informed modelling contribution.

Response 13: We appreciate this observation. While the study is partly motivated by benchmark-style analyses, the manuscript has been revised to strengthen its theory-informed and methodological contribution. In particular, the Materials and Methods section (Pages 2-6, Lines 83-198) has been reorganized to foreground a general experimental and analytical framework for assessing computational scalability in distributed hydrological models, with TETIS used as a controlled case study. As further discussed in Section 4. Discussion (Pages 10-12, Lines 334-449), the contribution of this work lies in providing a reproducible framework that links model configuration, computational constraints, and scientific inference, rather than merely presenting a benchmark exercise.

Comments 14: Some figures (e.g., Figures 5 and A1) are visually dense and difficult to interpret without extended captions.

Response 14: Figures 5 and A1 (now B1) have been revised to improve readability by increasing font and marker sizes and adjusting contrast. Captions and legends were also expanded and clarified to better support figure interpretation.

Comments 15: Repetition of hardware descriptions across sections should be reduced.

Response 15: The manuscript has been revised to reduce repetitive hardware descriptions. Hardware characteristics are now introduced concisely and referenced consistently across sections, avoiding unnecessary repetition.

Comments 16: Minor language polishing is required (long sentences, heavy nominalization).

Response 16: Thank you for this comment. The manuscript has undergone careful language revision to reduce sentence length, limit heavy nominalization, and improve clarity, consistency, and readability.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Upon reviewing the author’s responses to the previous comments, the reviewer believes that the concerns have been adequately addressed and recommends accepting the revised version.

Author Response

Comment: Upon reviewing the author’s responses to the previous comments, the reviewer believes that the concerns have been adequately addressed and recommends accepting the revised version.

Response: We sincerely thank the Reviewer for the careful re-evaluation of the revised manuscript and for the positive assessment. We are pleased that the revisions have adequately addressed the previous concerns, and we appreciate the recommendation to accept the revised version.

Reviewer 3 Report

Comments and Suggestions for Authors

Version 2 represents a substantial improvement over Version 1 in terms of methodological clarity, experimental rigor, and reproducibility. The manuscript now presents a clearly articulated computational scalability framework, supported by a large, well-designed experiment set and a transparent machine-learning–based runtime prediction approach. However, some weaknesses remain, particularly in positioning the novelty beyond the TETIS case study and sharpening the take-home messages for a broader hydrological modeling audience. The framework logic could be more abstracted (e.g., summarized in a short “framework box” independent of TETIS)
Results section is very long and dense. Some plots (Figures 5–6, Appendix B) are information-rich but repetitive.
Discussion still feels defensive, repeatedly stating “this is not generalizable”.
Current title is strong but still method-heavy.
Recommended maximum resolution for real-time forecasting.

Author Response

Comments 1: Version 2 represents a substantial improvement over Version 1 in terms of methodological clarity, experimental rigor, and reproducibility. The manuscript now presents a clearly articulated computational scalability framework, supported by a large, well-designed experiment set and a transparent machine-learning–based runtime prediction approach.

Response 1: We sincerely thank Reviewer 3 for this positive and constructive assessment. We are pleased that Version 2 is considered a substantial improvement over the previous version, particularly in methodological clarity and experimental rigor.

Comments 2: However, some weaknesses remain, particularly in positioning the novelty beyond the TETIS case study and sharpening the take-home messages for a broader hydrological modeling audience. The framework logic could be more abstracted (e.g., summarized in a short “framework box” independent of TETIS)

Response 2: Thank you for your comment. In response to this concern and in light of all reviewers' feedback, we reorganized Version 2 of the manuscript to ensure that the methodological framework takes precedence over the specific application model. Accordingly, the Materials and Methods section (Pages 2–6, Lines 83–198) was restructured to first present a general, model-agnostic computational scalability framework (Sections 2.1 and 2.2), with TETIS subsequently introduced as a controlled case study to demonstrate its application (Sections 2.3). In particular, the revised workflow (Figure 1, in version 2) emphasizes generic steps such as model configuration and model run into the second phase “Model Processes”. And introduces a generalized “Runtime Prediction Model” as an example that can be transferred to other hydrological models (Page 6, Lines 192–198). This restructuring strengthens the abstraction level of the framework, clarifies its applicability beyond TETIS, and reinforces its replicability across different hydrological modeling contexts.

“Figure 1. General methodological workflow for computational scalability analysis and machine-learning-based runtime prediction.

”

Comments 3: Results section is very long and dense. Some plots (Figures 5–6, Appendix B) are information-rich but repetitive.

Response 3: We appreciate the reviewer’s observation regarding the length and density of the Results section, as well as the potential redundancy among some figures. In this regard, it is important to clarify that the figures in question serve complementary but distinct purposes within the manuscript. Figure 5 provides visual support for the exploratory data analysis (EDA) and highlights interactions among the main variables used in developing the predictive tool. Figure 6, in contrast, presents a more detailed view of selected results, allowing precise inspection of specific operational ranges that are later discussed in the Discussion section (Page 12, Lines 428–433). Appendix Figure B1 was originally included to ensure transparency and provide access to the complete set of results for interested readers. However, to reduce repetition and improve the manuscript's readability, Figure B1 has now been moved to the supplementary material hosted on the “TETIS Runtime Predictor” repository [45]. The corresponding references in the main text have been updated accordingly (Lines 237 and 250).

Comments 4: Discussion still feels defensive, repeatedly stating “this is not generalizable”.

Response 4: We thank the reviewer for this comment and for the opportunity to clarify the intent and scope of the Discussion section. Our intention was not to adopt a defensive tone, but rather to provide a careful, technically grounded interpretation of the results, given the wide diversity of hydrological models and hardware configurations to which the proposed framework could be applied.

Regarding generalizability, the manuscript does not repeatedly state that the framework itself is “not generalizable.” The reference to limited generalizability appears explicitly only once (Line 347) and specifically refers to the numerical results obtained from the TETIS-based application, not to the methodological framework as a whole. This distinction is intentionally made to avoid overextending quantitative conclusions derived from a controlled case study.

Furthermore, the Discussion explicitly addresses pathways for generalization. In particular, Line 421 discusses how the machine-learning components of the framework can be generalized and adapted to other hydrological models by retraining on model-specific runtime data, thereby extending applicability beyond the TETIS implementation.

Following the reviewers’ collective feedback, the Discussion was revised to place greater emphasis on interpretative insights, transferable methodological lessons, and conditions under which the framework can be extended, while clearly delimiting model- and data-specific findings. These revisions aim to strengthen the explanatory and forward-looking character of the discussion, rather than to defensively constrain the scope of the contribution.

Comments 5: Current title is strong but still method-heavy.

Response 5: We thank the reviewer for the constructive suggestion regarding the title. We agree that, while the original title was technically precise, it could be further improved to better engage the journal’s broader audience and to more clearly emphasize the replicability of the proposed methodological framework beyond a single application.

Accordingly, the title has been revised to: “Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction.”

Comments 6: Recommended maximum resolution for real-time forecasting.

Response 6: We thank the reviewer for this suggestion. In response, we refined the Discussion section to explicitly relate the computational scalability results to commonly used operational forecasting horizons, providing practical guidance without prescribing fixed limits.

Specifically, new text was introduced to clarify how the framework can be used during the design phase to assess feasible temporal resolutions based on modeling objectives, basin discretization, and available computational resources (Discussion Page 12, Lines 432–439). This addition contextualizes the results in terms of typical real-time applications, such as short-term forecasts of 3 days to 1 week at hourly resolution and seasonal-oriented predictions at daily resolution, while emphasizing that these ranges should be interpreted as indicative rather than prescriptive.

In addition, this perspective is consistent with the design-oriented considerations introduced earlier in the manuscript (Materials and Methods Lines 94–102), where the selection of model configurations is framed as covering a wide range of temporal setups commonly used in hydrological modeling.