Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction

Cortés-Torres, Nicolás; Salazar-Galán, Sergio; Francés, Félix

doi:10.3390/w18040466

Open AccessArticle

Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction

by

Nicolás Cortés-Torres

^1,*

,

Sergio Salazar-Galán

²

and

Félix Francés

¹

Research Institute of Water and Environmental Engineering (IIAMA), Universitàt Politècnica de València, 46022 Valencia, Spain

²

Agroecosystems History Laboratory, Universidad Pablo de Olavide, 41013 Sevilla, Spain

^*

Author to whom correspondence should be addressed.

Water 2026, 18(4), 466; https://doi.org/10.3390/w18040466

Submission received: 31 December 2025 / Revised: 3 February 2026 / Accepted: 9 February 2026 / Published: 11 February 2026

(This article belongs to the Section Hydrology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Despite the widespread use of distributed hydrological models for operational forecasting, climate change impact assessment, and large ensemble experiments, their computational performance and scalability are rarely systematically and reproducibly quantified. This lack of explicit information limits researchers’ and practitioners’ ability to anticipate runtimes, allocate computational resources efficiently, and design feasible modeling experiments. To address this gap, a general methodological framework was developed to assess computational scalability by varying spatial and temporal resolutions, input/output gauge densities, and hardware configurations. This framework was evaluated using the TETIS v9.1 ecohydrological model as a case study. Runtimes were systematically recorded, and a Random Forest regression model was trained to predict computational performance based exclusively on user-defined configuration variables. Model robustness was further assessed through a Monte Carlo uncertainty analysis. The results reveal clear scaling patterns: spatial resolution and output-gauge density exert the strongest influence on runtime, while temporal resolution shows nonlinear effects that depend on catchment size. The predictive tool achieved high accuracy for large hydrological simulations, with increased uncertainty limited to extremely short runtimes on high-speed processors. This study introduces a transferable framework to support efficient experimental design and operational hydrological modeling and provides the first reproducible characterization of TETIS computational scalability.

Keywords:

hydrological modeling; computational performance; scalability; TETIS model; random forest; machine learning; runtime prediction; rainfall-runoff model

1. Introduction

Hydrological modeling plays a central role in water resources management, flood forecasting, climate change impact assessment, early warning systems and operational decision-making [1]. The growing availability of hydrometeorological and geospatial data—driven by advances in remote sensing, in situ monitoring networks, and data assimilation—has accelerated the development of distributed hydrological models (HMs) capable of representing processes at increasingly fine spatial and temporal resolutions.

Since the late 1980s, research has emphasized the importance of selecting an appropriate spatial discretization and temporal resolution to adequately capture spatial heterogeneity and hydrological processes [2,3]. While finer resolutions generally enhance process representation. They also lead to substantial increases in computational demand, data volume, memory usage, and runtime. Thereby limiting feasibility in operational forecasting, climate change assessments, and large-scale experimental designs [4].

As a result, assessing the computational efficiency and scalability of HMs has become essential to ensure their applicability across diverse basin sizes, modeling scenarios, and data availability conditions [5,6,7,8,9,10,11]. In parallel, integrating machine learning (ML) techniques into environmental modeling workflows offers new opportunities to predict and optimize computational performance by identifying nonlinear patterns and potential bottlenecks in simulation pipelines [12,13].

These improvements in spatial and temporal resolution underscore the need to systematically evaluate how distributed models manage computational complexity while maintaining physical consistency and predictive skill. Several distributed models have been developed to address these emerging challenges by integrating detailed process representations while remaining applicable across a wide range of hydrological conditions [11,14,15,16,17,18]. Within this context, the present study adopts the TETIS v9.1 ecohydrological model as a representative case study, developed since 1995 [19,20]. TETIS is a spatially distributed conceptual model that uses regular grid cells and finite-difference schemes to simulate hydrological and environmental processes. Over three decades of continuous refinement, its structure has been expanded to include additional modules—including a snowmelt module, a dynamic vegetation and nitrogen cycle module, and reservoir and dam-operation components—which enhance its ability to represent flood forecasting, climate change impact assessments, erosion and sediment transport modeling, and key hydrological and biogeochemical dynamics across diverse climatic and physiographic conditions [20,21,22,23,24,25]. Despite its extensive adoption, few studies have explicitly examined TETIS from a computational performance perspective.

Understanding how runtime, efficiency, and memory usage respond to changes in spatial resolution, temporal discretization, catchment size, parameterization schemes, and hardware configurations is critical not only for optimizing operational applications but also for establishing a reproducible methodological framework that can be adapted to other distributed hydrological models and extended to multi-model or ensemble-based applications. To address this gap, this study conducts the first comprehensive evaluation of the computational scalability of TETIS v9.1, as a case study, through controlled experiments that vary spatial and temporal resolutions, model parameterizations, and computing environments. Furthermore, machine learning—specifically Random Forest regression—is employed to predict key performance indicators and uncover nonlinear relationships between model structure, computational settings, and run efficiency [26,27]. This hybrid experimental–data-driven framework provides actionable insights into the computational limits of distributed hydrological modeling and sets the foundation for future efforts in workflow optimization, uncertainty quantification, and ML-assisted ecohydrological simulation design.

2. Materials and Methods

2.1. Experimental Design

The methodological framework was conceived as a general experimental design to systematically assess the computational performance and scalability of a distributed hydrological model. It focuses on key variables that are known to directly influence model efficiency. Spatial resolution, expressed as the total number of grid cells within the basin, ranged from 1191 to 1,862,822. This range was obtained by considering two mesoscale catchments at five scales (200, 500, 1000, 2500, and 5000 m) and applying three distinct reconditioning schemes. This experimental setup follows established practices in distributed hydrological modeling and builds on previous applications [16,17,18,28,29], including those reported by Droppers et al. [14] and Cortés-Torres et al. [30,31], resulting in a total of 30 experimental basins.

Temporal configurations were defined using four representative numbers of time steps (50, 500, 1500, and 15,000) covering a wide range of typical temporal configurations used in hydrological modeling [16,17,18,28]. For instance, a 40-year simulation at daily resolution corresponds to approximately 15,000 time steps, whereas a monthly resolution requires about 500. A single-day simulation at minute resolution involves roughly 1500 time steps, while a 4 h rainfall event at 5 min resolution requires approximately 50 time steps. These configurations are also consistent with modeling frameworks that incorporate weather generators or climate change scenarios, as reported by Beneyto et al. [21] and Hernández-Sosa et al. [32].

Gauge density was defined based on two main considerations: the limited availability of in situ instrumentation in many basins and the growing use of high-resolution Earth observation products, such as precipitation or potential evapotranspiration at approximately 1 km² resolution [33,34,35,36]. Accordingly, input gauge configurations included 2, 1000, 10,000, and 50,000 stations. For output gauges, three configurations were considered based on common hydrological modeling practices [14,18,25]: (i) a single gauge, (ii) 20 gauges, and (iii) a proportional configuration equivalent to 20% of the cells in the drainage network whose accumulated cell values exceed the mean of the accumulation map. Under this approach, the number of output gauges ranged from 1 to 372,564.

Combining temporal configurations and gauge setups yielded 48 simulation scenarios. Together with 30 experimental basins, this resulted in 1440 hydrological simulations. The primary goal of this study was to provide an experimental framework for evaluating the computational scalability of hydrological models. Consequently, all simulations were run under five hardware configurations, as summarized in Table 1.

2.2. Machine Learning Application

Machine learning (ML) techniques were applied to predict computational performance and analyze how model configurations affect runtime in distributed hydrological simulations. Random Forest (RF) regression was selected for its robustness and ability to capture nonlinear relationships and interactions among predictors that are difficult to represent with simpler approaches, such as linear or polynomial regression [26,27,37]. Preliminary tests with linear models showed limited explanatory power, particularly for large basins, long simulations, and heterogeneous hardware configurations, which supports the use of an ensemble-based ML method [38].

Runtime was predicted using key variables defined in the experimental design, including basin cells, time steps, gauge density, and hardware configuration. This data-driven approach identifies the main drivers of computational bottlenecks and provides users with a practical tool for estimating runtime prior to running simulations.

Following the collection of runtime data, an exploratory data analysis (EDA) was conducted to characterize variable behavior and assess their relative importance. Correlations among predictor variables were examined to guide model construction. Subsequently, four influential RF hyperparameters (Table 2) were evaluated using 80% of the dataset to determine optimal predictor-hyperparameter combinations [39,40,41,42,43].

Hyperparameter selection followed five criteria: (i) validation R² within 0.5% of the maximum achieved; (ii) ΔR² between calibration and validation below 1% to limit overfitting; (iii) shallow tree depth to promote model simplicity; (iv) constraints on the maximum number of nodes and leaves to avoid overly fine partitions; and (v) a minimum number of estimators to reduce computational cost. To quantify uncertainty associated with training-data availability, Monte Carlo (MC) simulations were conducted with 10,000 iterations, varying the training subset from 20% to 30%. Overall, the resulting predictive tool—based exclusively on user-defined configuration variables—enables runtime estimation across a wide range of hydrological modeling scenarios.

Finally, Figure 1 summarizes the entire methodological workflow. The diagram organizes the procedure into four main phases: experiment configuration, model processes, data management, and prediction tool. All the experiment configurations and the predictive tool are available in the repositories [44,45].

2.3. Ecohydrological Model: TETIS

To apply and evaluate the proposed methodological framework under controlled, reproducible conditions, the TETIS v9.1 ecohydrological model was selected as a representative case study of a spatially distributed, conceptual hydrological model. The TETIS software was developed at the Universitat Politècnica de València by the Hydrological and Environmental Modeling Research Group (GIMHA) within the Research Institute of Water and Environmental Engineering [19,20]. TETIS simulates the hydrological response of a watershed to rainfall and snowmelt events. Its conceptual structure represents vertical water movement through a cascade of seven interconnected tanks, each representing distinct components and processes within the soil and subsoil (snow cover, interception, static storage, surface storage, gravitational storage, aquifer, and river channel). Water exchange among the tanks occurs through several mechanisms, including infiltration, percolation, non-connected underground flow, overland flow, interflow, and baseflow (Figure 2). Horizontal water transport across the basin is simulated using the Geomorphological Kinematic Wave (GKW) approach, which links runoff generation to the geomorphological characteristics of the drainage network and the catchment topographic (hillslopes, gullies, and river channels), jointly controlling the transfer of water toward the surface drainage network [19,20].

The TETIS parameters are directly linked to the processes outlined above. Their values are modified by corrector factors (CFs), which are used to calibrate the model and enhance the reliability of simulated outputs. In this study, only the hydrological module of TETIS was applied, and the parameters employed are summarized in Table A1.

Implementing TETIS requires a set of parameter maps in ASCII format that define the model’s spatial resolution and the total number of grid cells. These maps determine the potential size of the datasets. A high cell count in the basin may exceed available RAM, thereby reducing computational efficiency. Hydrological simulations in TETIS also require an input configuration file that defines the main characteristics of the event. This file specifies the temporal setup through three variables: the start date, the number of time steps, and the temporal resolution (in minutes). In addition, it defines the number and type of gauges used as model inputs or outputs.

Within the TETIS framework, the number of time steps plays a dominant role in computational cost relative to the simulation’s nominal temporal resolution (e.g., minute, sub-daily, daily, or monthly). This is because the computational load is primarily driven by the total number of temporal iterations rather than by the physical time interval represented. At each time step, the simulation algorithm sequentially calculates vertical fluxes, horizontal flows, and water routing using the GKW across all basin cells. The temporal resolution is particularly relevant when evaluating the model’s numerical stability.

Before running a simulation, TETIS performs two preliminary procedures that are also evaluated as part of the computational case study. First, all topological maps are compiled into a single file named Topolco.sds (Topolco). Second, the initial moisture state for each tank in every grid cell is generated as a separate file, referred to as Hantec.sds (Hantec). The computational characteristics of Topolco and Hantec, together with the hydrological simulation, are summarized in Table 3.

3. Results

This section presents the key findings from the experiments described above. The results are structured into three components: (i) preliminary processes for hydrological simulations, (ii) hydrological simulation runs, and (iii) application of machine learning for performance prediction and analysis.

3.1. Analysis of Preliminary Processes

Due to its role in the workflow, Topolco generation is independent of gauge density and temporal configuration. As shown in Figure 3a, runtime exhibits a strong positive correlation with the number of basin cells, indicating that computational demand increases markedly as spatial discretization becomes finer over the same area.

Performance differences among hardware configurations become pronounced for basins exceeding one million cells. These effects are primarily determined by RAM capacity, cores and threads availability, and processor clock speed. Larger memory facilitates the handling of extensive data structures, while the parallelized nature of the process benefits from increased core counts. In contrast, higher processor clock speeds can partially offset limitations in memory or parallel resources, as illustrated by the comparable performance of the multi-core/low-speed Intel Xeon E5-2697 v2 and the fewer-core/high-speed Intel Core i7-12700F (Figure 3b).

Despite its different algorithmic structure, the performance of the Hantec generation (Figure 4) follows patterns that are broadly comparable to those observed for Topolco. However, runtime dispersion becomes noticeable beyond approximately 500,000 cells, reflecting the process’s increasing sensitivity to hardware characteristics.

Because Hantec runs entirely in serial mode, its performance is primarily controlled by single-core processor speed rather than by memory capacity or core count. Accordingly, runtimes closely follow the ranking imposed by each processor’s maximum turbo frequency (Table 1). Notably, processors with larger RAM, higher core availability, and lower clock speeds do not exhibit improved performance in this task.

Overall, Hantec generation requires substantially less runtime than Topolco, confirming that it is a lightweight internal process within the TETIS workflow and contributes marginally to the computational cost.

3.2. Hydrological Simulation Runs

The complete set of hydrological simulation results is available in an open-access repository [45]. Figure 5 illustrates how key experimental variables affect hydrological simulation runtime. Outcomes range from seconds to days, depending on spatial discretization, temporal setup, and hardware configuration.

The number of basin cells strongly influences runtime (Figure 5a). Grouping simulations by this variable reveals a clear increase in runtime as spatial discretization becomes finer, in line with theoretical expectations for distributed models. Time steps also significantly affect performance (Figure 5b), with median runtimes rising sharply and variability increasing notably as the number of steps grows from 1500 to 15,000.

Gauge density has contrasting effects depending on its role in the model. Input gauge density shows only a marginal influence on runtime (Figure 5c), with stable distributions across configurations. In contrast, output gauge density has a measurable effect (Figure 5d). Although median values remain similar, simulations using the proportional configuration (20% of basin cells) produce markedly larger extreme runtimes, particularly when the number of output gauges exceeds about 1000 [45].

Hardware comparisons confirm that hydrological simulation runtime is primarily constrained by processor clock speed rather than by RAM capacity or core count. The Intel Xeon E5-2697 v2 consistently produces the longest runtimes, whereas the Xeon W7-2595X and Core i7-12700F yield the shortest. Even on high-frequency processors, the most demanding configuration scenarios still require tens of hours, highlighting the computational intensity of large-scale, high-resolution simulations.

To explore runtime behavior, the combined effects of gauge density, basin cells, and time steps were analyzed using two contrasting processors: a high-frequency Intel Xeon W7-2595X and a lower-frequency Intel Core i7-6700. Figure 6 illustrates representative configurations with low (50) and high (15,000) numbers of time steps.

For short simulations (50 time steps; Figure 6a,b), the runtime remains largely independent of gauge density for basins smaller than approximately 100,000 cells, with runtimes under one minute across all configurations. For larger basins, runtime increases sharply, particularly on lower-frequency processors, whereas high-frequency architectures maintain shorter runtimes.

In contrast, for long simulations (15,000 time steps; Figure 6c,d), gauge density becomes a key driver even for relatively small basins. In domains with fewer than 10,000 cells, increasing the input gauge density results in a clear increase in runtime. For large basins (>100,000 cells), runtime is primarily controlled by output gauge density, especially under the proportional configuration (20% of drainage network cells), where output handling dominates the computational cost.

3.3. Performance Prediction and Analysis

Building on the exploratory data analysis (EDA) presented in previous sections, the Hantec generation was shown to incur negligible computational cost (Section 3.1). Consequently, the predictive analysis focused exclusively on: Topolco generation and hydrological simulation.

For the Topolco process, the predictor set included Maximum Turbo Frequency (MTF), number of basin cells, RAM memory, core count, and thread count. For the hydrological simulation, the predictors included MTF, the number of basin cells, the number of time steps, the input gauge count, and the output gauge count. Correlation analysis confirmed the patterns identified earlier.

For Topolco, the number of basin cells was the dominant predictor (Pearson r = 0.79), whereas the remaining variables showed weak negative correlations (r between –0.12 and –0.20). Similarly, in hydrological simulation, the number of basin cells remained the most influential factor (Spearman ρ = 0.62), followed by the number of time steps (ρ = 0.47) and MTF (ρ = –0.44), whereas input and output gauge counts had comparatively minor influence.

Hyperparameter optimization, conducted according to the criteria in Section 2.2, identified optimal RF configurations for most predictor combinations. In Topolco, the full five-variable model achieved an R² of 0.993, and several reduced-variable configurations produced comparable performance, offering flexibility when complete predictor information is unavailable (Table A2). For the hydrological simulation, the five-variable model achieved an R² of 0.994. However, only a limited subset of predictor combinations produced similarly strong results. The seven best-performing configurations and their optimized hyperparameters are reported in Table A3.

Once the optimal predictor and hyperparameter configurations were defined, Monte Carlo (MC) simulations were conducted to quantify the uncertainty associated with the training subset.

For Topolco, the mean prediction error was approximately 25%. Error stratification by observed runtime indicates that prediction uncertainty is highest for short runtimes, particularly under high-frequency processors. In several cases involving large basins and processor speeds above 4.8 GHz, prediction errors exceeded 50%, reflecting the RF model’s tendency to overestimate runtime when actual Topolco runtimes are close to 2 min. In hydrological simulation, the aggregated mean prediction error reached 203%. However, this value is dominated by short-duration simulations. When grouped by observed runtime, prediction errors decrease markedly with increasing runtime and converge to approximately 7.4% for long simulations (Table 4), indicating stable predictive performance under computationally demanding conditions.

Based on the optimal predictor–hyperparameter combinations, a user-oriented predictive tool was implemented to estimate the runtime before model runs [45]. The tool was evaluated in two sub-basins of the Imperial River Basin (Quino and Muco, South-Central Chile) under climate-change scenarios [32] (inputs tested with the tool, as shown in Table A4).

For the Quino basin, the predicted Topolco runtime was 0.026 min (range: 0.023–0.030 min), compared with an observed value of 0.012 min. The predicted hydrological simulation runtime was 4.56 min (range: 3.33–6.50 min), compared with an observed 3.77 min. For the Muco basin, predicted runtimes were 0.051 min for Topolco and 13.38 min for the hydrological simulation (ranges: 0.036–0.076 and 6.80–18.85 min, respectively), compared with observed values of 0.026 and 8.24 min.

4. Discussion

The increasing complexity of distributed hydrological modeling has made computational performance a central concern in contemporary hydrological research [1,9,10]. Despite its importance, the computational burden is rarely documented systematically. Only a limited number of models have been examined with respect to runtimes, mainly in the context of calibration strategies, parallel implementations [46], or real-time applications, including AWRA-L [47], BreZo [48], JAMS [49], LIS [50], SWAT [29,51], Xinanjiang model [52]. Conversely, a much larger body of literature addresses the computational performance of specific algorithms that support hydrological modeling, such as drainage area delineation [53], flow-direction computation [54], pipe-network routing [55], longest-flow-path algorithms [56,57], time-series segmentation [58], and sink-filling procedures [59]. This literature reveals a persistent gap in model-level scalability assessments.

Although the present study is conducted on a single grid-based, conceptual distributed model (TETIS v9.1), its aim is not to generalize numerical results across all hydrological models. Rather, the experimental design and methodological framework proposed here provide a transferable starting point for scalability analyses in other distributed or semidistributed models with comparable data structures and configuration schemes, such as SWAT, W-flow, CLM, PCR-GLOBWB, or mHM. While conceptual tank-based models differ from fully physics-based frameworks—which typically involve higher computational demands due to equation complexity—addressing these differences lies beyond the scope of this study. Instead, the contribution of this work lies in providing a replicable methodological basis for future comparative and multi-model analyses of computational scalability in hydrological modeling.

The analysis of TETIS’s internal processes clarifies how computational cost is distributed across the modeling workflow. As mentioned in Section 3.1, Topolco generation is primarily controlled by the number of basin cells and benefits from its parallelized algorithm. This process effectively exploits computational resources, similar to other modeling systems, such as SEBS [60], AWRA-L [47], Xinanjiang [52], MGB [61,62], TOUGHREACT [63], LuKARS 3.0 [64], and Hydro-CAL [65], reaffirming that parallelization is a tool for minimizing runtimes. Beyond these general trends, the results reveal a hardware-related behavior that has received limited attention in previous studies. Higher processor clock speeds can partially offset lower core counts and limited memory capacity, yielding comparable runtimes across different hardware configurations. This trade-off between frequency and parallel resources was clearly observed during Topolco generation and suggests that balanced hardware setups—not necessarily those maximizing cores or RAM—may achieve competitive performance. As a specific TETIS characteristic, the Topolco file must be regenerated whenever parameter maps change (e.g., soil properties, aquifer characteristics, DEMs, and derived layers). Consequently, parameter-space explorations, sensitivity analyses, or scenario-based experiments can substantially increase the total computational cost if this step is repeated frequently.

In contrast, generating initial states (Hantec) imposes a negligible computational burden within the TETIS workflow. Given its fully serial implementation and short runtimes, repeated Hantec generation is generally unnecessary unless multiple initial-condition scenarios are explicitly required, such as in event-based simulations. Moreover, changes to parameter maps alone do not justify repeated runs. From an operational perspective, Topolco dominates the preprocessing cost when the model configuration varies, whereas Hantec is typically computationally marginal.

Therefore, hydrological simulation is the dominant computational bottleneck in the TETIS workflow. This behavior aligns with findings from other CPU-based models, such as PIHM [37] and HydroCAL [65]. From a computational standpoint, climatic conditions do not directly affect runtime because the behavior of climate variables is not part of the simulation algorithm itself. However, strong variations in geomorphology could influence runtime. In the case of TETIS, horizontal water transport is computed for all grid cells, regardless of the drainage network’s complexity. There is a small difference in the mathematical operations applied to cells predefined as hillslopes, gullies, or channels [20]. This geomorphological aspect would be a valuable direction for future work, particularly for assessing second-order effects.

Beyond improvements in computational efficiency, understanding runtimes has direct practical relevance for operational hydrological applications, particularly in real-time forecasting and early-warning systems coupled with meteorological forecasting [66]. In such contexts, runtime constraints directly influence the choice of optimal spatiotemporal resolution and the number of scenarios that can be evaluated within operational deadlines. More broadly, machine learning (ML) approaches have been increasingly adopted in hydrology for prediction and system analysis tasks, including flood forecasting [67] and water-system modeling [13], while advances in HPC continue to shape computational practices across environmental modeling [68].

In this study, Random Forest (RF) proved to be an effective and complementary approach for predicting runtime in both Topolco generation and hydrological simulations [27]. The resulting predictive tool provides configuration-level runtime estimates using readily available variables for standard computational users (e.g., RAM capacity, core and thread counts, and clock speed). However, it does not account for system-level factors such as disk I/O performance, memory bandwidth, background processes, or file-system behavior. Consequently, predictions should be interpreted as configuration-level estimates rather than full system-level performance metrics.

Prediction uncertainty strongly depends on runtime magnitude. For very short runtimes, relative errors increase markedly because small absolute deviations—often influenced by disk access, operating-system scheduling, timing resolution, or runtime Python (v.3.11) capture procedures—translate into large percentage errors [69]. By contrast, for longer simulations, prediction errors decrease substantially, and RF performance remains stable. The coexistence of high R² and large percentage errors at short runtimes reflects the sensitivity of relative error metrics to small absolute times rather than a lack of model robustness. In terms of absolute values, a relative error of 900% for 0.01 min corresponds to an absolute difference of 0.1 min, whereas for longer runtimes, 7.4% error from 1000 min results in 1075 min. Moreover, reported error rates represent mean values across 10,000 MC iterations, varying the training subset, and thus also reflect uncertainty associated with non-measurable system-level variability. Expanding the training dataset to include additional hardware architectures, larger basins, and high-performance computing environments is expected to improve predictive accuracy and generalizability [41,70,71].

Beyond technical efficiency, computational scalability has direct implications for both hydrological science and operational practice. Runtime constraints shape feasible experimental design, uncertainty analysis, and ensemble size, thereby conditioning how model complexity and resolution can be balanced against robust inference. In this context, the proposed TETIS Runtime Predictor offers practical guidance for selecting model configurations that are computationally feasible for real-time and research-oriented constraints. Its application is most appropriate for simulations exceeding practical thresholds: for small basins (<100,000 cells), configurations with more than 1500 time steps, input gauge counts above 10,000, or output gauges exceeding 1000; and for larger basins (>100,000 cells), configurations with more than 500 time steps, input gauges above 1000, or output gauges exceeding 100. Under these conditions, the tool supports informed planning for operational real-time forecasting workflows or large-ensemble experiments, as it explicitly encompasses the temporal horizons commonly used in practice, such as short-term forecasts of 3 days to 1 week at hourly resolution (approximately 72–168 time steps) and seasonal-oriented predictions at daily resolution (on the order of ~180 time steps). Consequently, the upper limit of feasible real-time configurations is not prescribed by the framework itself, but rather emerges from the interaction among basin discretization, temporal resolution, and available computational resources.

These results highlight several clear pathways for optimizing the TETIS framework in the future. At model level, improvements may include code modularization and more efficient data structures, as implemented in mHM [72,73], along with reducing unnecessary I/O operations. At the computational level, previous studies demonstrate substantial performance gains through GPU acceleration [74,75,76], complementary parallelization frameworks such as Swift [77,78], and running on high-performance clusters or cloud infrastructures, as reported for SWAT [79,80,81]. More advanced strategies—including MPI- based parallelization [82,83], Monte Carlo–oriented GPU/MP implementations [84,85], sub-basin–level parallelization [86,87], and GPU-accelerated surface–subsurface coupling [88]—illustrate the range of options available to substantially reduce runtime in large-scale hydrological applications in TETIS.

Comparative performance analyses, such as those conducted for ALFISH [89], together with cloud-integrated and HPC-oriented modeling infrastructures [90,91,92,93], provide a mature context for evaluating similar strategies for TETIS. In this sense, the present study establishes a baseline to guide future developments toward scalable implementations that better support both operational forecasting and computationally demanding ensemble-based research.

5. Conclusions

The results presented in this study contribute to closing a persistent gap in the literature by providing a comprehensive, systematic, and reproducible assessment of the computational performance of a hydrological model across a broad range of spatiotemporal resolutions and hardware configurations, using the TETIS model as a case study. Understanding how runtime responds to these factors has direct implications for both operational and research-oriented hydrological applications. For instance, the design of climate-change scenario experiments, the implementation of real-time flood forecasting systems, or the running of multi-objective calibration workflows all require reliable estimates of computational cost to avoid excessive delays, resource saturation, or operational failures. Likewise, early-warning systems operate under strict temporal constraints; therefore, prior knowledge of expected runtimes enhances operational robustness and allows for dynamic adjustments—such as modifying spatial resolution or the number of model inputs or outputs—while still ensuring timely forecasts.

Beyond computational considerations, the results also highlight a fundamental modeling insight: temporal resolution broadly governs the dynamics of hydrological simulation, whereas spatial resolution becomes critical for adequately representing nonlinear hydrological processes [22,24]. As temporal scales define the evolution and interaction of processes, spatial discretization becomes increasingly relevant for capturing heterogeneity, thresholds, and connectivity effects that are intrinsic to distributed ecohydrological modeling. This balance between temporal control and spatial representativeness has important implications for both model realism and computational efficiency.

In scientific and engineering contexts, the ability to anticipate computational demands is equally valuable. Large ensemble simulations, global sensitivity analyses, and multi-basin comparative studies often involve thousands of model runs, and the computational burden can become a limiting factor [14,32]. The predictive tool developed in this work helps mitigate these limitations by providing runtime estimates based on user-defined configuration variables, thereby supporting efficient planning and resource allocation at early stages of model design.

Overall, the findings demonstrate that the proposed tool is a practical aid for designing future experimental configurations and operational workflows. Rather than replacing detailed system-level performance analyses, it supports informed decision-making during model setup and experiment planning. Its performance further underscores the potential of data-driven approaches to support computational decision-making in distributed hydrological modeling, particularly as larger, more diverse training datasets become available.

Author Contributions

Conceptualization, N.C.-T., S.S.-G. and F.F.; methodology, N.C.-T.; software, N.C.-T.; validation, S.S.-G. and F.F.; formal analysis, N.C.-T.; investigation, N.C.-T.; resources, F.F.; data curation, N.C.-T.; writing—original draft preparation, N.C.-T.; writing—review and editing, N.C.-T., S.S.-G. and F.F.; visualization, N.C.-T.; supervision, S.S.-G. and F.F.; project administration, F.F.; funding acquisition, N.C.-T., S.S.-G. and F.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Valencian Regional Government through the WATER4CAST 2.0 research project (CIPROM/2023/5); the Spanish Ministry of Science and Innovation through the research project TETISPREDICT (PID2022-141631OB-I00); partially supported by the Spanish Government project TSI-100130-2024-14; N.C.-T. by the “Programa Credito Beca” grant from COLFUTURO and the Colombian Ministry of Science, Technology and Innovation (PCB–2024); and S.S.-G. by the research talent recruitment program “EMERGIA”, Call 2021, Consejería de Universidad, Investigación e Innovación, Junta de Andalucía, Spain (EMC21_00413). Funding for open access charge: Universitat Politècnica de València.

Data Availability Statement

The data presented in this study are openly available in the Full Reproduction Package at URL https://doi.org/10.5281/zenodo.17569945 (created on 7 October 2025), reference number [44].

Acknowledgments

The authors wish to acknowledge Universitat Politècnica de València for access to the scientific databases used in the literature review. During the preparation of this manuscript/study, the authors used ChatGPT V.5.2 and Google Gemini V.3 to assist with clarity and style in English writing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Correction Statement

This article has been republished with a minor correction to the Funding statement. This change does not affect the scientific content of the article.

Appendix A

Compiles additional tables that support the methodological choices and results discussed in the main text. These tables provide detailed information on model parameterization, Random Forest hyperparameter evaluation, and input configurations used to test the runtime prediction tool. The material is included for completeness and reproducibility, while only the key findings are discussed in the main sections of the manuscript.

Table A1. Parameter maps to implement the hydrological module of TETIS.

Parameter Map	Unit	Description	Corrector Factor *
Digital elevation model (DEM)	m	The digital elevation model has been processed with fill sinks and hydrological reconditioning.	-
Slope	m/m	Slope for each cell	-
Flow direction	(-)	Indicate the direction in which the flow is driven	-
Accumulated cells	(-)	Number of cells accumulated	-
Land use	(-)	Vegetation covers index for the evapotranspiration	-
Static storage (Hu)	(mm)	Maximum capacity of the static storage tank	CF1
Infiltration capacity (Ks)	(mm/h)	Saturated hydraulic conductivity of soil	CF3
Hillside Flow Velocity	(m/s)	Surface velocity of flow on the hillside	CF4
Percolation capacity (Kp)	(mm/h)	Hydraulic conductivity of subsoil	CF5
Interflow hydraulic conductivity (Kss)	(mm/h)	Saturated horizontal hydraulic conductivity of soil	CF6
Deep aquifer percolation capacity (Kps)	(mm/h)	Saturated hydraulic conductivity of the rock layer	CF7
Connected aquifer hydraulic conductivity (Ksa)	(mm/h)	Saturated horizontal hydraulic conductivity of the rock layer	CF8

Note: * CF2 and CF9 do not exert a direct influence on any of the parameter maps. These elements are associated with evapotranspiration production for each land use and flow velocity in the channel, respectively.

Table A2. Random Forest hyperparameter evaluation for Topolco runtime prediction.

Predictor Variables Combination *	R² Val	ΔR²	N Estimators	Max Depth	Min Samples Split	Min Sample Leaf
Max turbo frequency, Basin cells, RAM memory, Cores, Threads	0.993	−0.034	100	5	6	2
Max turbo frequency, Basin cells, RAM memory, Cores	0.992	−0.052	200	5	7	1
Basin cells, RAM memory, Cores, Threads	0.993	−0.043	150	5	7	1
Max turbo frequency, Basin cells, Cores, Threads	0.991	−0.060	100	5	7	1
Max turbo frequency, Basin cells, RAM memory, Threads	0.992	−0.055	100	5	7	1
Max turbo frequency, Basin cells, RAM memory	0.992	−0.063	100	5	7	2
Basin cells, RAM memory, Threads	0.993	−0.046	100	5	7	1
Max turbo frequency, Basin cells, Threads	0.992	−0.052	400	5	7	1
Basin cells, Cores, Threads	0.992	−0.013	100	5	6	1
Basin cells, RAM memory, Cores	0.991	−0.061	300	5	7	1
Max turbo frequency, Basin cells, Cores	0.991	−0.064	100	5	7	1
Max turbo frequency, Basin cells	0.993	−0.178	200	5	6	2
Basin cells, RAM memory	0.992	−0.067	100	5	7	1
Basin cells, Cores	0.991	−0.066	100	5	7	1
Basin cells, Threads	0.991	−0.057	100	5	7	1
Basin cells	0.898	−0.299	100	5	10	1

Note: * Evaluation of Random Forest hyperparameter configurations for predicting Topolco runtime across different predictor–variable combinations. The table reports validation performance (R² and ΔR²) together with the selected hyperparameters. Predictor–variable combinations not listed did not achieve satisfactory validation R² after applying the methodological filters mentioned in Section 2.2.

Table A3. Random Forest hyperparameter evaluation for hydrological simulation runtime prediction.

Predictor Variables Combination *	R² Val	ΔR²	N Estimators	Max Depth	Min Samples Split	Min Sample Leaf
Max turbo frequency, Basin cells, Time steps, Input gauges, Output gauges	0.994	−0.005	300	5	10	5
Max turbo frequency, Basin cells, Time steps, Output gauges	0.994	−0.005	400	5	10	5
Max turbo frequency, Time steps, Output gauges	0.990	−0.004	100	5	10	5
Max turbo frequency, Time steps, Input gauges, Output gauges	0.988	−0.003	100	5	10	5
Basin cells, Time steps	0.289	−0.009	150	5	10	3
Basin cells, Output gauges	0.212	0.010	100	5	10	5
Output gauges	0.212	0.009	100	5	10	5

Note: * Evaluation of Random Forest hyperparameter configurations for predicting hydrological simulation runtime across alternative predictor–variable combinations. Reported metrics include validation R², ΔR², and the optimized hyperparameters. Only combinations achieving satisfactory predictive performance are shown, after filtering according to the criteria in Section 2.2.

Table A4. Configuration of input variables used to test the runtime prediction tool in the Quino and Muco basins.

Predictor Variables *	Quino Basin	Muco Basin
Max turbo frequency	4.9	4.9
Basin area (km²)	300	649
Cell size (m)	90	90
Initial date (dd/mm/yyyy hh:mm)	1/01/2061 0:00	1/01/2030 0:00
Final date (dd/mm/yyyy hh:mm)	31/12/2091 0:00	1/01/2060 0:00
Delta t (min)	1440	1440
Number input gauges	9	9
Number output gauges	12	1
RAM memory (Gb)	128	128
Cores	12	12
Threads	20	20

Note: * The key predictor variables, including the number of basin cells and time step, are calculated internally by the prediction tool.

References

Beven, K. Rainfall-Runoff Modelling; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2012; ISBN 9780470714591. [Google Scholar] [CrossRef]
Wood, E.F.; Sivapalan, M.; Beven, K.; Band, L. Effects of spatial variability and scale with implications to hydrologic modeling. J. Hydrol. 1988, 102, 29–47. [Google Scholar] [CrossRef]
Mamillapalli, S.; Srinivasan, R.; Arnold, J.G.; Engel, B.A. Effect of spatial variability on basin scale modeling. In Proceedings of the Third International Conference on GIS and Enviromental Modeling, Santa Fe, NM, USA, 21–25 January 1996; National Center for Geographic Information and Analysis: Santa Barbara, CA, USA, 1996; pp. 802–8013. [Google Scholar]
Viviroli, D.; Archer, D.R.; Buytaert, W.; Fowler, H.J.; Greenwood, G.B.; Hamlet, A.F.; Huang, Y.; Koboltschnig, G.; Litaor, M.I.; López-Moreno, J.I.; et al. Climate change and mountain water resources: Overview and recommendations for research, management and policy. Hydrol. Earth Syst. Sci. 2011, 15, 471–504. [Google Scholar] [CrossRef]
Clark, M.P.; Nijssen, B.; Lundquist, J.D.; Kavetski, D.; Rupp, D.E.; Woods, R.A.; Freer, J.E.; Gutmann, E.D.; Wood, A.W.; Gochis, D.J.; et al. A unified approach for process-based hydrologic modeling: 2. Model implementation and case studies. Water Resour. Res. 2015, 51, 2515–2542. [Google Scholar] [CrossRef]
García Bartual, R. Efecto de las variaciones temporales de la intensidad de la precipitación en el hidrograma de la crecida generada. Rev. De Obras Públicas 1991, 138, 13–21. [Google Scholar]
Ehret, U.; van Pruijssen, R.; Bortoli, M.; Loritz, R.; Azmi, E.; Zehe, E. Adaptive clustering: Reducing the computational costs of distributed (hydrological) modelling by exploiting time-variable similarity among model elements. Hydrol. Earth Syst. Sci. 2020, 24, 4389–4411. [Google Scholar] [CrossRef]
Blöschl, G. Scaling and Regionalization in Hydrology; Vienna University of Technology: Vienna, Austria, 2011. [Google Scholar]
Blöschl, G.; Sivapalan, M. Scale issues in hydrological modelling: A review. Hydrol. Process. 1995, 9, 251–290. [Google Scholar] [CrossRef]
Clark, M.P.; Bierkens, M.F.P.; Samaniego, L.; Woods, R.A.; Uijlenhoet, R.; Bennett, K.E.; Pauwels, V.R.N.; Cai, X.; Wood, A.W.; Peters-Lidard, C.D. The evolution of process-based hydrologic models: Historical challenges and the collective quest for physical realism. Hydrol. Earth Syst. Sci. 2017, 21, 3427–3440. [Google Scholar] [CrossRef] [PubMed]
van Jaarsveld, B.; Wanders, N.; Sutanudjaja, E.H.; Hoch, J.; Droppers, B.; Janzing, J.; van Beek, R.L.P.H.; Bierkens, M.F.P. A first attempt to model global hydrology at hyper-resolution. Earth Syst. Dyn. 2025, 16, 29–54. [Google Scholar] [CrossRef]
Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Prieto, C.; Gupta, H.V. What Role Does Hydrological Science Play in the Age of Machine Learning? Water Resour. Res. 2021, 57, e2020WR028091. [Google Scholar] [CrossRef]
Shen, C. A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists. Water Resour. Res. 2018, 54, 8558–8593. [Google Scholar] [CrossRef]
Droppers, B.; Rakovec, O.; Avila, L.; Azimi, S.; Cortés-Torres, N.; De León Pérez, D.; Imhoff, R.; Francés, F.; Kollet, S.; Rigon, R.; et al. Multi-model hydrological reference dataset over continental Europe and an African basin. Sci. Data 2024, 11, 1009. [Google Scholar] [CrossRef]
Dai, Y.; Zeng, X.; Dickinson, R.E.; Baker, I.; Bonan, G.B.; Bosilovich, M.G.; Denning, A.S.; Dirmeyer, P.A.; Houser, P.R.; Niu, G.; et al. The Common Land Model. Bull. Am. Meteorol. Soc. 2003, 84, 1013–1024. [Google Scholar] [CrossRef]
Aerts, J.P.M.; Hut, R.W.; Van De Giesen, N.C.; Drost, N.; Van Verseveld, W.J.; Weerts, A.H.; Hazenberg, P. Large-sample assessment of varying spatial resolution on the streamflow estimates of the wflow_sbm hydrological model. Hydrol. Earth Syst. Sci. 2022, 26, 4407–4430. [Google Scholar] [CrossRef]
Janzing, J.; Wanders, N.; Van Tiel, M.; Van Jaarsveld, B.; Karger, D.N.; Brunner, M.I. Hyper-resolution large-scale hydrological modelling benefits from improved process representation in mountain regions. Hydrol. Earth Syst. Sci. 2025, 29, 7041–7071. [Google Scholar] [CrossRef]
Zink, M.; Kumar, R.; Cuntz, M.; Samaniego, L. A high-resolution dataset of water fluxes and states for Germany accounting for parametric uncertainty. Hydrol. Earth Syst. Sci. 2017, 21, 1769–1790. [Google Scholar] [CrossRef]
Francés, F.; Vélez, J.I.; Vélez, J.J. Split-parameter structure for the automatic calibration of distributed hydrological models. J. Hydrol. 2007, 332, 226–240. [Google Scholar] [CrossRef]
GIMHA—Grupo de Investigación en Modelación Hidrológica y Ambiental Distribuida. Descripción Del Modelo Conceptual Distribuido De Simulación Hidrológica TETIS v.9. 2021. Available online: https://zenodo.org/records/17780029 (accessed on 3 February 2026). [CrossRef]
Beneyto, C.; Vignes, G.; Aranda, J.Á.; Francés, F. Sample Uncertainty Analysis of Daily Flood Quantiles Using a Weather Generator. Water 2023, 15, 3489. [Google Scholar] [CrossRef]
Medici, C.; Butturini, A.; Bernal, S.; Vázquez, E.; Sabater, F.; Vélez, J.I.; Francés, F. Modelling the non-linear hydrological behaviour of a small Mediterranean forested catchment. Hydrol. Process. 2008, 22, 3814–3828. [Google Scholar] [CrossRef]
Romero Hernández, C.P. Análisis del impacto del crecimiento de las megaciudades sobre el ciclo hidrológico bajo escenarios de cambio climático. Aplicación a la cuenca del río Bogotá (Colombia). Ph.D. Thesis, Universitat Politècnica de València, Valencia, Spain, 2022. Available online: https://riunet.upv.es/handle/10251/191025 (accessed on 3 February 2026). [CrossRef]
Barrios, M.; Francés, F. Spatial scale effect on the upper soil effective parameters of a distributed hydrological model. Hydrol. Process. 2012, 26, 1022–1033. [Google Scholar] [CrossRef]
Güiza-Villa, N.; Cortés-Torres, N.; Francés, F. Impacto del riego estimado por satélite en la modelación hidrológica: Análisis del balance hídrico y desempeño del modelo TETIS en la cuenca del río Po. Ing. Del Agua 2026, 30, 31–47. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rakovec, O.; Kumar, R.; Attinger, S.; Samaniego, L. Improving the realism of hydrologic model functioning through multivariate parameter estimation. Water Resour. Res. 2016, 52, 7779–7792. [Google Scholar] [CrossRef]
Yalew, S.; van Griensven, A.; Ray, N.; Kokoszkiewicz, L.; Betrie, G.D. Distributed computation of large scale SWAT models on the Grid. Environ. Model. Softw. 2013, 41, 223–230. [Google Scholar] [CrossRef]
Cortes Torres, N.; Salazar-Galán, S.; Francés, F. Fractal Dimension and Multiscale Analysis in Geomorphological Parameter Assessment and Hydrological Modeling. In Proceedings of the EGU General Assembly 2025, Vienna, Austria, 27 April–2 May 2025. [Google Scholar] [CrossRef]
Cortés-Torres, N.; Vignes, G.; De Leon Pérez, D.; Salazar, S.; Fránces, F. Influencia Del Reacondicionamiento y Escalado Espacial De Parámetros Geomorfológicos En Modelación. In Proceedings of the Memorias Del XXXI Congreso Latinoamerciano De Hidráulica, Medellín, Colombia, 1–4 October 2024; pp. 429–438. [Google Scholar]
Hernández-Sosa, M.; Aguayo, M.; Cortés-Torres, N.; Stehr, A.; Frances, F.; Llompart, O. Assessing hydrological responses to large-scale native forest restoration as a nature-based solution in South-Central Chile under climate change. Nat.-Based Solut. 2026, 9, 100298. [Google Scholar] [CrossRef]
García-García, A.; Stradiotti, P.; Di Paolo, F.; Filippucci, P.; Fischer, M.; Orság, M.; Brocca, L.; Peng, J.; Dorigo, W.; Gruber, A.; et al. Intercomparison of Earth Observation products for hyper-resolution hydrological modelling over Europe. Remote Sens. Environ. 2026, 333, 115131. [Google Scholar] [CrossRef]
Gomis-Cebolla, J.; Garcia-Arias, A.; Perpinyà-Vallès, M.; Francés, F. Evaluation of Sentinel-1, SMAP and SMOS surface soil moisture products for distributed eco-hydrological modelling in Mediterranean forest basins. J. Hydrol. 2022, 608, 127569. [Google Scholar] [CrossRef]
Brocca, L.; Zhao, W.; Lu, H. High-resolution observations from space to address new applications in hydrology. Innovation 2023, 4, 100437. [Google Scholar] [CrossRef] [PubMed]
Brombacher, J.; de Oliveira Silva, I.R.; Degen, J.; Pelgrum, H. A novel evapotranspiration based irrigation quantification method using the hydrological similar pixels algorithm. Agric. Water Manag. 2022, 267, 107602. [Google Scholar] [CrossRef]
Leonard, L. Using machine learning models to predict and choose meshes reordered by graph algorithms to improve execution times for hydrological modeling. Environ. Model. Softw. 2019, 119, 84–98. [Google Scholar] [CrossRef]
Cortés-Torres, N.; Segio, S.-G.; Francés, F. Análisis de la escalabilidad computacional del modelo TETIS ante variaciones en resolución espacial, temporal y parametrizaciones del sistema. In Proceedings of the Libro de Resúmenes, VIII Jornadas de Ingeniería del Agua 2025, Zaragoza, Spain, 22–23 October 2025; Volume 11, pp. 385–388. [Google Scholar]
Probst, P.; Wright, M.N.; Boulesteix, A. Hyperparameters and tuning strategies for random forest. WIREs Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef] [PubMed]
Jamshidi, P.; Siegmund, N.; Velez, M.; Kästner, C.; Patel, A.; Agarwal, Y. Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis. arXiv 2017, arXiv:1709.02280. [Google Scholar] [CrossRef]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. c. [Google Scholar]
Cortés-Torres, N. Full Reproduction Package: Inputs, Configurations, and Monitoring Data for TETIS Computational Scalability Study [Data Set]. Zenodo. 2025. Available online: https://zenodo.org/records/17569945 (accessed on 3 February 2026). [CrossRef]
Cortés-Torres, N. TETIS Runtime Predictor. Zenodo. 2025. Available online: https://zenodo.org/records/18458768 (accessed on 3 February 2026). [CrossRef]
Mower, R.; Gutmann, E.D.; Liston, G.E.; Lundquist, J.; Rasmussen, S. Parallel SnowModel (v1.0): A parallel implementation of a distributed snow-evolution modeling system (SnowModel). Geosci. Model Dev. 2024, 17, 4135–4154. [Google Scholar] [CrossRef]
Perraud, J.-M.; Collins, D.; Bowden, J.C.; Raupach, T.; Manser, P.A.; Stenson, M.P.; Renzullo, L.J. A balancing act in heterogeneous computing—Developing the AWRA-Landscape data assimilation system. In Proceedings of the MODSIM2013, 20th International Congress on Modelling and Simulation; Piantadosi, J., Anderssen, R.S., Boland, J., Eds.; Modelling and Simulation Society of Australia and New Zealand (MSSANZ), Inc.: Sydney, Australia, 2013. [Google Scholar] [CrossRef]
Schubert, J.E.; Sanders, B.F. Building treatments for urban flood inundation models and implications for predictive skill and modeling efficiency. Adv. Water Resour. 2012, 41, 49–64. [Google Scholar] [CrossRef]
Krause, P.; Kralisch, S. The hydrological modelling system J2000-knowledge core for JAMS. In Proceedings of the MODSIM 2005 International Congress on Modelling and Simulation, Melbourne, Australia, 12–15 December 2005. [Google Scholar]
Tian, Y.; Peters-Lidard, C.D.; Kumar, S.V.; Geiger, J.; Houser, P.R.; Eastman, J.L.; Dirmeyer, P.; Doty, B.; Adams, J. High-performance land surface modeling with a Linux cluster. Comput. Geosci. 2008, 34, 1492–1504. [Google Scholar] [CrossRef]
Lin, Q.; Zhang, D. A scalable distributed parallel simulation tool for the SWAT model. Environ. Model. Softw. 2021, 144, 105133. [Google Scholar] [CrossRef]
Kan, G.; He, X.; Ding, L.; Li, J.; Hong, Y.; Liang, K. Heterogeneous parallel computing accelerated generalized likelihood uncertainty estimation (GLUE) method for fast hydrological model uncertainty analysis purpose. Eng. Comput. 2020, 36, 75–96. [Google Scholar] [CrossRef]
Richardson, A.; Hill, C.N.; Taylor Perron, J. IDA: An implicit, parallelizable method for calculating drainage area. Water Resour. Res. 2014, 50, 4110–4130. [Google Scholar] [CrossRef]
Survila, K.; Yildirim, A.A.; Li, T.; Liu, Y.Y.; Tarboton, D.G.; Wang, S. A scalable high-performance topographic flow direction algorithm for hydrological information analysis. In Proceedings of the the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (XSEDE16), New York, NY, USA, 17–21 July 2016. [Google Scholar] [CrossRef]
Creaco, E.; Franchini, M. Comparison of Newton-Raphson Global and Loop Algorithms for Water Distribution Network Resolution. J. Hydraul. Eng. 2014, 140, 313–321. [Google Scholar] [CrossRef]
Kotyra, B.; Chabudziński, Ł. Fast parallel algorithms for finding the longest flow paths in flow direction grids. Environ. Model. Softw. 2023, 167, 105728. [Google Scholar] [CrossRef]
Cho, H. Loop then task: Hybridizing OpenMP parallelism to improve load balancing and memory efficiency in continental-scale longest flow path computation. Environ. Model. Softw. 2025, 193, 106630. [Google Scholar] [CrossRef]
Kehagias, A. A hidden Markov model segmentation procedure for hydrological and environmental time series. Stoch. Environ. Res. Risk Assess. 2004, 18, 117–130. [Google Scholar] [CrossRef]
Senevirathne, N.; Willgoose, G. A comparison of the performance of digital elevation model pit filling algorithms for hydrology. In Proceedings of the MODSIM2013, 20th International Congress on Modelling and Simulation; Piantadosi, J., Anderssen, R.S., Boland, J., Eds.; Modelling and Simulation Society of Australia and New Zealand (MSSANZ), Inc.: Sydney, Australia, 2013. [Google Scholar] [CrossRef]
Abouali, M.; Timmermans, J.; Castillo, J.E.; Su, B.Z. A high performance GPU implementation of Surface Energy Balance System (SEBS) based on CUDA-C. Environ. Model. Softw. 2013, 41, 134–138. [Google Scholar] [CrossRef]
Freitas, H.R.A.; Mendes, C.L.; Ilic, A. Performance Optimization and Scalability Analysis of the MGB Hydrological Model. In Proceedings of the Proceedings—2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics, HiPC 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 31–40. [Google Scholar] [CrossRef]
Freitas, H.R.A.; Mendes, C.L.; Ilic, A. Performance optimization of the MGB hydrological model for multi-core and GPU architectures. Environ. Model. Softw. 2022, 148, 105271. [Google Scholar] [CrossRef]
Wei, X.; Li, W.; Tian, H.; Li, H.; Xu, H.; Xu, T. THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media. Comput. Geosci. 2015, 80, 26–37. [Google Scholar] [CrossRef]
Richieri, B.; Sivelle, V.; Hartmann, A.; Labat, D.; Muniruzzaman, M.; Chiogna, G. LuKARS 3.0: A high-performance computing software to model flow and transport processes in karst aquifers. Environ. Model. Softw. 2025, 193, 106642. [Google Scholar] [CrossRef]
Furnari, L.; Senatore, A. The Effects of Different Mesh Sizes on a Cellular Automata-Based Hydrological Model. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin, Germany, 2025; Volume 14477 LNCS, pp. 268–274. [Google Scholar] [CrossRef]
Momo, M.R.; Tachini, M.; Refosco, J.C.; Severo, D.L.; Dos Santos Silva, H.; Cordero, A. Architecture for integrating computational tools based on Grid services for system monitoring and alerting. In Proceedings of the Proceedings—2012 6th International Conference on Complex, Intelligent, and Software Intensive Systems, CISIS 2012, Palermo, Italy, 4–6 July 2012; pp. 1013–1017. [Google Scholar] [CrossRef]
Mosavi, A.; Ozturk, P.; Chau, K. Flood Prediction Using Machine Learning Models: Literature Review. Water 2018, 10, 1536. [Google Scholar] [CrossRef]
Schulthess, T.C. Programming revisited. Nat. Phys. 2015, 11, 369–373. [Google Scholar] [CrossRef]
Feitelson, D.G. Workload Modeling for Computer Systems Performance Evaluation; Cambridge University Press: Cambridge, UK, 2015; ISBN 9781107078239. [Google Scholar] [CrossRef]
Yokelson, D.; Charest, M.R.J.; Li, Y.W. HPC Application Performance Prediction with Machine Learning on New Architectures. In Proceedings of the Proceedings of the 2023 on Performance EngineeRing, Modelling, Analysis, and Visualization Strategy; ACM: New York, NY, USA, 2023; pp. 1–8. [Google Scholar] [CrossRef]
Hou, Z.; Zhao, S.; Yin, C.; Wang, Y.; Gu, J.; Zhou, X. Machine Learning Based Performance Analysis and Prediction of Jobs on a HPC Cluster. In Proceedings of the 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT); IEEE: Piscataway, NJ, USA, 2019; pp. 247–252. [Google Scholar] [CrossRef]
Samaniego, L.; Kumar, R.; Attinger, S. Multiscale parameter regionalization of a grid-based hydrologic model at the mesoscale. Water Resour. Res. 2010, 46, 1–25. [Google Scholar] [CrossRef]
Kumar, R.; Samaniego, L.; Attinger, S. Implications of distributed hydrologic model parameterization on water fluxes at multiple scales and locations. Water Resour. Res. 2013, 49, 360–379. [Google Scholar] [CrossRef]
Brædstrup, C.F.; Damsgaard, A.; Egholm, D.L. Ice-sheet modelling accelerated by graphics cards. Comput. Geosci. 2014, 72, 210–220. [Google Scholar] [CrossRef]
Gourbesville, P.; Tallé, H.A.; Ghulami, M. AquaVar: High Performance Computing for Real Time Water Management. In Proceedings of the IAHR World Congress; International Association for Hydro-Environment Engineering and Research: Madrid, Spain, 2022; pp. 4952–4960. [Google Scholar] [CrossRef]
Wei, J.; Luo, X.; Huang, H.; Liao, W.; Lei, X.; Zhao, J.; Wang, H. Enable high-resolution, real-time ensemble simulation and data assimilation of flood inundation using distributed GPU parallelization. J. Hydrol. 2023, 619, 129277. [Google Scholar] [CrossRef]
Ozik, J.; Collier, N.; Murphy, J.T.; Altaweel, M.; Lammers, R.B.; Prusevich, A.A.; Kliskey, A.; Alessa, L. Simulating Water, Individuals, and Management using a coupled and distributed approach. In Proceedings of the Winter Simulation Conference 2014; IEEE: Savannah, GA, USA, 2014; pp. 1120–1131. [Google Scholar] [CrossRef]
Perraud, J.-M.; Bridgart, R.; Bennett, J.C.; Robertson, D. SWIFT2: High performance software for short-medium term ensemble streamflow forecasting research and operations. In Proceedings of the MODSIM2015, 21st International Congress on Modelling and Simulation; Weber, T., McPhee, M.J., Anderssen, R.S., Eds.; Modelling and Simulation Society of Australia and New Zealand: Sydney, Australia, 2015. [Google Scholar] [CrossRef]
Sloboda, M.; Swayne, D. Autocalibration of Environmental Process Models Using a PAC Learning Hypothesis. In International Symposium on Environmental Software Systems; Springer: Berlin/Heidelberg, Germany, 2011; pp. 528–534. [Google Scholar] [CrossRef]
Bacu, V.; Gorgan, D. Grid application oriented computational resource allocation strategy. In Proceedings of the 2012 International Conference on High Performance Computing & Simulation (HPCS); IEEE: Piscataway, NJ, USA, 2012; pp. 581–587. [Google Scholar] [CrossRef]
Gorgan, D.; Bacu, V.; Mihon, D.; Rodila, D.; Abbaspour, K.; Rouholahnejad, E. Grid based calibration of SWAT hydrological models. Nat. Hazards Earth Syst. Sci. 2012, 12, 2411–2423. [Google Scholar] [CrossRef]
Wu, Y.; Li, T.; Sun, L.; Chen, J. Parallelization of a hydrological model using the message passing interface. Environ. Model. Softw. 2013, 43, 124–132. [Google Scholar] [CrossRef]
Li, Q.; Nie, N.; Lu, Z.; Wang, Y. The study of parallelization of SWAT hydrology cycle. In Proceedings of the Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2019; Volume 913, pp. 174–185. [Google Scholar] [CrossRef]
Yin, Z.; Liao, W.; Lei, X.; Wang, H. Parallel hydrological model parameter uncertainty analysis based on message-passing interface. Water 2020, 12, 2667. [Google Scholar] [CrossRef]
Kan, G.; Li, C.; Zuo, D.; Fu, X.; Liang, K. Massively Parallel Monte Carlo Sampling for Xinanjiang Hydrological Model Parameter Optimization Using CPU-GPU Computer Cluster. Water 2023, 15, 2810. [Google Scholar] [CrossRef]
Zhang, A.; Li, T.; Si, Y.; Liu, R.; Shi, H.; Li, X.; Li, J.; Wu, X. Double-layer parallelization for hydrological model calibration on HPC systems. J. Hydrol. 2016, 535, 737–747. [Google Scholar] [CrossRef]
Kotyra, B. High-performance watershed delineation algorithm for GPU using CUDA and OpenMP. Environ. Model. Softw. 2023, 160, 105613. [Google Scholar] [CrossRef]
Le, P.V.V.; Kumar, P.; Valocchi, A.J.; Dang, H.V. GPU-based high-performance computing for integrated surface-sub-surface flow modeling. Environ. Model. Softw. 2015, 73, 1–13. [Google Scholar] [CrossRef]
Immanuel, A.; Berry, M.W.; Gross, L.J.; Palmer, M.; Wang, D. A parallel implementation of ALFISH: Simulating hydrological compartmentalization effects on fish dynamics in the Florida Everglades. Simul. Model. Pract. Theory 2005, 13, 55–76. [Google Scholar] [CrossRef]
Klenk, K.; Spiteri, R.J. Improving resource utilization and fault tolerance in large simulations via actors. Cluster Comput. 2024, 27, 6323–6340. [Google Scholar] [CrossRef]
Klenk, K.; Moayeri, M.M.; Guo, J.; Clark, M.P.; Spiteri, R.J. Mitigating synchronization bottlenecks in high-performance actor-model-based software. In Proceedings of the Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024; pp. 1274–1287. [Google Scholar] [CrossRef]
Sharma, V.; Swayne, D.; Lam, D.; Schertzer, W. Auto-Calibration of Hydrological Models Using High Performance Computing. In Proceedings of the International Congress on Environmental Modelling and Software, Burlington, VT, USA, 9–13 July, 2006. [Google Scholar]
Maghami, I.; Van Beusekom, A.; Hay, L.; Li, Z.; Bennett, A.; Choi, Y.D.; Nijssen, B.; Wang, S.; Tarboton, D.; Goodall, J.L. Building cyberinfrastructure for the reuse and reproducibility of complex hydrologic modeling studies. Environ. Model. Softw. 2023, 164, 105689. [Google Scholar] [CrossRef]

Figure 1. General methodological workflow for computational scalability analysis and machine-learning-based runtime prediction.

Figure 2. Conceptual schema of the TETIS model at the cell scale.

Figure 3. Runtime performance of the Topolco generation across hardware configurations. Colors are consistent across panels and correspond to the same hardware configurations. (a) Topolco runtime as a function of the number of basin cells. (b) Distribution of processor speed (GHz) during the Topolco generation.

Figure 4. Runtime performance of the Hantec generation as a function of the number of basin cells, across hardware configurations.

Figure 5. Runtime performance of the TETIS hydrological simulation as a function of key experimental factors across all hardware configurations: (a) Basin discretization, expressed as the number of basin cells. (b) Temporal setup, expressed as the number of time steps. (c) Input gauge density. (d) Output gauge density. Boxplots summarize the distribution of runtime values for each factor, highlighting central tendency and variability across processors.

Figure 6. Runtime performance of hydrological simulations as a function of basin cells, illustrating the effect of gauge density, for two time step configurations and two processors: (a) Intel Xeon W7-2595X with 50 time steps. (b) Intel Core i7-6700 with 50 time steps. (c) Intel Xeon W7-2595X with 15,000 time steps. (d) Intel Core i7-6700 with 15,000 time steps. Each marker represents an individual simulation; marker colors indicate the combined input–output gauge density.

Table 1. Hardware configurations used for computational scalability experiments.

Processor	RAM Memory (GB)	Cores	Threads	Base Speed (GHz)	Max Turbo Frequency (GHz)
Intel Xeon w7-2595X	256	26	52	2.80	4.80
Intel Xeon E5-2697 v2	128	24	48	2.70	3.50
Intel Core i7-12700F	128	12	20	2.10	4.90
Intel Core i7-3930K	32	6	23	3.20	3.80
Intel Core i7-6700	32	4	8	3.40	4.00

Table 2. Hyperparameter space explored for Random Forest performance prediction.

Hyperparameter	Name	Minimum	Maximum	Delta
n_estimators	The number of trees in the forest.	100	500	50
max_depth	The maximum depth of the tree	5	20	5
min_samples_split	The minimum number of samples required to split an internal node	2	10	1
min_samples_leaf	The minimum number of samples required to be at a leaf node	1	5	1

Table 3. Algorithmic structure and core usage of the main TETIS computational components.

Process	Algorithm Structure	Core Usage	Parallelization	Role in Workflow
Topolco	Parallel (Open MP)	All available cores	Yes	Topology preprocessing
Hantec	Serial	Single core	No	Initial state generation
Hydrological simulation	Serial	Single core	No	Dynamic hydrological simulation

Table 4. Mean RF prediction error (%) by observed hydrological simulation runtime ranges.

Observed Time Range (min)	Mean Error (%)	Observed Time Range (min)	Mean Error (%)
<0.1	907.6	10 to 100	37.3
0.1 to 1	253.1	100 to 1000	19.5
1 to 10	75.6	1000 to 10,000	7.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cortés-Torres, N.; Salazar-Galán, S.; Francés, F. Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction. Water 2026, 18, 466. https://doi.org/10.3390/w18040466

AMA Style

Cortés-Torres N, Salazar-Galán S, Francés F. Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction. Water. 2026; 18(4):466. https://doi.org/10.3390/w18040466

Chicago/Turabian Style

Cortés-Torres, Nicolás, Sergio Salazar-Galán, and Félix Francés. 2026. "Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction" Water 18, no. 4: 466. https://doi.org/10.3390/w18040466

APA Style

Cortés-Torres, N., Salazar-Galán, S., & Francés, F. (2026). Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction. Water, 18(4), 466. https://doi.org/10.3390/w18040466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scalability and Computational Performance of an Ecohydrological Model Using Machine Learning-Based Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Design

2.2. Machine Learning Application

2.3. Ecohydrological Model: TETIS

3. Results

3.1. Analysis of Preliminary Processes

3.2. Hydrological Simulation Runs

3.3. Performance Prediction and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Correction Statement

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI