1. Introduction
Continuous streamflow monitoring in river systems is fundamental for the integrated management of water resources and flood risk mitigation [
1]. The standard practice for obtaining these records is based on the uninterrupted measurement of water levels, which are subsequently transformed into discharge using stage–discharge curves or rating curves [
2,
3,
4]. Traditionally, these functional relationships are constructed by fitting empirical regressions—generally power or exponential types—based on in situ gauging campaigns [
5]. However, the physical reality of river channels is rarely static, introducing significant degrees of uncertainty in daily estimates due to changes in roughness, erosion, or alterations in the cross-section [
6,
7,
8]. Consequently, understanding and minimizing the error associated with these curves represents a persistent and unavoidable challenge in contemporary hydrometry.
In the specific context of high-mountain Andean basins, the intrinsic complexity of flow dynamics severely compromises the reliability of traditional gauging methods [
9]. These river systems are characterized by steep slopes, high sediment transport rates, and a channel morphology that is highly variable during flash floods [
10,
11]. Under these extreme conditions, the theoretical assumptions of uniform flow and stable section control, required by classical hydraulic equations, are frequently violated [
10]. Furthermore, during intense precipitation events, empirical extrapolation of the curve toward high levels becomes indispensable but carries high mathematical uncertainty due to the activation of floodplains and drastic changes in bed friction [
7]. Therefore, the application of conventional techniques in the Andean region often generates systematic biases that hinder the correct estimation of maximum and minimum flows.
To overcome the limitations of traditional mathematical approaches, Artificial Intelligence (AI) techniques have recently emerged as robust alternatives in hydrological modeling [
12]. Specifically, Artificial Neural Networks (ANN) have demonstrated a superior ability to map complex and strongly non-linear relationships without depending on a predefined physical control structure [
13]. Algorithms such as the Multilayer Perceptron, especially when trained using advanced optimization methods like Levenberg–Marquardt [
14], manage to implicitly capture hysteresis and geometric bed irregularities by learning from historical data [
15]. Various studies document that machine learning models drastically reduce flow prediction error compared to standard power laws [
16,
17]. Despite these notable predictive statistical advantages, the physical interpretability of AI models remains a subject of debate, making their validation against fundamental hydraulic principles essential [
18].
The rigorous implementation of any rating curve model requires, consequently, an exhaustive and simultaneous evaluation of its uncertainty and physical consistency [
18]. Much of the existing literature evaluates AI algorithms solely through global statistical metrics during the training phase, omitting the quantification of error propagation via confidence bands [
8]. Likewise, when direct measurements are unfeasible during floods, extrapolation depends on methodologies such as Manning or Stevens, whose physical sensitivity is rarely analytically contrasted with the trends projected by neural networks [
19,
20]. Additionally, spatial validation through continuous mass balances in nested basin systems is often ignored, despite constituting irrefutable proof of the hydrological coherence of the generated series [
21]. This multidimensional evaluation approach is vital to ensure that abstract models translate into realistic hydrological estimates on the ground.
Although recent advances in computational hydrology have successfully integrated Machine Learning (ML) and Deep Learning techniques to estimate river flows and optimize rating curves [
22,
23], significant methodological gaps remain regarding their physical interpretability and spatial coherence in topographically rugged regions. Most data-driven approaches evaluate model performance primarily through global statistical metrics during the training phase, frequently overlooking the rigorous propagation of mathematical uncertainty via confidence bands—a critical requirement in mountainous watersheds where stage–discharge relationships are highly unstable [
16]. Furthermore, while recent studies emphasize the necessity of correcting hydraulic biases independently of instantaneous discharge errors to capture complex geomorphic features [
24,
25], the literature rarely contrasts the physical sensitivity of ML-based curves against traditional extrapolations under extreme, ungauged flood conditions. A critical unresolved challenge is ensuring that these abstract AI algorithms do not violate fundamental hydraulic principles. The novelty and innovation of this study lie in moving beyond global statistical metrics by introducing a validated, physics-based spatial framework. Most existing studies have not addressed these gaps because instrumenting nested mountainous basins for continuous, high-frequency spatial mass balances—indispensable for geo-hydrological hazard assessment [
26]—is logistically complex and costly. Statistically quantifying these uncertainty bands while maintaining spatial hydrological coherence constitutes a major research gap in the operational application of AI for high-mountain hydrometry.
In this context, the general objective of this study is to compare the performance of Artificial Intelligence techniques against traditional hydraulic methods for reducing uncertainty in rating curves in mountainous Andean basins. To carry out this research, hydrometric data from gauging campaigns at three measurement stations were processed and installed in a nested basin scheme on the Zamora and Malacatos Rivers in the city of Loja, Ecuador. Discharge equations were determined through correlation analysis with exponential fitting and, in parallel, a neural network was implemented using the Neural Net Fitting algorithm optimized with Levenberg–Marquardt and subjected to cross-validation. Subsequently, extrapolation curves were defined using the Manning and Stevens methodologies [
20], and confidence bands were calculated to evaluate the mathematical uncertainty of the models. Finally, a spatial mass balance was executed between the tributaries and the basin outlet, determining the accuracy of all approaches through the Root Mean Square Error (RMSE) and the Nash-Sutcliffe Efficiency (NSE).
The spatial and temporal scope of this research encompassed the detailed evaluation of the stage–discharge relationship in an anthropogenically influenced Andean river system, considering both low-flow accuracy and extrapolated stability during floods. From a scientific standpoint, this work provides rigorous quantitative evidence of the advantages and limitations of neural networks for modeling non-linear mountain flow dynamics without losing the physical sense of runoff. At a technical level, the study provides a validated integrated methodological framework that merges the efficacy of artificial intelligence with classical spatial mass conservation. These results will provide decision-makers and engineering designers with substantially more accurate tools for transforming levels into flows. Soon, this comparative methodology is expected to serve as an operational technical standard for optimizing monitoring networks and early warning systems in similar Andean topographies.
2. Materials and Methods
2.1. Study Design and General Objective
The general objective of this study was to compare the performance of Artificial Intelligence (AI) techniques against traditional hydraulic methods for reducing mathematical and physical uncertainty in the estimation of stage–discharge rating curves in high-mountain Andean basins. To achieve this purpose, a comparative framework was established to evaluate predictive capacity, extrapolation of extreme flows, and spatial consistency through the principle of water mass conservation.
2.2. Study Area and Sample Description
The Zamora River basin (A = 227 km
2) is in the southern Andes of Ecuador and is formed by the confluence of the Zamora Huayco and Malacatos Rivers. It has an average elevation of 2400 m above sea level, an average basin slope of 30%, and an average slope of the main channel of 8.3% [
27]. The basin is covered by vegetation in good condition, mainly composed of grasslands, scrublands, and forests. Its climate is temperate subhumid equatorial, with a mean annual precipitation of 909.1 mm. The Zamora River experiences dry periods between May and November and significant flows during the rainy season (from December to April) [
28].
The city of Loja occupies the middle and lower portions of the basin. It has approximately 200,000 inhabitants and an area of 43 km
2, being the only urban settlement within the Zamora River basin [
29].
The location of the study area is shown in
Figure 1.
The analytical sample consisted of continuous time series of water levels and direct hydrometry (in situ streamflow measurements). A nested basin scheme was instrumented, comprising three strategic monitoring stations: two located on the main tributaries (LEON and DAB) and an outlet station on the main collecting channel (SAUCES). The gauging sample encompassed measurements captured under diverse flow conditions (low flow, transition, and moderate floods), which were subjected to a rigorous quality control process to identify and remove spurious outliers prior to mathematical modeling.
In situ streamflow measurements and water levels were recorded using mechanical current meters and radar sensors, respectively. Based on the manufacturer’s technical specifications and the standard hydrometric procedures employed in the field, the instrumental and methodological uncertainty is estimated at approximately between −2.6% and +1.6% at speeds above 0.22 m/s for discharge measurements and ±1.5% for water level readings. These inherent field measurement errors are considered within acceptable operational limits for turbulent mountain flows and are implicitly incorporated into the overall uncertainty bands modeled in this study.
2.3. Empirical Modeling and Rating Curve Fitting
To transform hydrometric stages (h) into discharges (Q), two independent methodological approaches were executed and contrasted:
2.3.1. Traditional Hydraulic Method
This approach was based on fitting the classical power-exponential regression equation, widely recognized as the international operational standard for stage–discharge relationships [
30,
31]:
where
Q(
t) is the instantaneous flow rate and
h(
t) is the observed continuous water level at time
t, meaning the variables are continuous in time rather than exclusively simulating peak flows. The empirical coefficients are strictly influenced by the environmental and morphometric factors of the river cross-section:
a is a scale parameter dependent on the channel’s roughness and longitudinal slope;
b is a shape parameter dictated by the cross-sectional geometry and flow control type (e.g., b ≈ 1.5 for rectangular section controls, b ≈ 1.67 for wide rectangular channels under friction control, and
b ≥ 2.0 for parabolic shapes); and
h0 represents the gauge height of zero flow, controlled by the physical elevation of the deepest point in the hydraulic control section [
30,
31]. While the operational application of this baseline formulation is standard practice globally, including in topographically rugged and mountainous watersheds [
16,
32], its rigidity often struggles to capture the highly variable hydrodynamics of steep mountain subcatchments. The inherent physical limitations of this equation in such complex environments precisely constitute the primary motivation for evaluating it as a baseline against Artificial Intelligence models in this study. The fitting of this non-linear function was optimized using the Levenberg–Marquardt algorithm to ensure the iterative minimization of residuals. For the extrapolation of the curve toward extreme levels (floods), where direct streamflow measurements were unavailable, the Manning and Stevens methodologies were applied in a complementary manner, assuming variable bed roughness parameters based on cross-sectional geometry. The dynamic uncertainty of this model was quantified by calculating 95% confidence bands through parametric error propagation.
2.3.2. Artificial Intelligence Techniques
In parallel, an Artificial Neural Network (ANN) of the Multilayer Perceptron type was implemented. The fundamental difference from the traditional approach is that the MLP is a non-parametric, data-driven approximator that does not require a priori physical assumptions. The model architecture used a 1-5-1 structure: an input layer (normalized water levels), a hidden layer with 5 neurons using a hyperbolic tangent activation function, and a linear output layer. Data were split into training (70%), validation (15%), and testing (15%), with Levenberg–Marquardt optimization stopped when the validation error failed to decrease for six consecutive iterations. The network was trained using the Neural Net Fitting algorithm, employing Levenberg–Marquardt optimization. To mitigate the risk of overfitting in the high-mountain dynamics, the AI model was subjected to a cross-validation technique, evaluating its generalization capacity on unseen data.
The implementation of a Multilayer Perceptron (MLP) neural network is justified by its proven ability as a universal approximator of nonlinear functions [
33]. In high-altitude Andean river systems, the assumptions of stable cross-section and constant friction are frequently violated due to bed irregularity and floodplain activation. In contrast to the rigidity of monotonic exponential regressions, the MLP implicitly captures these morphodynamic alterations and hysteresis phenomena by adapting its synaptic weights to the actual variance of the flow measurements. Additionally, the model training was optimized using the Levenberg–Marquardt algorithm. This method was chosen for its hybrid nature, combining the gradient descent direction with the Gauss–Newton convergence rate through Jacobian matrix approximation. This algorithmic robustness is ideal for hydrometric regression problems, allowing efficient minimization of the squared error in moderately sized samples, characteristic of costly gauging campaigns in rugged terrain.
2.4. Spatial Validation and Water Mass Balance
To ensure that the mathematical models preserved the physical sense of runoff, a continuous spatial mass balance was executed. Using high-resolution (sub-hourly) time series, the sum of the tributary inflows (QLEON + QDAB) was calculated and contrasted with the total discharge recorded at the outlet station (QSAUCES). The intermediate water contribution was mathematically deduced to verify the hydrological coherence of the hydrographs generated by both approaches (traditional and AI), ensuring that the abstract extrapolation did not violate the principle of mass conservation.
2.5. Statistical Analysis in R
Data preprocessing, hydrological modeling, and statistical inference were carried out using the R programming language (version 4.3.2) [
34] within the RStudio integrated development environment (version 2023.12.1) [
35].
For structural manipulation and deep cleaning of the time series, the tidyverse (v. 2.0.0), lubridate (v. 1.9.3), and janitor (v. 2.2.0) packages were employed. The non-linear fitting of the traditional rating curves was executed with minpack.lm (v. 1.2-4), while uncertainty propagation and confidence band calculation were performed with the propagate package (v. 1.0.6). The artificial intelligence-based modeling and cross-validation were programmed using the nnet (v. 7.3-19) and caret (v. 6.0-94) libraries. Finally, high-quality visualization and the composition of analytical plots were achieved with ggplot2 (included in tidyverse) and cowplot (v. 1.1.3).
The accuracy and predictive performance of the models against the actual streamflow measurements were quantified and compared using two main objective functions: the Root Mean Square Error (
RMSE) and the Nash-Sutcliffe Efficiency (
NSE) coefficient, calculated using the following equations [
36]:
where
Qobs,i is the in situ measured discharge,
Qsim,i is the discharge estimated by the models,
obs is the mean of the observed discharges, and
n is the total number of observations.
4. Discussion
4.1. Predictive Performance and the Challenge of Mathematical Generalization
The findings of this study confirm the theoretical premise that Artificial Intelligence (AI) techniques, specifically Artificial Neural Networks (ANN), possess a superior capability to model the stage–discharge relationship when compared to classical exponential regressions. During the final training phase, the neural network managed to capture the variance of the data almost perfectly across all three stations. This finding is consistent with those reported by [
13], who document that machine learning algorithms outperform standard power laws by not relying on a predefined physical control structure.
However, the drastic contrast observed during cross-validation reveals a critical technical vulnerability: overfitting. The significant drop in predictive efficiency on unseen data (particularly evident at the LEON station) suggests that, although the Levenberg–Marquardt algorithm is highly effective at minimizing local errors, it tends to memorize the noise and specific anomalies of the calibration sample. In the context of Andean basins, where gauging data are often scarce and dispersed, this lack of generalization warns that the initial statistical superiority of AI does not guarantee infallible extrapolation [
12], rendering the use of dynamic uncertainty bands indispensable.
It is important to note that while the AI model yielded higher NSE values during training, the traditional exponential model also demonstrated strong foundational performance (NSE > 0.94). Given the combined instrumental and methodological uncertainties inherent in the field data (with discharge measurement errors ranging between −2.6% and +1.6%, and water level reading accuracies of ±1.5%), the marginal increase in global NSE provided by the neural network is tightly constrained by this observational noise. Therefore, rather than claiming absolute statistical superiority based on incremental metric gains, the primary advantage of the AI approach lies in its structural flexibility to implicitly map local geometric irregularities and hysteresis effects—nuances that the rigid traditional power law inevitably smooths over or misrepresents.
4.2. Analytical Flexibility vs. High-Mountain Physical Dynamics
The graphical evaluation of the rating curves demonstrated that AI exhibits a flexibility that allows it to adapt to geometric bed irregularities and implicit phenomena such as hysteresis, as suggested by [
37]. Conversely, the exponential model displayed a rigid trajectory that inevitably underestimates or overestimates discharges in intermediate sections. This rigid behavior of the traditional method corroborates the assertions of [
32]. In steep-slope Andean rivers with highly variable morphology, the theoretical assumptions of uniform flow and stable section control are frequently violated, which invalidates the exclusive use of classical hydraulic equations without an in-depth uncertainty analysis.
When extrapolating the curves towards extreme water levels, AI proved to be a mathematically robust alternative to empirical extrapolation methods such as Manning or Stevens. Nevertheless, as cautioned by [
18], the physical sensitivity of these extrapolations must not be ignored. The neural network constructs its trajectory based purely on historical data mining; therefore, if streamflow measurements captured during flood events (where floodplains are activated and friction changes drastically) are unavailable, the abstract model runs the risk of projecting mathematically precise but physically unrealistic trends.
4.3. The Importance of Spatial Cross-Validation
One of the most significant contributions of this research is the verification of the models’ hydrological coherence through continuous mass balancing within the nested basin system (Zamora and Malacatos Rivers). Previous literature frequently omits this step, evaluating AI algorithms solely through global statistical metrics during the training phase [
17].
By achieving an adequate closure of the sub-hourly water balance between the tributary stations and the main outlet, this study directly addresses the call from [
21], who posits that spatial validation constitutes irrefutable proof of the viability of the generated series. This outcome demonstrates that the neural network’s predictions, despite their “black box” nature, translate into physically realistic runoff estimates on the ground, overcoming the interpretability gap that is often criticized in AI models.
4.4. Study Limitations
Despite the promising results, this study presents limitations that must be taken into consideration:
Scarcity of measurements at extreme flows: The database utilized for training the models lacks direct measurements during peak flood events, which increases the mathematical uncertainty in the upper extrapolation bands for both models.
Sensitivity to sample size and overfitting: The notable performance decline during cross-validation highlights that the neural networks employed are highly sensitive to the volume of available data, limiting their operational robustness if the monitoring network is not constantly updated.
Inherent instrumental noise: The near-perfect training metrics (NSE > 0.99) indicate that the AI partially modeled instrumental noise (−2.6% to +1.6% error). The drop in Cross-Validation NSE (e.g., to 0.563) is the primary metric indicating this overfitting. In mountain hydrology, a structurally robust model with lower precision (NSE 0.7–0.8) is often preferable for ungauged basin applications over a highly parameterized but overfitted AI. Additionally, extreme floods were not included because direct gauging in Andean flash-flood events is logistically unfeasible and poses extreme danger to equipment and personnel, marking a physical boundary for data-driven models. Consequently, extremely high-performance metrics during AI training (e.g., NSE ≈ 0.99) must be interpreted with caution, as the algorithm may be partially memorizing observational noise rather than pure physical variance.
Uninstrumented intermediate contributions: In the spatial water balance, the intermediate runoff contribution was mathematically deduced. The lack of instrumentation in contributing micro-basins between control points introduces a residual margin of error when validating mass conservation.
4.5. Future Work and Research Lines
To consolidate the transition toward modernized hydrometric protocols in the Andean region, the following future research lines are proposed:
Hybrid Modeling (Physical–Statistical): Develop Physics-Informed Neural Networks (PINNs) algorithms that penalize the neural network during its training phase if it violates fundamental hydraulic principles (such as the continuity equation or Manning’s friction limits).
Incorporation of Multidimensional Variables: Expand the AI architecture so that it does not solely depend on the water level (h), but also integrates dynamic topographic and climatic predictors (e.g., water surface slope, sediment transport) to mitigate overfitting.
Automated Monitoring Networks: Implement long-term validations using non-intrusive measurement techniques (such as radar velocimetry or drone imagery) during extreme events, which will enable feeding the algorithms with high-fidelity data under climate change scenarios.