A Global Benchmark of the Vector-Based Routing Model MizuRoute: Similarities and Divergent Patterns in Simulated River Discharge

Xu, Shuyuan; Sun, Haodong; Tang, Li; Sun, Xiaohui

doi:10.3390/w18040485

Open AccessArticle

A Global Benchmark of the Vector-Based Routing Model MizuRoute: Similarities and Divergent Patterns in Simulated River Discharge

¹

Department of Geology and Surveying Engineering, Shanxi Institute of Energy, Jinzhong 030600, China

²

Geological Environments and Disaster Prevention and Reduction Research Center, Shanxi Institute of Energy, Jinzhong 030600, China

³

Shanxi Provincial Geological Prospecting Bureau, Taiyuan 030001, China

⁴

Wuhan University of Science and Technology, Wuhan 430081, China

⁵

College of Geological and Surveying Engineering, Taiyuan University of Technology, Taiyuan 030024, China

⁶

Shanxi Institute of Geological Survey Co., Ltd., Taiyuan 030006, China

⁷

Shanxi Center of Technology Innovation for Mining Groundwater Pollution Prevention and Remediation in Karst Area, Taiyuan 030024, China

⁸

College of Construction Engineering, Jilin University, Changchun 130026, China

^*

Author to whom correspondence should be addressed.

Water 2026, 18(4), 485; https://doi.org/10.3390/w18040485

Submission received: 16 January 2026 / Revised: 9 February 2026 / Accepted: 11 February 2026 / Published: 13 February 2026

(This article belongs to the Section Hydrology)

Download

Browse Figures

Versions Notes

Abstract

Large-scale river modeling has transitioned toward vector-based routing, yet the global fidelity of standalone frameworks like mizuRoute remains poorly characterized due to fragmented observation networks and unquantified systematic biases. This study addresses this gap by establishing a comprehensive global benchmark using a harmonized database of 12,115 in situ gauging stations integrated with multi-dimensional catchment attributes. Simulations utilize the 5 km MERIT-Hydro network driven by ERA5-Land runoff from 1980 to 2024. Our results reveal a robust global median Pearson correlation of 0.53, though simulation efficiency is highly bifurcated with a median Kling–Gupta Efficiency (KGE) of 0.17. High fidelity is concentrated in humid temperate and cold regions, whereas performance collapses in arid zones (median KGE = −0.15) due to the structural omission of channel transmission losses. Attribution analysis identifies the aridity–moisture gradient and vegetation density as primary drivers of model skill, while topographic complexity is well-preserved by the vector framework. Furthermore, anthropogenic regulation significantly degrades accuracy; in basins with high reservoir density, naturalized routing fails to capture regulated flow signatures, leading to a sharp decline in efficiency. This work provides the first global appraisal of the mizuRoute framework and highlights that integrating dryland-specific loss functions and reservoir modules is essential for the next generation of global hydrological reconstructions.

Keywords:

global river discharge; mizuRoute; vector-based routing; model benchmarking; hydroclimatic stratification; anthropogenic impact

1. Introduction

River discharge is a key component of the terrestrial water cycle, influencing regional to global climate, sustaining ecosystems, and determining freshwater availability [1,2,3]. Reliable estimation of river discharge is essential for water resources management, hydropower planning, hazard forecasting, and climate impact assessments [4,5,6]. Despite its importance, direct discharge observations are unevenly distributed across the globe, with many records remaining publicly inaccessible due to financial, logistical, or transboundary constraints [7]. As a result, large portions of the global river network remain poorly characterized, necessitating the use of river models to estimate discharge, particularly where in situ measurements are unavailable [8].

Large-scale river modeling can be broadly divided into grid-based and vector-based routing approaches [9]. Historically, grid-based methods have been the popular choice for estimating discharge [10]. A pathway pioneered in large-scale hydrology is to use these routing models in combination with land surface models, reanalysis, and observations to reconstruct global river discharge [11]. For example, the Global Flood Awareness System combines a land surface scheme, HTESSEL, with the LISFLOOD routing model, simulating global discharge daily at a 0.05-degree resolution [12,13]. Similarly, many global water resources models, such as PCR-GLOBWB 2, also utilize this routing approach [14,15]. However, the inherent grid-based routing component introduces significant limitations. Water movement is restricted to neighboring rectangular grid cells, and the river network topology is strictly derived from the underlying gridded digital elevation model. These assumptions hinder a sufficiently accurate representation of the river system, potentially leading to misallocation of terrestrial runoff, distorted drainage basins, mass balance errors, and complicating the realistic depiction of localized features such as reservoirs and lakes [16,17,18].

Robust evaluation is a prerequisite for utilizing any river models, as this step establishes the fidelity with which a model captures flow processes across all scales, ensuring the reliability of historical analyses, impact attribution studies, and future projections [19]. Many efforts have involved using in situ discharge measurements to evaluate global and large-scale routing models. For example, a benchmark system has been established for the CaMa-Flood river model by integrating in situ river discharge observations and remote sensing data, such as water levels and areas [20]. More specifically, when driven by bias-corrected VIC runoff, CaMa-Flood achieved an NSE (Nash–Sutcliffe Efficiency) exceeding 0.3 at 43% of the 291 evaluated in situ discharge gauges, with reservoir scheme activation slightly improving overall performance [21]. Similarly, the RAPID vector-based routing model has also been evaluated; its global long-term daily discharge validation against 4510 gauges yielded a median KGE (Kling–Gupta efficiency) of 0.59, demonstrating how varying runoff inputs affect simulation accuracy [22]. Furthermore, the LISFLOOD model, used in the GloFAS-ERA5 reanalysis, represents a global gridded dataset available from 1979 until near real time, and its evaluation against a global network of 1801 daily river discharge observation stations found that 86% of catchments showed skill against a mean flow benchmark, achieving a global median Pearson correlation coefficient of 0.61 [13]. Despite these efforts, current model evaluations are still inherently limited, being restricted by in situ data spatial coverage (typically less than 4500 stations), which leaves large regions unverified and often masks systematic model biases.

Despite these extensive evaluations across river routing models, a comprehensive global benchmark focusing exclusively on the standalone mizuRoute vector-based river model remains a critical knowledge gap [23,24]. mizuRoute was selected for this benchmark due to its increasing adoption in Earth system models (e.g., Energy Exascale Earth System Model, Community Earth System Model) and its topology-agnostic capability to route runoff on any unstructured mesh. For instance, initial studies reporting the improved performance of mizuRoute with and without the lake/reservoir module only utilized 1600 in situ discharge stations, showing that over 60% of stations exhibited a poor NSE < 0 without the reservoir scheme activation [25]. Model simulations inevitably contain errors; therefore, a detailed characterization of the similarities and divergent patterns in global river discharge simulated by models is largely absent. Recently, the availability of extensive, publicly accessible streamflow datasets has enabled the possibility of addressing these research gaps. Leveraging these resources, we compiled daily discharge records for a substantial number of river gauges from multiple public repositories [26,27,28,29,30,31,32,33,34,35,36]. These sources include the United States Geological Survey (USGS) National Water Information System (NWIS), the Global Runoff Data Center (GRDC), and several Catchment Attributes and Meteorology for Large-Sample Studies (CAMELS) datasets. The resulting geographical distribution of these stations is highly heterogeneous; gauge densities are notably higher in regions like North America and Europe. Conversely, discharge records in vast areas of Africa and most of Asia rely primarily on GRDC gauges, which often focus on large basins, provide only monthly data, or lack records after the mid-1990s. This study capitalizes on this unique data compilation effort to perform the required comprehensive assessment.

To meet this challenge and address the identified gaps, we first generated a daily discharge database for millions of vector river reaches from 1980 to 2024 using the mizuRoute model and high-resolution 5 km MERIT-Hydro river networks driven by ERA5-Land runoff. Our study has two primary objectives:

(1) To establish a comprehensive, daily resolution global performance benchmark for the standalone mizuRoute vector-based routing model by validating its simulated discharge against the compiled network of 12,115 in situ discharge gauges, focusing on standard metrics and different flow regimes.

(2) To systematically characterize the similarities and divergent patterns in global river discharge simulated by mizuRoute across different hydroclimatic and geographical regions, and to identify specific catchment characteristics that correlate with model fidelity.

This study provides the first global benchmark for the mizuRoute vector framework, offering a systematic appraisal of its fidelity and providing a foundational reference for future global hydrological modeling and climate impact assessments.

2. Materials and Methods

The research framework, schematically illustrated in Figure 1, is structured into three primary phases to ensure a rigorous global assessment. The workflow begins with (i) the synthesis of a harmonized global discharge database and the extraction of multi-dimensional catchment attributes, integrating data from diverse repositories such as CAMELS and GRDC. The second phase, depicted in the central processing block of Figure 1, involves (ii) the configuration and execution of the mizuRoute model using the 5 km MERIT-Hydro river network driven by ERA5-Land runoff. Finally, the workflow culminates in (iii) a systematic evaluation and attribution analysis (Figure 1, bottom panel) to identify the hydroclimatic drivers of model performance through statistical metrics and stratification.

2.1. Synthesis of Multi-Source Streamflow Observations and Hydro-Physical Attributes

To establish a globally representative benchmark for river routing, this study developed a high-density streamflow database by integrating several premier hydrological repositories. The foundational datasets include the Catchment Attributes and Meteorology for Large-Sample Studies (CAMELS) series from the United States, Chile, Brazil, Great Britain, Australia, and Denmark [26,27,28,29,30,31,32,33]. These were supplemented by regional large-sample datasets, specifically HYSETS for North America [34] and the LamaH series for Central Europe and Iceland [35,36]. For broader global coverage, particularly in regions where CAMELS datasets are currently unavailable, records were harvested from the Global Runoff Data Center (GRDC) and the United States Geological Survey (USGS).

While many of these datasets provide pre-calculated catchment attributes, the sources and methodologies vary significantly across national boundaries. To ensure a consistent and high-quality global analysis, we compiled a unified suite of static physical and anthropogenic attributes for all 12,115 gauges. Discharge observations were not gap-filled; all evaluations were performed on available days with valid records to preserve observational integrity. The catchment boundaries (shapefiles) were obtained from the respective original studies or delineated using the MERIT-Hydro framework. We categorized these attributes into six distinct dimensions: reservoir and catchment morphology, hydroclimate, geology and soil, topography, land cover, and anthropogenic activities. Data processing involved harmonizing various formats, including netCDF, ESRI shapefiles, and high-resolution raster. For continuous variables such as mean annual precipitation from WorldClim V2.1, temperature, and elevation from GMTED2010, we calculated the catchment-averaged values using the area-weighted mean of the underlying raster grids. For categorical variables, such as the lithology geology classes [37] or GlobCover land use features, we calculated the fractional percentage of each class within the catchment boundaries and identified the dominant type. Vegetation dynamics, including NDVI climatology and VITO leaf area index (LAI), were similarly aggregated. Anthropogenic impacts were quantified by calculating the percentage of irrigated area and the reservoir impact index (RI). The RI was computed as the ratio of the cumulative storage capacity of all upstream reservoirs to the mean annual natural discharge at the gauge location, derived by intersecting the catchment polygons with the GRanD v1.3 reservoir locations. These spatial operations were implemented using the rasterstats package in Python 3.12 to ensure computational efficiency and precision across the millions of vector reaches. To ensure topographic consistency, each gauge’s drainage area was cross-referenced with the MERIT-Hydro network, and stations exhibiting a relative area discrepancy

ϵ

> 10% were excluded:

ϵ = \frac{|A_{o b s} - A_{s i m}|}{A_{o b s}} \times 100 %

(1)

where

A_{o b s}

represents the reported area, and

A_{s i m}

denotes the network-derived area.

2.2. Configuration of the MizuRoute Vector-Based Framework

The global river discharge was simulated using mizuRoute, a vector-based routing model that represents river networks as a series of connected reaches rather than traditional rectangular grid cells. This framework is particularly advantageous for global studies as it preserves the physical length and slope of the river network, which is critical for accurate travel-time estimation. The model was configured using the MERIT-Hydro 5 km vector network, which provides a high-fidelity representation of the global hydrography.

The primary forcing for the model was daily total runoff from the ERA5-Land reanalysis at a 0.1 degree spatial resolution, covering the period from 1980 to 2024. To bridge the scale gap between the 0.1 degree runoff grids and the 5 km MERIT-Hydro catchments, we applied an area-weighted mapping approach. The runoff for each individual reach was calculated by summing the contributions of all overlapping ERA5-Land grid cells, weighted by their respective intersection areas. This ensures that the water balance is conserved during the transition from the land surface model grid to the routing network. The routing simulation then translates this runoff into streamflow using either a kinematic wave or impulse response function, producing a daily discharge database for the entire global network.

2.3. Statistical Evaluation and Performance Attribution Framework

The evaluation of mizuRoute was performed by comparing the simulated reach-level discharge against the 12,115 in situ observations. To provide a holistic view of model fidelity, we employed five standard statistical metrics. The Kling–Gupta efficiency (KGE) served as the primary benchmark for overall skill, as it simultaneously accounts for correlation, bias, and variability.

K G E = 1 - \sqrt{{(r - 1)}^{2} + {(β - 1)}^{2} + {(γ - 1)}^{2}}

(2)

where r is the Pearson correlation coefficient representing the temporal synchronicity between simulated and observed series;

β

is the bias ratio calculated as

μ_{s} / μ_{o}

, which evaluates the volume error; and

γ

is the variability ratio, defined as

(\frac{σ_{s}}{μ_{s}}) / (\frac{σ_{o}}{μ_{o}})

, representing the model’s ability to capture the flow coefficient of variation. In these terms,

μ

and

σ

denote the mean and standard deviation of the discharge time series, respectively.

To evaluate the model’s capacity to simulate high-flow events and seasonal peaks, the Nash–Sutcliffe efficiency (NSE) was calculated:

N S E = 1 - \frac{\sum_{t = 1}^{T} {(Q_{s, t} - Q_{o, t})}^{2}}{\sum_{t = 1}^{T} {(Q_{o, t} - \bar{Q_{o}})}^{2}}

(3)

where

Q_{s, t}

and

Q_{o, t}

are the simulated and observed discharge at time t, T is the total number of days in the evaluation period, and

\bar{Q_{o}}

is the mean of the observed discharge. An NSE value of 1 signifies a perfect match, while a value of 0 indicates that the model predictions are only as accurate as the mean of the observed data.

Given that NSE is highly sensitive to extreme peak flows due to the squared error term, the log-transformed Nash–Sutcliffe efficiency (logNSE) was employed to focus on the model’s performance during low-flow regimes:

l o g N S E = 1 - \frac{\sum_{t = 1}^{T} {({l n Q}_{s, t} - l n Q_{o, t})}^{2}}{\sum_{t = 1}^{T} {(l n Q_{o, t} - \bar{l n Q_{o}})}^{2}}

(4)

where

{l n Q}_{s, t}

and

l n Q_{o, t}

represent the natural logarithms of simulated and observed discharge, and

\bar{l n Q_{o}}

is the mean of the log-transformed observations. By compressing the magnitude of peak flows, logNSE effectively highlights the routing model’s skill in capturing baseflow and recession characteristics.

Systematic errors were further analyzed using the relative bias (RB).

R B = \frac{\sum_{t = 1}^{T} {(Q_{s, t} - Q}_{o, t})}{\sum_{t = 1}^{T} Q_{o, t}} \times 100 %

(5)

where a positive RB indicates a systematic overestimation of discharge volume, and a negative value indicates underestimation.

To systematically characterize the similarities and divergent patterns in global river discharge, we conducted a tiered stratification analysis. First, the 12,115 stations were categorized into five primary climate zones according to the Koeppen–Geiger classification: tropical, arid, temperate, cold, and polar. This allows us to identify how different precipitation and temperature regimes influence routing accuracy. Second, we grouped the stations into six classes based on their relative bias (RB) levels, ranging from less than −20% to greater than 20%, to distinguish between stations with high temporal correlation but systematic volume errors. Third, the impact of human intervention was assessed by stratifying stations according to their reservoir regulation intensity.

The attribution of model performance was quantified using the Spearman rank correlation

ρ

between the KGE and the multi-dimensional basin attributes.

ρ = \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(6)

where n is the number of station samples, and d_i is the difference between the ranks of the performance metric (e.g., KGE) and a specific catchment attribute Xi (e.g., aridity index, slope, or soil clay content). This non-parametric approach assesses the monotonic relationship between model fidelity and basin characteristics, making it more suitable for complex hydrological data where variables may be non-linearly related. To ensure the robustness of the attribution results and avoid the influence of extreme failures, we performed the correlation analysis on both the full sample and a sub-sample of stations where KGE was greater than −5. This dual-track approach enables us to identify the specific physiographic or climatic drivers—such as aridity, slope, or soil silt content—that most significantly correlate with model performance across different regions of the globe.

3. Results

3.1. Global Distribution and Hydro-Physical Characteristics of Gauging Stations

The synthesized global streamflow database comprises 12,115 in situ stations, providing a high-density benchmark for evaluating the mizuRoute model across diverse environmental gradients (Figure 2). The geographical distribution of these gauges exhibits a significant hemispheric imbalance, with 8571 stations (70.7%) located in the Northern Hemisphere and 3544 stations (29.3%) in the Southern Hemisphere. Latitudinal analysis reveals that the network is primarily concentrated in the mid-latitudes, with a mean latitude of 21.4° N and a median of 38.1° N. The spatial density is highest in North America and Europe, while coverage remains comparatively sparse in tropical and high-latitude regions, as evidenced by the distribution of stations ranging from 54.5° S to 66.2° N. Temporal coverage for the 1980–2024 period is exceptionally robust, supporting long-term performance assessments. Approximately 40.6% of the stations (4914 gauges) possess highly complete records with over 80% data coverage. Conversely, only a small fraction (9.5%) of the dataset contains less than 20% coverage, typically associated with stations in remote or developing regions. This high degree of temporal continuity allows for a reliable evaluation of both seasonal cycles and inter-annual variability.

The database captures a wide spectrum of hydroclimatic regimes based on the Koeppen–Geiger classification. The temperate group constitutes the largest share of the dataset (47.4%, 5739 stations), followed by the cold (26.1%, 3163 stations) and arid (18.6%, 2250 stations) zones. Tropical stations account for 7.7% (927 gauges), while polar stations represent a critical but small niche. This distribution ensures that the routing model is tested against various runoff generation mechanisms, from snowmelt-dominated regimes in cold regions to convective-dominated systems in the tropics. The physical attributes of the gauged catchments further demonstrate this environmental diversity (Figure 3). The mean annual precipitation across the network averages 1019.2 mm/year, ranging from a hyper-arid 25.3 mm/year to a per-humid 6192.5 mm/year. The aridity index shows a median value of 0.96, with the 5th to 95th percentile range (0.4 to 3.7) capturing both water-limited and energy-limited conditions, providing a robust gradient to test the sensitivity of the mizuRoute model to runoff generation mechanisms across different water-balance regimes. Topographic complexity and anthropogenic influence also represent critical dimensions of the evaluation dataset. Topographic complexity is represented by the mean basin slope, which averages 4.98°. High-slope catchments (95th percentile > 15.27°) are primarily clustered in mountainous regions such as the Rockies, Andes, and the European Alps, whereas low-slope stations are dominant in the Great Plains and parts of Northern Europe. The fractional irrigated area (AEI) reaches a maximum of 91.3% in highly engineered basins, though the global median remains low (0.43%). High irrigation intensities are notably concentrated in the Central United States, Eastern China, and parts of Southern Europe. Furthermore, 20.6% of the stations are identified as being influenced by upstream reservoir regulation. While the median reservoir impact (RI) index is 0, the mean of 2.56 and extreme outliers (max > 6000) highlight the presence of highly regulated systems. These regulated stations, predominantly found in heavily dammed river fragments of North America and Eurasia, provide a unique opportunity to assess model performance degradation under human-induced flow alterations.

In summary, the compiled database of 12,115 stations covers a vast array of geographical, climatic, and anthropogenic conditions, successfully capturing the high spatial heterogeneity of the global river network. The combination of high-density mid-latitude observations and critical samples from diverse climate zones and regulated basins provides a rigorous foundation for benchmarking the mizuRoute vector-based routing model. The multi-dimensional nature of these attributes allows for a detailed attribution of simulation errors to specific physical and human-induced drivers in the subsequent analysis.

3.2. Global Patterns and Hydroclimatic Stratification of Model Performance

The global benchmarking of the mizuRoute model reveals distinct spatial heterogeneity in simulation fidelity across the 12,115 stations (Figure 4). Globally, the model achieves a median KGE of 0.17, with the interquartile range (p25–p75) spanning from −0.13 to 0.37. While the top 5% of stations (p95) exhibit excellent performance with KGE > 0.62, the mean KGE of −0.15 reflects the influence of left-censored, low-performing outliers. These outliers, visible in the boxplots (Figure 4), predominantly represent catchments subject to heavy regulation or severe forcing errors, where the model fails to capture the flow regime entirely. The NSE follows a similar pattern with a median of 0.1, though it is more sensitive to peak flow mismatches. Interestingly, the CC remains robustly high across the network, with a global median of 0.53 and a 95th percentile reaching 0.81, indicating that the model captures the temporal seasonality of discharge even when volumetric biases exist. Spatially, these high-performance clusters are prominently concentrated in humid temperate regions such as the Eastern United States, Central Europe, and Southeast Asia. In contrast, performance significantly degrades in arid subtropical regions and high-latitude basins, where KGE and NSE frequently drop toward the lower bound of −5.0.

The systematic volumetric errors, quantified through relative bias (RB) and mean bias, reveal clear regional tendencies of overestimation and underestimation (Figure 5). After clipping to a representative range of [−100%, 250%], the RB distribution shows that approximately 24% of stations exhibit near-target volumes, yet a substantial number of stations in semi-arid zones and continental interiors show extreme overestimation (RB > 100%), likely due to the underrepresentation of transmission losses or overestimation of runoff in the ERA5-Land forcing. Conversely, systematic underestimation is prevalent in mountainous and snow-dominated regions, potentially linked to precipitation undercatch. Stratifying these results by Koeppen–Geiger climate zones (Figure 6) confirms that the cold and temperate groups yield the highest median efficiencies, with KGE medians of 0.27 (n = 3163) and 0.19 (n = 5739), respectively. The arid zone (n = 2250), however, remains the most challenging, exhibiting a negative median KGE of −0.15. Tropical regions (n = 927) show the highest temporal synchronicity with a median CC of 0.62, yet their negative median NSE (−0.14) and logNSE (−0.5) suggest difficulties in capturing the precise magnitude of tropical high-flows and the persistence of baseflow.

The interplay between systematic bias, human intervention, and model skill further characterizes the divergent patterns in global discharge (Figure 7 and Figure 8). When stratified by RB classes, stations with minimal bias (−5% to 5%, n = 438) and low negative bias (−20% to −5%, n = 1788) achieve the highest median KGE values of 0.40 and 0.36, respectively. Performance collapses for stations with extreme volumetric errors (RB > 20%, n = 3080), where the median KGE drops to −0.19. Furthermore, the impact of anthropogenic regulation is a dominant driver of performance degradation. While near-natural basins (RI = 0, n = 9616) and lightly regulated basins (0 < RI < 1, n = 1793) maintain median KGE scores of 0.16 and 0.27, simulation fidelity plummets as the reservoir impact index increases. For heavily regulated reaches (RI > 500, n = 10), the median KGE falls to extreme negative values, and the CC drops to 0.23. This failure is particularly evident in the logNSE metric, which reflects the model’s inability to simulate the artificial baseflow stabilization and peak attenuation characteristic of managed river systems. To provide a clear quantitative baseline alongside the spatial patterns shown in Figure 4 and Figure 6, Table 1 synthesizes the median efficiency metrics. This tabular format allows for a precise comparison of model fidelity across different environmental and anthropogenic gradients.

In summary, the global performance of mizuRoute is strongly modulated by the intersection of hydroclimatic regimes and human-induced flow alterations. The model provides a reliable reconstruction of discharge timing and volume in temperate and cold natural basins, but requires significant structural or forcing enhancements to address the systematic biases observed in arid regions and the altered hydraulics of regulated river networks.

3.3. Multi-Dimensional Attribution of Model Performance

To move beyond descriptive mapping, we conducted a robust attribution analysis to identify the physical and climatic drivers that govern mizuRoute’s performance. By focusing on the functional sub-sample of all stations where KGE > −5, we focus on understanding the monotonic relationships within a physically meaningful range. The binned median trends (Figure 9) reveal that model skill is most strongly sensitive to the moisture gradient. A clear negative relationship is observed between KGE and the aridity index (

ρ

= −0.35), while a positive trend exists with precipitation occurrence (

ρ

= 0.33). These trends indicate that as the environment transitions from water-limited (arid) to energy-limited (humid), the model’s ability to represent the runoff–discharge relationship improves significantly. This is further supported by the positive correlation with mean annual precipitation (

ρ

= 0.29), suggesting that the reanalysis forcing is more accurate in regions with higher and more frequent rainfall events.

The Spearman rank correlation comparison (Figure 10) provides a quantitative hierarchy of these drivers, illustrating the stability of the attribution across the full sample and the KGE > −5 sub-sample. Among the hydroclimatic variables, the aridity index and precipitation occurrence remain the most influential factors, followed closely by vegetation-related attributes. Specifically, tree cover percentage (

ρ

= 0.31) and leaf area index (LAI,

ρ

= 0.3) show strong positive correlations with KGE. This suggests that mizuRoute’s performance is optimized in densely vegetated temperate and tropical forests, where subsurface flow and baseflow processes are more stable and well-characterized by the kinematic wave approximation. Conversely, potential evapotranspiration (PET) exhibits a consistent negative correlation (

ρ

= −0.28), reinforcing the findings that high evaporative demand complicates the accurate simulation of river discharge in global routing frameworks.

The analysis also highlights the secondary but significant role of topographic and land surface features. Basin slope and snowfall fraction (

ρ

= 0.21) exhibit positive correlations, indicating that the model performs well in mountainous regions and snow-melt dominated catchments, where topographic gradients provide a strong forcing for the vector-based routing network. Notably, the correlation rankings are remarkably consistent between the two station samples, with only minor shifts in absolute values (e.g., P_mean shifting from 0.31 to 0.29). In summary, the attribution analysis confirms that the fidelity of the mizuRoute framework is primarily governed by the aridity–moisture gradient and vegetation density. The reliability of global discharge simulations is inherently higher in humid, topographically well-defined, and naturally vegetated basins, while further research is needed to resolve the complex runoff-retention processes in arid and human-impacted landscapes.

4. Discussion

4.1. Hydroclimatic Gradients and Physical Controls on Routing Fidelity

The dominant role of the aridity–moisture gradient in governing the performance of the mizuRoute framework represents a fundamental finding of this global evaluation. The high fidelity observed in temperate and cold regions (median KGE > 0.18) contrasted sharply with the persistent performance degradation in arid zones (median KGE = −0.15). This divergence is physically rooted in the differing runoff-generation mechanisms across climate zones. In humid and cold-climate catchments, river discharge is typically driven by saturation-excess overland flow and consistent subsurface contributions, which are relatively well-represented by the ERA5-Land reanalysis and the kinematic wave approximation. The strong positive correlation between KGE and precipitation occurrence (ρ = 0.33) further suggests that a higher frequency of hydrologic “events” allows the routing model to better resolve the basin’s response, whereas infrequent, high-intensity pulses in arid lands are easily lost in temporal or spatial averaging.

The performance collapse in arid regions (RB > 100% in many cases) highlights a critical structural gap in global routing: the omission of transmission losses. In dryland rivers, channel flow is significantly depleted by infiltration into dry riverbeds and riparian evapotranspiration—processes currently neglected in the standalone mizuRoute configuration. Furthermore, the role of vegetation as a primary control is evidenced by the positive relationship between KGE and LAI (ρ = 0.3). Forests and well-vegetated landscapes act as natural buffers, regulating runoff timing and creating smoother, more predictable hydrographs. In contrast, sparsely vegetated or barren landscapes exhibit rapid, highly non-linear responses that challenge the kinematic wave’s travel-time assumptions. Topographically, the model’s robustness in mountainous regions (slope = 0.16) validates the transition to vector-based frameworks. Unlike grid-based methods, mizuRoute preserves the physical length and longitudinal slope of reaches, ensuring that gravitational forcing is accurately depicted in complex terrain.

4.2. Anthropogenic Fingerprints and the Limitations of Naturalized Routing

A significant finding of this study is the quantified degradation of simulation skill under human intervention, demonstrating that naturalized routing is increasingly insufficient in the Anthropocene. The sharp decline in KGE and CC for basins with a reservoir impact index (RI > 10) highlights a structural inability to capture managed flow signatures. Reservoirs attenuate peak flows and augment baseflows for hydropower or irrigation; by omitting these storage-release operations, mizuRoute produces hydrographs that are “too flashy” and temporally misaligned with observed regulated flows. This is most evident in the logNSE metric, which serves as a proxy for the model’s failure to represent artificial baseflow stabilization.

Beyond regulation, the high positive bias in regions like the Central United States and Eastern China is intrinsically linked to unmodeled irrigation withdrawals. Our analysis shows that as the area equipped for irrigation (AEI) increases, the model systematically overestimates the volume of water reaching the gauge. This confirms that the error in simulated discharge is not merely a “routing error” or a “forcing error,” but a “structural omission” of the human water cycle. Even within the robust sub-sample (KGE > −5), anthropogenic attributes remain significantly negatively correlated with performance. This suggests that even when the model appears skillful by capturing seasonality, the underlying water balance remains distorted by unmodeled human activities. Future iterations must integrate dynamic reservoir operation modules and irrigation withdrawal schemes to close the water balance in these heavily modified landscapes.

4.3. Forcing Uncertainties, Regional Biases, and Future Benchmarking

The global fidelity of mizuRoute is inextricably linked to the quality of the ERA5-Land runoff forcing. Previous studies have noted that ERA5-Land tends to underestimate snow water equivalent in high latitudes, leading to the delayed or dampened spring pulse observed in our cold-zone results [38]. The systematic underestimation (negative RB) observed in high-latitude regions likely stems from the challenges land surface models face in representing snowpack dynamics, frozen soil, and the complex timing of spring snowmelt. If the input runoff misses the magnitude of these seasonal pulses, the routing framework cannot compensate for the missing mass. Similarly, the high temporal correlation (CC = 0.62) but poor efficiency (NSE = −0.14) in tropical zones indicates that while the reanalysis captures the monsoon timing, the volumetric estimation in dense tropical rainforests remains highly uncertain due to sparse gauge networks and complex canopy-interception processes.

Despite compiling 12,115 gauges—a major advancement over previous benchmarks—the geographical imbalance (70.7% in the Northern Hemisphere) remains a persistent limitation. The sparse coverage in Central Africa, the Amazon, and Northern Asia means global metrics remain weighted toward mid-latitude, data-rich regions. Future work should leverage satellite-based discharge estimates, such as those from the Surface Water and Ocean Topography (SWOT) mission, to validate model performance in these ungauged tropical and high-latitude basins [39]. The use of the KGE > −5 sub-sample for attribution analysis proved to be a robust methodological choice, allowing us to distinguish between “systematic model behavior” and “catastrophic failure” (often caused by gauge-reach mismatches or severe forcing errors). The consistency of Spearman rankings across both samples confirms that aridity, precipitation frequency, and vegetation are fundamental, physically meaningful predictors of model fidelity. Moving forward, the “spatial heterogeneity of skill” identified here argues against a “one-size-fits-all” parameterization. Regionalized calibration of channel roughness and the inclusion of dryland-specific loss functions will be essential for the next generation of global discharge products and climate impact assessments.

5. Conclusions

This study establishes the first comprehensive global performance benchmark for the mizuRoute vector-based routing model by validating it against a synthesized high-density database of 12,115 in situ gauging stations. By integrating multi-dimensional catchment attributes—ranging from hydroclimatic indices to anthropogenic regulation metrics—we provide a systematic appraisal of simulation fidelity across diverse environmental gradients. The results demonstrate that while the model achieves a robust global median Pearson correlation (CC) of 0.53, the overall efficiency remains spatially heterogeneous (median KGE = 0.17), highlighting the distinct regional capabilities and limitations of standalone vector routing at the global scale.

The findings reveal that hydroclimatic gradients are the primary determinants of model skill. Higher fidelity is consistently observed in temperate and cold regions where runoff generation is more predictable, whereas a significant performance collapse occurs in arid zones (median KGE = −0.15) due to the omission of channel transmission losses and excessive volumetric bias (RB > 100%). Furthermore, the study quantifies the “anthropogenic footprint” on model accuracy, showing that simulation skill plummets in heavily regulated basins (RI > 10). This underscores that in the Anthropocene, naturalized routing frameworks are increasingly insufficient for characterizing the water cycle in engineered watersheds, particularly in capturing the attenuated hydrographs downstream of major reservoirs.

In summary, while the transition to vector-based routing enhances the representation of river network topology, resolving physical structural gaps and human interventions remains a critical frontier. Future model development must prioritize the integration of explicit channel transmission loss functions for dryland rivers, regionalized reservoir operation schemes (e.g., integrating GRanD attributes), and the correction of forcing biases in data-sparse tropical and high-latitude regions. This benchmarking framework not only provides a foundational reference for the mizuRoute community but also offers a scientific roadmap for improving the reliability of global hydrological reconstructions and climate impact assessments.

Author Contributions

Conceptualization, S.X. and H.S.; methodology, L.T.; software, X.S.; validation, S.X., L.T., and X.S.; formal analysis, L.T.; investigation, L.T.; resources, L.T.; data curation, S.X.; writing—original draft preparation, S.X.; writing—review and editing, L.T.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Fundamental Research Program of Shanxi Province (Grant Number [202303021211196]) and the Teaching Reform and Innovation Project of Higher Education Institutions in Shanxi Province in 2024 (No. J20241550).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Xiaohui Sun was employed by Shanxi Institute of Geological Survey Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Clark, E.A.; Sheffield, J.; van Vliet, M.T.H.; Nijssen, B.; Lettenmaier, D.P. Continental runoff into the oceans (1950–2008). J. Hydrometeorol. 2015, 16, 1502–1520. [Google Scholar] [CrossRef]
Dai, A.; Qian, T.; Trenberth, K.E.; Milliman, J.D. Changes in continental freshwater discharge from 1948 to 2004. J. Clim. 2009, 22, 2773–2792. [Google Scholar] [CrossRef]
Wada, Y.; Reager, J.T.; Chao, B.F.; Wang, J.; Lo, M.-H.; Song, C.; Li, Y.; Gardner, A.S. Recent changes in land water storage and its contribution to sea level variations. Surv. Geophys. 2017, 38, 131–152. [Google Scholar] [CrossRef]
Liu, S.; Shi, H.; Sivakumar, B. Socioeconomic drought under growing population and changing climate: A new index considering the resilience of a regional water resources system. J. Geophys. Res. Atmos. 2020, 125, e2020JD033005. [Google Scholar] [CrossRef]
Milly, P.C.; Dunne, K.A. Colorado River flow dwindles as warming-driven loss of reflective snow energizes evaporation. Science 2020, 367, 1252–1255. [Google Scholar] [CrossRef]
Ficke, A.D.; Myrick, C.A.; Hansen, L.J. Potential impacts of global climate change on freshwater fisheries. Rev. Fish Biol. Fish. 2007, 17, 581–613. [Google Scholar] [CrossRef]
Durand, M.; Gleason, C.J.; Pavelsky, T.M.; de Moraes Frasson, R.P.; Turmon, M.; David, C.H.; Altenau, E.H.; Tebaldi, N.; Larnier, K.; Monnier, J.; et al. A framework for estimating global river discharge from the Surface Water and Ocean Topography satellite mission. Water Resour. Res. 2023, 59, e2021WR031614. [Google Scholar] [CrossRef]
Tuozzolo, S.; Lind, G.; Overstreet, B.; Mangano, J.; Fonstad, M.; Hagemann, M.; Frasson, R.P.M.; Larnier, K.; Garambois, P.; Monnier, J.; et al. Estimating River discharge with swath altimetry: A proof of concept using AirSWOT observations. Geophys. Res. Lett. 2019, 46, 1459–1466. [Google Scholar] [CrossRef]
Gochis, D.J.; Barlage, M.; Dugger, A.; FitzGerald, K.; Karsten, L.; McAlliste, M.; McCreight, J.; Mills, J.; RafieeiNasab, A.; Read, L.; et al. The WRF-Hydro Modeling System Technical Description, Version 5.0; NCAR Technical Note; UCAR: Boulder, CO, USA, 2018; 107p. [Google Scholar]
Thober, S.; Cuntz, M.; Kelbling, M.; Kumar, R.; Mai, J.; Samaniego, L. The multiscale routing model mRM v1.0: Simple river routing at resolutions from 1 to 50 km. Geosci. Model Dev. 2019, 12, 2501–2521. [Google Scholar] [CrossRef]
Siqueira, V.A.; Paiva, R.C.D.; Fleischmann, A.S.; Fan, F.M.; Ruhoff, A.L.; Pontes, P.R.M.; Paris, A.; Calmant, S.; Collischonn, W. Toward continental hydrologic-hydrodynamic modeling in South America. Hydrol. Earth Syst. Sci. 2018, 22, 4815–4842. [Google Scholar] [CrossRef]
Alfieri, L.; Lorini, V.; Hirpa, F.A.; Harrigan, S.; Zsoter, E.; Prudhomme, C.; Salamon, P. A global streamflow reanalysis for 1980–2018. J. Hydrol. X 2020, 6, 100049. [Google Scholar] [CrossRef] [PubMed]
Harrigan, S.; Zsoter, E.; Alfieri, L.; Prudhomme, C.; Salamon, P.; Wetterhall, F.; Barnard, C.; Cloke, H.; Pappenberger, F. GloFAS-ERA5 operational global river discharge reanalysis 1979–present. Earth Syst. Sci. Data 2020, 12, 2043–2060. [Google Scholar] [CrossRef]
Sutanudjaja, E.H.; van Beek, R.; Wanders, N.; Wada, Y.; Bosmans, J.H.C.; Drost, N.; van der Ent, R.J.; de Graaf, I.E.M.; Hoch, J.M.; de Jong, K.; et al. PCR-GLOBWB 2: A 5 arcmin global hydrological and water resources model. Geosci. Model Dev. 2018, 11, 2429–2453. [Google Scholar] [CrossRef]
van Beek, L.P.H.; Wada, Y.; Bierkens, M.F.P. Global monthly water stress: 1. Water balance and water availability. Water Resour. Res. 2011, 47, W07517. [Google Scholar] [CrossRef]
Li, H.-Y.; Wigmosta, M.S.; Wu, H.; Huang, M.; Ke, Y.; Coleman, A.M.; Leung, L.R. A physically based runoff routing model for land surface and earth system models. J. Hydrometeorol. 2013, 14, 808–828. [Google Scholar] [CrossRef]
Wu, H.; Kimball, J.S.; Li, H.; Huang, M.; Leung, L.R.; Adler, R.F. A new global river network database for macroscale hydrologic modeling. Water Resour. Res. 2012, 48, W09701. [Google Scholar] [CrossRef]
Yamazaki, D.; Oki, T.; Kanae, S. Deriving a global river network map and its sub-grid topographic characteristics from a fine-resolution flow direction map. Hydrol. Earth Syst. Sci. 2009, 13, 2241–2251. [Google Scholar] [CrossRef]
Do, H.X.; Gudmundsson, L.; Leonard, M.; Westra, S. The global streamflow indices and metadata archive (GSIM)—Part 1: The production of a daily streamflow archive and metadata. Earth Syst. Sci. Data 2018, 10, 765–785. [Google Scholar] [CrossRef]
Zhou, X.; Yamazaki, D.; Revel, M.; Zhao, G.; Modi, P. Benchmarkframework for global river models. J. Adv. Model. Earth Syst. 2025, 17, e2024MS004379. [Google Scholar] [CrossRef]
Shen, Y.; Yamazaki, D.; Pokhrel, Y.; Zhao, G. Improving globalreservoir parameterizations byincorporating flood storage capacity dataand satellite observations. Water Resour. Res. 2025, 61, e2024WR037620. [Google Scholar] [CrossRef]
Yang, Y.; Feng, D.; Beck, H.E.; Hu, W.; Abbas, A.; Sengupta, A.; Monache, L.D.; Hartman, R.; Lin, P.; Shen, C.; et al. Global daily discharge estimation based ongrid long short-term memory (LSTM) model and river routing. Water Resour. Res. 2025, 61, e2024WR039764. [Google Scholar] [CrossRef]
Mizukami, N.; Clark, M.P.; Gharari, S.; Kluzek, E.; Pan, M.; Lin, P.; Beck, H.E.; Yamazaki, D. A vector-based river routing model for Earth system models:Parallelization and global applications. J. Adv. Model. Earth Syst. 2021, 13, e2020MS002434. [Google Scholar] [CrossRef]
Mizukami, N.; Clark, M.P.; Sampson, K.; Nijssen, B.; Mao, Y.; McMillan, H.; Viger, R.J.; Markstrom, S.L.; Hay, L.E.; Woods, R.; et al. mizuRoute version 1: A river network routing tool fora continental domain water resources applications. Geosci. Model Dev. 2016, 9, 2223–2238. [Google Scholar] [CrossRef]
Gharari, S.; Vanderkelen, I.; Tefs, A.; Mizukami, N.; Kluzek, E.; Stadnyk, T.; Lawrence, D.; Clark, M.P. A flexible framework forsimulating the water balance of lakes andreservoirs from local to global scales:mizuRoute-Lake. Water Resour. Res. 2024, 60, e2022WR032400. [Google Scholar] [CrossRef]
Addor, N.; Newman, A.J.; Mizukami, N.; Clark, M.P. The CAMELS data set: Catchment attributes and meteorology for large-sample studies. Hydrol. Earth Syst. Sci. 2017, 21, 5293–5313. [Google Scholar] [CrossRef]
Alvarez-Garreton, C.; Mendoza, P.A.; Boisier, J.P.; Addor, N.; Galleguillos, M.; Zambrano-Bigiarini, M.; Lara, A.; Puelma, C.; Cortes, G.; Garreaud, R.; et al. The CAMELS-CL dataset: Catchment attributes and meteorology for large sample studies—Chile dataset. Hydrol. Earth Syst. Sci. 2018, 22, 5817–5846. [Google Scholar] [CrossRef]
Addor, N.; Do, H.X.; Alvarez-Garreton, C.; Coxon, G.; Fowler, K.; Mendoza, P.A. Large-sample hydrology: Recent progress, guidelines for new datasets and grand challenges. Hydrol. Sci. J. 2020, 65, 712–725. [Google Scholar] [CrossRef]
Chagas, V.B.P.; Chaffe, P.L.B.; Addor, N.; Fan, F.M.; Fleischmann, A.S.; Paiva, R.C.D.; Siqueira, V.A. CAMELS-BR: Hydrometeorological time series and landscape attributes for 897 catchments in Brazil. Earth Syst. Sci. Data 2020, 12, 2075–2096. [Google Scholar] [CrossRef]
Coxon, G.; Addor, N.; Bloomfield, J.P.; Freer, J.; Fry, M.; Hannaford, J.; Howden, N.J.K.; Lane, R.; Lewis, M.; Robinson, E.L.; et al. CAMELS-GB: Hydrometeorological time series and landscape attributes for 671 catchments in Great Britain. Earth Syst. Sci. Data 2020, 12, 2459–2483. [Google Scholar] [CrossRef]
Fowler, K.J.A.; Acharya, S.C.; Addor, N.; Chou, C.; Peel, M.C. CAMELS-AUS: Hydrometeorological time series and landscape attributes for 222 catchments in Australia. Earth Syst. Sci. Data 2021, 13, 3847–3867. [Google Scholar] [CrossRef]
Höge, M.; Kauzlaric, M.; Siber, R.; Schönenberger, U.; Horton, P.; Schwanbeck, J.; Floriancic, M.G.; Viviroli, D.; Wilhelm, S.; Sikorska-Senoner, A.E.; et al. CAMELS-CH: Hydro-meteorological time series and landscape attributes for 331 catchments in hydrologic Switzerland. Earth Syst. Sci. Data 2023, 15, 5755–5784. [Google Scholar] [CrossRef]
Liu, J.; Koch, J.; Stisen, S.; Troldborg, L.; Højberg, A.L.; Thodsen, H.; Hansen, M.F.T.; Schneider, R.J.M. CAMELS-DK: Hydrometeorological time series and landscape attributes for 3330 Danish catchments with streamflow observations from 304 gauged stations. Earth Syst. Sci. Data 2025, 17, 1551–1572. [Google Scholar] [CrossRef]
Arsenault, R.; Brissette, F.; Martel, J.L.; Troin, M.; Lévesque, G.; Davidson-Chaput, J.; Gonzalez, M.C.; Ameli, A.; Poulin, A. A comprehensive, multisource database for hydrometeorological modeling of 14,425 North American watersheds. Sci. Data 2020, 7, 243. [Google Scholar] [CrossRef]
Helgason, H.B.; Nijssen, B. LamaH-Ice: LArge-SaMple DAta for Hydrology and Environmental Sciences for Iceland. Earth Syst. Sci. Data 2024, 16, 2741–2771. [Google Scholar] [CrossRef]
Klingler, C.; Schulz, K.; Herrnegger, M. LamaH-CE: LArge-SaMple DAta for Hydrology and Environmental Sciences for Central Europe. Earth Syst. Sci. Data 2021, 13, 4529–4565. [Google Scholar] [CrossRef]
Durr, H.H.; Meybeck, M.; Durr, S.H. Lithologic composition of the Earth’s continental surfaces derived from a new digital map emphasizing riverine material transfer. Global Biogeochemical Cycles 2005, 19, 49–53. [Google Scholar] [CrossRef]
Shen, Y.; Yamazaki, D.; Pokhrel, Y.; Zhao, G. Two recent mega dams reshape Yangtze river hydrology with comparable impact to Three Gorges Dam. J. Hydrol. Reg. Stud. 2025, 62, 103017. [Google Scholar] [CrossRef]
Riggs, R.M.; Allen, G.H.; Wang, J.; Pavelsky, T.M.; Gleason, C.J.; David, C.H.; Durand, M. Extending global river gauge records using satellite observations. Environ. Res. Lett. 2023, 18, 064027. [Google Scholar] [CrossRef]

Figure 1. Methodological workflow of the mizuRoute global evaluation framework. The diagram outlines the data flow from multi-source observation acquisition (Panel 1) to the vector-based routing simulation (Panel 2) and the final multi-metric statistical benchmarking (Panel 3).

Figure 2. Global spatial distribution of the in situ streamflow network. Panel (A) presents the geographical locations of the 12,115 gauging stations, where the circle color indicates the percentage of data coverage during the 1980–2024 study period, and circle size is proportional to the unit area discharge (mean discharge divided by catchment area). Panel (B) illustrates the histograms of data coverage, as well as longitudinal and latitudinal gradients.

Figure 3. Spatial distribution of key catchment-level attributes across the 12,115 gauging stations. The panels illustrate the geographical heterogeneity of four representative attributes used for performance attribution: (A) aridity index derived from WorldClim V2.1; (B) mean annual precipitation; (C) mean basin slope; (D) area equipped for irrigation (AEI, %).

Figure 4. Global spatial distribution and statistical summary of model performance metrics. Panels display (A) KGE, (B) NSE, (C) logNSE, and (D) CC. All metrics are left-censored at −5 for visualization. Circle colors represent the metric values, while circle size is proportional to the unit area discharge.

Figure 5. Global patterns of systematic model bias and volume errors: (A) relative bias (RB, %) distribution clipped to the [−100, 250] range to highlight regional patterns of overestimation and underestimation; (B) mean absolute bias clipped to [−50, 50]. Circle sizes represent the mean unit area discharge across the global network.

Figure 6. Model performance stratification across five Köoeppen–Geiger climate zones (A). Boxplots illustrate the distribution of (B) KGE, (C) NSE, (D) logNSE, and (E) CC for tropical, arid, temperate, cold, and polar regions. Median values and station counts are provided for each category.

Figure 7. Distribution of efficiency metrics across relative bias (RB) classes. Boxplots show the variation in (A) KGE, (B) NSE, (C) logNSE, and (D) CC for stations categorized into six RB groups.

Figure 8. Impact of reservoir regulation on simulation fidelity. (A) Spatial distribution of reservoir regulation intensity; (B) Boxplots show the variation in KGE, NSE, logNSE, and CC for stations grouped across six levels of reservoir regulation intensity.

Figure 9. Binned median trends between catchment attributes and KGE for the robust sub-sample (KGE > −5). Scatter plots and trend lines represent the sensitivity of model skill to (A) aridity index, (B) precipitation occurrence, (C) mean annual precipitation, and (D) leaf area index, with Spearman

ρ

values indicating the strength of the monotonic relationship.

Figure 9. Binned median trends between catchment attributes and KGE for the robust sub-sample (KGE > −5). Scatter plots and trend lines represent the sensitivity of model skill to (A) aridity index, (B) precipitation occurrence, (C) mean annual precipitation, and (D) leaf area index, with Spearman

ρ

values indicating the strength of the monotonic relationship.

Figure 10. Quantitative attribution of model performance drivers. Spearman rank correlation coefficients (ρ) between multi-dimensional basin attributes and KGE for the full sample (grey) and the functional sub-sample (KGE > −5). The ranking identifies aridity and precipitation frequency as the primary global predictors of model fidelity.

Table 1. Key evaluation metrics for different stratification groups. Median values are given for each group.

Stratification Group	KGE	NSE	logNSE	CC
Global (n = 12,115)	0.17	0.1	−0.03	0.53
Climate Zones
Tropical (n = 927)	0.11	−0.14	−0.5	0.62
Arid (n = 2250)	−0.15	−0.1	−0.53	0.43
Temperate (n = 5739)	0.19	0.15	0.01	0.54
Cold (n = 3163)	0.27	0.14	0.14	0.54
Polar (n = 31)	0.11	−0.7	−1.73	0.49
Reservoir Impact Classes
RI = 0 (n = 9616)	0.16	0.1	−0.01	0.52
0 < RI ≤ 1 (n = 1793)	0.27	0.19	0.04	0.62
1 < RI ≤ 10 (n = 565)	−0.07	−0.13	−0.65	0.47
10 < RI ≤ 100 (n = 113)	−0.36	−0.29	−0.69	0.43
100 < RI ≤ 500 (n = 18)	−1.18	−0.83	−0.60	0.39
RI > 500 (n = 10)	−659.18	−239,252.55	−14.73	0.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, S.; Sun, H.; Tang, L.; Sun, X. A Global Benchmark of the Vector-Based Routing Model MizuRoute: Similarities and Divergent Patterns in Simulated River Discharge. Water 2026, 18, 485. https://doi.org/10.3390/w18040485

AMA Style

Xu S, Sun H, Tang L, Sun X. A Global Benchmark of the Vector-Based Routing Model MizuRoute: Similarities and Divergent Patterns in Simulated River Discharge. Water. 2026; 18(4):485. https://doi.org/10.3390/w18040485

Chicago/Turabian Style

Xu, Shuyuan, Haodong Sun, Li Tang, and Xiaohui Sun. 2026. "A Global Benchmark of the Vector-Based Routing Model MizuRoute: Similarities and Divergent Patterns in Simulated River Discharge" Water 18, no. 4: 485. https://doi.org/10.3390/w18040485

APA Style

Xu, S., Sun, H., Tang, L., & Sun, X. (2026). A Global Benchmark of the Vector-Based Routing Model MizuRoute: Similarities and Divergent Patterns in Simulated River Discharge. Water, 18(4), 485. https://doi.org/10.3390/w18040485

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Global Benchmark of the Vector-Based Routing Model MizuRoute: Similarities and Divergent Patterns in Simulated River Discharge

Abstract

1. Introduction

2. Materials and Methods

2.1. Synthesis of Multi-Source Streamflow Observations and Hydro-Physical Attributes

2.2. Configuration of the MizuRoute Vector-Based Framework

2.3. Statistical Evaluation and Performance Attribution Framework

3. Results

3.1. Global Distribution and Hydro-Physical Characteristics of Gauging Stations

3.2. Global Patterns and Hydroclimatic Stratification of Model Performance

3.3. Multi-Dimensional Attribution of Model Performance

4. Discussion

4.1. Hydroclimatic Gradients and Physical Controls on Routing Fidelity

4.2. Anthropogenic Fingerprints and the Limitations of Naturalized Routing

4.3. Forcing Uncertainties, Regional Biases, and Future Benchmarking

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI