Evaluation and Comparison of Reanalysis Data for Runoff Simulation in the Data-Scarce Watersheds of Alpine Regions

: Reanalysis datasets provide a reliable reanalysis of climate input data for hydrological models in regions characterized by limited weather station coverage. In this paper, the accuracy of precipitation, the maximum and minimum temperatures of four reanalysis datasets, the China Meteorological Assimilation Driving Datasets for the SWAT model (CMADS), time-expanded climate forecast system reanalysis (CFSR+), the European Centre for Medium-Range Weather Forecast Reanalysis (ERA). and the China Meteorological Forcing Dataset (CMFD), were evaluated by using data from 28 ground-based observations (OBs) in the Source of the Yangtze and Yellow Rivers (SYYR) region and were used as input data for the SWAT model for runoff simulation and performance evaluation, respectively. And, finally, the CMADS was optimized using Integrated Calibrated Multi-Satellite Retrievals for Global Precipitation Measurement (AIMERG) data. The results show that CMFD is the most representative reanalysis data for precipitation characteristics in the SYYR region among the four reanalysis datasets evaluated in this paper, followed by ERA5 and CFSR, while CMADS performs satisfactorily for temperature simulations in this region, but underestimates precipitation. And we contend that the accuracy of runoff simulations is notably contingent upon the precision of daily precipitation within the reanalysis dataset. The runoff simulations in this region do not effectively capture the extreme runoff characteristics of the Yellow River and Yangtze River sources. The refinement of CMADS through the integration of AIMERG satellite precipitation data emerges as a potent strategy for enhancing the precision of runoff simulations. This research can provide a reference for selecting meteorological data products and optimization methods for hydrological process simulation in areas with few meteorological stations.


Introduction
Located in Northwest China's Qinghai Province, the Source of the Yangtze and Yellow Rivers (SYYR) is the home to the headstreams of China's two great rivers, the Yangtze and Yellow river, which provide water resources for the country's natural ecosystems and a population exceeding hundreds of millions [1].This assumes indispensable functions in the preservation of water, regulation of runoff, and the sustenance of ecological diversity [2][3][4].Runoff, a pivotal facet of the hydrological cycle, exerts consequential effects on both the regional hydrological dynamics and holds substantial implications for water security and ecological well-being in downstream regions [5,6].Therefore, the large-scale simulation or prediction of runoff within the SYYR region holds significance in advancing our comprehension of surface water availability and management [7].
Watershed hydrological models have gained extensive utilization in the simulation and prediction of runoff processes and are now mainly classified into the lumped hydrological models, and the semi-distributed and distributed models.Sun et al. [8] applied the calibrated lumped Xin'anjiang model for flood forecasting in the Shaowu basin.Dal et al. [9] used the semi-distributed hydrological model to investigate the control of spatial variability of runoff by the main river channel in the Thur basin, Switzerland.Liu et al. [10] used the distributed hydrological model based on hydrothermal coupling considering glaciers and permafrost to model the hydrological processes associated with the water cycle in the Niyang River basin, located on the Qinghai-Tibet Plateau.Compared with these models, the advantage of the Soil and Water Assessment Tool (SWAT) model is that it not only has a strong physical basis, which can integrate geographic information systems (GISs), remote sensing (RS), and digital elevation models (DEMs), but also has a good user interface, strong spatial data management, organization, analysis, and presentation capabilities.At present, the SWAT model has experienced growing adoption among hydrologists and water resource managers for evaluating the influence of anthropogenic activities and climatic variations on water resources [11].However, a large amount of basic data are required for the operation of the SWAT model, which includes four main components: DEM data, land use data, soil data, and meteorological data.Over the years, the data quality of DEMs, land use, and soil datasets has witnessed substantial improvement owing to the swift evolution in remote sensing techniques and advancements in data classification methodologies [12].The availability and quality of meteorological data may be critical to the results of simulations, but dense meteorological observatories are expensive to maintain and not easily accessible [13].However, the insufficiency of weather stations and meteorological data, coupled with the absence of comprehensive, long-term time series datasets, characterizes the data limitations within the SYYR region [14].
With the advancement and development of satellite and ground-based observation technologies, an increasing number of meteorological data products are available for researchers to share and use.In particular, the precipitation data, which are the most important data driving hydrological models, are available in a wide range of options.In a review published in the Reviews of Geophysics, Sun et al. [15] delineated intricate details pertaining to data sources and estimation methodologies for approximately 30 extant global precipitation datasets, derived from gauge, satellite-related, and reanalysis datasets.It is worth highlighting that reanalysis datasets exhibit heightened variability compared to other dataset categories.Simultaneously, the extent of variability in precipitation estimates varies across different geographical regions However, as an important part of the "China Water Tower", the SYYR region is mainly recharged by natural precipitation and glacial meltwater.In the context of continued global warming, the increasing share of glacial meltwater in water recharge indicates the necessity of enhancing the impacts of temperature on runoff in the SYYR region; taking into account natural precipitation.The significance of the reanalyzed dataset is underscored by its comprehensive incorporation of intricate precipitation data and a multitude of climatic variables, encompassing temperature, solar radiation, and wind speed.Of particular relevance to the SYYR area are precipitation and temperature factors.In comparison to alternative climate products, these datasets comprehensively fulfill our requirements for runoff simulation employing the SWAT model.
Reanalysis datasets denote gridded datasets characterized by spatial and temporal integrity, generated through weather simulation models extrapolating from observed data [13].They combine satellite data with observational data to represent observed weather as accurately as possible on a continuous time and space scale.Numerous meteorological reanalysis data products have been employed in the realm of hydrological modeling, including the CFSR, one of the preeminent reanalysis climate products utilized in SWAT modeling, contributing about 40% to publications, and CMADS.Although the latter exclusively encompasses the East Asian region, with freely accessible data limited to the temporal span from 2008 to 2018, it still contributes about 13% to publications [16].In addition, other global reanalysis products used in SWAT include the European Centre for Medium-Range Weather Forecast ReAnalysis (ERA) and the Climatic Research Unit (CRU).Reanalysis datasets usually provide a more accurate representation of weather conditions at the basin or regional scale, but given that the reanalysis system is constituted by a background forecast model and a data assimilation process, their data quality depends to some extent on the observed data used in the reanalysis system [17].Consequently, in some cases, the reanalysis dataset is not matched by the actual weather conditions in the region.Eini et al. [13] evaluated precipitation data from five reanalysis datasets within the CRU and CFSR in a semi-arid basin and simulated the local hydrology separately and concluded that reanalysis precipitation is a valid alternative for hydrological modeling in semi-arid basins.Ndhlovu et al. [18] utilized gridded climate data in the context of hydrological modeling within the Zambezi River Basin, located in Southern Africa, and showed that the CFSR has a satisfactory hydrological modeling result for this region.However, preceding studies frequently focus solely on evaluating the precision of monthly precipitation, overlooking the evaluation of the precision of daily precipitation data essential for hydrological modeling as mandated by the SWAT model.It also requires an accurate assessment because of the effect of alpine temperature on regional hydrology.Therefore, a comprehensive evaluation of the regional precision of reanalysis data must be an obligatory prerequisite preceding their incorporation into hydrological modeling [13].
Considering the suitability of meteorological reanalysis products across temporal and spatial dimensions, coupled with the availability of data, this paper selects CFSR and CMADS, which are used in hydrological modeling, the regularly updated ERA5 at a global scale, and the China Meteorological Forcing Dataset (CMFD), which is a mainstream Chinese gridded observation.Subsequently, an evaluation is conducted on the efficacy of the CFSR, CMADS, CMFD, and ERA precipitation and temperature datasets within the SYYR region.These datasets are subsequently employed in conjunction with the SWAT model for hydrological simulations in the region, aiming to discern the reanalysis dataset that yields optimal performance in the hydrological modeling context.Section 2 provides a comprehensive overview of the study area, encompassing details on data sources, statistical methodologies, and hydrological models employed in the research.Section 3 offers an in-depth analysis of the performance of the reanalysis dataset, encompassing statistical evaluations and hydrological modeling assessments.Section 4 delves into the underlying factors contributing to the varied applicability of reanalysis data within the SYYR region, elucidating optimization methodologies specific to CMADS.The study culminates in Section 5 with the presentation of key findings and conclusions.

Study Area
"Situated on the Tibetan Plateau (TP), the geographic expanse spans approximately 30.53 × 10 4 km 2 within the southern confines of Qinghai province.This area encompasses both the Yangtze River source region and the Yellow River source region, as shown in Figure 1.The Yangtze River source is bounded by the confluence of the Batang River, located between the Kunlun and Tanggula ranges in the heart of the TP covering a watershed area of about 16.11 × 10 4 km 2 .With its vast expanse of glaciers, snow-capped peaks, rivers, lakes, and wetlands, this region plays a crucial role as a primary hydrological reservoir.Serving as a vital water source for the Yangtze River basin, the runoff from the Yangtze River source region constitutes approximately 25% of the entire water volume of the Yangtze River" [19].The term 'Yellow River source' designates the region within the Yellow River basin situated upstream of the Longyangxia Reservoir, located in the northeastern segment of the Tibetan Plateau, encompassing a watershed area spanning approximately 14.42 × 10 4 km 2 .Despite its modest physical footprint, the Yellow River source exhibits an intricate network of water systems, serving as a vital water catchment in China.Notably, the runoff originating from this source contributes to over 40% of the entire Yellow River basin, establishing it as a substantial component of China's water tower [20].In this research, the SYYR region is divided into 54 sub-basins by the SWAT model, as shown Figure 1b.
northeastern segment of the Tibetan Plateau, encompassing a watershed area spanning approximately 14.42 × 10 4 km 2 .Despite its modest physical footprint, the Yellow River source exhibits an intricate network of water systems, serving as a vital water catchment in China.Notably, the runoff originating from this source contributes to over 40% of the entire Yellow River basin, establishing it as a substantial component of China's water tower [20].In this research, the SYYR region is divided into 54 sub-basins by the SWAT model, as shown Figure 1b.

Data Sources and Processing
CFSR, CMADS V1.0 (2008-2018) [21], CMFD, and ERA5 were designated as the reanalysis datasets that scrutinize the accuracy of precipitation and temperature (both maximum and minimum) in comparison to observations.Then, they were used as input data for the hydrological simulations.It is imperative to highlight that the CFSR(SWAT) [22] dataset, accessible via the SWAT website, received updates only until July 2014.This limitation compromises its adequacy for fulfilling the contemporary demands of scientific research presentations.In this study, tools such as Matlab2022b and python3.10 were used to connect the NCEP Climate Prediction System version 2 (CFSv2) 6 h product to the CFSR dataset, and updated the CFSR dataset to 2018, the same period as CMADS V1.0.This paper encompasses ground-based observational meteorological data (OBS) utilized for assessing the precision of reanalysis data.Additionally, it includes the DEM data, land use data, and soil data essential for runoff simulation using the SWAT model, along with the hydrological observations required for the calibration process.In addition, the required data include AIMERG [23], the satellite precipitation data needed to optimize CMADS.Detailed information such as the source of the above data can be found in Table 1.

Data Sources and Processing
CFSR, CMADS V1.0 (2008-2018) [21], CMFD, and ERA5 were designated as the reanalysis datasets that scrutinize the accuracy of precipitation and temperature (both maximum and minimum) in comparison to observations.Then, they were used as input data for the hydrological simulations.It is imperative to highlight that the CFSR(SWAT) [22] dataset, accessible via the SWAT website, received updates only until July 2014.This limitation compromises its adequacy for fulfilling the contemporary demands of scientific research presentations.In this study, tools such as Matlab2022b and python3.10 were used to connect the NCEP Climate Prediction System version 2 (CFSv2) 6 h product to the CFSR dataset, and updated the CFSR dataset to 2018, the same period as CMADS V1.0.This paper encompasses ground-based observational meteorological data (OBS) utilized for assessing the precision of reanalysis data.Additionally, it includes the DEM data, land use data, and soil data essential for runoff simulation using the SWAT model, along with the hydrological observations required for the calibration process.In addition, the required data include AIMERG [23], the satellite precipitation data needed to optimize CMADS.Detailed information such as the source of the above data can be found in Table 1.

SWAT Model
The SWAT model, derived from the Simulator for Water Resources in Rural Basins (SWRRB) model, is a semi-distributed hydrological model encompassing three primary sub-modules designed for applications in hydrology, soil erosion, and pollution load assessment [24,25].Since its development, the model has undergone ongoing refinement, and its efficacy in simulating and forecasting runoff and sediment dynamics has garnered global recognition through widespread utilization and validation.

Hydrological Cycle Processes in SWAT Model
The process of the hydrological cycle follows the principle of water balance.A fraction of the precipitation is intercepted and retained by the vegetative canopy, while the remaining portion impinges upon the soil surface.Subsequently, a segment of the water at the soil surface undergoes infiltration into the soil profile, contributing to the generation of surface runoff, which then converges into the river system.Then, part of the infiltrated water will remain in the soil and evaporate, and others will flow into the surface water system via underground channels [26,27], the formula is as follows: SW t denotes the final soil water content, SW 0 represents the initial soil water content on day 'i', where 't' denotes the temporal dimension in days.The variables R day , W y , Q surf , E a , W seep , and Q gw , correspondingly, signify the daily quantities of precipitation, snowmelt, surface runoff, evapotranspiration, percolation, and return flow on day 'i', with all measurements expressed in millimeters.

Snowmelt Processes in SWAT Model
The SWAT employs a temperature index-based methodology for the estimation of snowmelt processes.The dynamics of snowmelt are intricately influenced by both atmospheric and snowpack temperatures, the rate of melting, and the spatial extent of snow coverage.Within this framework, the model treats the resultant melted snow as akin to rainfall, facilitating the computation of runoff and percolation.The quantification of snowmelt is derived through a linear function, wherein it is expressed as a function of the disparity between the average maximum air temperature associated with the snowpack and a user-defined threshold denoting the temperature conducive to snowmelt.
SNOW melt represents the daily volume of snowmelt (mm H 2 O), f melt denotes the snowmelt factor (mm H 2 O/ • C-day), SNOW cov signifies the proportion of Hydrologic Response Units characterized by snow cover, T snow represents the temperature of the snowpack ( • C), T max corresponds to the maximum temperature ( • C), and T melt is the temperature threshold indicative of snowmelt initiation ( • C).

Model Setup
The simulation of the source basin was executed utilizing the ArcSWAT interface designed for SWAT2009.In the source regions of the Yellow River and the Yangtze River, a single basin outlet is concurrently identified, and the entire basin is delineated into 54 subbasins using a DEM.In order to optimize both the precision of model computation and computational efficiency, a total of 695 Hydrologic Response Units (HRUs) were generated.The delineation of these HRUs was contingent upon the predominant land use, soil characteristics, and slope, with respective thresholds of 10%, 15%, and 10% being applied.

Statistical Measures 2.4.1. Validation Strategies
The evaluation of the reanalysis datasets was divided into two main aspects: (1) Climatic aspects-evaluating the accuracy of four reanalysis datasets in terms of precipitation, maximum and minimum temperatures in time (daily/monthly/annual), and space by using OBS data.(2) Hydrological aspects-the comparison of runoff data obtained from four reanalysis datasets and the OBS data-driven SWAT model to analyze their performance in a runoff simulation.As reanalysis datasets are grid data products while OBS data are point data, so the spatial matching between the two kinds of data is a critical aspect.According to Pombo et al. [28] and Tan et al. [29], there are two main approaches to solving the spatial matching problem.To achieve grid-to-point conversion, one approach is to calculate the meteorological indicators corresponding to the location of the actual meteorological station in the reanalysis dataset by simple averaging; the other is to use interpolation to convert points to a grid based on the actual meteorological station locations.In this paper, we choose to calculate the precipitation, maximum temperature, and minimum temperature of four reanalysis datasets at the corresponding points of OBSs using the inverse distance weighting method by comparing the results obtained by the two methods of interpolation and simply averaging them.As for the evaluation of reanalysis datasets, only grids covering at least one OBS site were considered, otherwise, they were excluded from the evaluation.

Validation Metrics
Taylor diagrams serve as a tool for evaluating the performance of four distinct reanalysis datasets of climate information in comparison to observational data.In a Taylor diagram, the correlation coefficient (CC), standard deviation (STD), and root mean square error (RMSD) of the reanalysis climate data are related to the observations by a cosine relationship.This was employed for the evaluation of the precision of precipitation, maximum temperature, and minimum temperature within the reanalysis datasets relative to OBS data.The latter had been extensively utilized in prior research endeavors for the meticulous assessment of gridded climate products [30][31][32].In addition, three categories of statistical indicators, the Probability of Detection (POD), False Alarm Rate (FAR), and Critical Success Index (CSI), were used to assess the consistency of daily precipitation data with actual precipitation events in the reanalysis data, which can be represented in the performance diagram.POD and FAR metrics quantify the proportions of correctly and incorrectly identified precipitation events relative to the total precipitation occurrences, respectively.Meanwhile, CSI assesses the comprehensive ratio of both precipitation and non-precipitation days detected by the reanalysis data.Within the Taylor diagram, proximity to the 'observed' point along the x-axis signifies heightened accuracy, whereas in the performance diagram, closeness to the upper right corner indicates better detectability of occurrence [33].The determination coefficient (R 2 ) and Nash-Sutcliffe coefficient (NSE) were used to evaluate the runoff accuracy of the SWAT model.The evaluation criteria (Table 2) were applied for evaluation standards, which are commonly used by previous researchers [34].In other words, the assessment of simulated runoff results from the SWAT model was conducted by gauging the R 2 and NSE indexes for their respective magnitudes.The calculation methods of the above evaluation indicators are shown in Table 3.
where O denotes the observed data, G signifies the reanalysis data, n represents the sample size, H represents the count of accurately identified precipitation days by the reanalysis data, F indicates the count of observed non-precipitation days erroneously identified as precipitation by the reanalysis data, M denotes the count of observed precipitation days not detected by the reanalysis data.Q S i and Q O i denote the simulated and observed values for event i, while Q S and Q O represent the respective averages of simulated and observed events.

Climate Aspect
A total of 903 meteorological points in the SYYR region and its surrounding areas were extracted from the four reanalysis datasets, respectively, including three meteorological factors: precipitation, maximum temperature, and minimum temperature.Then, the observation data of 28 meteorological stations were used to analyze the time scale (daily/monthly/yearly) and spatial scales to evaluate the accuracy of the reanalysis datasets.

Evaluation of the Precipitation of Reanalysis Datasets by Time Scale
A quantitative understanding of how these reanalysis data compare to observations on different time scales can be found by looking at the correlation coefficient (i.e, CC), centered root mean square difference (i.e., RMSD), and standard deviation (i.e., STD).This can be summarized by a drawing of the Taylor diagram Figure 2. At the daily scale, CMFD performed best, followed by ERA5 and CFSR, while CMADS underperformed in terms of daily precipitation.For example, CMFD precipitation has the best correlation with observed precipitation (i.e., a correlation over 0.95) and its RMSD is also the smallest in the four reanalysis datasets used for evaluation (i.e., RMSD less than 1 mm/day), showing that CMFD is in excellent agreement with observed data, although the STD of CMFD is not the smallest.The correlation between ERA5 and CFSR with observations, although not as high as CMFD, is also satisfactory (i.e., the correlation also reaches 0.85-0.91),whereas the correlation between CMADS and observations is only 0.74, showing that ERA5 and CFSR are also representative of the precipitation characteristics of the SYYR region to some extent.Additionally, the standard deviation of CMFD is 2.01 mm/day, which is only 0.01 mm/day different from the standard deviation of 2.00 mm/day for the observations, showing that CMFD captures the variability of precipitation in the SYYR region well.Standard deviations greater than 2.00 mm/day for ERA5 and CFSR indicate a slight tendency to overestimate estimated precipitation, while CMADS tends to underestimate (i.e., standard deviation of 1.81 mm/day).Meanwhile, CMFD has the highest POD and CSI and the smallest FAR in the four reanalysis datasets according to the performance diagrams in Figure 3, showing that CMFD performs well in the precipitation detection capability assessment.The four reanalysis products show similar performance, in most cases second only to CMFD, suggest- Meanwhile, CMFD has the highest POD and CSI and the smallest FAR in the four reanalysis datasets according to the performance diagrams in Figure 3, showing that CMFD performs well in the precipitation detection capability assessment.The four reanalysis products show similar performance, in most cases second only to CMFD, suggesting a robust occurrence detection mechanism.On the monthly scale, the performance of CMADS and CFSR is satisfactory (i.e., both correlation coefficients are greater than 0.95), while CMFD and ERA5 are not as good as at the daily scale (i.e., both correlation coefficients are less than 0.7).At the interannual scale, the four reanalysis datasets performed poorly, not only with correlation coefficients lower than 0.5, but also standard deviations and root mean square errors greater than 40 mm/year.

Evaluation of the Temperature of Reanalysis Datasets by Time Scale
At the daily scale, the four reanalysis datasets performed satisfactorily in te maximum and minimum temperature, with the best being CMADS, followed by and finally ERA5 and CFSR; all had correlation coefficients greater than 0.95.At th time, the STD and RMSD of the four reanalysis datasets are relatively close and n nificantly different to the observed data and show a cumulative state in the Tay gram, showing that the reanalysis dataset has high accuracy in temperature estim At the monthly scale, the four reanalysis datasets performed similarly to the daily the monthly scale, with all showing satisfactory accuracy in temperature estimat both maximum and minimum temperatures.On the interannual scale, the four rea data vary considerably in the accuracy of the temperature estimates, and the ac varies considerably between maximum and minimum temperatures.For example, the datasets considered, ERA5 exhibits superior accuracy in capturing the mean maximum temperature (i.e., the correlation coefficient of 0.92), while the accuracy low in terms of mean annual minimum temperature (i.e., the correlation coefficient 0.32).The other three reanalysis datasets are similar to ERA5, which may be caused lower number of samples used for assessment at the annual scale.
3.1.3.Spatial Annual Averages: Reanalysis Datasets vs. OBS Data In order to spatially compare the differences between reanalysis datasets an data, this study employs the ANUSPLIN interpolation methodology to interpolate

Evaluation of the Temperature of Reanalysis Datasets by Time Scale
At the daily scale, the four reanalysis datasets performed satisfactorily in terms of maximum and minimum temperature, with the best being CMADS, followed by CMFD, and finally ERA5 and CFSR; all had correlation coefficients greater than 0.95.At the same time, the STD and RMSD of the four reanalysis datasets are relatively close and not significantly different to the observed data and show a cumulative state in the Taylor diagram, showing that the reanalysis dataset has high accuracy in temperature estimation.At the monthly scale, the four reanalysis datasets performed similarly to the daily scale at the monthly scale, with all showing satisfactory accuracy in temperature estimation, for both maximum and minimum temperatures.On the interannual scale, the four reanalysis data vary considerably in the accuracy of the temperature estimates, and the accuracy varies considerably between maximum and minimum temperatures.For example, among the datasets considered, ERA5 exhibits superior accuracy in capturing the mean annual maximum temperature (i.e., the correlation coefficient of 0.92), while the accuracy is very low in terms of mean annual minimum temperature (i.e., the correlation coefficient of only 0.32).The other three reanalysis datasets are similar to ERA5, which may be caused by the lower number of samples used for assessment at the annual scale.

Spatial Annual Averages: Reanalysis Datasets vs. OBS Data
In order to spatially compare the differences between reanalysis datasets and OBS data, this study employs the ANUSPLIN interpolation methodology to interpolate annual mean precipitation as well as maximum and minimum temperatures within the SYYR region (Figure 4).From Figure 4a, it was found that precipitation presents a decrease from southeast to northwest in the SYYR region.Due to the topographic obstruction and airflow uplift, water vapor from the ocean decreases in the process of moving from southeast to northwest, resulting in abundant precipitation in the southeastern part of the SYYR region, with annual precipitation ranging from 800 to 1200 mm, while precipitation is scarce in the northwest, with annual precipitation less than 300 mm.By comparing Figure 4a,d,g,j,m, it can be discerned that the annual precipitation across CMADS, CFSR, CMFD, ERA5, and observational (OBS) data in the SYYR region consistently displays a geographical pattern marked by higher values in the southeast and lower values in the northwest.In particular, the CMFD data are in excellent agreement with the spatial distribution of the observations.Furthermore, it is apparent that CMADS tends to exhibit a systematic underestimation of annual precipitation within the SYYR region, while both CFSR and ERA5 demonstrate a tendency toward overestimation.This observation aligns with the outcomes derived from the quantitative analysis presented in the preceding section.
From Figure 4b,c, the investigation reveals a discernible east-to-west decline in both the annual mean maximum and minimum temperatures across the SYYR region.This temperature variation is attributed to the elevation gradient, particularly evident in the transition from the relatively lower-elevation eastern Yellow River source area to the higher-elevation Yangtze River source area in the west.The temperature reduction is compounded by the latitudinal influence, contributing to relatively elevated temperatures in the southeastern corner of the Yangtze River source region.This phenomenon indicates that the influence of elevation gradient on the temperature in the SYYR region is greater From Figure 4a, it was found that precipitation presents a decrease from southeast to northwest in the SYYR region.Due to the topographic obstruction and airflow uplift, water vapor from the ocean decreases in the process of moving from southeast to northwest, resulting in abundant precipitation in the southeastern part of the SYYR region, with annual precipitation ranging from 800 to 1200 mm, while precipitation is scarce in the northwest, with annual precipitation less than 300 mm.By comparing Figure 4a,d,g,j,m, it can be discerned that the annual precipitation across CMADS, CFSR, CMFD, ERA5, and observational (OBS) data in the SYYR region consistently displays a geographical pattern marked by higher values in the southeast and lower values in the northwest.In particular, the CMFD data are in excellent agreement with the spatial distribution of the observations.Furthermore, it is apparent that CMADS tends to exhibit a systematic underestimation of annual precipitation within the SYYR region, while both CFSR and ERA5 demonstrate a tendency toward overestimation.This observation aligns with the outcomes derived from the quantitative analysis presented in the preceding section.
From Figure 4b,c, the investigation reveals a discernible east-to-west decline in both the annual mean maximum and minimum temperatures across the SYYR region.This temperature variation is attributed to the elevation gradient, particularly evident in the transition from the relatively lower-elevation eastern Yellow River source area to the higher-elevation Yangtze River source area in the west.The temperature reduction is compounded by the latitudinal influence, contributing to relatively elevated temperatures in the southeastern corner of the Yangtze River source region.This phenomenon indicates that the influence of elevation gradient on the temperature in the SYYR region is greater than that of latitude.By comparing Figure 4b,e,h,k,n, a consensus is evident in the spatial distribution of the annual mean maximum temperature across CMADS, CFSR, CDMFD, and observational (OBS) data within the SYYR region.This commonality manifests as a geographical pattern characterized by higher values in the east and lower values in the west.However, ERA5 performed poorly and there was a significant overestimation of temperatures in the southwestern Yangtze River source.The spatial patterns observed in Figure 4c,f,i,l,o indicate a general coherence between the spatial distribution of the mean annual minimum temperature and that of the mean annual maximum temperature.

Calibration and Sensitivity Analysis of Parameters
The calibration of the SWAT model is executed by employing the Sequential Uncertainty Fitting Algorithm (SUFI-2) within the Uncertainty Procedure (SWAT-CUP).This method offers a versatile algorithm for the calibration of SWAT models characterized by a substantial number of input parameters [35].The process of calibrating the SWAT model involved the application of global sensitivity analysis within SWAT-CUP, leading to the identification and selection of a total of 10 parameters for calibration.Table 4 lists the global sensitivity ranking of model parameters and their best-fit values for runoff simulations in the SYYR region.Across all simulations, the year 2008 was designated as the warm-up phase for model simulations, while the interval spanning 2009 to 2014 served as the model calibration period.Subsequently, the validation phase encompassed the years 2015 to 2018.Calibration and validation procedures were conducted at both the daily and monthly temporal scales.The notation, v_ signifies the substitution of the default parameter with a specified value, while r_ indicates the multiplication of the existing parameter value by (1 + a given value), a means to add a fixed value to an existing parameter value.Note: The "Fitted Value" in the last column of Table 4 is the value entered during parameter modification in ArcSWAT, and does not represent the true value of the parameter.

Flow Simulation in the SYYR
To mitigate uncertainties in the hydrological model, parameter uniformity is imperative among the various components of the hydrological model.Therefore, we first rigorously process the data based on observations, then adjust the reference to obtain the best parameters for input into SWAT, with the best parameters held constant, and then input different meteorological forcing data for simulation, so that the resulting hydrological processes are comparable.
Upon calibration of parameters using observational data, diverse runoff simulations were generated by executing the SWAT model with varied meteorological datasets.Figure 5 presents the daily and monthly runoff simulations for the two hydrological stations, while Table 5 details the corresponding performance indicators.Comparative analysis of the runoff simulations using distinct input data reveals the superior performance of OBS-driven simulations, with CMFD demonstrating the closest resemblance, followed by ERA5 and CFSR.Conversely, CMADS exhibits comparatively inferior performance.different meteorological forcing data for simulation, so that the resulting hydrological processes are comparable.Upon calibration of parameters using observational data, diverse runoff simulations were generated by executing the SWAT model with varied meteorological datasets.Figure 5 presents the daily and monthly runoff simulations for the two hydrological stations, while Table 5 details the corresponding performance indicators.Comparative analysis of the runoff simulations using distinct input data reveals the superior performance of OBSdriven simulations, with CMFD demonstrating the closest resemblance, followed by ERA5 and CFSR.Conversely, CMADS exhibits comparatively inferior performance.In summary, the precision of runoff simulations generated by SWAT models generally exhibits a higher degree of accuracy at the monthly scale compared to the daily scale.Additionally, most reanalysis datasets adequately fulfill the data requirements for runoff simulations, with the notable exception of CMADS, which demonstrates suboptimal performance during the validation period.When coupled with the assessment outcomes of  In summary, the precision of runoff simulations generated by SWAT models generally exhibits a higher degree of accuracy at the monthly scale compared to the daily scale.Additionally, most reanalysis datasets adequately fulfill the data requirements for runoff simulations, with the notable exception of CMADS, which demonstrates suboptimal performance during the validation period.When coupled with the assessment outcomes of reanalysis data in Section 3.1, a notable correspondence emerges between the runoff simulation results and the evaluation of daily scale precipitation.Specifically, a one-to-one relationship is observed between the accuracy of runoff simulation and the precision of daily scale precipitation.It can be inferred that the accuracy of runoff simulation is heavily contingent upon the accuracy of daily scale precipitation within reanalysis datasets.This observation underscores the tendency in related research to primarily focus on the assessment of precipitation within reanalysis or satellite datasets.On the other hand, we note that there are differences in the accuracy of the reanalysis data for runoff simulations in the two basins.For example, although CMADS is less accurate in estimating daily precipitation, its simulation accuracy for monthly runoff in the Yellow River source area is better than that of CFSR and ERA5, while it performs poorly in the Yangtze River source area, most likely due to the spatial differences in the accuracy of the reanalysis data.
As can be seen from Figure 5, whether within the Yellow River or Yangtze River basin, the annual peak flow predominantly occurs during the period from May to September.Notably, during the calibration period (2009-2014), the highest daily average flow reached 3350.00 m 3 /s at the source of the Yellow River on 24 July 2012, and 3320.00 m 3 /s at the source of the Yangtze River on 24 July 2009.Employing the 95% quantile [36] of runoff values for the study period as the extreme runoff threshold (as illustrated in Figure 6), it was observed that, in most instances, both reanalysis datasets and observed data tend to underestimate peak flows during runoff simulations at the sources of both the Yellow River and the Yangtze.This discrepancy may arise from the SWAT model's limitation in accurately representing the glacier and soil thawing processes in the SYYR region during the thaw period.Additionally, it fails to seamlessly integrate glacier meltwater and thawed soil water into the simulated runoff outcomes.
was observed that, in most instances, both reanalysis datasets and observed data tend to underestimate peak flows during runoff simulations at the sources of both the Yellow River and the Yangtze.This discrepancy may arise from the SWAT model's limitation in accurately representing the glacier and soil thawing processes in the SYYR region during the thaw period.Additionally, it fails to seamlessly integrate glacier meltwater and thawed soil water into the simulated runoff outcomes.

Uncertainty Analysis of Reanalysis Data
In general, the reanalysis data have good applicability in China and can obtain a relatively accurate runoff simulation by running the SWAT model.However, the study area in this paper is located on the Tibetan Plateau, which has a unique plateau climate with climatic differences from other regions in China [14].Moreover, the spatial and temporal resolution, as well as the quality of the components, will exhibit variations across diverse reanalysis datasets, contingent upon factors such as the origin of the observed data, the forecast model/land surface model [37], the assimilation method [38], and other contributing parameters [39].These factors could contribute to notable disparities in the precision and suitability of the four reanalysis datasets within the SYYR region.
The findings of this study indicate that, within the SYYR region, CMFD exhibits the highest level of applicability, succeeded by ERA5 and CFSR, whereas CMADS demonstrates suboptimal results in runoff simulation due to its diminished performance in daily scale precipitation.Notably, CMFD and CMADS assimilate ground-based observations with multiple gridded datasets sourced from remote sensing and reanalysis [40].Given the constraints associated with ground observations, particularly in regions of elevated terrain complexity, direct interpolation of variable values may yield greater errors compared to the interpolation of differences or ratios between station data and background data [41].CMFD employed distinct data generation algorithms tailored for various variables [42].Notably, the precipitation generation algorithm employed a positive bias suppression method, wherein precipitation interpolation was conducted separately for sub-daily and -monthly scales.Subsequently, the sub-daily interpolated values were adjusted based on the corresponding monthly interpolated values [43], which greatly improves data accuracy.ERA5 is a comprehensive reanalysis dataset which assimilates as many observations as possible in the upper atmosphere and near the surface [44,45].In regions with limited observational coverage, ERA5 employs a weather forecasting model to furnish spatially and temporally continuous data.This dataset is generated through the utilization of 4D-Var data assimilation and model forecasts within the CY41R2 version of the ECMWF Integrated Forecast System (IFS) [46].ERA5 integrates available weather observations and complements them with a weather forecasting model to yield continuous data with spatial and temporal coverage.The resultant dataset, resembling a weather forecast, inherently carries a degree of uncertainty.Conversely, CFSR stands as a comprehensive, long-term reanalysis dataset crafted by the U.S. National Centers for Environmental Prediction.It encompasses both the coupled land-atmosphere-ocean model and the Global Land Data Assimilation System (GLDAS) [47].Although its data quality has improved after the upgrade transition from CFSv1 to CFSv2, the resulting data necessarily contain some uncertainty as they seem not to assimilate ground-based observations [48], but to produce data in a forecast manner.CMADS was constructed through the application of diverse techniques, including data loop nesting, resampling, and bilinear interpolation.This process was rooted in the utilization of field elements derived from the China Meteorological Administration Land Data Assimilation System (CLDAS) as the foundational dataset [49].This may be one of the reasons for the lower accuracy of CMADS in terms of daily scale precipitation, which is also consistent with the findings of Zhang et al. [49] and Dao et al. [50].Nevertheless, CMADS merits acknowledgment for its commendable precision in temperature estimates, yielding satisfactory outcomes across the annual, monthly, and daily temporal scales.Consequently, an evaluation of climate data source accuracy becomes imperative when employing reanalysis datasets at a regional scale, particularly in areas with limited meteorological station coverage.

Optimization of the CMADS Dataset
The SYYR region is one of China's important nature reserves, where human activities have relatively little impact on the natural environment [51].Moreover, the predominant sources of water supply to rivers in the region consist primarily of natural precipitation and glacial meltwater.Consequently, the regional runoff is primarily shaped by natural factors [5,52].In this investigation, an initial exploration of the influence of precipitation, temperature, and solar radiation on runoff in the SYYR region was conducted at a monthly scale using path analysis.This analysis was based on a synthesis of regional characteristics and the data essential for runoff simulation with the SWAT model, as illustrated in Figure 7.The findings reveal that temperature exerts the most substantial direct impact on runoff in the SYYR region, followed by solar radiation, with precipitation exhibiting the least direct influence.Li et al. [53] argued that with global warming and increasing temperatures, on the one hand, the depth of permafrost freezing decreases, glacial snowmelt increases, and evaporation increases in the basin, leading to a greater effect of temperature on runoff.On the other hand, the rise in temperature alters the characteristics of the basin's underlying surface, leading to an augmentation in soil layer thickness.This, in turn, escalates the proportion of precipitation contributing to groundwater recharge while diminishing direct surface runoff.Consequently, the impact of temperature on runoff in the SYYR region becomes more pronounced.

Optimization of the CMADS Dataset
The SYYR region is one of China's important nature reserves, where human activities have relatively little impact on the natural environment [51].Moreover, the predominant sources of water supply to rivers in the region consist primarily of natural precipitation and glacial meltwater.Consequently, the regional runoff is primarily shaped by natural factors [5,52].In this investigation, an initial exploration of the influence of precipitation, temperature, and solar radiation on runoff in the SYYR region was conducted at a monthly scale using path analysis.This analysis was based on a synthesis of regional characteristics and the data essential for runoff simulation with the SWAT model, as illustrated in Figure 7.The findings reveal that temperature exerts the most substantial direct impact on runoff in the SYYR region, followed by solar radiation, with precipitation exhibiting the least direct influence.Li et al. [53] argued that with global warming and increasing temperatures, on the one hand, the depth of permafrost freezing decreases, glacial snowmelt increases, and evaporation increases in the basin, leading to a greater effect of temperature on runoff.On the other hand, the rise in temperature alters the characteristics of the basin's underlying surface, leading to an augmentation in soil layer thickness.This, in turn, escalates the proportion of precipitation contributing to groundwater recharge while diminishing direct surface runoff.Consequently, the impact of temperature on runoff in the SYYR region becomes more pronounced.From the results of this paper, it is clear that CMADS does not perform well in precipitation, but it presents good results in temperature simulation.Therefore, the author advocates for the utilization of meteorological data products demonstrating superior precision in precipitation estimation or corresponding satellite data products to address the deficiencies observed in CMADS regarding precipitation.Notably, prior studies have consistently demonstrated a strong correlation between the daily precipitation data from IMERG and surface precipitation across China [54][55][56].
In this study, the selection is made for AIMERG, which is acquired through the implementation of an innovative spatiotemporal correction algorithm.Additionally, the integration of a high-quality ground-based observation product, APHRODITE, is employed to rectify the GPM IMERG satellite precipitation product [23].The AIMERG precipitation dataset effectively integrates the respective merits of satellite estimation and ground- From the results of this paper, it is clear that CMADS does not perform well in precipitation, but it presents good results in temperature simulation.Therefore, the author advocates for the utilization of meteorological data products demonstrating superior precision in precipitation estimation or corresponding satellite data products to address the deficiencies observed in CMADS regarding precipitation.Notably, prior studies have consistently demonstrated a strong correlation between the daily precipitation data from IMERG and surface precipitation across China [54][55][56].
In this study, the selection is made for AIMERG, which is acquired through the implementation of an innovative spatiotemporal correction algorithm.Additionally, the integration of a high-quality ground-based observation product, APHRODITE, is employed to rectify the GPM IMERG satellite precipitation product [23].The AIMERG precipitation dataset effectively integrates the respective merits of satellite estimation and ground-based observations, exhibiting superior performance in systematic bias and random errors across diverse spatial and temporal scales in China, thereby furnishing a more dependable foundational dataset for scientific research and applications in related fields across Asia [57,58].Employing Python and Matlab tools, precipitation data from AIMERG was extracted and substituted for corresponding points in CMADS, ensuring the continuity of other input data.Subsequently, the SWAT model underwent recalibration to simulate runoff in the SYYR region, with the stipulation that the remaining input data remained unaltered.The result shows (Table 6) that the CMADS data optimized with AIMERG are more suitable for runoff simulation in the SYYR area (Figure 8), especially in the Yangtze River source area, where the R 2 and NSE for monthly scale runoff simulation increased from 0.55 and 0.54 to 0.86 and 0.86, respectively, and the R 2 and NSE for monthly scale runoff simulation increased from 0.53 and 0.50 to 0.82 and 0.79.The refinement of the CMADS dataset through the application of the substitution method has yielded a significant enhancement in its accuracy.This approach, characterized by its simplicity and accessibility, contrasts with the intricate methodologies employed by other researchers in optimizing and fusing reanalysis data.These methodologies encompass sophisticated techniques, including artificial neural networks [59], wavelet transform methods [60], genetic algorithms [61,62], and machine learning [63,64].However, it is noteworthy that the efficacy of the aforementioned optimization is constrained to situations akin to CMADS, where certain metrics exhibit suboptimal performance relative to others.

Limitations
Within the confines of the SWAT model framework, notwithstanding the incorporation of parameters pertinent to snowmelt, it is imperative to acknowledge the model's omission of pivotal freeze-thaw phenomena manifesting in soils and glaciers.This lacuna encompasses the nuanced influences exerted by seasonal permafrost, permafrost, and glacial melt on the hydrological process of runoff.Particularly noteworthy are investigations conducted by Wang et al. [65] and Xin et al. [66], which have astutely recognized the substantive repercussions of these processes on streamflow dynamics.And the depth of perennial permafrost on the Tibetan Plateau has decreased in recent years [67].The concurrent presence of permafrost and seasonal permafrost introduces limitations on interfacial water fluxes encompassing the terrestrial surface, the cryosphere, and the unfrozen lithosphere beneath [68].This constraint engenders a diminution in the soil surface's infiltration capacity and the hydraulic conductivity of the soil, instigating alterations in surface hydrodynamics, subsurface groundwater transport, and subsequent disruptions to ecosystem functioning [69,70].Conversely, the escalating influence of glacial meltwater on runoff dynamics within the SYYR region necessitates the development of an analogous algorithmic framework or a module dedicated to simulating soil and glacier freeze-thaw processes within the SWAT model.Previous endeavors, exemplified by Omani et al. [71] and Qi et al. [72], sought to devise methodologies incorporating soil temperature and glacier mass balance simulations.Nevertheless, extant accomplishments underscore discernible lacunae the envisioned objectives, thus underscoring the imperative for future research endeavors aimed at ameliorating these gaps.Furthermore, the role of groundwater in runoff simulation within the SWAT model in alpine regions is of paramount importance.The substantial impact of groundwater on soil moisture dynamics, particularly in cooler climates, is undeniable, and its pivotal role in mitigating runoff constraints arising from soil freezing cannot be disregarded [70].However, the somewhat simplified delineation of groundwater processes in the SWAT model may not comprehensively encapsulate the intricate interplay between groundwater dynamics and soil moisture dynamics [73].The optimization of the groundwater-surface water linkage within the SWAT model can be achieved by incorporating a more nuanced depiction of interaction mechanisms between groundwater and surface water [74].Furthermore, enhancing the parameterization process governing groundwater-surface water interaction within the model is imperative to elucidate groundwater dynamics with greater granularity in forthcoming research endeavors.

Conclusions
In this study, three meteorological elements, precipitation, and maximum and minimum temperature of the reanalysis datasets CMADS, CFSR, CMFD, and ERA5 were initially evaluated based on daily, monthly, and annual time scales by using 28 meteorological station observations from 2008 to 2018 in the SYYR region.And the five meteorological datasets (OBS, CMADS, CFSR, CMFD, and ERA5) were also applied to the SWAT model to compare the performance of each dataset in the simulation of runoff.Additionally, the CMADS dataset was also optimized using AIMERG satellite precipitation data based on the assessment results of meteorological elements and the hydroclimatic characteristics of the SYYR region, and satisfactory results were obtained.In summary, the principal findings of this study are outlined as follows: (1) On the daily scale, in terms of climate, CMFD performs best (i.e., correlation over 0.95 and RMSD less than 1 mm/day), followed by ERA5 and CFSR, while CMADS performs poorly in terms of daily precipitation (i.e., correlation only 0.74).The four reanalysis datasets performed well in terms of temperature at the daily and monthly scales (i.e., correlation over 0.95), but not satisfactorily at the annual scales.(2) In runoff simulations, SWAT has better simulation accuracy at the monthly scale than at the daily scale.The best runoff simulation was based on OBS data (with R 2 > 0.80, NSE > 0.80 during the calibration period), and the closest to the OBS simulation results was CMFD, followed by ERA5 and CFSR, while CMADS was the worst performer.(3) Reanalysis data and the observed data in most cases underestimated the peak flows to varying degrees in the runoff simulations carried out, and failed to effectively capture the extreme runoff characteristics of the basin, both at the source of the Yellow River and the Yangtze.(4) After optimizing CMADS with the AIMERG precipitation dataset, the runoff simulation performance was greatly improved, with R2 and NSE increasing from 0.55 and 0.54 to 0.86 and 0.86 for the source of the Yangtze at the monthly scale and from 0.53 and 0.50 to 0.82 and 0.79 at the daily scale, respectively.
Ultimately, we posit that a meticulous evaluation of the precision inherent in reanalyzed data becomes imperative when employing such datasets at a regional scale, particularly in regions characterized by a paucity of weather station coverage.In the context of the SYYR region scrutinized within this study, it is undoubtable that the CMFD dataset and the refined iteration of CMADS stand out as the most apt choices.

Figure 3 .
Figure 3. Daily scale performance diagrams for the years 2008 to 2018 feature bias scores in by blue lines and the Comprehensive Similarity Index (CSI) represented by green lines (a-E CFSR, c-CMFD, d-CMADS).

Figure 3 .
Figure 3. Daily scale performance diagrams for the years 2008 to 2018 feature bias scores indicated by blue lines and the Comprehensive Similarity Index (CSI) represented by green lines (a-ERA5, b-CFSR, c-CMFD, d-CMADS).

Figure 6 .
Figure 6.Comparison of observed extreme runoff at >95% quantile with simulated runoff from SWAT.

Figure 6 .
Figure 6.Comparison of observed extreme runoff at >95% quantile with simulated runoff from SWAT.

Figure 7 .
Figure 7. Path analysis of runoff changes in the SYYR region, (a) the Yellow River; (b) the Yangtze River.Green color indicates a positive effect, red color indicates a negative effect, and the thickness of the line represents the magnitude of the effect.

Figure 7 .
Figure 7. Path analysis of runoff changes in the SYYR region, (a) the Yellow River; (b) the Yangtze River.Green color indicates a positive effect, red color indicates a negative effect, and the thickness of the line represents the magnitude of the effect.

Figure 8 .
Figure 8. Simulation results of observed monthly runoff and the AIMERG-optimized CMADSdriven SWAT model at hydrological stations during the calibration period (2009-2014).

Figure 8 .
Figure 8. Simulation results of observed monthly runoff and the AIMERG-optimized CMADS-driven SWAT model at hydrological stations during the calibration period (2009-2014).

Table 1 .
Foundational details pertaining to the datasets employed in this study.

Table 2 .
Evaluation of runoff simulation performance of SWAT model.

Table 3 .
Statistical indexes for the evaluation of precipitation, temperature, and runoff in this study.

Table 4 .
Initial and optimized parameter values for the runoff simulation.

Table 5 .
Performance metrics for daily and monthly runoff simulation.

Table 6 .
Evaluation Indicators for Daily and Monthly Runoff Simulation for CMADS and AIMERG + CMADS.

Table 6 .
Evaluation Indicators for Daily and Monthly Runoff Simulation for CMADS and AIMERG + CMADS.