Predicting Tropical Monsoon Hydrology Using CFSR and CMADS Data over the Cau River Basin in Vietnam

: To improve knowledge of this matter, the potential application of two gridded meteorological products (GMPs), the China Meteorological Assimilation Driving Datasets for the SWAT model (CMADS) and Climate Forecast System Reanalysis (CFSR), are compared for the ﬁrst time with data from ground-based meteorological stations over 6 years, from 2008 to 2013, over the Cau River basin (CRB), northern Vietnam. Statistical indicators and the Soil and Water Assessment Tool (SWAT) model are employed to investigate the hydrological performances of the GMPs against the data of 17 rain gauges distributed across the CRB. The results show that there are strong correlations between the temperature reanalysis products in both CMADS and CFSR and those obtained from the ground-based observations (the correlation coefﬁcients range from 0.92 to 0.97). The CFSR data overestimate precipitation (percentage bias approximately 99%) at both daily and monthly scales, whereas the CMADS product performs better, with obvious differences (compared to the ground-based observations) in high-terrain areas. Regarding the simulated river ﬂows, CFSR-SWAT produced “unsatisfactory”, while CMADS-SWAT (R 2 > 0.76 and NSE > 0.78) performs better than CFSR-SWAT on the monthly scale. This assessment of the applicative potential of GMPs, especially CMADS, may further provide an additional rapid alternative for water resource research and management in basins with similar hydro-meteorological conditions.


Introduction
Accurate and complete weather information provides important inputs into hydrological models, supporting flood forecasting and climate change impact assessments and serving as scientific guidance for water resource management [1][2][3]. Normally, data collected from meteorological stations are the most reliable and accurate data [4,5]; however, these data are insufficient to represent the actual weather conditions occurring in river basins due to their low spatial coverage [6,7], and as they are affected by signal distortions [8][9][10]. Furthermore, the acquisition of reliable temperature and precipitation data is a difficult task because of dynamic climatic conditions, altitudes, and surface properties [11][12][13][14]. These data are especially limited in developing countries due to technical and financial limitations [3,[11][12][13]15]. and CSFR data for hydro-meteorological studies has not yet been adequately investigated outside of China, including in Southeast Asia.
The Cau River basin (CRB) in the Thai Binh River network is located in northern Vietnam and plays an important social and economic role. The Cau River flows through six administrative provinces, including Hanoi city. Bui et al. [33] integrated the SWAT model with QUAL2K to simulate the water quality (mainly the organic and nutrient contaminant contents) in the CRB using limited data. Tran et al. [34] used the SWAT model to estimate the nitrogen (N) load with multiple polluting sources along the CRB, one of the three most polluted river basins in Vietnam. To date, assessments of the potential of CFSR data and especially CMADS data in hydrological applications have not been given sufficient attention. Furthermore, global climate change associated with extreme weather events adds complexity to water resource management issues in the CRB.
In this paper, CFSR and CMADS data are used for the first time in hydro-meteorological research over a specific river basin in Vietnam. For the aim of assessing the reliabilities and capabilities of these datasets in the hydrological model, in this study we have carried out the following specific tasks: (i) to use data from ground-based meteorological stations (GMS) to validate the CFSR and CMADS precipitation/temperature products on temporal and spatial scales using various statistical indicators; (ii) to assess the abilities of these datasets in hydrological simulations from 2008 to 2013 using the SWAT model; and (iii) to investigate the abilities of these GMPs to capture extreme weather (including rain events, extreme heat events, cold events, etc.) and streamflow events occurring in the CRB.

Study Area
The CRB (21.07 • -22.18 • N and 105.28 • -106.08 • E) contains the Cau River, the main tributary of the Thai Binh River, which is the second-largest river system in northern Vietnam. The river flows in a north-south direction, originating from the high mountains in northwest Bac Kan province. It flows through six administrative provinces and cities (including northern Hanoi city in the downstream reach). The river basin covers an area of nearly 6300 km 2 with a total length of~1603 km; the mainstream length is 290 km [34].
The topography in the basin changes from an altitude of approximately 1000 m in the surrounding mountains in the west, north, and northeast regions to the plains in the central and south regions with elevations below 10 m, alternating with hills with elevations ranging between 200 and 400 m above sea level. The CRB is characterized by a tropical monsoon climate with an annual mean temperature of approximately 23 • C. However, due to the influence of the northeast monsoon, from December to February, the monthly average temperature declines below 18 • C. The annual mean precipitation is~1600 mm, of which approximately 80% occurs in the wet season (May to October) and the rest occurs in the dry season (November to April). Land use in the region (Figure 1b) is divided into 9 major categories; approximately 49% of the region is natural forestland (the main upstream area has a complex mountainous topography), grassland, and shrubland (distributed in the midland areas of Bac Kan and Thai Nguyen provinces). Moreover, agricultural land (36%) and built-up regions (8%) are located in the downstream region of the river, where there are a crowded population and dynamic economic development. Notably, the irrigation canal system serving agricultural activities in the basin is quite complete. The Gia Bay hydrological station (Thai Nguyen province) measures the discharge in the center of the river basin.

Model Input Data
The spatial data required as inputs for the SWAT model include the digital elevation model (DEM), soil type map, and land use map. DEM data with a 30 m resolution were extracted from the United States Geological Survey (USGS). The land use data representing 2005 were collected from the Ministry of Natural Resources and Environment (MONRE) and were classified based on the SWAT format. Soil maps were obtained from the Food and Agriculture Organization of the United Nations (FAO), and the soil properties were determined based on the soil characteristic data of Vietnam.

NCEP-CFSR
CFSR is a reanalysis product developed by the US National Center for Environmental Prediction. This system was collected by NCEP from 1979 to July 2014 with a resolution of 0.31° (~38 km) based on the assimilation of atmosphere-ocean-land surface-sea ice system data at a global, coupled scale [23,25,35]. In the CRB, the daily meteorological data of 15 grid points were downloaded from the official website (https://globalweather.tamu.edu/ (accessed on 31 March 2021)).

CMADS Data
The CMADS dataset was developed by the Agricultural University of China based on the integration of the Local Analysis and Prediction System/Space-Time Multiscale Analysis System (LAPS/STMAS), the Climate Prediction Center Morphing (CMROPH) global precipitation data, and China's National Meteorological Information Center. The data sources used for the CMADS series, available from 2008 to 2016 and covering all of East Asia (0°-65° N; 60°-160° E), were collected from nearly 40,000 regional automated stations belonging to 2421 national stations; this ensures that the CMADS data are widely available and increases the accuracy of the dataset [24,30,36]. In this paper, we use CMADS v1.1, with a spatial resolution of 0.25° and 24 available grid points (this product

Model Input Data
The spatial data required as inputs for the SWAT model include the digital elevation model (DEM), soil type map, and land use map. DEM data with a 30 m resolution were extracted from the United States Geological Survey (USGS). The land use data representing 2005 were collected from the Ministry of Natural Resources and Environment (MONRE) and were classified based on the SWAT format. Soil maps were obtained from the Food and Agriculture Organization of the United Nations (FAO), and the soil properties were determined based on the soil characteristic data of Vietnam.

NCEP-CFSR
CFSR is a reanalysis product developed by the US National Center for Environmental Prediction. This system was collected by NCEP from 1979 to July 2014 with a resolution of 0.31 • (~38 km) based on the assimilation of atmosphere-ocean-land surface-sea ice system data at a global, coupled scale [23,25,35]. In the CRB, the daily meteorological data of 15 grid points were downloaded from the official website (https://globalweather.tamu.edu/ (accessed on 31 March 2021)).

CMADS Data
The CMADS dataset was developed by the Agricultural University of China based on the integration of the Local Analysis and Prediction System/Space-Time Multiscale Analysis System (LAPS/STMAS), the Climate Prediction Center Morphing (CMROPH) global precipitation data, and China's National Meteorological Information Center. The data sources used for the CMADS series, available from 2008 to 2016 and covering all of East Asia (0 • -65 • N; 60 • -160 • E), were collected from nearly 40,000 regional automated stations belonging to 2421 national stations; this ensures that the CMADS data are widely available and increases the accuracy of the dataset [24,30,36]. In this paper, we use CMADS v1.1, with a spatial resolution of 0.25 • and 24 available grid points (this product is available from the following website: http://www.cmads.org/ (accessed on 10 April 2021).

GMS Data
GMS data from 13 rain gauges and four daily meteorological stations in the CRB were obtained from the Meteorological and Hydrological Administration of the Ministry of Natural Resources and Environment (MORNE). On the CRB, only the Gia Bay hydrological station in Thai Nguyen province (which has been recording data from 1997 to the present) is used in this study, as other stations have stopped working or have collected inadequate data. No information is available at the river basin outlet.
The GMP and GMS data necessary for the SWAT model include the daily maximum/minimum temperature and precipitation data collected and processed from 1 January 2008 to 31 December 2013 to ensure consistency in the evaluation and comparison of the performances of the input data. Detailed information about the sources and the spatial/temporal resolutions of the products is shown in Table 1.

Hydrological Modeling Method
The SWAT model is a semi-distributed hydrological model developed by the Agricultural Research Service of the United States Department of Agriculture [22]. The hydrologic cycle as simulated by SWAT is based on the water balance equation, which considers precipitation, irrigation, evapotranspiration, surface runoff, lateral flow, and percolation to shallow and/or aquifers: where SW t is the final soil water content, SW 0 is the initial soil water content on day "i", "t" is the time (days), R day , Q suf , ET a , P i and QR i are the daily amounts of precipitation, runoff, evapotranspiration, percolation, and return flow on day "i", respectively (all units are mm H 2 O) [37]. With outstanding advantages, SWAT has been widely used to simulate hydrological processes and conduct impact assessments of land management methods on water and point-/non-point-source pollution in river basins and watersheds [21,37]. Inputs of SWAT model simulations include weather data (e.g., precipitation, maximum/minimum temperature), soil properties, topography, vegetation, and land management practices. The 2012 ArcSWAT version, an interface in ArcGIS 10.2, was used to build the hydrological study in the CRB. Installation guidelines and research papers related to the SWAT model are available at https://swat.tamu.edu/ (accessed on 5 April 2021), as well as in an online open-source document [38]. Using 30 m DEM data and the river network, the basin delimiter in ArcSWAT creates 42 sub-basins in the CRB. These sub-basins are further subdivided into 405 hydrological reactive units (HRUs) with different land, topographic, and soil management characteristics. The GMS, CFSR, and CMADS weather data are then provided as inputs for the flow simulations (hereinafter referred to as the SWAT model controlled by CFSR_, CMADS_, and GMS_), and the results are verified with the hydrological observations.

Index Evaluates Temperature and Precipitation
Over the CRB, only four meteorological stations record maximum/minimum temperature data, while the number of grid points in CMADS/CFSR are relatively high. Therefore, authentication of the temperature data was conducted only for the meteorological stations. Moreover, precipitation data were collected from 13 stations covering the period from 2008-2013 at the catchment scale. A point-to-pixel assessment was applied by selecting the closest GMP grid points as references for authentication against the GMS data.
To assess the quality of CMP in collecting temperature and precipitation data, the following indicators were used.
(i) Four basic statistical indicators: the correlation coefficient (CC), mean absolute error (MAE), root-mean-square error (RMSE), and percentage bias (PBIAS) [8,17,39]. The calculation formulas of these indicators are shown as follows: where G i and O i are the gridded and observed temperature (or precipitation), respectively; G and O are the average gridded and observed temperature (or precipitation), respectively; "i" is representative of each individual measurement; and "n" is the number of measurements. A high correlation coefficient value and low MAE, PBIAS, and RMSE values indicate the reliable performance of a GMP in correlation with GMS [2,4,40]. (ii) Three statistical-categorical indicators were used to evaluate precipitation events, including the probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI) [39]. Their formulas are as follows: where "a" represents the correct detection of precipitation (from CFSR/CMADS and GMS); "b" represents a false alarm (when precipitation is detected in the CFSR/CMADS products but is not observed in GMS); and "c" is a predictive error (when precipitation is observed from the rain gauges but not estimated from CFSR/CMADS). These values range from 0 to 1. A high FAR value and low POD and CSI values imply a more accurate quantification [41,42]. (iii) To evaluate the effectiveness of GMPs in capturing extreme weather events, this study selected indicators proposed by the "Climate Change Detection and Indicator Experts Group" [43] and "Circular on technical regulations and the process of dangerous hydro-meteorological forecasting of the MORNE, Vietnam" (2016). Accordingly, the extreme events related to temperature include: (1) very cold events in which the average temperature is between 13 • and 15 • C (Tav13-15 • C/day), (2) damaging cold events in which the average temperature during the day is below 13 • C (Tav13 • C/day), (3) strong sun events in which the maximum temperature is in the range of 37 • -39 • C (Tmx37-39 • C/day), and (4) scorching hot days in which the maximum temperature is recorded above 39 • C (Tmx39 • C/day). For rainfall, in addition to the indicators R10 mm/day and R50 mm/day, we propose to test a very heavy rainfall threshold of R100 mm/day to further assess the impact of this heavy rain layer on river flow.

Flow Indicators
The performance of the SWAT model in simulating the river flow in the CRB was evaluated using the NSE (Nash Sutcliffe efficiency coefficient), R 2 (coefficient of determination), and PBIAS (percent bias) statistics [4,8,41]. The equations (Equations (9)-(11)) of these indicators, as well as the criteria of Moriasi et al. [44] (in terms of performance ratings as "very good", "good", "satisfactory", or "unsatisfactory" at the monthly scale (see Table 2)), are presented as follows: where S i and O i are the simulated and observed values, respectively; S and O are the averages of the simulations and observations at event "i", respectively; and "n" is the number of events. The R 2 value (ranging from 0 to 1) represents the linear cohesion of the observed and simulated flows, with an ideal value of 1 [32]. The PBIAS value shows the tendency of the average simulated flow to be larger/smaller (overestimated/underestimated) than the observed data [4,13,41]. For example, a PBIAS value of −15% means that the simulated flow from CMADS/CFSR is 15% smaller than the observed flow. Moreover, this PBIAS value shows the tendency of the data product to underestimate the flow. On the other hand, a PBIAS value of 200% reflects that the simulated flow by the GMP is overestimated and is twice as much as the observed flow. The NSE value is a dimensionless statistic that determines the relative magnitude of the simulated variance against the observed data variance. The NSE value ranges from −∞ to 1. The model prediction is more accurate with an NSE value close to 1 [45].

Compare CFSR and CMADS Temperatures Using GMS Data
Within the CRB, there are 4 meteorological stations with temperature data information, while the densities of the CMADS and CFSR grid points are much higher; thus these monitoring stations are compared with the nearest CFSR and CMADS grid points. The average temperature and CC, MAE, RMSE, and MBE values selected to evaluate the accuracy of the maximum and minimum temperatures in the CMADS and CFSR datasets on daily and monthly scales are shown in Table 3. In general, Tmax and Tmin tend to increase gradually from north to south, i.e., from high latitudes and hills (e.g., Bac Kan and Dinh Hoa stations) to lower latitudes (e.g., Bac Ninh station). Both the CFSR and CMADS temperature data show strong correlations with the GMS data collected at observational stations. The average CC values obtained from CFSR are 0.92 for Tmax and 0.97 for Tmin, while those of CMADS are 0.96 and 0.97, respectively. The average MAE is 1.7, ranging from 0.95 to 2.47, and the average RMSE is 1.8, varying from 1.27 to 2.85; these values show that the GMP data are in good agreement with the temperatures observed at the stations. Figure 2 shows a box plot of the maximum and minimum temperatures of CFSR, CMADS, and the ground station data in the CRB. The boxes show that the CFSR temperatures have the largest range of changes. The mean values and interquartile range of temperature are consistent in CMADS than those in CFSR. Although the datasets are slightly different, the CFSR and CMADS temperature data can be used as GMS station data in hydrometeorological studies over the CRB.

Compare CFSR and CMADS Temperatures using GMS Data
Within the CRB, there are 4 meteorological stations with temperature data information, while the densities of the CMADS and CFSR grid points are much higher; thus these monitoring stations are compared with the nearest CFSR and CMADS grid points. The average temperature and CC, MAE, RMSE, and MBE values selected to evaluate the accuracy of the maximum and minimum temperatures in the CMADS and CFSR datasets on daily and monthly scales are shown in Table 3. In general, Tmax and Tmin tend to increase gradually from north to south, i.e., from high latitudes and hills (e.g., Bac Kan and Dinh Hoa stations) to lower latitudes (e.g., Bac Ninh station). Both the CFSR and CMADS temperature data show strong correlations with the GMS data collected at observational stations. The average CC values obtained from CFSR are 0.92 for Tmax and 0.97 for Tmin, while those of CMADS are 0.96 and 0.97, respectively. The average MAE is 1.7, ranging from 0.95 to 2.47, and the average RMSE is 1.8, varying from 1.27 to 2.85; these values show that the GMP data are in good agreement with the temperatures observed at the stations. Figure 2 shows a box plot of the maximum and minimum temperatures of CFSR, CMADS, and the ground station data in the CRB. The boxes show that the CFSR temperatures have the largest range of changes. The mean values and interquartile range of temperature are consistent in CMADS than those in CFSR. Although the datasets are slightly different, the CFSR and CMADS temperature data can be used as GMS station data in hydrometeorological studies over the CRB.  The PBIAS values are negative at most stations (except that of Tmax at Bac Ninh station), reflecting that both CFSR and CMADS data tend to underestimate the maximum and minimum temperatures compared to those observed in the GMS data. Specifically, at Bac Ninh station, the PBIAS value is much smaller than that at other stations, showing that the CFSR and CMADS data have higher accuracies in low-terrain areas. Conversely, the increasing PBIAS values (especially with CFSR data) observed in areas with higher terrain may be related to temperature estimation errors that occur due to the effects of clouds in mountainous areas, and the minimum temperature value typically causes high PBIAS values in winter.  Table 4). A difference is found in that the CMADS values underestimated the actual precipitation, with a PBIAS value of −16.64%, while CFSR overestimated the actual precipitation with a PBIAS of 99.2%. Therefore, the MAE value of CFSR is also much higher than that of CMADS, with values of 8.01 and 5.7 mm/day, respectively. As predicted, the trend described above is also seen on the monthly scale. The CC values of the CFSR and CMADS data ranged from 0.82 to 0.84, showing a good correlation with the GMS data. Moreover, the MAE and RMSE values of CFSR are many times higher than those of CMADS. The errors at the daily scale were eliminated by the aggregation to the monthly scale, causing the CC to become more balanced; however, this does not explain the big difference observed in the evaluation trends between CFSR and CMADS. The CFSR precipitation data is always overestimated across the basin, and the largest bias statistic indicator values were recognized with this dataset in evaluations by Mou Tan et al. [4] and Roth Lemann et al. [46]. Generally, CMADS precipitation data are slightly more accurate and agree relatively better than CFSR data with observations measured on the monthly scale.

Statistical Evaluation of GMP Precipitation
As shown in Table 4, the analysis results of the seasonal statistical indicators obtained from the CFSR data show the largest mean errors, with MAE and RMSE values that are too large. At the same time, the PBIAS value of CMADS in the dry season is −40.9%, many times different from the rainy season value of −8.5%. This is related to the very low CMADS rainfall that occurs in the dry season; the lower rainfall value in the denominator of Equation (5) will cause the PBIAS value to be higher. Due to underestimating rainfall in the dry season, the rainfall in the CMADS data makes the difference between the two seasons much larger than the observed data. The rainfall ratios between the dry and rainy seasons for CMADS were 11% and 89%, respectively, while those for GMS were 18% and 82%, respectively.

Comparison in the Spatial Scale (Pixels)
The distributions of the CC, MAE, and PBIAS values on a monthly scale at the pixel scale on the CRB are shown in Figure 3. Compared with the CC values obtained on the daily scale (ranging from 0.2 to 0.6), the values obtained on the monthly scale were significantly improved, with CC values of 0.82 for CFSR and 0.84 for CMADS (Figure 3a-d).
The spatial distribution of MAE shows the difference between CFSR and CMADS in the basin more clearly than the other indicators. As shown in Figure 3e, the largest MAE values were found in CFSR in the north and northwest (in altitudes ranging from 160 m to 300 m); the values then tended to decrease gradually to the south. Smaller MAE values were found at the western edge of the study region, such as at Tam Dao and Diem Mac stations; these are the locations with the highest rainfall rates, and as such their recorded data are balanced with the CFSR data; in the south of the study area, such as at Phuc Loc Phuong station, the MAE values may be related to the decline in rain forecasts by CFSR in the delta. In contrast, the CMADS-derived rainfall data achieved the highest MAE values in the western part of the river basin (ranging from 60 to 100 mm/month) (Figure 3f). Overall, the MAE values of the CMADS precipitation had less deviation than the GMS data, with an average value of 58.44 mm (ranging from 40.58 to 93.26 mm), because the rainfall rating of CMADS performs better on the daily scale than the observed data.
Water 2021, 13, x FOR PEER REVIEW 11 of 20 bility in the gridded precipitation products. Moreover, the MAE and PBIAS values show the same trend, but these trends occur according to different parts of the spatial domain that are determined by terrain factors rather than latitudinal factors. This also shows that local knowledge and information are very useful in hydro-meteorological research to avoid too many misunderstandings occurring due to the characteristics of the utilized GMPs.

Evaluate the Precipitation Event Detection Accuracy
Using the value of 0.1 mm/day as the threshold for detecting rain/no rain [8,47], the POD, FAR, and CSI values were used to investigate the ability of the GMPs to detect rain. In terms of the CFSR data, the mean POD value of 0.98 (ranging from 0.94 to 0.99) shows the tendency of the data to capture nearly all daily precipitation events. At the same time, the average FAR value of 0.72 (ranging from 0.56 to 0.74) indicates that only nearly 30% of the rain events obtained from the CFSR data are accurate. In contrast, the CMADS data showed more harmony in their forecasts, with POD and FAR values of 0.61 and 0.2, respectively, consistent with a successful rain detection forecast of 43%. Overall, the CMADS precipitation data are more accurate when estimating precipitation events than The PBIAS values obtained at the pixel scale in the basin, as shown in Figure 3g,h, provide more insight into the spatial variation in the GMPs. The CFSR rainfall data was overestimated over most of the basin, with a prevalence value between 60% and 150%. The PBIAS value of −2.3% obtained at Tam Dao station (at an elevation of~900 m) shows that the CFSR precipitation data perform best in high mountainous terrain on the CRB (Figure 3g). In contrast, the rainfall data of CMADS tends to underestimate rainfall on the catchment scale, with an average PBIAS of −16%, but the data exhibit different states: rainfall is underestimated near the western edge of the study region (ranging from −23% to −38%), while the data have slightly higher ratings in the south of the region (~2% at Da Phuc and Phuc Loc Phuong stations) than those of the precipitation gauge data.
The analysis of the statistical indicators in the Cau River basin revealed the contradiction between the CMADS and CFSR rainfall data at the pixel scale, especially in the west. The amplification of precipitation in sheltered, mountainous terrain significantly increased the precipitation recorded at the GMS stations, so the statistics were more balanced with the CMADS than those of the CFSR data, for which the infrared sensors misjudged the effects of moist clouds on the mountaintops [7]. Second, considering medium-sized river basins such as the CRB, the CC value ratings obtained on the pixel scale at the daily and monthly temporal scales do not represent the characteristics of or variability in the gridded precipitation products. Moreover, the MAE and PBIAS values show the same trend, but these trends occur according to different parts of the spatial domain that are determined by terrain factors rather than latitudinal factors. This also shows that local knowledge and information are very useful in hydro-meteorological research to avoid too many misunderstandings occurring due to the characteristics of the utilized GMPs.

Evaluate the Precipitation Event Detection Accuracy
Using the value of 0.1 mm/day as the threshold for detecting rain/no rain [8,47], the POD, FAR, and CSI values were used to investigate the ability of the GMPs to detect rain. In terms of the CFSR data, the mean POD value of 0.98 (ranging from 0.94 to 0.99) shows the tendency of the data to capture nearly all daily precipitation events. At the same time, the average FAR value of 0.72 (ranging from 0.56 to 0.74) indicates that only nearly 30% of the rain events obtained from the CFSR data are accurate. In contrast, the CMADS data showed more harmony in their forecasts, with POD and FAR values of 0.61 and 0.2, respectively, consistent with a successful rain detection forecast of 43%. Overall, the CMADS precipitation data are more accurate when estimating precipitation events than the CFSR data, while the CFSR data excel at detection but require reliable validation with rain gauges. Figure 4 plots the occurrence frequencies and the contribution rates of the rainfall thresholds to the cumulative daily-scale rainfall from the CFSR, CMADS, and GMS data. Visually, we can see that the frequencies of the rain layers obtained from CMADS and GMS are similar. Specifically, the highest rate is observed at the threshold of 0.1 mm, accounting for over 66% (Figure 4a), indicating that the CMADS data may be suitable for detecting rain/no rain events. At other rainfall thresholds, although there is little difference, the rainfall threshold ratio observed between CMADS and GMS is still the smallest. Moreover, CFSR shows a clear difference in the drop to the 0.1mm rainfall threshold due to its association with excessively high POD and FAR values, leading to the omission of rain/no rain events. Compared with CMADS and GMS, CFSR tends to overestimate the light rain layer (1-20 mm), light rain (0.1-1 mm), and moderate rain (20-50 mm) at rates of 49.46%, 15.65%, 13.62%, respectively.
Despite the frequency of occurrence, but due to the insignificant rainfall, a threshold value < 0.1 mm limited the contribution to the total cumulative precipitation density (Figure 4b). The light rainfall (1-20 mm) and moderate rainfall (20-50 mm) layers of the CMADS and CFSR data are the most important contributors to the rainfall density, accounting for approximately 68% to 71% of the total rainfall. Simultaneously, in the GMPs, these rain layers all illustrate the same changes at higher ratings than that of the GMS data due to the large frequency of occurrence. Another degree is found in heavy rainfall (>50 mm) when estimates by CMADS, particularly by CFSR, tend to underestimate the actual rainfall. This precipitation layer has a small occurrence frequency (2.24%), but heavy rains contribute the greatest amount to the proportion of rain; approximately 37% of the rainfall over the CRB is related to typical summer showers as well as tropical storms brought to the area. detecting rain/no rain events. At other rainfall thresholds, although there is little difference, the rainfall threshold ratio observed between CMADS and GMS is still the smallest. Moreover, CFSR shows a clear difference in the drop to the 0.1mm rainfall threshold due to its association with excessively high POD and FAR values, leading to the omission of rain/no rain events. Compared with CMADS and GMS, CFSR tends to overestimate the light rain layer (1-20 mm), light rain (0.1-1 mm), and moderate rain (20-50 mm)   Despite the frequency of occurrence, but due to the insignificant rainfall, a threshold value < 0.1 mm limited the contribution to the total cumulative precipitation density (Figure 4b). The light rainfall (1-20 mm) and moderate rainfall (20-50 mm) layers of the CMADS and CFSR data are the most important contributors to the rainfall density, accounting for approximately 68% to 71% of the total rainfall. Simultaneously, in the GMPs, these rain layers all illustrate the same changes at higher ratings than that of the GMS data due to the large frequency of occurrence. Another degree is found in heavy rainfall (>50 mm) when estimates by CMADS, particularly by CFSR, tend to underestimate the actual rainfall. This precipitation layer has a small occurrence frequency (2.24%), but heavy rains contribute the greatest amount to the proportion of rain; approximately 37% of the rainfall over the CRB is related to typical summer showers as well as tropical storms brought to the area.
Thus, the analysis of the differences between the rain layers of GMPs and GMS data shows that the CMADS data have the same frequency as the rain gauge data at all thresholds. Moreover, CFSR has a disadvantage in capturing rain/no rain events but overestimated the occurrence of light rain (1-20 mm) and moderate rain (20-50 mm). At the same time, local calibration with on-site observations (rain gauge data or terrestrial radar) is needed to improve the performances of GMPs in heavy rainfall (>50 mm), as heavy rains play the most important role in the proportion of precipitation in the basin.

Evaluate the Ability to Capture Extreme Weather Events
In this section, further studies on extreme weather events, such as temperature and precipitation events, on the CRB are conducted using the aggregated statistical results of GMPs data at meteorological stations at a daily scale over the period from 2008 to 2013. Specifically, the extreme events studied in the CRB included: (1) very cold events (Tav13-15 °C/day); (2) damaging cold events (Tav13 °C/day); (3) strong sun events (Tmx37-39 °C/day) and (4) scorching hot events (Tmx39 °C/day), and the specific results are listed in Thus, the analysis of the differences between the rain layers of GMPs and GMS data shows that the CMADS data have the same frequency as the rain gauge data at all thresholds. Moreover, CFSR has a disadvantage in capturing rain/no rain events but overestimated the occurrence of light rain (1-20 mm) and moderate rain (20-50 mm). At the same time, local calibration with on-site observations (rain gauge data or terrestrial radar) is needed to improve the performances of GMPs in heavy rainfall (>50 mm), as heavy rains play the most important role in the proportion of precipitation in the basin.

Evaluate the Ability to Capture Extreme Weather Events
In this section, further studies on extreme weather events, such as temperature and precipitation events, on the CRB are conducted using the aggregated statistical results of GMPs data at meteorological stations at a daily scale over the period from 2008 to 2013. Specifically, the extreme events studied in the CRB included: (1) very cold events (Tav13-15 • C/day); (2) damaging cold events (Tav13 • C/day); (3) strong sun events (Tmx37-39 • C/day) and (4) scorching hot events (Tmx39 • C/day), and the specific results are listed in Table 5. For rainfall, the indicators R10 mm/day, R50 mm/day, and R100 mm/day were selected in this study. Table 5. Statistics on the total number of cold and damaging days at meteorological stations in the Cau River basin in the period 2008-2013.

Bac Kan
Dinh Hoa Thai Nguyen Bac Ninh 153  108  147  108  114  91  112  94  CMADS  125  108  114  103  96  94  96  89 The assessment results of the ability of the datasets to collect extreme rain layers were calculated using the average value of the corresponding points/stations in the period 2008-2013. The CFSR rainfall data contained 560 days of R10 mm, which is higher than that of the GMS data (216 days) and the CMADS data (190 days) obtained at the polymerization station. However, for R50 mm events, the GMS and CFSR precipitation data show 40 and 44 days, respectively, both of which are significantly more than the 29 days observed in the CMADS data. The total R100 days observed in the GMS precipitation data was 9 days, while the totals were 5 days for the CFSR and CMADS data. Thus, the CMADS rain data have a much lower rain detection rate than the rest of the heavy rain layers (>50 mm/day), while CMADS and CFSR have much lower detection probabilities than the GMS for extreme rain events (>100 mm/day).
Regarding Tav13-15 • C/day and Tav13 • C/day, these data are not available for CFSR, and as such only CMADS is compared with GMS. The statistics listed in Table 5 show that both of these datasets have similar changes, with the number of days decreasing from the high-latitude areas and hilly areas (Bac Kan and Dinh Hoa stations) to the regions with lower latitude and flat terrain (Thai Nguyen and Bac Ninh stations). The CMADS data are slightly underestimated compared to the GMS data in terms of cold days (except Tav13-15 • C at Thai Nguyen station) with the difference being over 11%. Notably, the total number of days with temperatures below 15 • C in the CMADS data are quite large (17.2 days/year), showing the strong influence of the northeast monsoons on the CRB. The appearance of cold airwaves not only lowers the temperature of the area (from December to February, the average temperature is below 20 • C) but also drastically decreases the humidity and precipitation during this period. This is a satisfactory explanation for the surge observed in the PBIAS values in the dry season and explains the difference in the rainfall contribution between the two seasons of the CMADS data.
The statistics compiled from the temperature stations also show that the CFSR and CMADS data outperformed at Tmx37-39 • C/day and Tmx39 • C/day. The largest difference was observed in the CFSR temperature data representing the peak likelihood compared to the GMS and CMADS data, which is consistent with the assessment results for temperature obtained from testing with box plots (see Figure 2). This discrepancy may be related to excessive misunderstandings about the buffer surfaces of GMP datasets. For example, at Dinh Hoa station, the number of days with Tmx37-39 • C and Tmx39 • C is very high, possibly due to the widely captured CFSR data in low mountainous areas in the windless part of summer. Moreover, Bac Ninh station is located in the delta area, with many industrial and construction activities, so its maximum temperatures may be incorrectly calculated for the buffer surface.
The distributions of the maximum temperatures recorded over the studied years (see Figure 5), from 2008 to 2013, also show that the CFSR data is outstanding in the number of days when hot weather occurs (Tmx > 37 • C). Notably, an unusual increase in hot days was found in all three datasets in 2010 (except for in the GMS data at Tmx39 • C/day). According to our collected information, from May to July of 2010, northern Vietnam experienced the longest heatwave in 27 years with many extremely hot days. This shows that while there are similar error characteristics if they can be "calibrated with the observed ground temperature", the GMPs can provide an additional viable alternative to predicting and capturing extreme events in the CRB at various temporal and spatial scales. predicting and capturing extreme events in the CRB at various temporal and spatial scales. The parameters affecting the sensitivity analysis and calibration were partly inherited from previous studies on the CRB [33,34]. The SWAT-CUP software and the SUFI-2

Calibration and Sensitivity Analysis of Parameters
The parameters affecting the sensitivity analysis and calibration were partly inherited from previous studies on the CRB [33,34]. The SWAT-CUP software and the SUFI-2 algorithm were used to calibrate the parameters in the CFSR, CMADS, and GMS simulations [22]. Accordingly, calibrations were performed simultaneously on SWAT-CUP and manually, with 1000 model runs performed for each iteration to obtain optimal parameter values. At the same time, due to the uncertainty of the meteorological data (especially the precipitation data), the parameter values and sensitivities may vary among each model [22]. Therefore, in this study, parameterization was performed in the SWAT model on the CRB using the CMADS, CFSR, and GMS data, and the results are listed in Table 6. In all simulations, CFSR, CMADS, and GMS data from 2008 were set to warm up the model, the calibration period was 2009-2011, and the validation period was 2012-2013. The calibration and validation of each simulation were conducted independently at daily and monthly time steps.  Table 7 summarizes the statistical indicators (R 2 , NSE and PBIAS) used for the SWAT simulations based on data from GMS, CFSR and CMADS for the 2009-2013 period. Overall, the SWAT model based on the GMS data is best suited during the calibration and validation periods at both daily and monthly scales. The simulated flow reproduced by GMS data at Gia Bay station is "good", with NSE > 0.79 and R 2 > 0.68. The simulations performed using the CMADS-driven model tended to underestimate the observed flow, with PBIAS values varying from −16.19 to −19.35%, but with R 2 > 0.76 and NSE > 0.78; thus, this dataset was also identified as "satisfactory" on a monthly scale. Finally, the CFSR data led to a relatively appreciate assessment of the observed flows during the simulation period (as indicated by a high PBIAS value of >41.81%) and a tendency to capture peak flows (Figures 6 and 7). Generally, the CFSR control model is not suitable for flow simulations over the CRB basin, with R 2 and NSE values that are "unsatisfactory" based on the given criteria [44]. Notes: r: the parameter value is multiplied by (1+ given value); v: the parameter is replaced by the given value and then compared with the measured flow results from the Gia Bay hydrological station (Thai Nguyen). Table 7 summarizes the statistical indicators (R 2 , NSE and PBIAS) used for the SWAT simulations based on data from GMS, CFSR and CMADS for the 2009-2013 period. Overall, the SWAT model based on the GMS data is best suited during the calibration and validation periods at both daily and monthly scales. The simulated flow reproduced by GMS data at Gia Bay station is "good", with NSE > 0.79 and R 2 > 0.68. The simulations performed using the CMADS-driven model tended to underestimate the observed flow, with PBIAS values varying from −16.19 to −19.35%, but with R 2 > 0.76 and NSE > 0.78; thus, this dataset was also identified as "satisfactory" on a monthly scale. Finally, the CFSR data led to a relatively appreciate assessment of the observed flows during the simulation period (as indicated by a high PBIAS value of >41.81%) and a tendency to capture peak flows (Figures 6 and 7). Generally, the CFSR control model is not suitable for flow simulations over the CRB basin, with R 2 and NSE values that are "unsatisfactory" based on the given criteria [44].    In the CRB, rainfall is the major source of streamflow, so the accuracy of these data will greatly affect the runoff simulation results. The above results showed that the rainfall data obtained from GMS and CMADS reached agreement better than the agreement between CFSR and GMS; thus the flow simulation performed by GMS-and CMADS-driven models also showed better performance, with high R 2 and NSE values, and the absolute PBIAS value was smaller than that of the CFSR-driven simulation. Research by Mou Tan et al. showed that integrating temperature data from CFRS with the precipitation data of the other GMPs did not cause any difference compared to conventional simulations [34]. At the same time, the published results of Mou Tan et al. [4], D. N. Khoi et al. [48], Roth et al. [46], and Bressiani et al. [31] showed that the CFSR data had much lower, or even unacceptable, performances compared to other GMP products, mainly because these data overestimate actual precipitation values.

Flow Simulation in the CRB
It is possible that the CFSR precipitation data are more suitable and have more uniform and dense grid point densities in temperate, subtropical climates, such as in the United States and China, as reported by Fuka et al. [23], Meng et al. [30], and Lu et al. [2], while these data have not been used extensively in Southeast Asia or Vietnam. It is quite difficult for estimated products such as CFSR to accurately capture climatic conditions in areas with very complex climates such as northern Vietnam (a tropical monsoon climate with cold winters). Furthermore, differences in catchment areas and topographies (including elevations and directions of ridges) also lead to changes in the model algorithm, interpolation, and parameters [49,50]. The CMADS dataset is integrated with CMROPH data and collected from automatic measuring stations in the region for reverse interpolation, so it can be widely used and increase its accuracy in Chinese territories [2,3,30,51]. Compared to published studies, we find that the performance of these data needs to be confirmed in areas within the region of coverage. In general, the analytical results show that the CMADS control model will have good performance if the input data are confirmed with an observation gauge.

Conclusions
This study is the first to refer to the role of temperature in extreme events based on regulations in Vietnam. The temperature verification in the CRB shows that CFSR and CMADS can be representative as ground temperature measurement stations in meteorological and hydrological studies. Temperature events such as very cold, damaging cold, strong sun, and scorching hot events affect the rainfall distribution and the inputs to the flow simulations. Moreover, the proposed study of the R100 mm/day rain layer is suitable for humid climate conditions in the tropics, such as the climatic conditions in the study area, and can be reliably used in other basins with similar conditions. The usefulness and suitability of the climate reanalysis products were evaluated in this study. Both the CMADS and CFSR temperature datasets performed well in comparison to the GMS data. Therefore, the CMADS and CFSR temperature data can be reliably used in areas with low numbers of observation stations. The verification of rainfall in the GMPs, as well as the flow simulation results of the SWAT model on the CRB, show that the CMADS data obtain more suitable results than the CFSR data; moreover, it is recommended that the overall CFSR data should be evaluated before they are applied in other hydrological research in which the conditions are similar. The advantages and disadvantages of the CFSR and CMADS data suggest that local knowledge/information is also very useful in hydro-meteorological research to avoid excessive misunderstandings of gridded climate products.
In our opinion, the results obtained in this study should not be conceded as a generalized imposition but rather can be seen as an attempt to explore the potential of reanalysis data in terms of their performances that are, as of yet, unproven due to limited, short duration, and heterogeneous observational data. Our tentative studies will be further expanded with other gridded climate products already recognized in Vietnam, and the spatial variations in the water balance components and the effects of climate change on flow changes in the CRB will be calculated.