Historical Trends in Mean and Extreme Runoff and Streamflow Based on Observations and Climate Models

To understand changes in global mean and extreme streamflow volumes over recent decades, we statistically analyzed runoff and streamflow simulated by the WBM-plus hydrological model using either observational-based meteorological inputs from WATCH Forcing Data (WFD), or bias-corrected inputs from five global climate models (GCMs) provided by the Inter-Sectoral Impact Model Intercomparison Project (ISI-MIP). Results show that the bias-corrected GCM inputs yield very good agreement with the observation-based inputs in average magnitude of runoff and streamflow. On global average, the observation-based simulated mean runoff and streamflow both decreased about 1.3% from 1971 to 2001. However, GCM-based simulations yield increasing trends over that period, with an inter-model global average of 1% for mean runoff and 0.9% for mean streamflow. In the GCM-based simulations, relative changes in extreme runoff and extreme streamflow (annual maximum daily values and annual-maximum seven-day streamflow) are slightly greater than those of mean runoff and streamflow, in terms of global and continental averages. Observation-based simulations show increasing trend in mean runoff and streamflow for about one-half of the land areas and decreasing trend for the other half. However, mean and extreme runoff and streamflow based on the GCMs show increasing trend for approximately two-thirds of the global land area and decreasing trend for the other one-third. Further work is needed to understand why GCM simulations appear to indicate trends in streamflow that are more positive than those suggested by climate observations, even where, as in ISI-MIP, bias correction has been applied so that their streamflow climatology is realistic.


Introduction
Anthropogenic changes in global climate and alteration of Earth's hydrological cycle [1][2][3] have resulted in increased heavy precipitation with consequent increased surface runoff and flooding risk [4,5], which is likely to worsen in the future [6]. Climate change is expected to change the pattern, frequency and intensity of precipitation and result in increased intensity and frequency of floods and droughts, with damaging effects on environment and society [5][6][7][8][9][10][11][12].
As a result of greenhouse gas (GHG) build-up in the atmosphere, global mean near-surface temperature shows an increasing trend since the beginning of the 20th century [11,13,14]. The Fifth Assessment Report of Inter-Governmental Panel on Climate Change (IPCC) indicates that global land and ocean near-surface air temperature has increased by approximately 0.78˝C, over the 20th century, with greater trend slope in recent decades [15]. As a result of global warming, global climate models (GCMs) and satellite observations both indicate that atmospheric water vapor content has increased relative humidity, a hydrological model can generate corresponding streamflow and/or runoff outputs. Utilizing a global hydrological model, particularly if it can be validated against a large number of past streamflow observations, provides us with a robust tool to study the historical and future changes in streamflow, and consequently fresh-water resources, based on either observed or modeled climate changes.
In the present study, runoff and streamflow generated by the WBM-plus global hydrological model [52], which will simply be called WBM in the remainder of the paper, based on the observation-based meteorological inputs from WATCH Forcing Data (WFD) [53] are statistically analyzed at global and continental scales, for the time period of 1971-2001. Runoff and streamflow generated by the WBM model based on bias-corrected outputs of GCMs, provided by the ISI-MIP, are also analyzed to compare the modeled and observational meteorology based trends in runoff and streamflow for the time period of 1971-2001, at global and continental scales. The WFD datasets are used in ISI-MIP project for bias correction of GCMs [41] and provide a fair basis for comparison of GCMs with observation-based simulations.
The analyses presented in this paper are therefore as follows. The WBM model performance in simulation of streamflow is first evaluated based on observation-based forcings and observational streamflow data for the coterminous United States (US). Annual-mean daily runoff and streamflow for the WFD and ISI-MIP data are then analyzed to study the changes in mean flow based on GCMs versus observations. Annual-maximum daily runoff and streamflow are also analyzed with the GCM outputs to study the changes in extreme flows and in comparison with changes in mean flow. Trends in annual-maximum seven-day streamflow are additionally analyzed to study the changes in lengthier extreme flows in comparison with shorter one-day extremes.

Materials and Methods
We study changes in mean and extreme runoff and streamflow. Mean runoff is defined as the annual-mean daily runoff, which is the annual average of daily runoff values. Extreme runoff is defined as the annual-maximum daily runoff, in which the maximum daily runoff is selected for each year. Similar to runoff, mean streamflow is defined as the annual-mean daily streamflow, in which the daily flow rates are averaged for each year. Extreme streamflow is defined as the annual-maximum daily streamflow, in which the maximum daily flow rate value is selected for each year. Annual-maximum 7-day streamflow is also calculated, for which average flow rate is calculated for each consecutive 7-day period (moving average) and then the maximum value is selected for each year.
ISI-MIP presents bias-corrected daily meteorological fields, at a uniform 0.5 degree spatial resolution, from 5 selected GCMs from the fifth phase of the Coupled Model Intercomparison Project (CMIP5), which provides the opportunity to investigate the impacts of climate change from a range of GCMs on water resource system design after bias correction [6]. The first fast-track phase of the ISI-MIP project presents outputs from the following 5 GCMs: GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, MIROC-ESM-CHEM and NorESM1-M [45].
One of the main obstacles in study of streamflow is that streamflow data is not publicly available for most land areas [51]. Runoff and streamflow trends in global scale can be studied through the analysis of the outputs of global hydrological models. ISI-MIP also provides runoff and streamflow simulations, which have been simulated by various global hydrological models based on bias-corrected outputs of the 5 CMIP5 global climate models. For the purposes of the present study, the runoff and streamflow simulations of the WBM hydrological model based on bias-corrected GCM inputs are obtained from the ISI-MIP.
ISI-MIP project uses the WATCH Forcing Data (WFD) [53] observation-based dataset for bias correction. Hence, for sake of fair comparison, runoff and streamflow generated by the WBM model based on WFD observation-based inputs are studied here as the observational-based runoff and streamflow data sets. The Water Balance Model (WBM), maintained at the City College of the City University of New York, is a macro-scale hydrological model that simulates water exchange between the land surface and the atmosphere as well as water transport along a prescribed river network. The model runs on spatial resolutions as high as of 0.1 degree and daily temporal resolution. The model can account for human activities such as irrigation water abstraction and reservoir operation [52]. The runoff and streamflow data simulated by the WBM model using WFD forcing are also at 0.5 degree spatial resolution and comparable with those of the ISI-MIP project.
Tests for the determination of significant trends in climatologic time series can be classified as parametric or non-parametric. Parametric trend tests are derived on the assumption of independence and normal distribution in the data, while non-parametric trend tests require only that the data be independent. The trend slope (b) obtained from the linear regression method, which assumes that the data variability follows a normal distribution, is utilized for parametric trend strength analysis and comparison of the datasets. The relative change in runoff (R) and streamflow (Q) is defined as the trend slope divided by the average value of the grid cell (b/R and b/Q, respectively). The Z-score (Z) obtained from the Mann-Kendall test (Appendix A) [54,55] and Qmed from the Sen's slope estimator (Appendix B) [56] are also applied in order to confirm the results of linear regression using non-parametric trend detection approaches. It should be noted that Qmed denotes an index calculated in Sen's slope estimator method (Appendix B) for any time series (such as runoff and streamflow), and is not related to the connotation of Q as discharge or streamflow.
Dry grid cells with average discharge of less than 0.01 mm/day [57] for the period of 1971-2001 based on WFD observation-based data, as well as Greenland and Antarctica, are excluded from the calculations. The remaining grid cells cover about 76% of global land area. The trend tests are applied for each grid cell's runoff and streamflow time series. The obtained values are also averaged globally as well as by continent in order to present the general trend of runoff and streamflow in different regions. The averaging is weighted by grid cell area, meaning that the larger cells in tropics have higher impact on the average than the smaller cells in high latitudes. The continents studied comprise Africa, Asia, Europe, North America, South America and Oceania. The subcontinent of India has results shown separately and is also included in Asia.

Hydrological Model (WBM) Performance Evaluation
The performance of the WBM model in simulation of river discharge based on the WFD observation-based meteorological inputs is evaluated. The Coterminous US, which is generally considered to have more complete and reliable observational meteorological data, compared to most other regions, is chosen as the evaluation area. Limiting the performance evaluation of the hydrological model to a region with more reliable data will help in constraining the accuracy of the results mostly to the performance of the model rather than inaccuracy of the input data. Figure 1 depicts the maps of average streamflow and its trend from 1971 to 2001 for the US region, simulated by WBM and compared to USGS station observations. Due to high variability of the streamflow values at different locations, the streamflow magnitude maps are plotted in logarithmic scale. Figure 1c illustrates that the eastern and north-western parts of the US have had a decreasing trend in streamflow for 1971-2001, while the central parts experienced increases in flow. Figure 2 shows the magnitude and change in streamflow of the United States from 1971 to 2001 in the model outputs as well as observations. It should be noted that the observational streamflow data are station-based, and show the discharge at the point of measurement, whereas the modeled data are grid-based and show the discharge rate per grid cell. Hence, the magnitude of discharge may not be comparable between observations and models. However, the relative change in the discharge, which is the ration of the slope of change in discharge and the mean magnitude of discharge (b/Q), has units of % per year and is more comparable. As seen in Figure 2, as well as Table 1, the boxplot of relative change in discharge (b/Q) for the model results show a good agreement with the observational results        Table 2 presents the global and continental averaged trends in runoff generated by WBM, based on observation-based inputs from WFD. Results show that globally, the runoff has decreased at a rate of 0.042% per year from 1971 to 2001. Linear regression indicates that 44.8% of the grid cells show positive trend in runoff, including 5.6% that is statistically significant at 95% confidence level. The remaining 55.2% of the grid cells show negative trend, including 7.9% that is statistically significant at 95% confidence level. This implies that the trend in 86.5% of the grid cells is not significant at 95% confidence level. Table 3 presents the global average of results based on the 5 GCMs from ISI-MIP as well as observation-based simulations, for annual-mean runoff and annual-maximum daily runoff. Table 3 indicates that GCMs show good agreement with observations for the simulated average magnitude of mean runoff, which is perhaps expected given the bias-correction procedure applied to the modeled data in ISI-MIP. However, the GCM-based average shows increasing trend in mean runoff between 1971 and 2001, unlike the decreasing trend in observations. Table 3 shows that maximum runoff has also increased in GCMs, with a faster rate compared to mean runoff.

Runoff and Streamflow Trends 1971-2001 (GCMs versus Observations)
Similar to the linear regression slope (b), Qmed from Sen's slope estimator test shows the direction and magnitude of the trend in a time series, but has the advantage of being a non-parametric method (Appendix B). The global average of Qmed is close to the average value of b obtained from the linear regression, as seen in the Tables 2 and 3 which further supports the analysis above of the average trend magnitudes with GCMs as opposed to observation-based inputs. Values of Z-score index obtained from the Mann-Kendall method shows the non-parametric confidence level of statistical significance in the identified trends in the data. Higher values of Z correspond to higher confidence in the detected trend (Appendix A). As seen in Table 2, the global average of Z for observation-based trends is very small, further confirming the weak negative global mean trend in observation-based historical runoff, while the ISI-MIP outputs give a positive average trend (Table 3). Table 2. Global and continental-averaged trend results of the WBM runoff simulations based on the WFD observation-based inputs, from 1971 to 2001. The Qmed and Z-score indices are obtained from the Sen's slope estimator and Mann-Kendall methods, respectively, for the runoff.   Table 4 presents the global and continental averaged trends in observation-based streamflow (i.e., generated by WBM model, using observation-based inputs from WFD). The discharge magnitude (Q) values are per grid cell. Since both WFD and ISI-MIP dataset have the same spatial resolutions, the numbers are comparable between those two. Results show that global mean streamflow has decreased at a rate of 0.041% per year from 1971 to 2001, which is very similar to the runoff trend. Similar to those obtained from runoff data analysis, linear regression indicates that 43.4% of the grid cells show positive trend in streamflow, including 5.3% that are statistically significant at 95% confidence level. The remaining 56.6% of the grid cells show negative trend, including 8.3% that are statistically significant at 95% confidence level. This implies that trend in 86.4% of the grid cells is not significant at 95% confidence level. According to Table 4, Europe has had average streamflow increase by 4.9% over 31 years. South America, Oceania, North America, Asia and Africa have experienced decreasing trends of 11.5%, 9.4%, 8.7%, 4.7% and 3.5%, respectively.  models, respectively. All the maps presented for the ISI-MIP models in this study represent the average of 5 models. The two maps show signs of general agreement between the results of the GCMs (Figure 3d) and observation-based simulations (Figure 3a) in terms of the spatial distribution of annual-average runoff. Figure 3 also shows the global maps of relative change in annual-average runoff from 1971 to 2001 simulated by the WBM model based on either the inputs from the 5 ISI-MIP models (Figure 3f) or WFD observation-based data (Figure 3c).  Table 4 presents the global and continental averaged trends in observation-based streamflow (i.e., generated by WBM model, using observation-based inputs from WFD). The discharge magnitude (Q) values are per grid cell. Since both WFD and ISI-MIP dataset have the same spatial resolutions, the numbers are comparable between those two. Results show that global mean streamflow has decreased at a rate of 0.041% per year from 1971 to 2001, which is very similar to the runoff trend. Similar to those obtained from runoff data analysis, linear regression indicates that 43.4% of the grid cells show positive trend in streamflow, including 5.3% that are statistically significant at 95% confidence level. The remaining 56.6% of the grid cells show negative trend, including 8.3% that are statistically significant at 95% confidence level. This implies that trend in 86.4% of the grid cells is not significant at 95% confidence level. According to Table 4, Europe has had average streamflow increase by 4.9% over 31 years. South America, Oceania, North America, Asia and Africa have experienced decreasing trends of 11.5%, 9.4%, 8.7%, 4.7% and 3.5%, respectively. Table 5 presents the global average of results with the 5 GCMs from ISI-MIP, as well as the global average of observation-based simulations, for annual-mean streamflow and annual-maximum 1-day streamflow. Table 5 shows that GCMs show fair agreement with observations for the simulated  Table 5 presents the global average of results with the 5 GCMs from ISI-MIP, as well as the global average of observation-based simulations, for annual-mean streamflow and annual-maximum 1-day streamflow. Table 5 shows that GCMs show fair agreement with observations for the simulated average magnitude of mean streamflow, as seen for average magnitude of mean runoff. However, GCMs show an average increasing trend of 0.028% per year for mean discharge, despite the decreasing trend seen for the observation-based simulations. Table 5 shows that based on the ISI-MIP output, maximum daily discharge has also increased, at a faster rate of 0.032% per year. GCMs on average show increasing trend in mean streamflow for 62.4% of the grid cells and decreasing trend for the other 37.6%. Similar to mean streamflow, extreme (annual maximum 1-day) streamflow shows increasing trend for 64.8% and decreasing trend for 35.2% of the grid cells. An assessment of the trend in annual-maximum 7-day streamflow also resulted in similar numbers (63.2% increase and 36.8% decrease). This finding is in line with the earlier study that shows increases in the 30-y return level of 5-d average peak flows (Q30) over approximately two-thirds of the global land area, in data provided by ISI-MIP from 1971 to 2000 [6].   (Figure 4f), as well as observation-based WFD data (Figure 4c). Figure 5 shows the boxplots of the results presented in Tables 3 and 5 for mean runoff and mean streamflow. The ISI-MIP models show the opposite direction of the global average trend in runoff and streamflow, compared to observation-based simulations, as stated before. However, the disagreement is largest for the continents of North America, South America and Asia, while the observation-based average trend in Europe is closes to the median of the models (Figure 5c,f).    Tables 3 and 5, for mean runoff and mean streamflow. The ISI-MIP models show the opposite direction of the global average trend in runoff and streamflow, compared to observation-based simulations, as stated before. However, the disagreement is largest for the continents of North America, South America and Asia, while the observation-based average trend in Europe is closes to the median of the models (Figure 5c,f).
Comparing Tables 3 and 5 reveals that value of relative change in streamflow for GCMs and observation-based simulations are very similar to those of runoff, which means that globally, changes in runoff are well translated to corresponding changes in streamflow. Comparing Figure 5c,f also shows such agreement in continental scale as well. Table 4 shows an average 0.30% per year decrease in observation-based streamflow simulations in Oceania. Figure 4c shows that Australia has very mixed trends in streamflow though, as western Australia shows a significant increase in discharge while the eastern part shows significant decrease. By contrast, GCM inputs yield increase in discharge throughout Australia (Figure 4f), although uncertainty of the models in trend direction is very high for Oceania compared to the other continents ( Figure 5f). As seen in Figure 4a,d, those regions have very low flow rate and hence are very sensitive, as small absolute changes in streamflow may result in high relative changes.
GCMs perform poorly in capturing the trends in regions with mixed changes in runoff and streamflow (and often low absolute runoff rates), for instance northern Africa and Australia (compare Figure 4c,f). Additionally, GCMs show higher rates of disagreement among each other in such regions, as illustrated by the wider range in the boxplots, compared to other continents (Figure 5c,f). Comparing Tables 3 and 5 reveals that value of relative change in streamflow for GCMs and observation-based simulations are very similar to those of runoff, which means that globally, changes in runoff are well translated to corresponding changes in streamflow. Comparing Figure 5c,f also shows such agreement in continental scale as well. Table 4 shows an average 0.30% per year decrease in observation-based streamflow simulations in Oceania. Figure 4c shows that Australia has very mixed trends in streamflow though, as western Australia shows a significant increase in discharge while the eastern part shows significant decrease. By contrast, GCM inputs yield increase in discharge throughout Australia (Figure 4f), although uncertainty of the models in trend direction is very high for Oceania compared to the other continents ( Figure 5f). As seen in Figure 4a,d, those regions have very low flow rate and hence are very sensitive, as small absolute changes in streamflow may result in high relative changes.
GCMs perform poorly in capturing the trends in regions with mixed changes in runoff and streamflow (and often low absolute runoff rates), for instance northern Africa and Australia (compare Figure 4c,f). Additionally, GCMs show higher rates of disagreement among each other in such regions, as illustrated by the wider range in the boxplots, compared to other continents (Figure 5c,f).  Figure 6 shows the maps and boxplots for extreme runoff and streamflow. Changes in extreme runoff and streamflow are very similar to those of mean runoff and streamflow, in terms of global and continental averages, although the maximum parameters show relatively faster rate of change compared to mean parameters (Tables 3 and 5). This is similar to what is seen for precipitation in many studies, where extreme (e.g., 1-day maximum) precipitation is increasing faster than mean precipitation. Figure 6c shows relative change in annual-maximum 7-day streamflow for the ISI-MIP models, which is very similar to those in maximum 1-day streamflow (extreme streamflow) and mean streamflow.  Figure 6 shows the maps and boxplots for extreme runoff and streamflow. Changes in extreme runoff and streamflow are very similar to those of mean runoff and streamflow, in terms of global and continental averages, although the maximum parameters show relatively faster rate of change compared to mean parameters (Tables 3 and 5). This is similar to what is seen for precipitation in many studies, where extreme (e.g., 1-day maximum) precipitation is increasing faster than mean precipitation. Figure 6c shows relative change in annual-maximum 7-day streamflow for the ISI-MIP models, which is very similar to those in maximum 1-day streamflow (extreme streamflow) and mean streamflow. Boxplots of trends simulated under climate forcing from ISI-MIP models (minimum, 25th percentile, median, 75th percentile and maximum of the 5 model runs) for relative change in annual-maximum 1-day runoff (%·year −1 ) (b), annual-maximum 1-day streamflow (%·year −1 ) (e) and annual-maximum 7-day streamflow (%·year −1 ) (c).

Conclusions
Runoff and streamflow simulated by the WBM hydrological model based on either observationbased meteorological inputs from WATCH Forcing Data (WFD) or the bias-corrected outputs of GCMs, provided by the ISI-MIP, have been statistically analyzed. Initial comparison of the WBM streamflow with station observation streamflow data over the coterminous US shows that WBM can predict relative change in streamflow with suitable accuracy.
On global average, the mean runoff simulated based on meteorological observation-based data has decreased from 1971 to 2001. GCMs show very good agreement with observation-based estimates in case of average magnitude of mean runoff, which is expected given the bias correction procedure applied on the GCM outputs. The GCMs on average show increasing trend in mean runoff between 1971 and 2001, with an inter-model global average of 1% over that 31-year period, despite the 1.3% decrease seen in observation-based simulations. The observation-based mean streamflow has decreased by an average of 1.3% from 1971 to 2001. The GCM average shows a global averaged increase of 0.9%. In fact, each of the 5 GCMs considered yields larger global average trend for mean runoff and streamflow, compared to observation-based estimates. Results show that values of relative change in mean streamflow for GCMs are very similar to those of runoff, on global and continental scale, which means that changes in runoff have been well translated to corresponding changes in streamflow. Boxplots of trends simulated under climate forcing from ISI-MIP models (minimum, 25th percentile, median, 75th percentile and maximum of the 5 model runs) for relative change in annual-maximum 1-day runoff (%¨year´1) (b), annual-maximum 1-day streamflow (%¨year´1) (e) and annual-maximum 7-day streamflow (%¨year´1) (c).

Conclusions
Runoff and streamflow simulated by the WBM hydrological model based on either observation-based meteorological inputs from WATCH Forcing Data (WFD) or the bias-corrected outputs of GCMs, provided by the ISI-MIP, have been statistically analyzed. Initial comparison of the WBM streamflow with station observation streamflow data over the coterminous US shows that WBM can predict relative change in streamflow with suitable accuracy.
On global average, the mean runoff simulated based on meteorological observation-based data has decreased from 1971 to 2001. GCMs show very good agreement with observation-based estimates in case of average magnitude of mean runoff, which is expected given the bias correction procedure applied on the GCM outputs. The GCMs on average show increasing trend in mean runoff between 1971 and 2001, with an inter-model global average of 1% over that 31-year period, despite the 1.3% decrease seen in observation-based simulations. The observation-based mean streamflow has decreased by an average of 1.3% from 1971 to 2001. The GCM average shows a global averaged increase of 0.9%. In fact, each of the 5 GCMs considered yields larger global average trend for mean runoff and streamflow, compared to observation-based estimates. Results show that values of relative change in mean streamflow for GCMs are very similar to those of runoff, on global and continental scale, which means that changes in runoff have been well translated to corresponding changes in streamflow.
Relative changes in extreme runoff (and streamflow) are very similar to those of mean runoff (and streamflow), in terms of global and continental averages. However, extreme runoff (and streamflow) show relatively faster rate of change compared to mean runoff (and streamflow). Relative change in annual-maximum 7-day streamflow for the GCMs is also very similar to those of annual-maximum 1-day streamflow (extreme streamflow) and of mean streamflow.
Observation-based estimates show increasing trend in mean runoff and streamflow for about one-half of the land areas and decreasing trend for the other half (Only a minority of these gridcell-level trends are statistically significant at 95% confidence level). However, on average, mean and extreme runoff and streamflow based on GCMs show increasing trend for approximately two-thirds of the global land area and decreasing trend for the other one-third.
Results show that despite the bias-correction procedure applied on the GCM outputs in ISI-MIP, streamflow simulated based on corrected GCM outputs shows different trends compared to observation-based simulations. These differences in streamflow trends simulated over recent decades, depending on whether observation-based or bias-corrected GCM-based climate forcing is used to drive a hydrological model, deserve further investigation to understand the main contributing factors and to refine predictions for changes in water resource availability and flood hazards over the coming decades.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Mann-Kendall Trend Test
The MK test is a non-parametric rank based test [54,55]. The Mann-Kendall test statistic S is calculated as: where n is the number of data points, x i and x j are the data values in time series i and j (j > i), respectively, and sgn(x j´xi ) is the sign function: sgn`x j´xi˘" The variance is computed using the equation: Var pSq " n pn´1q p2n`5q´ř m i"1 t i pt i´1 qp2t i`5 q 18 (A3) where n is the number of data points, m is the number of tied groups and t i is the number of ties of extent i. A tied group is a set of sample data having the same value. In cases where the sample size n > 10, the standard normal test statistic Z S is computed as: The sign of Z S indicates the trend in the data series, where positive values of Z S means increasing trend, while negative Z S values show decreasing trends. For the tests at a specific α significance level, if |Z S | ą Z 1´α{2 , the null hypothesis is rejected and the time series has a statistically significant trend. Z 1´α{2 is obtained from the standard normal distribution table, where at the 5% significance level (α = 0.05), trend is statistically significant if |Z S | ą 1.96 and at the 1% significance level (α = 0.01), trend is statistically significant if |Z S | ą 2.576.