Hydrologic Model Evaluation and Assessment of Projected Climate Change Impacts Using Bias-Corrected Stream Flows

: Hydrologic models driven by downscaled meteorologic data from general circulation models (GCM) should be evaluated using long-term simulations over a historical period. However, simulations driven by GCM data cannot be directly evaluated using observed ﬂows, and the conﬁdence in the results can be relatively low. The objectives of this paper were to bias correct simulated stream ﬂows from calibrated hydrologic models for two basins in New Jersey, USA, and evaluate model performance in comparison to uncorrected simulations. Then, we used stream ﬂow bias correction and ﬂow duration curves (FDCs) to evaluate and assess simulations driven by statistically downscaled GCMs for the historical period and the future time slices 2041–2070 and 2071–2099. Bias correction of stream ﬂow from simulations increased conﬁdence in the performance of two previously calibrated hydrologic models. Results indicated there was no difference in projected FDCs for uncorrected and bias-corrected ﬂows in one basin, while this was not the case in the second basin. This result provided greater conﬁdence in projected stream ﬂow changes in the former basin and implied more uncertainty in projected stream ﬂows in the latter. Applications in water resources can use the methods described to evaluate the performance of GCM-driven simulations and assess the potential impacts of climate change with an appropriate level of conﬁdence in the model results.


Introduction
Estimation and simulation of the flow regime and hydrologic indices require longer time periods, especially for annual time series, which can ensure natural climate variability is included in model calibration and evaluation [1,2]. Calibration of hydrologic models for long-term simulations is done with objective functions on observed daily mean flows, monthly mean flow, and/or peak flows across multiple time resolutions [3,4]. In a non-stationary context, there are limitations to applying calibrated models [5]. Hydrologic simulations for climate change impact analysis are driven by downscaled general circulation models (GCM) [6][7][8], and such simulations cannot be directly calibrated with observed flows since the inputs differ for historical simulations. Additionally, there are no future observed data; therefore, it is not possible to calibrate or evaluate the model performance of hydrologic simulations for potential future climates. The absence of such data creates a situation similar to the development of models for ungauged basins [9][10][11], where regionally applicable measures of stream flows are required for model calibration and evaluation. Some questions remain as to the level of confidence for applications of hydrologic models driven by GCMs to local scales [12]. A potential solution to this problem is to use flow duration curves (FDC) and bias correction of stream flows for the evaluation of hydrologic models driven by statistically downscaled meteorological data.
FDCs have been used as objective functions for model calibration [1,13,14] and have a wide range of uses in water resource applications because they are able to simply convey relatively complex information about the hydrology of a system to decision makers [11,15]. Westerberg et al. [1] suggested using FDCs for calibration when the range of observed flows is outside the range of model input data and that important aspects of the flow regime ("hydrological signatures") and simulated uncertainty have a more direct interpretation than other efficiency measures. A potential additional step to evaluate model performance involves bias correction of simulated stream flows with observed stream flows using FDCs. This is distinct from bias correction techniques that are required for both statistically and dynamically downscaled meteorological data from general circulation models (GCM) used to drive climate change impact assessment in hydrology [16,17], which is not discussed in this paper.
Daraio [18] used the precipitation-runoff modeling system (PRMS), a distributed hydrologic model [19] that has been widely used for climate change impact assessment [20][21][22], to simulate the hydrologic response of two watersheds on the Coastal Plain in New Jersey (NJ) under selected climate change scenarios. The region is expected to see increases in temperature of around 4 • C and a 6-10% increase in precipitation for high emission scenarios [23]. Model performance was deemed satisfactory, and simulations were able to provide some information on potential seasonal changes in flow and groundwater recharge. However, the model output was biased toward overestimation of low flows and underestimation of high flows. Given that this indicated a systematic bias in simulations, it was conjectured that bias correction of flow output using FDCs could be used to both more fully evaluate model performance and improve the simulation results. Additionally, it was recognized that a more thorough assessment of the models' abilities to simulate long-term flow regimes was required to increase confidence in the results of climate change projections of stream flows in these basins [24].
Stream flows are well represented by a lognormal distribution, and the FDC is equivalent to the empirical cumulative distribution function for flows, which makes it ideal for parametric bias correction techniques. Bias correction and other correction methods have been applied to hydrologic model output of flows for hydrologic forecasting with success [25][26][27][28]. Bias-corrected flows increased the accuracy of ensemble forecasts [29] and improved forecasts using a coupled GCM [30]. Bias correction of flows has the advantage of being applied after parameter estimation and could help address the problem of equifinality [31]. Bias correction of simulated flows from hydrologic models could be used to both more fully evaluate model performance and improve simulation results, in particular for applications to ungauged basins and for future projected flows in a single basin.
The FDC is a representation of the hydrologic regime that is a function of both climatic drivers and watershed properties. Assuming watershed properties are well represented, simulations using climatic drivers from statistically downscaled GCMs should capture the overall long-term hydrologic regime, represented by the FDC. It is hypothesized that bias correction of stream flows and hydrologic analysis using FDCs can be used to evaluate and assess confidence in hydrologic simulations driven by statistically downscaled GCMs. The objectives of this paper were to use FDCs (1) to bias correct simulated stream flows from calibrated PRMS models for two basins in NJ and evaluate model performance in comparison to uncorrected simulations, (2) to use bias correction and FDCs to evaluate the performance of rainfall-runoff simulations driven by statistically downscaled GCMs over the historical period 1956-2005, and (3) to assess potential changes to FDCs using uncorrected and bias-corrected stream flows from climate change projections of the future time slices 2041-2070 and 2071-2099.

Overview
Previously calibrated PRMS models [18] were used to simulate stream flows in two watersheds in New Jersey, USA, for the historical period  and for the future period 2041-2099. The simulated stream flows were used in the following manner. (1) Simulated stream flows driven by observed meteorological data from these models were bias corrected using FDCs developed from observed stream flows in each basin. (2) The PRMS simulations driven by statistically downscaled GCM meteorological data over the historical period 1956-2005 for both uncorrected and bias-corrected stream flows were evaluated using FDCs from observed and simulated stream flows. (3) Uncorrected and bias-corrected simulated stream flows driven by statistically downscaled GCM data for climate change projections of the future time slices 2041-2070 and 2071-2099 were assessed using FDCs.
Model performances for simulations driven by observed data for the historical period of water years 1956-2005 were assessed using several goodness-of-fit (GoF) measures and compared with GoF after bias correction of simulated stream flows. Meteorological data (bias corrected with constructed analogs, BCCA, maximum and minimum daily temperature and daily precipitation) from 15 different GCMs from the CMIP5 multi-model ensemble (Table 1) were used to drive calibrated PRMS models for the two watersheds. The GCM-driven stream flow simulations were evaluated and bias corrected for model error, and results from these simulations were used in the analyses described below.

Site
The Batsto and the upper Maurice basins are located on the coastal plain of NJ, which ranges in elevation from 119 m to sea level and has a surficial geology that consists of unconsolidated to semi-consolidated material. The Batsto watershed is located entirely within the Pinelands region, which extends from central NJ to just north of Delaware Bay (Figure 1), and the majority of the basin is within the Pinelands National Reserve. The area primarily consists of sandy, acidic soil that sits atop the Kirkwood-Cohansey Aquifer. Coarse sands within the Pinelands are porous in nature, which allows for rapid infiltration. The upper Maurice watershed is adjacent to this region and includes a small part of the western border of the Pinelands, and the soils in the upper Maurice basin are similar to those in Batsto. Soils in the Batsto basin are 85% sand, and soils in the Maurice basin are 78% sand. The Maurice River becomes an estuary downstream of Union Lake, in Millville, NJ, and the extent of the area of the upper Maurice begins upstream of the entrance to the lake. The Batsto watershed is dominated by forest (60%) and wetlands (25%), and the upper Maurice watershed is mixed urban (28%), forest (30%), and agriculture (22%) lands. These two watersheds are near enough to share a very similar climatic regime with important differences in land use and degree of urbanization.

Hydrologic Simulations and Bias Correction
Simulations for the historical period were run from water years 1956-2005 using observed meteorologic data and meteorologic data derived from 15 GCMs (Table 1). Simulations were run using GCM derived data for the time slices 2041-2070 and 2071-2099.

Simulations Using Observed Data
The PRMS models were previously calibrated over the years 1989-1995 and validated over 1996-2003 by Daraio [18]. In this study, simulations were evaluated against observed flows for the historical period of water years 1956-2005 using several GoF measures (Table 2). Uncorrected simulated flows were bias corrected using quantile mapping based on the full record of flows. Bias correction was done using the flow duration curves from the entire period  instead of using part of the period for calibration and part for evaluation. This choice was made because a significant part of any bias is random, and a long time period is recommended for calibration of the bias correction [17].
Lognormal distributions were fit to FDCs, and parameter estimates were obtained, µ and σ, for all simulations using maximum likelihood estimation (MLE) with the R package "fitdistrplus" [32]. FDCs for observed and simulated flows were developed, and PRMS model bias was estimated and corrected in each basin using quantile mapping in R [33] with the R package "qmap" [34,35]. The general approach (see Gudmundsson et al. [34] for details) transforms the simulated data, P m , using the following.
where P o is the observed data, F −1 o is the inverse CDF for P o , and F m is the CDF for P m . The CDF was the quantile function estimated from the data that used a quantile step of 0.01. Bias correction was applied for daily mean flows over the full period 1956-2005 based on the composite FDC. As with uncorrected simulations, performance for bias-corrected simulations was evaluated using GoF measures.
Confidence intervals for FDCs were estimated at the 10% and 90% exceedance levels. The 10% and 90% FDCs and a median FDC were estimated by sorting each annual flow in decreasing order, high flows to low flows, then taking the 10%, 50%, and 90% quantiles for each associated probability of exceedance p e , where: from the annual maximum flow to the lowest annual flow. The 10% FDC represents the FDC for flows that were in the 90th percentile (greater than 90% of all values) of flows for each p e,i . Likewise, the 90% FDC represents the FDC for flows that were in the 10th percentile of flows for each p e,i .

Goodness-of-Fit
Goodness-of-fit for simulations driven by observed historical climate data were evaluated using the R package "hydroGoF" [36] ( Table 2). Note that the seasonal definition for this analysis does not follow the traditional definitions that were used in the GoF evaluations. The season definition was based on the exceedance hydrograph for observed flows (Figure 2 and Daraio [18]): winter = JFM and part of April, spring = AMJJ, summer = JASO, and autumn = OND. Seasonal definitions for the GoF measures were winter = DJF, spring = MAM, summer = JJA, and autumn = SON. Therefore, the GoF measures for seasonal flows provided a robust measure of model performance at these time scales. Table 2. Goodness-of-fit measures.

ME
Mean Error

Simulations Using GCM Data
Statistically downscaled bias-corrected (1/8 • × 1/8 • ) constructed analogs (BCCA) V2 daily climate projections from the CMIP5 multimodel ensemble [37][38][39] were used to derive meteorological data (i.e., precipitation, maximum, and minimum air temperature) to drive PRMS models. The CMIP5 general circulation models used in this analysis are listed in Table 1. Shapefiles of both basins with delineated HRUs were uploaded to the USGS Geo Data Portal, and the area-weighted grid (12 km) statistics algorithm was used to obtain daily mean precipitation, maximum air temperature, and minimum air temperature for each HRU. These data were evaluated and validated prior to being made publicly available on the data portal by Bracken [40]. Data for the historical period of 1955-2005 and future projections for the period 2020-2100 were downloaded for each CMIP5 model. A total of 132 climate projections were used for simulations in each basin to drive PRMS simulations and obtain stream flow under two different representative concentration pathways (RCP) for emissions leading to 4.5 and 8.5 additional Wm −2 by the end of the 21st Century. The selection of CMIP5 climate projections was based on data availability from the USGS Geo Data Portal. PRMS simulations for the historical period (water years 1955-2000) provided baseline data from which climate change measures were calculated. The baseline values were calculated from either one historical run, for GCMs with only one such run, or the ensemble mean from all historical runs combined for GCMs with multiple historical runs. Simulations driven by statistically downscaled meteorological data were corrected for PRMS model bias, and results from uncorrected simulations and model-corrected were compared for the historical period  and for the future periods 2041-2070 and 2071-2099. Bias correction of stream flows was also done directly on uncorrected GCM-driven stream flow simulations with observed data Changes in FDCs were analyzed qualitatively by plotting the projected median FDC, 10% exceedance FDC, and 90% exceedance FDC curves and baseline FDC curves at the same exceedances. Quantitative estimates of the proportional change PC (percent change expressed as a decimal) were obtained using GCM-driven simulations for the historical period as a baseline. Changes in stream flows from the projected median FDC, 10% exceedance FDC, and 90% exceedance FDC curves were estimated using: where Q f p e,i is the GCM-driven projected stream flow for exceedance probability p e,i in the 10%, median, or 90% FDC, Q h p e,i is the GCM-driven simulated historical flow in the 10%, median, or 90% FDC for exceedance probability p e,i , and PC p e,i is the proportional change in stream flow in the 10%, median, or 90% FDC for exceedance probability p e,i . Positive values of PC indicate an increase in flow in the future, and negative values represent a decrease of flow in the future. Future uncorrected simulations were compared with baseline uncorrected simulations, and future model-corrected simulations were compared with model-corrected baseline simulations in all cases.

Goodness-of-Fit
Uncorrected simulations of stream flow for water years 1956-2005 performed relatively well in both basins based on several GoF measures used to evaluate the models (Figures 3 and 4). Overall, based on GoF measures, the model performed better in the Batsto basin than in the Maurice basin for daily flows by most measures with less of a difference for monthly flows. This was also the case for GoF measures for uncorrected annual and seasonal flows (Table 3). In the Maurice basin, the monthly R 2 and Nash-Sutcliffe Efficiency (NSE) measures for uncorrected flows improved from ≈0.4 to 0.7 compared with daily flows, and most measures indicated a better fit of model simulations for monthly mean flows over daily flows in both basins.
Uncorrected simulations showed a wide range in GoF measures for seasonally averaged flows. For both basins, absolute PBIAS was greater for seasonally averaged flows than for daily, monthly, and annual mean flows. In Batsto, seasonal bias ranged from −17% (winter) to 30% (summer), while the daily, monthly, and annual PBIASs were −1.4%, −1.4%, and −2.8%, respectively ( Table 3). The same trend was apparent in Maurice.   Bias correction of flows using the FDC for the full record in each basin improved the fit of the overall FDC as expected. However, GoF measures did not improve consistently for bias-corrected simulations ( Figure 4 and Table 3). For bias-corrected simulations of Batsto River flows, there were improvements in fit according to the measures of mean error (−0.09 to −0.03) and PBIAS (−2.8% to ≈−0.9%) for daily and monthly mean flows and KGE from 0.62 to 0.75 and 0.71 to 0.77 for daily and monthly mean flows, respectively. For bias-corrected simulations of Maurice River flows, there were improvements in fit according to the measures of mean error (≈0.32 to ≈−0.08) and PBIAS (≈7 % to −1.7%) for daily and monthly mean flows and Kling-Gupta Efficiency (KGE) from 0.63 to 0.87 and 0.72 to 0.80 for daily and monthly mean flows, respectively.
Bias correction of flows did not improve the overall performance of the model on simulations of annual mean flows in either basin, with the exception of PBIAS, which was expected from the bias correction. Goodness-of-fit for seasonally averaged bias-corrected flows was variable in a similar manner to the uncorrected simulated flows. Model PBIAS increased in some seasons and decreased in others for bias-corrected simulations compared to uncorrected simulations. Overall, for both basins, PRMS simulations over the time period  simulated here compared with the shorter evaluation period (1997-2003) from [18] performed as well or better for some measures and worse for others.

Flow Duration Curves
Each basin showed a different pattern of variation in annual FDCs ( Figure 5). Observed annual FDCs varied to about the same magnitude across all flow exceedances in the Maurice basin. Observed annual FDCs showed less variability at low flows (high exceedance) in the Batsto basin and greater variability for high flows. Simulated FDCs did not fully capture the overall FDC or the variation of annual FDCs in either basin. Simulations in both basins tended to underestimate high flows and overestimate low flows ( Figure 6). Variation at low exceedance levels was underestimated in the Maurice River, and variation at high exceedance levels was overestimated in the Batsto River.
In the Maurice basin, there was a greater tendency for uncorrected simulations to underestimate high flows than to overestimate low flows. Simulations for the Batsto River, on the other hand, underestimated high and overestimated low flows to a similar degree (Figure 7). The FDC in the Maurice basin was smoother than that for the Batsto for flows with p e > 0.95 (low flows). Simulated flows were able to capture this sharper drop in low flows for the Batsto; however, this part of the FDC was not fit well by a lognormal distribution based on the entire range of flows.
The variations in observed annual FDCs ( Figure 6) were relatively well represented by uncorrected simulated flow duration curves in both basins. Differences in uncorrected simulated flow bias for each basin were more apparent when comparing the annual FDCs and the 10% and 90% FDCs. For instance, the consistent underestimation of high flows and underestimation of low flows was clearer in Batsto. The middle of the FDC (0.25 ≤ p e ≤ 0.75) was well represented by uncorrected simulations, as well as the variance in annual FDCs. In Maurice, for p e ≤ 50%, the tendency of uncorrected simulations to underestimate flows can be seen clearly. Uncorrected simulations in Maurice also showed slightly lower variation than observed flows in the annual FDCs. Bias correction of stream flow for the calibrated PRMS model was able to correct the systematic model error of the overestimation of low flows and underestimation of high flows in both basins ( Figure 6). Bias-corrected simulations for both basins tended to increase the spread of the 10% and 90% FDCs over the full range of p e with the exception of at low p e in Maurice, which indicated greater uncertainty with bias correction. Overall, FDCs representing bias-corrected simulations indicated that the model performed better in the Maurice River than in the Batsto River for both high and low flows and with respect to the variation in flows.

Simulations Using GCM Data
Hydrologic simulations driven by GCM data could only be evaluated using FDCs. Estimated parameters for the fitted lognormal distributions for uncorrected and model-corrected simulations driven by GCM data showed greater variation than parameters bias corrected directly with observed data (Figure 8). GCM-driven simulated stream flows showed relatively low overall variation for estimated parameters over the historical period and increased variation in parameters for climate change projections, which indicated over-correction of stream flows. It is likely that bias correction of GCM-driven stream flow simulations with observed flow data, in this case, reduced the simulation of the natural variation of the hydrologic model. Therefore, the analysis of climate change impacts on flow duration curves was done using only model-corrected GCM simulated stream flows.

Historical Simulations
Overall, the annual FDCs from GCM-driven stream flow simulations for the historical period for uncorrected and model-corrected stream flows fell within the range of observed annual FDCs ( Figure 9). Annual FDCs from GCM-driven simulated stream flows for the historical period showed a wide range of variability dependent on the GCM (supplemental figures), and some models showed greater variation than others. FDCs based on the ensemble of uncorrected GCM simulations for the historical period tended to overestimate low flows and underestimate high flows (Figure 9) to an even greater degree than uncorrected simulations driven by observed meteorological data. Application of the model correction to GCM-driven simulations greatly reduced this bias, but did not fully eliminate it. Model correction of GCM-driven simulations increased the variance of annual FDCs in both basins.
The 90% FDC for high flows from uncorrected GCM-driven simulations of the Batsto River were aligned with the median FDC of observed high flows, p e 0.05, whereas the the opposite was the case for low flows with p e 0.95. Implementing the model correction for GCM simulated stream flows in Batsto was able to correct for bias in most of the high flows and greatly improved the estimates of low flow frequencies. A similar trend was apparent in the Maurice River, though to a lesser extent at both high and low flows. The model correction of GCM-driven simulations in Maurice improved estimates of the frequency distribution, but not as much as in Batsto.

Projected Simulations
The projected changes in parameter values for the full FDCs indicated a much greater uncertainty in estimates of FDC than can be seen in the qualitative assessment of annual FDCs. Overall estimates of σ did not change much for projected distributions; however, there was much greater variation in estimates of the mean, µ, for projected distributions of stream flows, or FDCs ( Figure 8 (Figures 10 and 11). As with historical simulations, FDCs from GCM simulated stream flows for the historical period showed a wide range of variability dependent on the GCM (supplemental figures). The FDCs showed an increase in variation (wider spread in 10% and 90% FDCs) with climate change across the full range of p e in both basins. The projected impacts of climate change on the FDCs were well quantified by measures of the proportional change in flows by quantile, PC p e (Figures 12 and 13).   For the Batsto River, the median FDC and 10% FDC were projected to increase across all probabilities under both RCPs in both time slices (2041-2070 and 2071-2099). Model-corrected simulations indicated a slightly greater increase in flow levels than uncorrected projected stream flows. The 10% FDC for both model-corrected and uncorrected projected flows increased for higher flows (low p e ) and decreased for low flows (high p e ). There was a greater increase in the projected median FDC for high flows than for low flows as indicated by a slight negative slope of the median FDC difference curve. Under RCP 4.5 for 2071-2099, both median and 90% FDCs changed about the same amount, and under RCP 8.5, the median FDC seemed to return closer to the baseline FDC despite a relatively large increase in the 90% FDC. The magnitude of low flows showed an increase under RCP 4.5 of around 5% (Figure 12), but under RCP 8.5, projections indicated little or no increased flow at these high p e levels. At p e < 0.1, stream flows were projected to increase by a greater amount than for low flows (p e > 0.9) under both RCPs, and a greater increase of high flows (p e ≥ 0.2) was projected under RCP 8.5. For the Maurice River, there was little or no difference between uncorrected and model-corrected simulations of projected stream flows ( Figure 13). Under RCP 4.5 for both time slices, the change in median FDCs indicated that flows were not projected to increase by more than around 2.5% at most p e , with slightly greater increases in flow indicated at p e 0.02. Under RCP 8.5, projections for 2041-2070 indicated that median flows will not change except at the extremes, p e > 0.9 and < 0.1. Projections also indicated an increase of 5-10% in the 10% FDC for p e > 0.75 and p e < 0.1 suggesting that higher low flows and greater high flows may occur in the future. Projections in FDCs for 2071-2099 under RCP 8.5 showed a decrease in the median FDC in Maurice for all p e 0.07, but with an increase in flows with p e 0.05 and extreme high flows could increase by > 10%. Furthermore, projections suggested that flows ranging from 0.1 ≤ p e ≤ 0.9 could decrease slightly.

Discussion
An important consideration when evaluating hydrologic model performance is whether or not the model can be considered an appropriate representation, or hypothesis, of the relevant processes in a simulated basin [41,42], which can be difficult due to uncertainty in the modeling chain. For climate change impact assessments, top-down approaches propagate uncertainty from emission scenarios, global climate models, regional climate models, through downscaling and bias correction (of meteorological data) methods [17] to data that are used to drive hydrologic models. Additionally, there is error in observed data used to estimate model parameters (e.g., land use and soils), meteorological data used as input (temperature and precipitation), and flows used for calibration/evaluation. An important question remains, "Are we getting the right answers for the right reasons?" [1]. This limits the application of hydrologic models to assess the potential impacts of climate change for local and regional decision making due to the uncertainty of model outputs.
Results were reported by Daraio [18] with relatively low overall confidence in large part because of the limited evaluation period and existing bias in stream flow simulations. The extended evaluation of model simulations done here provided an assessment of the performance over a long-term historical time period that extended beyond the calibration period, which should be done before its use to project potential future stream flows [24]. The analysis using bias correction of stream flows from PRMS simulations allowed for more confidence in projected climate change impacts in both basins, and in the Maurice River in particular.

Performance of the Hydrologic Model
Goodness-of-fit (GoF) measures are the traditional way to assess the performance of hydrologic models. Moriasi et al. [43] suggested that for watershed models, GoF measures for daily, monthly, or annual simulations should have values of R 2 > 0.60, NSE > 0.50, and PBI AS ≤ ±15%. Evaluation of model performances using these metrics suggested that the model in the Batsto River performed adequately on daily mean, monthly mean, annual mean, and (most) seasonal mean flows, but only monthly mean flows were adequately simulated in the Maurice River. Ritter and Muñoz-Carpena [44] suggested that an acceptable model would have an NSE ≥ 0.65, in which case only annual flow simulations in the Batsto River and annual and monthly mean flows in the Maurice River would be considered adequate. The use of the coefficient of variation (R 2 ) and other correlation measures for model calibration and evaluation has been questioned [45], but it was included here primarily for model comparison. Additionally, several variants of the NSE were used to supplement this efficiency measure. The Kling-Gupta efficiency improves on the NSE because it includes more information on model performance including correlations, bias, and variability [36,46]. The index of agreement d provides a measure of differences in the means for observed and simulated flows and does not include a correlation measure, though the measure is sensitive to the influence of extreme values due to the use of squared differences [45]. The Volumetric Efficiency (VE) represents the fraction of water delivered at the proper time and in itself provides some measure of the skill of the model to capture an important aspect of the flow regime. Including these GoF measures for models in both basins indicated that PRMS simulations over the time period 1956-2005, compared with the shorter evaluation period (1997-2003) from Daraio [18], performed as well or better for some measures and worse on others. Overall, this increased the confidence in model performance for both basins.
All measures of GoF for daily mean and monthly mean flows in Batsto were slightly better or about the same over 1956-2005 compared with the shorter evaluation period. This was not the case for daily mean flows in the Maurice River where GoFs were better for the shorter evaluation period, but monthly mean flow GoFs for 1956-2005 were about the same for most measures. On a seasonal basis, there were no consistent patterns in the differences in GoF measures between the longer and shorter evaluation periods in each basin. Some measures improved over the longer period, and others indicated poorer performance. Both models performed better for monthly mean flows, which may be more relevant for use in long-term climate change projections. Bias correction of simulated flows using composite FDCs from previously calibrated PRMS models was able to improve model performance on some measures of GoF, and overall, they were as good as uncorrected simulations. The most important improvement was the reduction in the degree of overestimation (underestimation) of low (high) flows in both basins for bias-corrected simulated stream flows.
While it is difficult to objectively interpret the GoF measures, in the context of each model being a hypothesis for the relevant processes in the basins [42], there is not enough evidence to reject them. Overall, the bias-corrected calibrated models of the Batsto and Maurice Rivers are good models, and GoF measures indicated better model performance in the Batsto. Assessment with FDCs, on the other hand, indicated better simulations of flow regime in the Maurice. Models in both basins do have important limitations, and the variation in GoF scores provided an indication of the strengths and weaknesses of the models. In particular, bias correction of stream flows was able identify areas where a model may not simulate physical processes well.
The primary weakness of the simulations lies in the performance of the model at both high and low p e levels, or low and high flows, respectively. The observed FDC for the Batsto River shows reduced variation at low flows and a sharper curve in the FDC at the highest p e level. This is most likely due to the strong groundwater component of the Batsto, especially at low flows [47]. The lognormal distribution fit to the data was not able to capture low flows at high p e , or the tail of the distribution, so the fitted FDC is more smoothly curved than the FDC from observed flows in Batsto. This is an indication that there may be a structural error in the model that does not correctly simulate the physical processes that dominate at low flows.
The PRMS simulations driven by observed meteorological data did not capture high flows, or low p e levels, as well either. Bias correction of model simulations improved performance at high flows for simulations, but again, this may indicate a problem with the PRMS model. The error in the PRMS model at high flows seems to be greater in the Maurice River than Batsto, and it may be due to some misrepresentation of runoff processes due to the greater area of mixed urban development in the Maurice watershed. Additionally, the relative performance of the models, both uncorrected and bias corrected, at the seasonal scales indicates the need to test bias correction of stream flows using seasonally based FDCs. This work is in progress by the author's research group.

Bias Correction of Stream Flows
Statistically downscaled GCM or GCM-driven stream flow simulations showed bias at both extremes where the FDC was not as well simulated as the middle of the FDCs (p e ≈ 0.5), which can be primarily attributed to the performance of the PRMS models. For uncorrected simulations, use of GCM data increased the PRMS model bias at both ends of the FDC, and correcting for model bias improved the representation of GCM-driven simulations over the historical period.
Direct bias correction of GCM-driven stream flow simulations with observed flows-which includes both PRMS model bias and any bias that can be attributed to the meteorological input data-was able to improve the overall fit of the FDC over model-corrected GCM-driven simulations in both basins. However, the direct bias correction of GCM stream flows seems to reduce the representation of the natural variation of simulated flows over the historic period while increasing variation over climate change projected stream flows. This would occur if the timing of, for instance, high flows in simulations did not coincide with with the timing of observed high flows. High flows tend to occur in summer in both basins. If high flows were simulated to occur in winter, then the mismatch in timing would lead to correcting for flows generated by different physical processes in the different seasons, e.g., rain on snow in winter versus a large tropical event in summer. Bias correction of stream flows without model calibration has been shown to reduce seasonal forecast error for forecasts that focused on the volume, but not timing, of flows [48]. Improper timing of bias corrections may improve the overall fit of the FDC, but would lead to poor climate change projections. Therefore, no GCM-driven simulations were bias corrected directly with observed flows, and bias correcting stream flows would need to include timing to ensure better results.

Climate Change Projections
Overall, the correction of model bias was effective for improving the fit of the FDCs, but projected changes in FDCs based on uncorrected and model-corrected GCM-driven simulations were not consistent in the two basins. In Maurice, projected changes in FDCs were almost identical for uncorrected and model-corrected GCM simulated stream flows. In Batsto, model-corrected GCM-driven simulations showed a greater change in flows at the median FDC level compared to uncorrected simulated stream flow and an even greater increase in flows at the 10% FDC level. It seems likely that the lack of difference in projected change in the FDCs between uncorrected and model-corrected stream flow simulations in Maurice indicates higher confidence in climate projections in this basin. The FDC was better simulated in Maurice than in Batsto; therefore, it is likely that correction was done with proper timing, as discussed above, which led to more consistent correction of flows and overall better model fit. Consistent with this interpretation, the analysis for simulations in Batsto provided a better understanding of the limitations of the model and allow for a more clear interpretation of results. Most importantly, it seems to be an indication of greater uncertainty for projections in the Batsto River compared with projections in Maurice.
Projected trends in both basins included an increase in flow variability (spread of the 10% and 90% FDCs), an increase in median flows through 2070 under both RCPs, and reduced changes in the median FDCs by 2099 under RCP 8.5. The former indicates an increase in the variation of flows across all probabilities of exceedance, in particular at high flows. Both rivers are projected to have greater high and low flows, and the projections indicate more variation at high flows in both basins. Model-corrected simulations had greater variation for both projected FDCs and for the baseline (historical) GCM simulations. However, some bias correction methods are known to lead to variance inflation [17], which could increase the uncertainty of simulated flows. Projected high flows were larger with a greater likelihood that annual maximum flows will be more extreme. It is important to note that it is well known that one of the weaknesses of GCMs is an inability to capture extremes [49,50]. While both the PRMS model bias and model-corrected GCM-driven bias in simulated stream flows indicated an underestimation of high flows and overestimation of low flows, the consistency of the error provides a level of confidence in these projections despite the bias. That is, based on these projections, it seems likely that extreme floods will be higher, by the end of this century compared to the end of the last, by 5 to 30% in Batsto and up to 15% in Maurice. However, more detailed analysis of projected changes in extremes, both high and low, is necessary to reduce the uncertainty in the interpretation of these projections.
There were important differences in projections in each basin that are clear based on the changes in the FDCs. For instance, in the Batsto River, climate change projections indicated an increase in mean flows under RCP 4.5-with slightly greater changes indicated by model-corrected compared to uncorrected projected stream flows (see above). Projected stream flows were not greater under the RCP 8.5 emission scenario than for RCP 4.5 in Batsto for 2041-2070. By contrast, variation in flows increased in the Maurice River under both RCPs; however, the overall mean flow was not projected to increase by much, and the median FDC indicated that flows across most probabilities of exceedance will be lower by the end of the century. In Batsto, the range of flow variation was projected to increase while the low flow regime returns to baseline by the end of the century for RCP 8.5.

Impacts of Watershed Characteristics
The Batsto basin lies within the protected Pinelands National Reserve in New Jersey, and there is a relatively small amount of agriculture in the basin (≈10% of the basin). Land use change and population growth will be minimal in this basin because of its location in the Reserve. The potential impacts of the projected changes in stream flow will primarily impact the ecology of the river. The results here indicate some change in the flow regime, but do not provide enough information to fully assess the potential ecological impacts. Such an assessment requires a more detailed analysis of the natural flow regime relevant to ecological characteristics of the river. Poff et al. [51] suggested that the natural flow regime can be defined using flow magnitudes, the frequency of occurrence, the duration of specified flow conditions, the timing and/or predictability of flows, and the rate of change or flashiness of flow. These indicators of the flow regime can be quantified using a number of measures including the autocorrelation structure of the time series, rising and descending limb density, peak distribution, low flows, and the FDC [52]. On there own, the FDCs analyzed in this paper represent an incomplete picture of the natural flow regime. The author's research group is continuing the current analysis to include ecologically relevant hydrologic indices that can describe most of the natural variation in flows, which is one of the key goals of hydro-ecological classification of rivers [53,54].
The upper Maurice basin is much more developed, and urbanization is expected to continue in the future as the population continues to increase. The climate change projections do not include potential changes to land use and the expected growth of urban area in the basin; therefore, the results must be interpreted in light of this limitation. For instance, the potential increase in high flows due to climate change is likely to be exacerbated by increased urbanization. Additionally, projected decreases in low flows have potential implications for water resources in Maurice since they signify a reduction in base-flow and thus groundwater recharge, and the majority of drinking water is obtained from groundwater sources in the watershed [18]. However, knowing the potential changes due to climate change can help water resource managers and engineers adapt water infrastructure systems and control that which is amenable to such control. The use of FDCs to show these potential changes could be a strong means to convey the impacts of climate change on water resources and guide such decisions.

Conclusions
Evaluation of GCM-driven stream flow simulations requires techniques that provide a measure of a regional or long-term flow regime. This entails the use of methods that are commonly used for hydrologic simulations in ungauged basins. This is particularly important for the use of models for climate change projections since future stream flows even in a gauged basin cannot be observed. A model that performs well over longer periods that include natural climate variation presumably can be used with confidence for climate change projections. Overall, PRMS simulations of both the Batsto and upper Maurice watersheds were able to simulate the FDC and the annual variation of FDCs within each basin. The application of a bias correction for model error on simulated stream flows improved the fit of the FDC over the historical period. While correcting for PRMS model bias seemed to increase the uncertainty FDC estimates representing the flow regime, bias correction can help identify potential model uncertainty and distinguish sources of uncertainty. Bias correction of stream flows indicated the presence of some potential structural errors in the model for the Batsto River that were not apparent by looking at GoF measures alone. The implementation of the model correction on GCM-driven simulated stream flows also indicated greater uncertainty in projected stream flows for Batsto in comparison with projections for Maurice.
The extended analysis of the FDCs for GCM-driven PRMS simulations indicated that climate model-driven hydrologic simulations can be used with some confidence to assess potential changes in FDCs under climate change. In both basins, the climate change projections can be assessed with higher confidence than those from Daraio [18]. Applications in water resources and hydrology that utilize flow duration curves can use projected changes to assess the potential impacts of climate change relatively easily, including a relatively clear assessment of uncertainty. The results leave open the question as to the possibility of using FDCs based on seasonal or monthly flows to bias correct GCM-driven simulations with observed data and apply corrections to climate change projections. The use of FDCs at these time scales may allow for the timing of flows to be included in the corrections, reduce uncertainty, and improve confidence in climate change projections of stream flows using GCM-driven hydrologic simulations. Acknowledgments: I would like to thank my graduate student Diana Sankar for her assistance in the literature review. The simulated stream flow used is available on HydroShare, http://www.hydroshare.org/resource/ e286de0f957941a2b3dc9c230ccdd9d4. The R source code used for analysis is available from the author upon request. I acknowledge the World Climate Research Programme's Working Group on Coupled Modelling, which is responsible for CMIP, and I thank the climate modeling groups (listed in Table 1 of this paper) for producing and making available their model output. For CMIP, the U.S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led the development of the software infrastructure in partnership with the Global Organization for Earth System Science Portals.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: