Use of WRF-Hydro over the Northeast of the US to Estimate Water Budget Tendencies in Small Watersheds

: In the Northeast of the US, climate change will bring a series of impacts on the terrestrial hydrology. Observations indicate that temperature has steadily increased during the last century, including changes in precipitation. This study implements the Weather Research and Forecasting (WRF)-Hydro framework with the Noah-Multiparameterization (Noah-MP) model that is currently used in the National Water Model to estimate the tendencies of the different variables that compounded the water budget in the Northeast of the US from 1980 to 2016. We use North American Land Data Assimilation System-2 (NLDAS-2) climate data as forcing, and we calibrated the model using 192 US Geological Survey (USGS) Geospatial Attributes of Gages for Evaluating Streamﬂow II (Gages II) reference stations. We study the tendencies determining the Kendall-Theil slope of streamﬂow using the maximum three-day average, seven-day minimum ﬂow, and the monotonic ﬁve-day mean times series. For the water budget, we determine the Kendall-Theil slope for changes in monthly values of precipitation, surface and subsurface runoff, evapotranspiration, transpiration, soil moisture, and snow accumulation. The results indicate that the changes in precipitation are not being distributed evenly in the components of the water budget. Precipitation is decreasing during winter and increasing during the summer, with the direct impacts being a decrease in snow accumulation and an increase in evapotranspiration. The soil tends to be drier, which does not translate to a rise in inﬁltration since the surface runoff aggregated tendencies are positive, and the underground runoff aggregated tendencies are negative. The effects of climate change on streamﬂows are buffered by larger areas, indicating that more attention needs to be given to small catchments to adapt to climate change. The objectives of this work are to evaluate in the performance of the Noah- Multiparameterization (Noah-MP) model within Weather Research and Forecasting (WRF)-Hydro framework in the Northeast of the US, to provide a consistent terrestrial hydrology dataset for 37 years, and to evaluate changes in the water cycle components at the terrestrial surface that are driving streamflow changes.


Background
In the Northeastern US, temperatures are expected to continue increasing [1] by about 1.4-6.7/0.8-7.8 • C for the winter/summer seasons, respectively, through the twenty-first century [2]. However, these changes are not new since temperatures have increased 1.1 • C in the period 1895-2011 in this region. Changes in temperature influences precipitation quantity and timing, as well as the phase of the precipitation, which translates to a shift in the partition between solid and liquid rainfall, decreasing/increasing solid/liquid precipitation [2,3], with an earlier winter-spring center volume date of 8.6 days (1.6 days/decade) for the period 1960-2014 [4]. Precipitation records also show that study area there was a positive trend in Q7. This may have been enhanced by increases in potential evapotranspiration and the multidecadal relationship between North Atlantic Oscillation (NAO) and Pacific North America (PNA) teleconnection patterns.
The objectives of this work are to evaluate in the performance of the Noah-Multiparameterization (Noah-MP) model within Weather Research and Forecasting (WRF)-Hydro framework in the Northeast of the US, to provide a consistent terrestrial hydrology dataset for 37 years, and to evaluate changes in the water cycle components at the terrestrial surface that are driving streamflow changes.

Study Area
The study area ( Figure 1) corresponds to the northeastern portion of the US, with the southern border extent at approximately 34.59 • latitude, and the western border at approximately −87.53 • longitude. More than sixty-four million people live in the Northeast of the US near to the east coast, one of the most developed areas in the world. According to the Third National Climate Assessment report [6], the built and natural environment has shown considerably vulnerability to climate hazards and other stresses, with aging infrastructure that has already suffered from climate events related to heat waves and coastal and riverine floods.
Water 2018, 10, x FOR PEER REVIEW 3 of 17 increases in potential evapotranspiration and the multidecadal relationship between North Atlantic Oscillation (NAO) and Pacific North America (PNA) teleconnection patterns. The objectives of this work are to evaluate in the performance of the Noah-Multiparameterization (Noah-MP) model within Weather Research and Forecasting (WRF)-Hydro framework in the Northeast of the US, to provide a consistent terrestrial hydrology dataset for 37 years, and to evaluate changes in the water cycle components at the terrestrial surface that are driving streamflow changes.

Study Area
The study area ( Figure 1) corresponds to the northeastern portion of the US, with the southern border extent at approximately 34.59° latitude, and the western border at approximately −87.53° longitude. More than sixty-four million people live in the Northeast of the US near to the east coast, one of the most developed areas in the world. According to the Third National Climate Assessment report [6], the built and natural environment has shown considerably vulnerability to climate hazards and other stresses, with aging infrastructure that has already suffered from climate events related to heat waves and coastal and riverine floods.

Methods and Data
We used the WRF-Hydro framework as our hydrological model to estimate changes in the terrestrial hydrology from 1980 to 2015 using settings similar to the experimental version

Methods and Data
We used the WRF-Hydro framework as our hydrological model to estimate changes in the terrestrial hydrology from 1980 to 2015 using settings similar to the experimental version implemented in the National Flood Interoperability Experiment [19][20][21]. We evaluated the performance of the model by comparing streamflow calculations to observations by the US Geological Survey (USGS). Moreover, we studied how precipitation gets distributed between the storage (snow accumulation, soil moisture, liquid and frozen water at the canopy) and the outputs of the system (evaporation, transpiration, surface and subsurface runoff), or Input = Accumulation + Outputs. The North American Land Data Assimilation System (NLDAS-2) climate forcing data was used to feed the hydrological model, and the river routing was done using the Routing Application for Parallel Computation of Discharge (RAPID) model. For the calibration of the model and basin selection for the water budget tendency study, we used the USGS Geospatial Attributes of Gages for Evaluating Streamflow II (Gages II) streamflow observations.

North American Land Data Assimilation System (NLDAS-2)
The North American Land Data Assimilation System (NLDAS-2) was used to force the WRF-Hydro model. NLDAS-2 has a resolution of 1/8th-degree grid spacing, hourly timing, and ranges from 1 January 1979 to present (accessed on 29 January 2018: https://ldas.gsfc.nasa.gov/ nldas/NLDAS2forcing.php). Of the 16 fields in each forcing file, Noah-MP requires nine fields: downward longwave and shortwave radiation (bias corrected), U/V 10-m wind components, 2-m air temperature, and specific humidity, surface pressure, and total precipitation. The non-precipitation forcing inputs were primarily derived from the National Centers for Environmental Prediction (NCEP) North American Regional Reanalysis (NARR) (accessed on 29 January 2018: http://www.emc.ncep. noaa.gov/mmb/rreanl/). Precipitation is a product of a temporal disaggregation of a gauge-only Climate Prediction Center (CPC) analysis of daily rainfall performed directly on the NLDAS grid, including an orographic adjustment based on the widely-applied Parameter elevation Regression on Independent Slopes Model (PRISM) climatology. We used an National Center for Atmospheric Research (NCAR) common language (NCL) code provided by the WRF-Hydro research group to further interpolate the forcing data to the hydrology model resolution of 3 km (accessed on 29 January 2018: https://ral.ucar.edu/projects/wrf_hydro/regridding-scripts).

WRF-Hydro Model
We used the Weather Research Forecast Hydro extension version 3 (WRF-Hydro V3) [22] to calculate runoff. The WRF-Hydro modeling system allows the coupling of hydrological model components to an atmospheric model. The WRF-Hydro configuration (Figure 2), at the highest level, can work in two modes: coupled and uncoupled to an atmospheric model. It then provides the architecture to select among different Land Surface Models (LSM), such as Noah and Noah with multi-parameterization (MP) (hereafter Noah-MP), to solve the energy and mass balance in one-dimensional vertical land surface parameterization. Noah LSM, which has a long history of development in coupled and uncoupled to climate models, is the base for Noah-MP, which allows the user to select among multiple physics option that enables further experimentation, or to generate ensemble results without altering the code [23]. In WRF-Hydro, on the other hand, the user can select from several physic options, such as surface overland flow, saturate subsurface flow, channel routing, and reservoir routing, that provide lateral water transport that allows leaving the one-dimensional column setting. We used the WRF-Hydro uncoupled from the WRF model configuration. For the LSM, we used the Noah-MP [23,24] model. In our case, we selected the physics indicated in Table 1 For the hydronamelist, which controls the horizontal movement of the water, we only kept active the bucket groundwater module with exponential parameters that we used for calibration [22].

National Hydrography Dataset Plus version 2 (NHDplusV2) and RAPID
Although WRF-Hydro V3 is capable of routing the water through a river network, it is computationally expensive. Therefore, once we had the runoff (surface and subsurface) from WRF-Hydro, we used the RAPID model to route the water through the National Hydrological Dataset river network [25]. Similar work is described in David et al. [25,26], Salas et al. [19], and Lin et al. [27]. The RAPID model, which is very efficient and able to run on a personal computer, uses the welldocumented Muskingham method, which relies on two coefficients that are used to represent the travel time and attenuation of flood waves, K, and X, respectively. The study area corresponds to the hydrology regions Northeast, Mid-Atlantic, the northern part of the South Atlantic region, and the eastern part of the Ohio region (Visited 2 January 2018: http://www.horizonsystems.com/NHDPlus/NHDPlusV2_data.php). RAPID needs two datasets: static and dynamic. The static dataset corresponds to the network topology described in a series of comma separated values (CSV) files that indicates how the streamlines in the river network are connected, as well as the X and

National Hydrography Dataset Plus Version 2 (NHDplusV2) and RAPID
Although WRF-Hydro V3 is capable of routing the water through a river network, it is computationally expensive. Therefore, once we had the runoff (surface and subsurface) from WRF-Hydro, we used the RAPID model to route the water through the National Hydrological Dataset river network [25]. Similar work is described in David et al. [25,26], Salas et al. [19], and Lin et al. [27]. The RAPID model, which is very efficient and able to run on a personal computer, uses the well-documented Muskingham method, which relies on two coefficients that are used to represent the travel time and attenuation of flood waves, K, and X, respectively. The study area corresponds to the hydrology regions Northeast, Mid-Atlantic, the northern part of the South Atlantic region, and the eastern part of the Ohio region (Visited 2 January 2018: http://www.horizon-systems.com/NHDPlus/NHDPlusV2_data.php). RAPID needs two datasets: static and dynamic. The static dataset corresponds to the network topology described in a series of comma separated values (CSV) files that indicates how the streamlines in the river network are connected, as well as the X and K parameters for each line [26]. In the domain study, we used 324,746 flowlines with a mean length of 1.589 km and a standard deviation of 1.45 km; therefore, we can calculate streamflow in 324,746 points. The second dataset corresponds to the runoff derived, in our case, from WRF-Hydro. The land surface model operates in a gridded domain, while RAPID runs in a lumped domain. To transform the data from one domain to the next, we used a simple weighting process that allowed us to calculate the contribution of runoff to each catchment unit. This process and the tools are described by Lin et al. [28] and Salas et al. [19]. We used 318,026 catchments, with an average area of 2.13 km 2 and a standard deviation of 3.07 km 2 .

Streamflow Data, Calibration, and Spin-Up
To evaluate our streamflow calculation, we used the Geospatial Attributes of Gages for Evaluating Streamflow, version II (Gages II). The U.S. Geological Survey (USGS) maintains this dataset that consists of at least 20 complete years of streamflow (not necessarily consecutive) from 1950 to 2009. Within the Gages II, there is a group of 742 gages denominated as a reference that can be used for hydroclimatic studies since they belong to watersheds with minimal disturbance, having less than 5 percent imperviousness as measured from the National Land Cover Database (NLCD) 2006. (Visited 2 January 2018: https://water.usgs.gov/GIS/metadata/usgswrd/XML/gagesII_Sept2011.xml). From this national dataset, in the area of study, there are 1717 USGS streamflow Gages II, from this number 402/1315 correspond to reference/no-reference watersheds, respectively. Since our analysis runs from 1979 to 2015, we selected 1093 Gages II that have at least 15 years of data for the period of our work. From these 1093 gages, 191 are reference gages, and 902 are non-reference gages.
The spin-up process was carried out following Cai et al. [29] for initializing Noah-MP by running the model 50 times through the entire year of 1979. WRF-hydro, compared to other hydrology models, is computationally expensive; therefore, the number of runs to calibrate the model were limited. Additionally, manual calibration was not desirable because of the number of parameters that need to be adjusted in the model made the manual procedure impractical [30,31]. For these reasons, we calibrated the model using just one year of results from 1980. We ran the model 500 times using a Monte Carlo Markov Chain data parameter estimation as part of the Statistical Parameter Optimization Tool in Python l (SPOTPY), which is a ready-made python package [32], similar to the Model-Independent Parameter Estimation (PETS) that was used in Senatore et al. [33] to calibrate WRF-Hydro. The procedure followed is identical to previous work by References [30,31,33]. The variables used in this calibration were infiltration factor (REFTKD) between 0.5-5 [34], and SLOPE, which is a coefficient between 0.1-1.0 that modifies the drainage out of the bottom of the last soil layer. A larger surface slope implies more considerable drainage [35], saturated soil hydraulic conductivity (DKSAT) [35], and an exponential coefficient within the bucket model within WRF-Hydro/Noah-MP. For the routing using the RAPID model, we included in the calibration the K and X parameters. To reduce the influence of false and unusually extreme values, we smoothed the streamflow using a 5-day rolling average [16,36,37]. In the calibration process, we compared the streamflow results simultaneously at 191 USGS stream Gages II reference locations, minimizing the square distance sum between the Kling-Gupta efficiency (KGE) (Equation (1)) [38] and the perfect value of 1. where: There are multiple indicators to evaluate the quality of the results in the calibration/evaluation period, depending on the application of the results (i.e., calculation of high or low flows) [39]. Nash Sutcliffe efficiency (NSE) (Equation (2)), percentage bias (PCbias) (Equation (3)), and the ratio of the root mean square error to the standard deviation of measured data (RSR) (Equation (4)) are the most used. Moriasi et al. [40] concluded that, in general, models with monthly NSE > 0.5, RSR < 0.70, and PBIAS ± 25% represent streamflow satisfactorily.

Changes in Streamflow
Multiple methodologies are available to determine changes in streamflow, i.e., the center of mass that indicates the timing for when half of the water passes, the time of the peak flow, 3-day peak flow for large events analysis [37,[41][42][43] or 7-day low flow to characterize water quality, and ecosystems impacts [37,44,45]. A more robust technic that can apply to different hydrological regimes is presented in Reference [36]. This method assessed monotonic trends in time series of 5-day means. Using a 5-day period allows users to obtain similar hydrological responses in both small and large watersheds [36], and to remove the effect of short-term anomalous oscillations [37,46]. This approach reveals runoff timing changes throughout the entire year, whereas other methods focus on a single annual event, or a specific month or season [36]. In this work, we used three metrics, i.e., monthly 3-day peak flow (Q3), monthly 7-day low flow (Q7), and monotonic 5-day means (5dQ) trends, to determine the effects of climate change in streamflows. The first two metrics are straightforward: For the Q3, we used a 3-day moving average and selected the maximum value for each year; for the Q7, we used a 7-day moving average and selected the minimum value for each year. For the 5dQ, we normalized streamflow by subtracting their annual mean and dividing by their standard deviation; then we calculated 73 5dQ each year, and for each of the 73 time series, the slope of the Kendall-Theil robust line [36].

Water Budget
The water budget is the relationship between the water inputs, outputs, and accumulation in a hydrological system. Understanding the tendencies in water budget provide information on how changes in temperature and precipitation change the timing of accumulation, evaporation, and streamflow in a specific location. Several studies have focused on the changes on snow/precipitation partition [3], changes in the timing of streamflow [37], changes on snowpack [15], among others.
We calculated the tendencies in the water budget on an annual basis, as well as on a monthly basis, to understand the redistribution of water outputs and accumulation within the year. Precipitation = Output + Accumulation where: Precipitation = solid + liquid rainfall Output = evaporation + transpiration + surface + subsurface runoff Accumulation = snow accumulation + soil moisture + liquid and frozen water at the canopy.

Results
We ran WRF-Hydro from 1979 to 2015 with three goals in mind: (1) to provide a consistent hydrology model dataset for 37 years across the area of study; (2) to evaluate in the long term the performance of the model in this region; and (3) to learn about the tendencies in the water budget component during the last four decades in the Northeast of the US.

Performance of the Model
After running the model 50 times to spin-up the system, we used the restart files to calibrate WRF-Hydro and RAPID for the year 1980. We compared the daily streamflow time series for 191 USGS Gages II references for the area of study. The calibration results indicated that the model results obtained NSE > 0.5 for 136 stations and NSE > 0.4 for 160 stations. One hundred forty-six stations had a |PBias| < 25%, and 96 had a |PBias| < 15%. For RSR, 133 stations had an RSR < 0.7 ( Figure 3).
In Figure 5, we also show the performance of the model for monthly values for the period 1980 to 2015, using the nomenclature of Reference [40]. For NSE, 30 stations were very good, 77 stations were good, 55 were satisfactory, and 27 stations were unsatisfactory. For RSR, 30 stations were very good, 83 stations were good, 50 were satisfactory, and 26 stations were unsatisfactory. Finally, for PBias, 61 stations were very good, 34 stations were good, 61 were satisfactory, and 33 stations were unsatisfactory. performance of the model in this region; and (3) to learn about the tendencies in the water budget component during the last four decades in the Northeast of the US.

4.1.Performance of the Model
After running the model 50 times to spin-up the system, we used the restart files to calibrate WRF-Hydro and RAPID for the year 1980. We compared the daily streamflow time series for 191 USGS Gages II references for the area of study. The calibration results indicated that the model results obtained NSE > 0.5 for 136 stations and NSE > 0.4 for 160 stations. One hundred forty-six stations had a |PBias| < 25%, and 96 had a |PBias| < 15%. For RSR, 133 stations had an RSR < 0.7 (Figure 3).  In Figure 5, we also show the performance of the model for monthly values for the period 1980 to 2015, using the nomenclature of Reference [40]. For NSE, 30 stations were very good, 77 stations were good, 55 were satisfactory, and 27 stations were unsatisfactory. For RSR, 30 stations were very good, 83 stations were good, 50 were satisfactory, and 26 stations were unsatisfactory. Finally, for PBias, 61 stations were very good, 34 stations were good, 61 were satisfactory, and 33 stations were unsatisfactory. obtained NSE > 0.5 for 136 stations and NSE > 0.4 for 160 stations. One hundred forty-six stations had a |PBias| < 25%, and 96 had a |PBias| < 15%. For RSR, 133 stations had an RSR < 0.7 (Figure 3).  In Figure 5, we also show the performance of the model for monthly values for the period 1980 to 2015, using the nomenclature of Reference [40]. For NSE, 30 stations were very good, 77 stations were good, 55 were satisfactory, and 27 stations were unsatisfactory. For RSR, 30 stations were very good, 83 stations were good, 50 were satisfactory, and 26 stations were unsatisfactory. Finally, for PBias, 61 stations were very good, 34 stations were good, 61 were satisfactory, and 33 stations were unsatisfactory.  Table 4 from Reference [40].

Changes in Streamflow and Water Budget Components
Considering the extent of our model, we increased the number of basins in which we did our analysis by comparing our results to 889 USGS Gage II references and non-references. From these 889 points, we selected the 242 points where our model's results have a KGE above 0.65 for 35+ year time series; Figure 6 shows their geographical distribution. We estimated tendencies for Q3, Q7, and 5 , and the water budget component (input, output, and accumulation) tendency. For the streamflow, we aggregated the result in three categories by area and three categories by latitude. With the former, we wanted to know if larger watersheds can reduce the effects of climate change, while the latter was  Table 4 from Reference [40].

Changes in Streamflow and Water Budget Components
Considering the extent of our model, we increased the number of basins in which we did our analysis by comparing our results to 889 USGS Gage II references and non-references. From these 889 points, we selected the 242 points where our model's results have a KGE above 0.65 for 35+ year time  Figure 6 shows their geographical distribution. We estimated tendencies for Q3, Q7, and 5dQ, and the water budget component (input, output, and accumulation) tendency. For the streamflow, we aggregated the result in three categories by area and three categories by latitude. With the former, we wanted to know if larger watersheds can reduce the effects of climate change, while the latter was designed to estimate if there is a south-north tendency gradient in the streamflow. We did this for the model results, as well as for the USGS observations, to show the ability of the model to capture such trends. Figure 5. Nash Sutcliffe efficiency, root mean square error (RMSE) -observations standard deviation ratio, and percentage bias monthly streamflow for the period of 1980-2015, according to Table 4 from Reference [40].

Changes in Streamflow and Water Budget Components
Considering the extent of our model, we increased the number of basins in which we did our analysis by comparing our results to 889 USGS Gage II references and non-references. From these 889 points, we selected the 242 points where our model's results have a KGE above 0.65 for 35+ year time series; Figure 6 shows their geographical distribution. We estimated tendencies for Q3, Q7, and 5 , and the water budget component (input, output, and accumulation) tendency. For the streamflow, we aggregated the result in three categories by area and three categories by latitude. With the former, we wanted to know if larger watersheds can reduce the effects of climate change, while the latter was designed to estimate if there is a south-north tendency gradient in the streamflow. We did this for the model results, as well as for the USGS observations, to show the ability of the model to capture such trends.

Three-Day Peak Flow (Q3)
The model overestimated the tendencies for Q3 ( Figure 7); however, it could capture the general behavior of the changes. When we separated the results by latitude, the Q3 behaved similarly from south to north; however, when we compared by area, we could see that we had a result similar to

Three-Day Peak Flow (Q3)
The model overestimated the tendencies for Q3 ( Figure 7); however, it could capture the general behavior of the changes. When we separated the results by latitude, the Q3 behaved similarly from south to north; however, when we compared by area, we could see that we had a result similar to Chezik et al. [16] since larger surfaces can buffer or reduce the effects of climate variability and do not show extreme changes (on average) compared to small catchments. The Q3 flows increased more during the late summer and early autumn for basins with areas lower than 250 km 2 . Chezik et al. [16] since larger surfaces can buffer or reduce the effects of climate variability and do not show extreme changes (on average) compared to small catchments. The Q3 flows increased more during the late summer and early autumn for basins with areas lower than 250 km 2 .

Seven-Day Low Flow (Q7)
Trends in the observed and modelled results indicate there were more substantial changes in the Q7 for winter, especially for small catchments (Figure 8). We observed that larger basins had lower trends; although, in general, the trends indicated that, on average, there was an increase in low flows, with a favorable north-south gradient that coincided with the annual Q7 tendencies reported by Kam and Sheffield [18]. April was the only month that presented negative trends, which were captured in the model results. However, the model also estimated negative trends for February in middle and

Seven-Day Low Flow (Q7)
Trends in the observed and modelled results indicate there were more substantial changes in the Q7 for winter, especially for small catchments (Figure 8). We observed that larger basins had lower trends; although, in general, the trends indicated that, on average, there was an increase in low flows, with a favorable north-south gradient that coincided with the annual Q7 tendencies reported by Kam and Sheffield [18]. April was the only month that presented negative trends, which were captured in the model results. However, the model also estimated negative trends for February in middle and southern latitudes, which did not match the observations.

Seven-Day Low Flow (Q7)
Trends in the observed and modelled results indicate there were more substantial changes in the Q7 for winter, especially for small catchments (Figure 8). We observed that larger basins had lower trends; although, in general, the trends indicated that, on average, there was an increase in low flows, with a favorable north-south gradient that coincided with the annual Q7 tendencies reported by Kam and Sheffield [18]. April was the only month that presented negative trends, which were captured in the model results. However, the model also estimated negative trends for February in middle and southern latitudes, which did not match the observations.

Five-Day Means (5dQ)
The model compared very well with the USGS observations for the 5dQ (Figure 9). We identified two major regions of change in streamflow: The first occurs during the months of November, December, and January, when streamflow volumes have increased, with an abrupt drop around February. These increases were larger in the northern basins, with the opposite happening during the February drop. We then identified that smaller basins had larger fluctuations in trends during the spring which were attenuated as the basin areas increased. Finally, basins located further north had more available water during the summer, but in general, the analysis indicated that more water was available through the year, except for the middle of the spring. The model also showed that there was a negative trend around April for middle and northern latitudes, which was not detected in the observations; however, this seemed to be due to a large negative trend that the NLDAS-2 precipitation had in the same month (Figure 10a). during the spring which were attenuated as the basin areas increased. Finally, basins located further north had more available water during the summer, but in general, the analysis indicated that more water was available through the year, except for the middle of the spring. The model also showed that there was a negative trend around April for middle and northern latitudes, which was not detected in the observations; however, this seemed to be due to a large negative trend that the NLDAS-2 precipitation had in the same month (Figure 10a).

Water Budget
WRF-Hydro results were cumulative for the component of the water budget. Therefore, to obtain changes in a month, we had to subtract from the last day of the month when analyzing the results for the last day of the previous month. The water budget calculation used is as follows: the balance inputs = outputs + accumulation. The input in this study was precipitation (liquid and solid) corresponding to NLDAS-2 total precipitation, which originally drove the hydrological model (Figure 10a). The outputs were evapotranspiration (ET) (Figure 10e), surface runoff (Figure 10f), underground runoff (Figure 10b), and bare soil evaporation (Figure 10c). Finally, the accumulation was represented by canopy water content (liquid and frozen), which was not shown here because was very small, soil moisture (Figure 10g), snow water equivalent (SWE) monthly accumulations (Figure 10d), and SWE total accumulation (Figure 10h). When we calculated the annual water balance, the accumulation tended toward zero, which was expected and desired from the model. Precipitation tendencies were mixed, with positive larger tendencies occurring during the summer and fall, except for November, and negative tendencies occurring during the middle of the winter to the middle of the spring, showing a clear change in the distribution of the precipitation toward the warmer seasons. Because of this, more water was available for evapotranspiration for the

Water Budget
WRF-Hydro results were cumulative for the component of the water budget. Therefore, to obtain changes in a month, we had to subtract from the last day of the month when analyzing the results for the last day of the previous month. The water budget calculation used is as follows: the balance inputs = outputs + accumulation. The input in this study was precipitation (liquid and solid) corresponding to NLDAS-2 total precipitation, which originally drove the hydrological model (Figure 10a). The outputs were evapotranspiration (ET) (Figure 10e), surface runoff (Figure 10f), underground runoff (Figure 10b), and bare soil evaporation (Figure 10c). Finally, the accumulation was represented by canopy water content (liquid and frozen), which was not shown here because was very small, soil moisture (Figure 10g), snow water equivalent (SWE) monthly accumulations (Figure 10d), and SWE total accumulation (Figure 10h). When we calculated the annual water balance, the accumulation tended toward zero, which was expected and desired from the model. Precipitation tendencies were mixed, with positive larger tendencies occurring during the summer and fall, except for November, and negative tendencies occurring during the middle of the winter to the middle of the spring, showing a clear change in the distribution of the precipitation toward the warmer seasons. Because of this, more water was available for evapotranspiration for the warmer months, which showed that there was a positive trend in most of the catchments to produce more evapotranspiration. For the runoff, there was a change in the fraction of runoff that went underground or stayed at the surface. Underground runoff decreased almost the entire year, except during January when less snow was accumulated and there was probably more water available to infiltrate. On the other hand, surface runoff increased, except for the early spring in mid-latitudes, which is explained by a reduction in precipitation, but the tendencies suggested that the increase in precipitation during the warm months went mostly to surface runoff. This change was also suggested by the tendencies in soil moisture that decreased for most of the year, which was a consequence of larger canopy evapotranspiration and soil evaporation and lower infiltration. For the SWE, we present two plots in Figure 10 (bottom row), the plot at the left showing the tendency for the total accumulated snow, which decreases for the entire season. The plot at the right shows the tendencies in the monthly accumulation for what we estimate to be the changes in snow accumulation during the month. The months where the trends are negative are straightforward for interpretation as a reduction in the snow accumulated during that month. However, in the months with positive trends, there is a mix of processes that explain the trend. First, the snow falls also decreased (less total precipitation and higher temperatures); however, what made it positive was that in those months when the snow started melting, there was less snow to melt due to the accumulation deficit from the previous month. Therefore, the normal negative changes in accumulation were reduced. present two plots in Figure 10 (bottom row), the plot at the left showing the tendency for the total accumulated snow, which decreases for the entire season. The plot at the right shows the tendencies in the monthly accumulation for what we estimate to be the changes in snow accumulation during the month. The months where the trends are negative are straightforward for interpretation as a reduction in the snow accumulated during that month. However, in the months with positive trends, there is a mix of processes that explain the trend. First, the snow falls also decreased (less total precipitation and higher temperatures); however, what made it positive was that in those months when the snow started melting, there was less snow to melt due to the accumulation deficit from the previous month. Therefore, the normal negative changes in accumulation were reduced.

Discussion
(a)

Model Performance
The model has good results across the domain of study, with better statistic scores for monthly streamflow than smoothed daily streamflow. The model performance, according to traditional standards [40], for 150 out of 191 stations is considered at least satisfactory for the validation period for smoothed daily values, and 160 stations are at least satisfactory for monthly results. However, when it comes to determining streamflow trends, the traditional performance evaluation criteria do not ensure that the model will capture well the slope of the trends. The dispersions around the perfect line in Figure 11 are not described by NSE, PCBias, correlation, or KGE, which are also dependent on the extent of the dataset, in our case 36 years. Therefore, there is a need to come up with a calibration metric that includes the estimation of the trends which are a common indicator to describe climate change. For example, in Figure 11, we plot the results filtered by KGE. The left graph shows the tendency values for KGE > 0.65, the middle KGE > 0.5, and the right KGE > 0.8. However, the correlation between the tendency for observation and the model are 0.66, 0.6, and 0.6, which indicates that having a better KGE does not imply that we will have a better estimation of the trend. What it is consistent is that positive larger trends are more often statistically significant (p < 0.05) than trends near zero, which are the majority of the trends. line in Figure 11 are not described by NSE, PCBias, correlation, or KGE, which are also dependent on the extent of the dataset, in our case 36 years. Therefore, there is a need to come up with a calibration metric that includes the estimation of the trends which are a common indicator to describe climate change. For example, in Figure 11, we plot the results filtered by KGE. The left graph shows the tendency values for KGE >0.65, the middle KGE >0.5, and the right KGE >0.8. However, the correlation between the tendency for observation and the model are 0.66, 0.6, and 0.6, which indicates that having a better KGE does not imply that we will have a better estimation of the trend. What it is consistent is that positive larger trends are more often statistically significant (p < 0.05) than trends near zero, which are the majority of the trends.

Streamflow Tendency
To evaluate the tendencies of streamflow, we use three metrics: monthly three-day peak flow (Q3), monthly seven-day low flow (Q7), and five-day means (5 ). Q3 is a good indicator for peak flow and therefore useful for flood assessments; Q7 is a good indicator of low flows tendencies that can be useful in water management and ecological applications; and 5 is a good indicator of the changes, shifting in time and amplitude, of streamflow during a year. The three indicators increase during the winter months, which is a consequence of less solid precipitation, despite total precipitation increasing for that period. Our model results capture well the direction of changes during the last 40 years; however, it has a positive bias in the trends calculated for the summer Q3. Q3 also increases for the whole region in the month of October, which may be a consequence of tropical storms moving further north as a consequence of climate change during that period. The Northeast has experienced serious catastrophes in the late summer to early fall, with tropical storms and even hurricanes crossing the region. Small basins also experience an increase in high flows at the beginning of spring. Q7, on the other hand, has slightly increased, with a more pronounced augmentation in the winter. Q7 also has a major increase in March for small basins, which is not the case for larger basing, which suggests that, in the Northeast, larger basins can circumvent the effects of short term rainfall associated with climate change, which is similar to what Reference [16] found on the West Coast. For 5 , the most significant positive trends are found in the winter, which can be observed in a reconstruction of the hydrographs for the years 1981 and 2015 ( Figure S1). We use the Kendall-Theil slope and intercept to create the hydrographs and see what the differences are in streamflow between 1981 and 2015. In the hydrographs for 1981, we can clearly see that after mid-

Streamflow Tendency
To evaluate the tendencies of streamflow, we use three metrics: monthly three-day peak flow (Q3), monthly seven-day low flow (Q7), and five-day means (5dQ). Q3 is a good indicator for peak flow and therefore useful for flood assessments; Q7 is a good indicator of low flows tendencies that can be useful in water management and ecological applications; and 5dQ is a good indicator of the changes, shifting in time and amplitude, of streamflow during a year. The three indicators increase during the winter months, which is a consequence of less solid precipitation, despite total precipitation increasing for that period. Our model results capture well the direction of changes during the last 40 years; however, it has a positive bias in the trends calculated for the summer Q3. Q3 also increases for the whole region in the month of October, which may be a consequence of tropical storms moving further north as a consequence of climate change during that period. The Northeast has experienced serious catastrophes in the late summer to early fall, with tropical storms and even hurricanes crossing the region. Small basins also experience an increase in high flows at the beginning of spring. Q7, on the other hand, has slightly increased, with a more pronounced augmentation in the winter. Q7 also has a major increase in March for small basins, which is not the case for larger basing, which suggests that, in the Northeast, larger basins can circumvent the effects of short term rainfall associated with climate change, which is similar to what Reference [16] found on the West Coast. For 5dQ, the most significant positive trends are found in the winter, which can be observed in a reconstruction of the hydrographs for the years 1981 and 2015 ( Figure S1). We use the Kendall-Theil slope and intercept to create the hydrographs and see what the differences are in streamflow between 1981 and 2015. In the hydrographs for 1981, we can clearly see that after mid-November, the streamflow decreases, which is due to the fact that, at the time, the solid precipitation fraction was larger; however, if we plot what would happen 34 years later, following the Kendall-Theil slope, the streamflow continues to increase until January, and then drops dramatically in February. Also, for 5dQ, abrupt oscillations in tendencies decrease as the basins get larger.

Water Budget Tendencies
NLDAS-2 precipitation indicates that there is a positive tendency, which means that there is more water input in the Northeast region of the US. However, the extra water entering the system is not being distributed evenly among the output and accumulation component of the water budget. Streamflow volumes are increasing for most of the year, except for the end of winter, which is a consequence of less snow accumulation during the winter, and in November, which is the result of a decrease in precipitation in late fall. Therefore, in general, there is more water running in the rivers; however, the partition between underground runoff and surface runoff is altered. It seems that underground flow tends to decrease while surface runoff increases for most of the year. This translates to faster responses from the basins, which means that the concentration-time of the water reduces. This is the result of more intense rain that does not have enough time to infiltrate, which is suggested by the fact that soil moisture also decreases for most of the year. Hence, the soils can hold more water. The decrease in underground runoff is also enhanced by more canopy evapotranspiration and soil evaporation due to the increase in temperatures, which is more dramatic in southern latitudes. SWE volume decreases, while the accumulation season gets shorter. This translates into a decrease in spring runoff in the north portion of the study area where basins rely on snow accumulation to supply fresh water in spring.
In the construction of WRF-Hydro model, we did not include changes in land cover, which could be significant in 40 years of natural [47] and anthropogenic disturbances. Therefore, the trends reported here are only based on changes in climate from 1979 to 2016.

Conclusions
In this study, we present the application of the WRF-Hydro framework to estimate changes in the water budget for the last 40 years in the Northeast of the US. The model, when compared to streamflow, performs at least satisfactorily for most of the region, capturing aggregated tendencies and comparing well when we use traditional criteria to compare model and observations. Changes in climate during the last 40 years has brought a redistribution of the water from precipitation within the different components of the water budget at the surface in the Northeast of the US. High, low and medium flow tendencies are mostly positive. Therefore, there is more water available. However, these changes are not evenly distributed during the year, nor between basins of different latitudes. More water is available during winter, but less during the spring. This is a consequence of the decrease in snow accumulation due to temperature increases in the north, and a decrease in precipitation in the south during the second half of the winter.
Although soils are becoming drier, infiltration is decreasing, so the surface runoff is increasing, and therefore, the time of responses of the watersheds are decreasing. Evapotranspiration is also increasing, which will produce more losses from the soil moisture, and less water in the subsurface is available for subsurface runoff or depth infiltration, affecting the storage during the year.
Finally, the effects of climate change in high and low flows seem to be smaller in larger basins. Therefore, efforts to mitigate the impacts of extreme events should be emphasized in smaller watersheds. Funding: The project described in this publication was supported by Grant or Cooperative Agreement No. G12AC00001 from the United States Geological Survey. Its contents are solely the responsibility of the authors and do not necessarily represent the views of the Northeast Climate Adaptation Science Center (NECASC) or the USGS. This manuscript is submitted for publication with the understanding that the United States Government is authorized to reproduce and distribute reprints for Governmental purposes.