Watershed-Scale, Probabilistic Risk Assessment of Water Resources Impacts from Climate Change

A framework for the assessment of relative risk to watershed-scale water resources from systemic changes is presented. It is composed of two experiments, or pathways, within a Monte Carlo structure and provides quantification of prediction uncertainty. One simulation pathway is the no change, or null hypothesis, experiment, and the other provides simulation of the hypothesized system change. Each pathway uses a stochastic weather generator and a deterministic water balance model. For climate change impact analysis, the framework is calibrated so that the differences between thirty-year average precipitation and temperature pathway values reproduce climate trends. Simulated weather provides forcing for identical water balance models. Probabilistic time histories of differences in actual evapotranspiration, runoff, and recharge provide likelihood per magnitude change to water resources availability. The framework is applied to a semi-arid watershed in Texas. Projected climate trends for the site are a 3 °C increase in average temperature and corresponding increase in potential evapotranspiration, no significant change in average annual precipitation, and a semi-arid classification from 2011–2100. Two types of water balance model are used in separate applications: (1) monthly water balance and (2) daily distributed parameter. Both implementations predict no significant change, on average, to actual evapotranspiration, runoff, or recharge from 2011–2100 because precipitation is unchanged on average. Increases in extreme event intensity are represented for future conditions producing increased water availability during infrequent events.


Introduction
Climate is the weather of a place averaged over an interval [1]. Weather includes the daily events that occur in the atmosphere, and it changes across a much shorter period like minutes to weeks [2]. Three-decade averages of weather measures, called Climate Normals, provide a place-and period-specific climate description [3].
Projections of future climate have heightened importance because of concern with anthropomorphic drivers pushing climate variations beyond the bounds of historical observations. Across the planet, temperatures are expected to increase by 1.5 • C between 2030 and 2052 relative to pre-industrial levels. Changes to precipitation are projected to be region dependent with increased precipitation intensity in some areas and increased probability of drought in several regions [4].
These future climate projections are based on the global climate model (GCM) simulation results. GCMs are coupled atmosphere-ocean, general circulation models that simulate global weather with a sub-daily time step. GCMs simulate future weather to generate a description of future climate. GCM results have provided climate change projections for analysis of water resources impacts in diverse geographic regions across varying temporal and spatial scales [5][6][7][8][9][10][11][12][13].
GCM-simulated weather can be used for analysis of future water resources because weather conditions drive the availability of water resources. Precipitation (P) is the water source, and evapotranspiration (ET) is the primary subtraction of water. ET includes water lost from the leaves of plants, or transpiration, and water that evaporates from the soil surface, puddles, ponds, lakes, and rivers [14]. It is dependent on vegetation, soil type, and weather; ET tends to increase with increasing temperature.
The first-order expectation is that increases in precipitation amount and intensity will increase the amount of water in the water budget and increases in temperature will remove more water from the water budget through increased ET. However, decadal projections of climate are insufficient to predict impacts to water resources. If precipitation intensity increases during colder months when ET is lower, water availability could increase even though average precipitation is constant and average temperature increases by several degrees.
Water availability is defined as P − ET. Only when P exceeds ET (P − ET > 0) is water available to contribute to runoff and recharge. To predict the future impacts to water availability, a water balance calculation is used with projected climate to capture the interplay of precipitation and temperature forcing and the influence of vegetation, land cover, and soils on the water budget. Water balance models provide one form of water budget calculation and are procedural models that estimate the balance between incoming water to the watershed from precipitation and outflowing water from the watershed related to the processes of ET, stream flow, and groundwater recharge [14,15].
Future weather, climate, and water availability are uncertain. Unknown details of future greenhouse gas emissions, incomplete understanding of some processes that control the evolution of climate and limited physical process representation due to computational constraints generate weather and climate uncertainty [16]. Because future water availability is determined by future weather, climate uncertainty is propagated through deterministic water balance calculations to produce a range of possible water resources futures.
Uncertainty in future simulation results can be partitioned into scenario, response, and structural components. Scenario uncertainty is the lack of knowledge about what future levels of boundary forcing should drive model simulations. Response, or epistemic, uncertainty is lack of knowledge about how the current and future system will respond to a particular scenario or estimated set of future forcing conditions. Structural uncertainty is lack of knowledge of the optimal form of equations describing complex physics and numerical solution methods [16].
Ensemble modeling approaches are employed to describe and quantify future result uncertainty. An ensemble is a collection of simulation results from multiple models and perhaps multiple model versions that include different physical forcing like different future emissions scenarios in GCMs [16,17]. The goal of an ensemble is to capture a range of futures that brackets the possible within a cone of uncertainty. In analysis of ensembles, prediction of the same result by multiple models is often assumed to confirm an increased likelihood for that future, and the likelihood for a future is calculated as the proportion of results that estimate a future [16].
Ensemble GCM results have been used in conjunction with various types of water balance calculations to predict impacts to water availability at global, regional, and watershed scales [18][19][20][21][22][23]. These studies identify water resources impacts through comparison of water budgets simulated using historical weather and ensembles of GCM results composed of less than 100 members. In the comparison, each GCM ensemble member, a model and forcing scenario combination, drives a future water budget calculation to produce a result ensemble composed of the same number of future realizations as GCM ensemble members. Uncertainty is characterized by the spread of water budget results in the ensemble, and climate uncertainty generates most of the uncertainty in future water budget results. GCM simulated weather is directly used as water budget calculation inputs, which discretizes a continuous future climate cone of uncertainty into a limited number of weather-forcing time series.
It is not expected that GCM simulations will accurately predict the weather for a particular day or month several years in the future. The goal of GCM simulations is to project future Climate Normals pertaining to a hypothesized trend in future greenhouse Water 2021, 13, 40 3 of 40 gas emissions. Consequently, improved quantification and description of future water resources uncertainty can be obtained using a continuous representation of the future climate cone of uncertainty generated from an ensemble of thousands of realizations of future weather. A more complete representation of future climate uncertainty will propagate through deterministic water balance calculations to generate a more robust depiction of future water resources uncertainty.
A probabilistic simulation framework, or algorithm, is presented and implemented to isolate watershed-scale impacts to water resources from systemic changes to the terrestrial water cycle. Impacts to water availability from climate change are analyzed for a small (less than 519 km 2 or 200 mi 2 ) watershed. Within the framework, two weather generators create weather forcing for independent water balance models. A weather generator is a statistical model of daily weather sequences designed to reproduce statistical properties of observed weather. One weather generator portrays historical observations, and one represents future climate from ensemble GCM results. Comparison of water budgets calculated using historical weather with those determined from GCM-derived future climate provide impacts to water resources from climate change.
Actionable outputs are probabilistic time histories of differences in water availability, split into recharge and runoff components, between historical and future conditions. The combination of projected changes with an associated likelihood provides a risk assessment, planning, and forecasting resource for water resources managers. Framework-based analysis of risk includes probability and consequences. Consequences are changes to water availability, and likelihoods for each consequence are calculated as part of the representation.
Key strengths of this framework implementation relative to previous work are generated through using two weather generators in place of historical observations and ensemble GCM results. The comparative weather generator formulation provides for: (1) Thousands of realizations of the system rather than just one realization for each ensemble member which provides detailed depiction of the future climate cone of uncertainty and results in enhanced projection of the future water availability cone of uncertainty; (2) explicit isolation of water availability impacts from climate change through probabilistic comparison of two independent and identical water balance models with different weather forcing; and (3) comparative statistical analysis of historical observations and ensemble GCM results that allows identification of structural uncertainty issues in downscaled GCM results and elimination of these issues from framework results.
In the case study implementation, identified climate trends are a 3 • C increase in average temperature with a corresponding increase in potential evapotranspiration, no significant change in average annual precipitation, and a semi-arid classification from 2011-2100. Structural uncertainty issues of synthetic drizzle and underprediction of extreme event intensity were discovered in downscaled, GCM results and eliminated from the weather generator representation. In terms of water availability, no significant change, on average, is projected for actual evapotranspiration, runoff, or recharge from 2011-2100 because precipitation is unchanged on average. However, increases in extreme event intensity are represented for future conditions resulting in estimates for increased water availability and ET during infrequent events.

Methods and Data
Methods include the implementation of weather generator and water balance models within the framework and techniques used for result presentation. The impact assessment framework is applied to a site in west-central Texas. Data employed in the study include physical characteristics, historical weather observations, and GCM simulated, future weather for the study site (Additional details concerning methods and the study site are available in Appendix A).
framework is applied to a site in west-central Texas. Data employed in the study include physical characteristics, historical weather observations, and GCM simulated, future weather for the study site (Additional details concerning methods and the study site are available in Appendices A and B).

Study Site Data
The 471 km 2 Dolan Creek Watershed in Val Verde County, TX, USA was selected as a study site for framework application, see Figures 1 and S1. Dolan Creek is the primary stream in the watershed and provides the surface outlet. It is ephemeral with the exception of the downstream-most reach where a number of springs, including Dolan Springs, provide flow to the stream throughout the year (see Figures S1 and S2). USGS Gauge 08449100 is located on Dolan Creek near the watershed outlet and the confluence of Dolan Creek and Devils River. This stream gauging station has been in operation since November 2011. A rain gauge is collocated with the stream gauge, was installed at the same time, and is the only rain gauge in the watershed.  The label for each grid cell, used in this study, is displayed in the center of the cell. PRISM grid and labels are in blue; "LOCA Archive" grid and labels are in orange. Grid index numbering for LOCA and PRISM is the numbering used in this study and is not a reference identifier. The site watershed is northwest of San Antonio, TX, USA.
The site is in karst terrain. Dolan Creek watershed is at the southwestern margin of the Edwards Plateau, a resistant carbonate upland of nearly flat-lying limestone and dolostone with thin soils, caprock mesas, and dry arroyos [24]. In terms of hydrologic soil properties, ridgetops and valley/mesa sides (92-93% of total watershed area) have low water storage capacity, shallow bedrock, and high runoff potential while the streambeds and valleys (7-8% of total watershed area) provide high infiltration rates, relatively deep bedrock, and near surface water storage capacity [25]. Most of the watershed, 92%, is naturally impervious with shallow to no soil because of the rocky nature of the mesadominated landscape. However, the site watershed is in karst terrain and valley bottomareas exhibit enhanced secondary porosity from limestone dissolution. This secondary porosity creates the relatively large storage capacity and elevated permeability which produce rapid infiltration in valleys and dry stream beds.
Beck et al. [26] provide global maps of Köppen-Geiger climate classifications at 1-km resolution for current conditions and for future conditions projected through 2100 from GCM simulations. The study area is mapped in the hot, semi-arid category (BSh) under both current and future conditions. The Köppen-Geiger climate classification system is a widely used vegetation-based empirical system which is based on the idea that climate is best defined by native vegetation [27].
2.1.1. Historical Weather Data PRISM (parameter-elevation relationships on independent slopes model) gridded climate data [28] provide weather parameter observations for the study. PRISM daily time series for precipitation, maximum temperature, and minimum temperature were acquired for 1 January 1981 through 4 May 2019.
PRISM climate variables are spatially interpolated weather elements and are not technically direct observations or point measurements. It is an interpolation method that seeks to produce gridded climate elements from a network of observation stations that reflects the known spatial climate patterns across the continental United States (CONUS). PRISM has been shown to provide improved climate grids relative to WorldClim and Dayment, and PRISM gridded climate estimates display the largest relative improvement in mountainous and coastal areas of the western United States [29]. Figure 1 displays the PRISM grid in the vicinity of the site; PRISM grid cells are approximately 18.6 km 2 in area. Multiple grid cells in the PRISM data set are completely or partially within the watershed. Each grid cell has a time series of observations for daily precipitation depth, maximum temperature, and minimum temperature. Figure 2 presents the 1981-2010 Climate Normals calculated from area-weighted PRISM data for the watershed. Area-weighting involves weighting each grid cell value by the proportional area of the grid cell, within the watershed, to the total watershed area to produce a single value characteristic of the entire watershed from multiple grid cells (Additional details of this process and weight definitions are available in Appendix A). Average monthly potential evapotranspiration for a reference crop, or monthly ET o , is also plotted on Figure 2. Average monthly potential evapotranspiration exceeds precipitation for every month confirming that the site is an arid environment.

Downscaled GCM Simulation Results
Ensemble GCM simulation results provide future weather and climate for this study. Because the study site is a small watershed, global-scale model results need to be downscaled prior to use in framework formulation. Spatial downscaling imparts enhanced resolution spatial structure to GCM results. "There are two predominant types of spatial downscaling-dynamical and statistical. Dynamical methods use GCM output as boundary conditions for finer resolution regional climate models, linking process-based physical relationships between small-and large-scale behavior. While these methods may capture (possibly unobserved) nonstationary behaviors, they are computationally expensive and are not practical for downscaling of the large ensembles of century long GCM runs that are of interest in many modern studies, on the continental scale. Statistical downscaling methods rely on historically observed statistical relationships between coarse-and fine-spatial scale patterns [31] (p.4)." Statistically downscaled, ensemble GCM simulation results from Coupled Model Intercomparison Project Phase 5 (CMIP5) are publicly available for the study site from the "Downscaled CMIP3 and CMIP5 Climate and Hydrology Projections" archive [31,32]. Three different statistical downscaling methodologies are available in this archive: (1) BCSD: Bias correction with spatial disaggregation [33]; (2) BCCA: Bias correction with constructed analogs [5]; and, (3) LOCA: Localized Constructed Analogs [34]. Only CMIP5, LOCA downscaled ensembles from "Downscaled CMIP3 and CMIP5 Climate and Hydrology Projections" archive, hereafter the "LOCA Archive", are used in this study.
LOCA is an example of a constructed analog method. Constructed analogs are created by finding a set of observed days that are like the GCM day when coarsened to the GCM grid. LOCA improves upon other constructed analog methods through selection of analog days in a synoptic-scale region around the downscaling point rather than across the entire domain, which results in different model points using different analog days. These relative improvements reduce issues of spurious drizzle generation and damping of local precipitation extremes that are known to occur in downscaled, GCM simulation results [31,34].
The "LOCA Archive" provides simulated values for daily precipitation depth, maximum temperature, and minimum temperature for 1950-2099. Results are provided on a Error bars shown on precipitation depth columns are plus and minus one standard deviation. Historic average monthly ET o depth was obtained from the Texas A&M Agrilife Extension [30] for Del Rio, Texas which is south of the study site. ET o is the potential evapotranspiration for a reference crop, typically a short grass. For average conditions, excess evaporative capacity exists for every month of the year.

Downscaled GCM Simulation Results
Ensemble GCM simulation results provide future weather and climate for this study. Because the study site is a small watershed, global-scale model results need to be downscaled prior to use in framework formulation. Spatial downscaling imparts enhanced resolution spatial structure to GCM results. "There are two predominant types of spatial downscaling-dynamical and statistical. Dynamical methods use GCM output as boundary conditions for finer resolution regional climate models, linking process-based physical relationships between small-and large-scale behavior. While these methods may capture (possibly unobserved) nonstationary behaviors, they are computationally expensive and are not practical for downscaling of the large ensembles of century long GCM runs that are of interest in many modern studies, on the continental scale. Statistical downscaling methods rely on historically observed statistical relationships between coarse-and fine-spatial scale patterns [31] (p. 4)." Statistically downscaled, ensemble GCM simulation results from Coupled Model Intercomparison Project Phase 5 (CMIP5) are publicly available for the study site from the "Downscaled CMIP3 and CMIP5 Climate and Hydrology Projections" archive [31,32]. Three different statistical downscaling methodologies are available in this archive: (1) BCSD: Bias correction with spatial disaggregation [33]; (2) BCCA: Bias correction with constructed analogs [5]; and, (3) LOCA: Localized Constructed Analogs [34]. Only CMIP5, LOCA downscaled ensembles from "Downscaled CMIP3 and CMIP5 Climate and Hydrology Projections" archive, hereafter the "LOCA Archive", are used in this study.
LOCA is an example of a constructed analog method. Constructed analogs are created by finding a set of observed days that are like the GCM day when coarsened to the GCM grid. LOCA improves upon other constructed analog methods through selection of analog days in a synoptic-scale region around the downscaling point rather than across the entire domain, which results in different model points using different analog days. These relative improvements reduce issues of spurious drizzle generation and damping of local precipitation extremes that are known to occur in downscaled, GCM simulation results [31,34].
The "LOCA Archive" provides simulated values for daily precipitation depth, maximum temperature, and minimum temperature for 1950-2099. Results are provided on a 1/16 • latitude-longitude grid, displayed on Figure 1; each grid cell is approximately 42 km 2 in area. Multiple grid cells are completely or partially within the study area. These downscaled climate projections are an ensemble of results from 32 models and two emissions scenarios (Table S1 lists the GCMs by organization and the emissions scenario that are in the "LOCA Archive.") that provides 64 simulated time series of daily precipitation, minimum air temperature, and maximum air temperature per grid cell. Each of the 64 realizations is assumed to be equally likely.
Representative concentration pathway (RCP) 4.5 and 8.5 emissions scenarios were available in the "LOCA Archive" at the time of this study. RCP4.5 and RCP8.5 are the two, future projection (2006-2300) simulations in CMIP5 core, long-term experiments. RCP8.5 is a high emissions scenario and RCP4.5 is a midrange mitigation emissions scenario [35]. RCP4.5 stabilizes radiative forcing at the 4.5 W/m 2 level in the year 2100 and the 4.5 W/m 2 forcing threshold is not exceeded during the scenario [36]. RCP8.5 assumes no global climate change impact limitation policy framework along with high energy demand and corresponding emissions. In RCP8.5, emissions increase across the simulation time frame to 8.5 W/m 2 at 2100 [37].

Technical Approach
The goal of the framework, presented on Figure 3, is to analyze relative impact of systemic changes, including climate change, on watershed-scale water resources and to explicitly identify the uncertainty associated with projected impacts. It is composed of two separate experiments, or pathways, that are executed jointly within a probabilistic simulation framework. One internal pathway is the null hypothesis experiment (H0) that represents "no change" or historical conditions. The other pathway is the alternative hypothesis experiment (H1), representing systemic change.
Each pathway contains two models linked in series; a weather generator model generates synthetic input weather forcing for a watershed water balance model. The framework contains four models total (two simulation experiments, each with two models). To isolate projected impacts resulting from systemic variations, one model type, either the weather generator or the water balance model, should be different between the pathways. The other model type should be identical in both pathways. To assess changes to runoff and recharge resulting from future climate trends, identical water balance models are used in both experiments, and different weather generator models are employed in each pathway.
Probabilistic simulation is the process of explicitly representing uncertainty by describing model inputs as probability distributions. A probability distribution is a mathematical representation of the relative likelihood of an unknown variable having a specific value. It provides a means to quantify uncertainty related to the possible values of an unknown variable and a summary description of a sample of a larger population.
The framework uses Monte Carlo techniques to implement probabilistic simulation and analysis. Monte Carlo simulation means that the entire framework is simulated many times. Each simulation is a realization of the system; using multiple realizations provides an ensemble modeling approach. For each realization, uncertain input parameters are randomly sampled from probability distributions.

Stochastic Weather Generators
A weather generator is a statistical model of daily weather sequences designed to simulate key statistical properties of meteorological records like means, variances, frequencies, and extreme occurrences. Weather generators are commonly used in water resources planning and design, and in agricultural, ecosystem, or climate change analysis applications because meteorological data may lack temporal or spatial coverage for the area of interest [38]. Several weather generators exist as generally applicable computer programs that are available for creation of synthetic, site-specific, weather sequences. Examples of available weather generators include WGEN [39,40], LARS-WG [41,42], WeaGETS [43], and MulGETS [44]. Weather generators have been used to generate climate change scenarios [45]. Ensemble climate change projections obtained from GCM simulation results can be used, in place of measurements, to derive the key statistical properties that underlie the stochastic formulation inherent in a weather generator [46,47].
The purpose of weather generator models within the framework is to provide weather forcing to water balance model calculations. Custom weather generators are created, following the schematic provided on Figure 4, and implemented for each pathway to facilitate tailoring of the representation to identified climate trends (Additional details of weather generator formulation and implementation are provided in Appendix A).
Water 2021, 13, x FOR PEER REVIEW 8 o programs that are available for creation of synthetic, site-specific, weather sequences. amples of available weather generators include WGEN [39,40], LARS-WG [41,WeaGETS [43], and MulGETS [44]. Weather generators have been used to generate clim change scenarios [45]. Ensemble climate change projections obtained from GCM simu tion results can be used, in place of measurements, to derive the key statistical proper that underlie the stochastic formulation inherent in a weather generator [46,47]. The purpose of weather generator models within the framework is to prov weather forcing to water balance model calculations. Custom weather generators are ated, following the schematic provided on Figure 4, and implemented for each pathw to facilitate tailoring of the representation to identified climate trends (Additional det of weather generator formulation and implementation are provided in Appendix A).  Within the weather generator representation, precipitation is represented with occurrence and intensity processes. Daily minimum and maximum temperature are simulated using a first-order, state-conditional vector autoregression. These processes are simulated stochastically by selection of values from probability distributions. Values selected from probability distributions provide the uncertain inputs that produce probabilistic simulations.
Water 2021, 13, x FOR PEER REVIEW 9 of 41 Within the weather generator representation, precipitation is represented with occurrence and intensity processes. Daily minimum and maximum temperature are simulated using a first-order, state-conditional vector autoregression. These processes are simulated stochastically by selection of values from probability distributions. Values selected from probability distributions provide the uncertain inputs that produce probabilistic simulations. Figure 4. Schematic of weather generator implementation. Weather generators provide for stochastic simulation of precipitation occurrence and intensity and maximum and minimum daily temperature. An alternating state, spell length approach based on negative binomial distributions provides the precipitation occurrence process representation. Precipitation intensity is represented with daily, wet state precipitation depth sampled from mixed exponential distributions. Simulated air temperature is conditioned on state and calculated with a first-order vector autoregression.
To isolate impacts to water resources from climate trends, a different weather generator model is employed in each pathway. The H0 pathway uses a weather generator designed to reproduce the statistics of historical observations; the PRISM data set provides the historical observations for this study. The H1 pathway employs a weather generator customized to simulate the statistical properties of projected future climatic conditions that are determined through analysis of downscaled, ensemble GCM simulation results in the "LOCA Archive." Table 1 lists the probability distributions type employed for each precipitation process. Precipitation occurrence is represented with two states: (1) wet state and (2) dry state. Transition between states is modeled using a spell length approach. The spell length approach has been used in other weather generators, including LARS-WG [41,42]. Spelllength models for precipitation occurrence work by sampling wet and dry spell lengths from separate probability distributions [38]. This approach may also be referred to as an "alternating renewal process" [38,48]. . Schematic of weather generator implementation. Weather generators provide for stochastic simulation of precipitation occurrence and intensity and maximum and minimum daily temperature. An alternating state, spell length approach based on negative binomial distributions provides the precipitation occurrence process representation. Precipitation intensity is represented with daily, wet state precipitation depth sampled from mixed exponential distributions. Simulated air temperature is conditioned on state and calculated with a first-order vector autoregression.
To isolate impacts to water resources from climate trends, a different weather generator model is employed in each pathway. The H0 pathway uses a weather generator designed to reproduce the statistics of historical observations; the PRISM data set provides the historical observations for this study. The H1 pathway employs a weather generator customized to simulate the statistical properties of projected future climatic conditions that are determined through analysis of downscaled, ensemble GCM simulation results in the "LOCA Archive." Table 1 lists the probability distributions type employed for each precipitation process. Precipitation occurrence is represented with two states: (1) wet state and (2) dry state. Transition between states is modeled using a spell length approach. The spell length approach has been used in other weather generators, including LARS-WG [41,42]. Spelllength models for precipitation occurrence work by sampling wet and dry spell lengths from separate probability distributions [38]. This approach may also be referred to as an "alternating renewal process" [38,48]. The mixed exponential distribution is a three-parameter distribution. In this case, the fourth parameter is the maximum possible precipitation depth truncation value. An untruncated mixed exponential distribution will provide infrequent non-physical, precipitation depths exceeding 2000 mm.
Negative binomial distributions were selected to represent spell lengths because of their statistical properties. A negative binomial distribution is useful for discrete data (i.e., counts of contiguous days are discrete) when the sample variance exceeds the sample mean and have been found to produce realistic results in weather generators for representation of low frequency occurrence of long dry spells [38].
A negative binomial, wet spell length and dry spell length distribution is derived for each calendar month by fitting this distribution type, using the R package fitdistrplus [49], to samples of observed spell length extracted from the PRISM data set and the "LOCA Archive." Spell length is allocated to the month when the streak of contiguous state days begins. This permits spell lengths to be longer than the duration of a month or even longer than a year in duration. For spell length, each grid cell is considered to provide an equally likely realization of precipitation occurrence for the watershed, and spell length samples include spell lengths from all grid cells within, or partially within, watershed boundaries.
Precipitation intensity is the total depth of precipitation for each day, or daily precipitation amount. Mixed exponential distributions were selected for this process because of their statistical properties. The mixed exponential distribution previously provided better overall representation of daily precipitation depth than the gamma distribution [50][51][52] and better representation of the frequencies of extreme precipitation amounts [51].
A mixed exponential distribution for each calendar month and grid cell is fit, using the R package mixtools [53], to samples of observed daily precipitation depth extracted from the PRISM data set and the "LOCA Archive." Use of a different mixed exponential distribution for each month provides a description of seasonal variation in precipitation. Different distributions for each grid cell provide a description of spatial variation in precipitation intensity.
The mixed exponential distributions created for each grid cell require specification of a maximum truncation value to avoid infrequent sampling of non-physical, extremely large (exceeding 2000 mm), daily precipitation values. The requirement for truncation depth specification is a limitation to this approach because a fourth parameter needs to be derived to effectively define these distributions. However, specification of truncation depth provides a means to enforce larger magnitude extreme events in the future through use of a larger truncation depth for future conditions relative to historical conditions. Maximum and minimum daily temperatures are also simulated by the weather generators as shown on Figure 4. Maximum and minimum daily temperatures are the two weather parameters, in addition to precipitation, that are available in downscaled GCM simulation results. Temperature simulation is conditioned on precipitation occurrence to provide a proxy for unrepresented processes like cloud cover [54], and temperature is simulated using the first-order vector autoregression algorithm employed in WGEN [39,40,55]. First-order requires that the statistics of the current day's values are fully defined by the values on the previous day [38]. surface flows and storages is precipitation. ET dominates the water balance and controls the availability of water for soil moisture storage, groundwater recharge, and runoff. The ET subprocesses of potential evapotranspiration (PET) and actual evapotranspiration (AET) are normally the first considerations in water budget calculation because of the control of ET on the water balance. PET is the rate of water loss to the atmosphere in the absence of water supply limitations. AET is the actual rate of water loss to the atmosphere given water supply restrictions [14].

Water Balance Models
Water 2021, 13, x FOR PEER REVIEW 11 of 41 Figure 5 provides a highly conceptualized, schematic depiction of the watershed water balance. The ultimate source of freshwater supply for surface, near-surface, and subsurface flows and storages is precipitation. ET dominates the water balance and controls the availability of water for soil moisture storage, groundwater recharge, and runoff. The ET subprocesses of potential evapotranspiration (PET) and actual evapotranspiration (AET) are normally the first considerations in water budget calculation because of the control of ET on the water balance. PET is the rate of water loss to the atmosphere in the absence of water supply limitations. AET is the actual rate of water loss to the atmosphere given water supply restrictions [14]. Figure 5. Porous media, terrestrial hydrologic cycle schematic. The watershed water budget, or land-surface processes balance, and the groundwater water budget combine to provide a full terrestrial hydrologic cycle representation. Precipitation is the ultimate source of natural watershed water and groundwater. Actual evapotranspiration (AET) returns water vapor to the atmosphere. Recharge is water that flows downwards across the water table from the vadose zone. Runoff is composed mainly of overland flow and is the water that goes directly to generate stream flow without crossing the water table of an aquifer.
Precipitation that is not lost to AET may infiltrate into the soil or move across the ground surface as runoff and travel to a stream. Infiltration is the movement of water from the land surface into the soil. Soil moisture is the water stored in pore spaces and channels in the soil beneath the ground surface. Percolation is the movement of water downward through the soil channels, and water may percolate through the soil and move out of the bottom of the soil column [14].
Water that percolates through the soil and continues downwards through the unsaturated zone may eventually reach a water table. Deep percolation water may also be stored within pore spaces and other storages within the unsaturated zone. The water table denotes the interface of the zone of saturation and the unsaturated zone. Groundwater is water that is in the zone of saturation [56]. Recharge is the water that flows downward Figure 5. Porous media, terrestrial hydrologic cycle schematic. The watershed water budget, or land-surface processes balance, and the groundwater water budget combine to provide a full terrestrial hydrologic cycle representation. Precipitation is the ultimate source of natural watershed water and groundwater. Actual evapotranspiration (AET) returns water vapor to the atmosphere. Recharge is water that flows downwards across the water table from the vadose zone. Runoff is composed mainly of overland flow and is the water that goes directly to generate stream flow without crossing the water table of an aquifer.
Precipitation that is not lost to AET may infiltrate into the soil or move across the ground surface as runoff and travel to a stream. Infiltration is the movement of water from the land surface into the soil. Soil moisture is the water stored in pore spaces and channels in the soil beneath the ground surface. Percolation is the movement of water downward through the soil channels, and water may percolate through the soil and move out of the bottom of the soil column [14].
Water that percolates through the soil and continues downwards through the unsaturated zone may eventually reach a water table. Deep percolation water may also be stored within pore spaces and other storages within the unsaturated zone. The water table denotes the interface of the zone of saturation and the unsaturated zone. Groundwater is water that is in the zone of saturation [56]. Recharge is the water that flows downward through the unsaturated zone and across the water table to join the waters in the zone of saturation [57].
The framework, see Figure 3, employs two water balance models, one in each calculation pathway. Any water budget calculation that uses precipitation and PET, calculated from location and air temperature, as inputs and produces AET, runoff, and recharge as outputs can be used within the framework. Two types of water-budget calculations are used in this study: (1) monthly water balance models and (2) daily continuous simulation, distributed parameter models.
A Thornthwaite and Mather-style water balance calculation with a monthly time step [14,15,[58][59][60] is used for the monthly water balance modeling approach (A schematic of this calculation is provided in Figure S3 for reference). This calculation is a single soil column water balance that provides homogeneous representation of the watershed. The method provides an estimate of runoff and deep infiltration out of the bottom of the soil column for each month. Runoff on a monthly interval represents surface runoff, interflow, and sub-surface runoff, and it is assumed to be equivalent to the monthly volume of stream flow generated from the watershed. However, the physical processes of stream flow are not simulated. Similarly, aquifer recharge in this calculation is attributed to water that leaves the soil column water as deep percolation. A water table is not simulated or involved in the monthly water budget calculations, and aquifer recharge is not explicitly simulated.
The Hydrological Simulation Program-FORTRAN (HSPF) is the continuous simulation, distributed parameter model used in this study. HSPF can simulate hydrologic, and associated water quality, processes on pervious and impervious land surfaces and in streams and well-mixed impoundments. It is a comprehensive package for simulation of watershed hydrology and surface water-related considerations at the watershed scale [61].
HSPF provides a heterogeneous representation of the watershed using the concepts of hydrologic response units (HRUs) and stream segments. Runoff is simulated from HRUs and includes surface runoff and subsurface interflow; runoff is routed from HRUs to stream segments and then downstream across stream segments to the basin outlet. Recharge is attributed to deep percolation which leaves the bottom of the soil column; this quantity is referred to as inactive groundwater inflow (IGWI). Recharge can also be simulated in HSPF as water that seeps out of the bottom of a stream, river, or lake and leaves the model. It should be noted that a water table is not explicitly simulated in HSPF; consequently, recharge is not explicitly simulated.

Potential Evapotranspiration Calculations
In Figure 3, the outputs from the weather generator models are daily precipitation, minimum temperature, and maximum temperature. The inputs to the water balance models are daily precipitation and PET. To complete the linkage between the weather generator models and the water balance models, PET is calculated from temperature and study site location.
The Hargreaves equation [62,63] is used to estimate the PET from simulated temperature and site location for all water balance calculations in this study (Additional details of the implementation of the Hargreaves equation are provided in Appendix A). Using temperature alone to estimate PET for historical conditions is generally not recommended [64]; however, temperature is the available weather parameter in GCM simulation results for PET calculation. The Hargreaves equation provides a reasonable estimator for future PET given the available information.

Framework Results-Relative Probabilistic Time Histories
Monte Carlo techniques provide a probabilistic simulation structure to the framework where two independent solution pathways are simulated thousands of times. For each realization, the differences between simulated weather parameters and watershed water budget components provide results that are aggregated across all realizations to generate probabilistic time histories of relative impacts. These relative differences and associated likelihoods are the primary outputs and are labeled as ∆ values in Figure 3; ∆ denotes the difference, ∆ = H1 − H0, between pathway simulated parameter values. ∆ values are calculated for each simulated day for weather parameters and for each simulated month for water balance parameters.
A probabilistic time history is a time series of probability distributions. As an example, a monthly water balance model provides results for calculated recharge for every simulated month and every realization in each pathway. An empirical probability distribution, which is just a normalized cumulative histogram, is created from the sample of simulated recharge ∆ values for each month. Each realization contributes one recharge ∆ value to the sample, and the size of the sample is equal to the number of realizations. With a probability distribution created for each output time, a selected percentile, like the 50th percentile, can be plotted against simulation time to create a probabilistic time history.

Weak Stationarity-Identification of Climate from Weather
Weak stationarity denotes a constant mean for a time series, suggesting that there is not a trend across the analysis period [65]. The assumption of weak stationarity is inherent in calculating Climate Normals. Weak stationarity must be assumed to derive probability distributions representing precipitation occurrence and intensity from samples of contiguous days and daily precipitation depths.
Weak stationarity is not expected to be a realistic assumption across many decades. A relatively short averaging interval of thirty years, or the Climate Normals interval, is used for the averaging interval to identify a climate description from weather. Use of the Climate Normals interval produces a stepwise approximation of expected climate trends across multiple Climate Normals.
"LOCA Archive" simulation results are available for 1950-2099. PRISM weather data are based on historical observations and are available from 1981-present. The total framework analysis period for the study site is 1981-2100 as dictated by the intersection of PRISM data availability, "LOCA Archive" simulation duration, and inclusion of four complete Climate Normals intervals. This total period is split into the four intervals shown in Table 2. The first interval, 1981-2010, is the most recent Climate Normals period. This "Data Interval," 1981-2010, provides a natural interval for climate comparison between "LOCA Archive" simulated weather results and PRISM weather data. The next three Climate Normals (2011-2040, 2041-2070, and 2071-2100) are future, or projection, intervals. Daily weather parameters are simulated stochastically within the framework and drive water budget calculations. Framework outputs will vary each simulated day in accordance with the stochastically simulated weather inputs. Probabilistic time history results will become smoother and display fewer high frequency oscillations as the number of realizations is increased. To provide relatively smooth time history depiction while limiting the total number of realizations, smoothing is applied to post-process framework results. This time history smoothing facilitates interpretation of results while limiting the overall computational burden.
A simple low pass filter using a cutoff frequency, like that used for temperature time series processing in the WGEN calculation algorithm [39,40], is employed to smooth result time histories and temperature time series. Specifically, this low pass filter is used to generate the first-order, vector autoregression representation for daily minimum and maximum air temperature, to compare observed air temperature time series between data sets, and to smooth probabilistic result time histories.
For smoothing of oscillations in result time histories, daily simulated values are aggregated to a monthly time series covering a 30-year period to generate a 360 month-long time history. This time series is transformed to the frequency domain using discrete Fourier transform. The transformed series is multiplied by a rectangular filter created by setting the first 15 frequency indexes to one and the remaining indexes to zero. The product series is back transformed to the time domain. This low pass filtering approach maintains the frequency with a two-year period as well as lower frequencies and generally provides a smoothed representation of simulation results.

Analysis of Climate Change Projections and Model Uncertainty
Creation of a weather generator requires analysis of the basis data set, which provides samples of contiguous state days, daily precipitation depth, and day of the year minimum and maximum temperature. An average value for each of these samples is used in weather generator formulation, and a weather generator must be formulated for an interval where weak stationarity is assumed.
Statistical analyses of these samples and the distribution of values within each sample provides a built-in comparison between historical observations and ensemble GCM results within overlap periods. Comparison of the distribution of values in and the means of samples from 1981-2010 between the PRISM and "LOCA Archive" data sets is used to identify components of structural uncertainty in downscaled, ensemble GCM results. If structural uncertainties can be identified in framework inputs, the framework representation can be tweaked to remove these influences from framework results.
Comparison of the distribution of values in and the means of samples collected from the "LOCA Archive" data set among the future Climate Normals in Table 2 provides a way to identify projections for future climate change. If ensemble GCM results provide different descriptive statistical summaries between 30-year analysis intervals and thus provide a trend, the trend is identified as the climate change projection. Because weak stationarity is assumed within Climate Normals intervals, identified trends will be represented in a stepwise fashion with changes between 30-year analysis intervals and consistent averages within 30-year intervals.

Results
Primary outputs of the framework are probabilistic comparisons AET, runoff, and recharge between pathways. Two different water balance models: (1) monthly water budget and (2) daily continuous simulation, distributed parameter models are used in separate framework implementations.
Prior to implementing the water balance models, weather generators are created and calibrated. Comparative results for simulated weather parameters and climate are available from the weather generators as secondary results.

Projected Climate Change Trends and Weather Generator Formulation
Site-specific climate trends are identified from analysis of downscaled, GCM simulation results in the "LOCA Archive." The weather generators within the framework are then formulated so that the average differences between the H1 and H0 weather generators reproduce the identified departures of future climatic conditions from historical conditions.

Site Specific Future Climate Trends
Study site climate trends are identified from "LOCA Archive" results through comparison of the distribution of values in, and the means of, samples of contiguous state days, daily precipitation depth, and day of the year minimum and maximum temperature from future Climate Normals in Table 2. In terms of temperature, "LOCA Archive" simulated temperatures show an approximately 1 • C increase from one projection interval to the next so that 2071-2099 is about 3 • C warmer on average than 1981-2010 ( Figure S4 and Table S2 provide a summary of the differences between average daily minimum and maximum temperatures for future Climate Normals). Distribution of annual precipitation depth from the "LOCA Archive" is almost identical across all four Climate Normals. The lower whisker in Figure 6 displays a decreasing trend from 2010 to 2099; however, the means, medians, and interquartile ranges are similar or consistent across 2010 to 2099. Given the similarity in empirical distributions of annual precipitation depth across the three future Climate Normals in the "LOCA Archive," no significant change in expected annual precipitation is projected for the study site from 2011-2100.
Study site climate trends are identified from "LOCA Archive" results through comparison of the distribution of values in, and the means of, samples of contiguous state days, daily precipitation depth, and day of the year minimum and maximum temperature from future Climate Normals in Table 2. In terms of temperature, "LOCA Archive" simulated temperatures show an approximately 1 °C increase from one projection interval to the next so that 2071-2099 is about 3 °C warmer on average than 1981-2010 ( Figure S4 and Table S2 provide a summary of the differences between average daily minimum and maximum temperatures for future Climate Normals). Figure 6 displays a comparison of empirical distributions of annual precipitation depth observed in the PRISM 1981-2010 data set and the "LOCA Archive" simulation results. Distribution of annual precipitation depth from the "LOCA Archive" is almost identical across all four Climate Normals. The lower whisker in Figure 6 displays a decreasing trend from 2010 to 2099; however, the means, medians, and interquartile ranges are similar or consistent across 2010 to 2099. Given the similarity in empirical distributions of annual precipitation depth across the three future Climate Normals in the "LOCA Archive," no significant change in expected annual precipitation is projected for the study site from 2011-2100. Distribution of annual maximum, daily precipitation depths among the four Climate Normals intervals was examined to identify trends in extreme event occurrence. Extreme events are low probability phenomena that occur within a discrete period. Figure 7 shows these distributions along with site-specific, estimates of 2-yr, 25-yr, 50-yr, and 100-yr recurrence interval, daily event depth. The recurrence interval estimates are "point" estimates, and these are compared to the distribution of values from all grid cells completely or partially within the site watershed (see Figure 1). Overall, comparison of "LOCA Archive" values among the analysis intervals in Figure 7 suggests no significant change in expected extreme event magnitude from 2011-2100. Distribution of annual maximum, daily precipitation depths among the four Climate Normals intervals was examined to identify trends in extreme event occurrence. Extreme events are low probability phenomena that occur within a discrete period. Figure 7 shows these distributions along with site-specific, estimates of 2-yr, 25-yr, 50-yr, and 100-yr recurrence interval, daily event depth. The recurrence interval estimates are "point" estimates, and these are compared to the distribution of values from all grid cells completely or partially within the site watershed (see Figure 1). Overall, comparison of "LOCA Archive" values among the analysis intervals in Figure 7 suggests no significant change in expected extreme event magnitude from 2011-2100.

Identified Structural Uncertainty in "LOCA Archive" Simulation Results
Structural uncertainty issues in "LOCA Archive" results are identified through comparison between the distribution of values in, and the means of, samples of contiguous state days, daily precipitation depth, and day of the year minimum and maximum temperature in the PRISM and "LOCA Archive" data sets during 1981-2010. Synthetic drizzle and underprediction of extreme event precipitation are the focus issues because these are known issues in downscaled, GCM simulation results. Both issues are evident in "LOCA Archive" simulation results for the study site. Water 2021, 13, x FOR PEER REVIEW 16 of 41 Figure 7. Observed, empirical distribution of individual grid cell, annual maximum, daily precipitation depth. A violin plot combines the box plot format with a kernel density estimator to provide a smoothed depiction of an empirical distribution [66], in this case the distribution of annual maximum, daily precipitation depth. The horizontal, dashed lines within the shaded portions of the violin plots identify the three inner quartile thresholds (i.e., 25th, 50th, and 75th percentile or 1/4, 1/2, and 3/4 quantiles). Daily extreme event magnitude estimates by recurrence interval come from NOAA Atlas 14 [67] 24-h event magnitude estimates that are divided by 1.13, which is a conversion multiplier to convert from daily extreme values to 24-h extreme values [68].

Identified Structural Uncertainty in "LOCA Archive" Simulation Results
Structural uncertainty issues in "LOCA Archive" results are identified through comparison between the distribution of values in, and the means of, samples of contiguous state days, daily precipitation depth, and day of the year minimum and maximum temperature in the PRISM and "LOCA Archive" data sets during 1981-2010. Synthetic drizzle and underprediction of extreme event precipitation are the focus issues because these are known issues in downscaled, GCM simulation results. Both issues are evident in "LOCA Archive" simulation results for the study site.
The distribution of average annual precipitation depth in Figure 6 is similar between PRISM 1981-2010 and "LOCA Archive" 1981-2010 samples suggesting that on an average annual basis the two data sets are providing a similar depiction of precipitation for the study site. However, the combination of relatively longer wet spell durations (see Figure  S5), and thus shorter dry spell durations (see Figure S6), with lower median daily precipitation amounts (see Figure S7) suggests a synthetic drizzle issue in "LOCA Archive" results. Relatively longer wet spells and lower daily precipitation depths mean more rainy days with small amounts of precipitation are simulated than observed.
In terms of extreme events, the "LOCA Archive" median value is approximately equal to the PRISM 25th percentile value in Figure 7, denoting that the annual maximum daily precipitation depth in the "LOCA Archive" tends to be significantly smaller than the annual maximum daily precipitation depth observed for the watershed. Additionally, the "LOCA Archive" values appear to be biased low as the 75th percentile threshold is always lower in magnitude than a 2-year recurrence, daily event. Figure 7. Observed, empirical distribution of individual grid cell, annual maximum, daily precipitation depth. A violin plot combines the box plot format with a kernel density estimator to provide a smoothed depiction of an empirical distribution [66], in this case the distribution of annual maximum, daily precipitation depth. The horizontal, dashed lines within the shaded portions of the violin plots identify the three inner quartile thresholds (i.e., 25th, 50th, and 75th percentile or 1/4, 1/2, and 3/4 quantiles). Daily extreme event magnitude estimates by recurrence interval come from NOAA Atlas 14 [67] 24-h event magnitude estimates that are divided by 1.13, which is a conversion multiplier to convert from daily extreme values to 24-h extreme values [68].
The distribution of average annual precipitation depth in Figure 6 is similar between PRISM 1981-2010 and "LOCA Archive" 1981-2010 samples suggesting that on an average annual basis the two data sets are providing a similar depiction of precipitation for the study site. However, the combination of relatively longer wet spell durations (see Figure S5), and thus shorter dry spell durations (see Figure S6), with lower median daily precipitation amounts (see Figure S7) suggests a synthetic drizzle issue in "LOCA Archive" results. Relatively longer wet spells and lower daily precipitation depths mean more rainy days with small amounts of precipitation are simulated than observed.
In terms of extreme events, the "LOCA Archive" median value is approximately equal to the PRISM 25th percentile value in Figure 7, denoting that the annual maximum daily precipitation depth in the "LOCA Archive" tends to be significantly smaller than the annual maximum daily precipitation depth observed for the watershed. Additionally, the "LOCA Archive" values appear to be biased low as the 75th percentile threshold is always lower in magnitude than a 2-year recurrence, daily event.

Calibrated Weather Generator Representation
The alternative hypothesis experiment, or H1 pathway, uses a weather generator formulated to generate a statistical description of projected climate from 2011-2100. The H1 pathway weather generator is adjusted on a trial-and-error basis to produce average differences in precipitation and temperature between the pathways that mirror expected climate trends. Table 3 describes a weather generator formulation that reproduces the identified climate trends of an approximately 3 • C increase in average temperature and no significant change to average annual precipitation depth. The average daily precipitation depth (see Figure S7) is similar between PRISM and "LOCA Archive" data sets; consequently, use of spell length distributions derived from the PRISM data set eliminates synthetic drizzle issues. When "LOCA Archive" derived spell length distributions are used in conjunction with "LOCA Archive" derived precipitation intensity distributions in the H1 pathway weather generator, the H1 weather generator simulates a 30% increase in average annual precipitation (the calibration process and additional details of weather generator conceptualization and formulation are provided in Appendix A).  (Tables S3 and S4) PRISM (Tables S3 and S4) PRISM (Tables S3 and S4) Precip. Intensity 1 PRISM (Tables S6-S9) PRISM (Tables S6-S9) PRISM (Tables S6-S9) PRISM (Tables S6-S9 (Tables S3 and S4) PRISM (Tables S3 and S4) PRISM (Tables S3 and S4) PRISM (Tables S3 and S4) Precip. Intensity 2 PRISM (Tables S6-S9) LOCA (Table S5) LOCA (Table S5) LOCA (Table S5) Temperature PRISM ( Figure Figure 8A, are close to zero but the average time history is always positive, and the median time history is always negative. Having the average and median precipitation ∆ time histories near zero with opposing signs suggests that the comparative weather generator representation is sufficiently representing no significant trend in precipitation depth from 2011-2100. Both the average and median precipitation ∆ time histories, however, display a slight increasing trend from 2011-2100, and the 5th to 95th percentile range is positively skewed. This positive skew and slight increasing trend are attributed to enforcing the occurrence of larger magnitude extreme events into the calibrated, future weather generator formulation. PET ∆ time histories, Figure 8B, show an increasing trend across all three future intervals because of the simulated and projected increase in temperature.

Monthly Water Balance Modeling
The framework implementation was applied to the site watershed using a monthly water balance model calculation. In this approach, a hypothetical soil column applies to the entire watershed. To isolate climate change impacts on water resources, identical water balance models are used in both pathways and weather forcing is provided by the calibrated weather generator formulation. The monthly water balance model was calibrated to site-specific conditions (details of the calibration and additional description of the monthly water balance model are provided in Appendix B).
The monthly water budget for the site was simulated from 1981-2100 using 10,000 framework realizations where each realization included calculation of water balance results in each pathway. Results are presented for 2011-2100 because simulation of the "Data Interval," 1981-2010, is only used to provide initial watershed conditions for the simulation of the first, 2011-2040, of three future Climate Normals in Table 2.

Monthly Water Balance Modeling
The framework implementation was applied to the site watershed using a monthly water balance model calculation. In this approach, a hypothetical soil column applies to the entire watershed. To isolate climate change impacts on water resources, identical water balance models are used in both pathways and weather forcing is provided by the calibrated weather generator formulation. The monthly water balance model was calibrated to site-specific conditions (details of the calibration and additional description of the monthly water balance model are provided in Appendix B).
The monthly water budget for the site was simulated from 1981-2100 using 10,000 framework realizations where each realization included calculation of water balance results in each pathway. Results are presented for 2011-2100 because simulation of the "Data Interval," 1981-2010, is only used to provide initial watershed conditions for the simulation of the first, 2011-2040, of three future Climate Normals in Table 2.
Average simulated quantities provide context for relative differences, or Δ time histories, that are the primary framework output (Table S11 provides summaries of the average, actual simulated water budget quantities from the H1 and the H0 pathways.). The simulated average annual precipitation depth from 2011-2100 is 629 mm. The AET proportion of total precipitation is 89% for future Climate Normals intervals. Runoff accounts for less than 5% of gross precipitation on average; recharge is estimated to be approximately 7% of gross precipitation.
Actionable outputs from the framework are the probabilistic relative, or Δ, time histories of AET, runoff, and recharge. Figure 9 displays simulated AET, runoff, and recharge Average simulated quantities provide context for relative differences, or ∆ time histories, that are the primary framework output (Table S11 provides summaries of the average, actual simulated water budget quantities from the H1 and the H0 pathways.). The simulated average annual precipitation depth from 2011-2100 is 629 mm. The AET proportion of total precipitation is 89% for future Climate Normals intervals. Runoff accounts for less than 5% of gross precipitation on average; recharge is estimated to be approximately 7% of gross precipitation.
Actionable outputs from the framework are the probabilistic relative, or ∆, time histories of AET, runoff, and recharge. Figure 9 displays simulated AET, runoff, and recharge ∆ time histories. In Figure 9A, average and median AET ∆s remain close to zero across the future intervals denoting minimal expected change to AET for typical conditions. As with the precipitation ∆ time histories in Figure 8 The site is a hot, semi-arid environment. Consequently, average conditions provide an excess of monthly PET relative to precipitation (P), or P − PET < 0. Because there is typically excess evaporative capacity, AET will only increase to the extent that precipitation increases so that additional water is available for evaporation and transpiration. The variability, identified as the width of the 5th to 95th percentile range in Figure 9A, in AET ∆ values increases across the future intervals due to the interplay of increased PET, a small increasing trend in average annual precipitation (see Figure 8), and a positive bias in the skew of the precipitation ∆, 5th to 95th percentile range. As identified earlier, the increasing trend in average annual precipitation and positive shift in the 5th to 95th percentile range are attributed to enforcement of larger magnitude extreme events in the representation of future climate.
typically excess evaporative capacity, AET will only increase to the extent that precipitation increases so that additional water is available for evaporation and transpiration. The variability, identified as the width of the 5th to 95th percentile range in Figure 9A, in AET Δ values increases across the future intervals due to the interplay of increased PET, a small increasing trend in average annual precipitation (see Figure 8), and a positive bias in the skew of the precipitation Δ, 5th to 95th percentile range. As identified earlier, the increasing trend in average annual precipitation and positive shift in the 5th to 95th percentile range are attributed to enforcement of larger magnitude extreme events in the representation of future climate. Because site parameterization is unchanged and mean AET ∆ values are approximately constant and near zero across 2011-2100, runoff and recharge are also expected to remain effectively unchanged. In Figure 9B,C, average and median ∆ time history values remain essentially zero across 2011-2100, which denotes no change in or impact to expected runoff and recharge from climate change for the site. The expected variability (denoted by the spread of the 5th to 95th percentile range) of future runoff and recharge is muted and much less than the variability expected for future AET. The positive bias of the 5th to 95th percentile range in Figure 9B,C suggests that infrequent increases in water availability are expected from extreme events.

Daily, Continuous Simulation, Distributed Parameter Water Balance Modeling
The framework implementation was also applied to the site using the HSPF water balance model. The HSPF model provides a heterogenous representation of the watershed as shown in Figure 10 where the watershed is divided into 12 HRUs and five stream segments. Runoff is routed from HRUs to stream reaches and routed through the stream reaches from upstream (i.e., Reach #1) to the watershed outlet at the end of Reach #5.
Because site parameterization is unchanged and mean AET Δ values are approximately constant and near zero across 2011-2100, runoff and recharge are also expected to remain effectively unchanged. In Figure 9B,C, average and median Δ time history values remain essentially zero across 2011-2100, which denotes no change in or impact to expected runoff and recharge from climate change for the site. The expected variability (denoted by the spread of the 5th to 95th percentile range) of future runoff and recharge is muted and much less than the variability expected for future AET. The positive bias of the 5th to 95th percentile range in Figure 9B,C suggests that infrequent increases in water availability are expected from extreme events.

Daily, Continuous Simulation, Distributed Parameter Water Balance Modeling
The framework implementation was also applied to the site using the HSPF water balance model. The HSPF model provides a heterogenous representation of the watershed as shown in Figure 10 where the watershed is divided into 12 HRUs and five stream segments. Runoff is routed from HRUs to stream reaches and routed through the stream reaches from upstream (i.e., Reach #1) to the watershed outlet at the end of Reach #5. Creek is divided into five stream segments or reaches, RCHRES. Reaches #1-#4 are dry streambed areas that are considered to be losing reaches based on the Hydrologic Soil Type mapping from SSURGO [25]. Reach #5 provides the watershed outlet and is simulated as a gaining reach; it is assumed that no seepage or recharge occurs from this stream segment. Creek is divided into five stream segments or reaches, RCHRES. Reaches #1-#4 are dry streambed areas that are considered to be losing reaches based on the Hydrologic Soil Type mapping from SSURGO [25]. Reach #5 provides the watershed outlet and is simulated as a gaining reach; it is assumed that no seepage or recharge occurs from this stream segment.
Because of the heterogeneity provided by 12 HRUs and five stream segments, the attribution of recharge and runoff is complex relative to the monthly water balance model. Deep percolation to inactive groundwater in the previous portion of each HRU and seepage that leaves the model from four of the five stream reaches are attributed to recharge. Runoff from the entire watershed is the outflow stream discharge simulated at the watershed outlet. Weather forcing is provided by the calibrated weather generator formulation, and the HSPF model was calibrated to site-specific conditions (details of the calibration are provided in Appendix B).
Simulated average annual precipitation depth from 2011-2100 is 631 mm (Table S12 provides summaries of the average, actual simulated water budget quantities.). Average annual precipitation depths are slightly different, 631 and 629 mm, between the two framework implementations with different types of water balance models even though the same weather generators are used in both frameworks because the HSPF model implementation breaks the total watershed area into 12 HRUs and the precipitation applied to each HRU is comprised of the area-weighted average of grid cells (see Figure 1) in the HRU. The calculated AET proportion of the water budget is about 78% on average; runoff is 2% of gross precipitation, and recharge is 20% of precipitation on average. The relative proportions of precipitation for average simulated, cumulative AET, runoff, and recharge are approximately constant across all future Climate Normals. This suggests no change, on average, to the future water budget from projected climate trends.
Meaningful outputs from the framework are the probabilistic relative, or ∆, time histories of AET, runoff, and recharge. Figure 11 displays these AET, runoff, and recharge ∆ time histories. In Figure 11A, average and median AET ∆s remain close to zero across the future intervals denoting minimal expected change to AET. There is a slight increasing trend for both average and median ∆ values with increasing simulation time associated with the slight trend toward increased average precipitation in Figure 8.

Discussion
Risk includes probability and consequences. Consequences are relative changes to watershed water availability. The primary actionable outputs are probabilistic time histories of simulated relative changes in water budget components. These time histories provide identification of likelihood per consequence magnitude across simulation time.
A framework implementation is presented for the Dolan Creek Watershed in west- Average and median runoff ∆ time history values, in Figure 11B, remain close to zero across 2011-2100, denoting no expected change or impact to runoff from climate change for the site. Additionally, the expected variability (denoted by the spread of the 5th to 95th percentile range) of future runoff is muted and much less than the variability expected for future AET. Figure 11C shows a slight increasing trend and positive bias in average and median ∆ time history values denoting a prediction for a slight increase in recharge. This figure also presents a positive bias in the skew of the recharge ∆, 5th to 95th percentile range. The slight trend toward increased average precipitation in Figure 8 is responsible for the slight increasing trends and positive bias in recharge ∆ time histories. Precipitation trends are attributed to the simulation of relatively larger extreme events in the future, and precipitation trend and bias appear to be split between AET and recharge ∆ time histories in Figure 11. The positive bias in the recharge and runoff 5th to 95th percentile ranges suggest that infrequent increases in water availability are expected from extreme events.

Discussion
Risk includes probability and consequences. Consequences are relative changes to watershed water availability. The primary actionable outputs are probabilistic time histories of simulated relative changes in water budget components. These time histories provide identification of likelihood per consequence magnitude across simulation time.
A framework implementation is presented for the Dolan Creek Watershed in westcentral Texas. This site is a small, semi-arid watershed in karst terrain. The first implementation step is to identify projected future climate for the site. The "LOCA Archive" of ensemble, downscaled, GCM simulation results provided estimates of future climatic conditions for this study. Only two emissions scenarios, RCP4.5 and RCP8.5, are included in the GCM simulations in the "LOCA Archive." Inclusion of multiple emissions scenarios provides for characterization of scenario uncertainty in framework formulation. If more emissions scenarios were included in the ensemble GCM results, then the range of possible future conditions and the relative likelihood of future conditions would be better constrained. Unfortunately, the limited number of emissions scenarios suggests that the possible range of future conditions may not be completely captured in "LOCA Archive" results.
After identification of future climate trends, comparative weather generator models are formulated to reproduce the average differences expected between historical conditions and future climate. Because of structural uncertainty issues in downscaled GCM results and capability limitations inherent in the stochastic weather generator formulation, a trial-and-error calibration approach provides customization of the H1 pathway weather generator model to represent future climate.
Identical water balance models isolate relative impacts to water budget components from hypothesized variations in long-term, average weather. Two separate framework implementations were created; each implementation uses a different type of water balance model. The two water balance model types are: (1) monthly water balance and (2) daily, continuous simulation, distributed parameter.
Both model types suggest limited change on average to AET, runoff, and recharge from projected climate trends, but each model type predicts a different partitioning of precipitation into AET, runoff, and recharge. These differences occur because each type represents the physical processes of water movement in different ways and uses a different calculation time interval. The monthly water balance model simulates a homogeneous watershed with a monthly time step. The continuous simulation, distributed parameter (i.e., the HSPF) model provides a heterogenous watershed representation and uses a daily calculation time step. Figure 12 displays the average water budget break-out by component under hypothesized climate trends from 2071-2100 for both model types. The monthly water balance model produces relatively larger estimates of AET because it uses a monthly calculation time step. If P − PET is determined monthly, it is expected that precipitation (P) will generally be less than PET (see Figure S3 for an example calculation). When P − PET < 0, AET is approximately equal to precipitation, which means that water availability is zero and there is no water for runoff and recharge. A monthly calculation will tend to attribute a greater proportion of precipitation to AET than a daily calculation. ent calculation time interval. The monthly water balance model simulates a homogeneous watershed with a monthly time step. The continuous simulation, distributed parameter (i.e., the HSPF) model provides a heterogenous watershed representation and uses a daily calculation time step. Figure 12 displays the average water budget break-out by component under hypothesized climate trends from 2071-2100 for both model types. The monthly water balance model produces relatively larger estimates of AET because it uses a monthly calculation time step. If P − PET is determined monthly, it is expected that precipitation (P) will generally be less than PET (see Figure S3 for an example calculation). When P − PET < 0, AET is approximately equal to precipitation, which means that water availability is zero and there is no water for runoff and recharge. A monthly calculation will tend to attribute a greater proportion of precipitation to AET than a daily calculation.  The continuous simulation, distributed parameter model produces a recharge estimate that is three times the value estimated by the monthly water balance model and a runoff estimate that is about one-third the monthly water balance model estimate. The relative increase in recharge, and corresponding decline in runoff, occurs because recharge and runoff are defined and simulated differently in the two model types. The continuous simulation, distributed parameter model calculates recharge as the sum of deep percolation in pervious areas and seepage from losing stream reaches. This approach attributes runoff from headwaters HRUs that travels to stream reaches and then seeps through the stream bed to groundwater as recharge. The monthly water balance model is homogeneous and cannot represent complex flow routing within the watershed. Given the representational differences between model types, analysis of changes to water availability provides a more robust comparison between model types than analysis of recharge and runoff changes.

Conclusions
A systemic change impact analysis framework for water resources at the watershedscale is presented and implemented to examine the water availability impacts from future climate change for a site in west-central Texas. The site is a small watershed, classified as hot semi-arid (BSh) for current and future conditions.
The weather parameter comparison inherent in framework formulation identified future climate trends for the study site of a 3+ • C increase in average temperature and no significant trend in annual precipitation depth from 2010-2100. It also provided for identification of structural uncertainty issues of synthetic drizzle and underprediction of extreme event intensity in downscaled, GCM results.
Trail-and-error weather generator model calibration removed the influence of these structural uncertainty concerns from the framework representation. Comparative weather generator results depict certain increase in potential evapotranspiration from 2011-2100 driven by the 3 • C increase in average temperature and minimal change in the distribution of annual average precipitation from 2011-2100. Impacts to water budget components from future climate trends are examined with two different types of water budget model. Both types provide projections of minimal change on average to AET, runoff, and aquifer recharge and of infrequent increases in water availability driven by an expectation for increased extreme event intensity in the future. Each type of model estimates different magnitudes for runoff and recharge because these processes are represented differently and at different scales.
Framework results provide consequences and likelihood for consequences. Consequences are changes in water availability. Infrequent, low probability increases in water availability are predicted for the study site. For average future conditions, the projected change is an increase in PET driven by an increase in future temperature. The complex, rugged, and harsh study environment limits estimated impacts to water resources for normal conditions. For this type of semi-arid environment, PET on an annual basis is expected, and is simulated, to exceed precipitation. The excess of evaporative capacity limits AET magnitude because AET is limited by the supply of water, which is not changing on average. Future work will involve application of the framework to sites with different characteristics and to examine different systemic changes. In terms of site characteristics, a site with a projected change in climate classification from current conditions to future conditions and a porous media environment with relatively thick soil cover and only gaining stream reaches would provide for testing of the framework on a different environment. This type of site would be more amenable to representation with the conceptual watershed water balance processes shown in Figure 5 and for making equivalent comparisons between results from the two water balance model types.
The framework can be adapted to analyze systemic change produced by evolution in the physical characteristics of the watershed. When the focus of implementation changes to watershed characteristics like economic development or vegetation assemblages, identical weather forcing is used in both pathways and different water balance models are used in each pathway. In this type of scenario, the H0 pathway water balance model would represent existing watershed conditions and the H1 pathway water balance model would represent hypothesized, alternative conditions. The H1 pathway water balance model can also be represented as evolving across simulation time.

Appendix A.2. Precipitation Intensity
Precipitation intensity is represented with mixed exponential distributions. A mixed exponential distribution is the probability mixture of two one-parameter exponential distributions. Equation (A2) provides the probability density function (PDF) for the mixed exponential distribution.
α = mixing weight µ 1 = mean of distribution 1 x = value for which to find probability density µ 2 = mean of distribution 2 Seasonal and spatial variation are accounted for in the representation of precipitation intensity. Seasonality was addressed by creating a different mixed exponential distribution for precipitation depth for each month. Monthly, mixed exponential distributions were fit to collections of observed daily precipitation depths and to daily precipitation depths from the "LOCA Archive." PRISM precipitation depth observations and simulated precipitation depths from the "LOCA Archive" are calculated and distributed for a regular network of cells. Time series of precipitation depths are provided for each cell in the network and the cells provide representation of spatial variation across the site watershed for daily precipitation depth. Mixed exponential distributions for precipitation depth were derived for each grid cell and each month. This requires fitting of 264 distributions for the "LOCA Archive" data set (22 grid cells in the site watershed and 12 months).

Appendix A.2.1. Spatial Variation in Precipitation Intensity
Use of distributions for each grid cell in Figure 1 provides for a spatially varying representation of precipitation intensity. Attempts were made to reduce the number of distributions required for the precipitation intensity process represention by grouping grid cells into larger contiguous regions based on similar descriptive statistics.
As part of examining grid cell-aggregation possibilities, four regions were identified from the gridded PRISM data for each month using the K-Means clustering algorithm. The K-Means algorithm clusters data in separate samples of equal variance to minimize the inertia criterion, and inertia is a Euclidian distance-based criteria which is a sum of squared errors where the errors are distances between member locations and the cluster centroid location [70]. Member locations (or point coordinates) for the cluster were derived from the grid cell mean, median, variance, skewness, and kurtosis of daily precipitation depth. These five precipitation depth statistics were transformed into three values using principal component analysis (PCA). Use of three transformed coordinates to identify each grid cell for the clustering analysis allows for visualization of the identified clusters (e.g., point clusters where points have coordinates in more than three dimensions are difficult to visualize). PCA provides for dimensionality reduction; in this case, reducing the five statistics to three transformed variables. The dimensionality reduction in PCA is accomplished through decomposition of the base data set into successive orthogonal components that explain a maximum amount of the variance of the data set [70].
Spatial variation in precipitation amount is simplified from 44 grid cells, fully or partially within the watershed, to four regions for the PRISM 1981-2010 data set because of the K-Means cluster analysis. Figure A1 provides an example of the regionalization for the month of February. After cluster numbers were assigned using the K-Means algorithms, the clusters were manually cleaned to make contiguous regions from the clusters and to set the region identifier to be 1 for the northeast region, 2 for the southeast region, 3 for the southwest region, and 4 for the northwest region. Manual cleaning generally involved relabeling 14 of 210, or 6.7%, of PRISM cells. The maximum number of grid cells relabeled for region continuity was 26, or 12.4%, in August. data set [70].
Spatial variation in precipitation amount is simplified from 44 grid cells, fully or partially within the watershed, to four regions for the PRISM 1981-2010 data set because of the K-Means cluster analysis. Figure A1 provides an example of the regionalization for the month of February. After cluster numbers were assigned using the K-Means algorithms, the clusters were manually cleaned to make contiguous regions from the clusters and to set the region identifier to be 1 for the northeast region, 2 for the southeast region, 3 for the southwest region, and 4 for the northwest region. Manual cleaning generally involved relabeling 14 of 210, or 6.7%, of PRISM cells. The maximum number of grid cells relabeled for region continuity was 26, or 12.4%, in August. Figure A1. PRISM grid cell regionalization example for February 1981-2010. (A) K-Means clustering with four clusters was used to identify watershed regions. Point coordinates for clustering were three PCA-transformed variables from grid cell mean, median, variance, skewness, and kurtosis of daily precipitation depth. (B) A manual cleaning process was used to produce contiguous regions. Region labels were assigned so that 1 is northeast, 2 is southeast, 3 is southwest, and 4 is northwest. Grid cells were renumbered from 0-209 for the manual cleaning. Point coordinates for clustering were three PCA-transformed variables from grid cell mean, median, variance, skewness, and kurtosis of daily precipitation depth. (B) A manual cleaning process was used to produce contiguous regions. Region labels were assigned so that 1 is northeast, 2 is southeast, 3 is southwest, and 4 is northwest. Grid cells were renumbered from 0-209 for the manual cleaning.
An attempt was also made to aggregate the "LOCA Archive" grid cells into four regions, analogous to the regions identified for the PRISM grid cells. Unfortunately, the aggregation of "LOCA Archive" grid cells into larger regions was not sufficiently successful for use in weather generator implementation. Aggregation or grouping of "LOCA Archive" grid cells was relatively unsuccessful because the "LOCA Archive" provides minimal spatial variation, at the scale of the site watershed, relative to the PRISM data set. As derivation of the PRISM data set involves spatial interpolation and regression, it makes sense that systematic regionalization could be obtained for the study area using the PRISM gridded data. Downscaling, used to create the "LOCA Archive" results, employs different methods than PRISM and so it is not surprising that the same regional groupings do not apply to both. As a result of these regional differences, each LOCA grid cell is treated as a region for the three projection intervals, or future Climate Normals: (1) 2011-2040, (2) 2041-2070, and (3) 2071-2100. Table S5 provides definitions of the mixed exponential distributions fit to each LOCA grid cell for each month and Climate Normals interval. Tables S6-S9 provide definition of mixed exponential distributions fit to the four PRISM regions for 1981-2010.

Appendix A.2.2. Maximum Precipitation Depth Truncation
Mixed exponential distributions were selected a priori to represent the daily precipitation intensity process because mixed exponential distributions provide a better representation of low frequency extreme precipitation events. The maximum likelihood estimation, distribution fitting algorithm used to fit a theoretical, mixed exponential distribution to samples of daily precipitation depth (see Tables S5-S9) focuses on producing a theoretical distribution whose mean reproduces the sample mean. In weather generator implementation, mixed exponential distributions need to be truncated with an estimate of a maximum value to avoid infrequent sampling of extremely large, wet day precipitation depths (e.g., daily values exceeding 2000 mm).
Changes to the truncation depth used for mixed exponential distributions provides a direct means to control the simulated maximum extreme event depth. Truncation implementation is somewhat complicated using mixed exponential distributions for each month to represent seasonality. If all months are truncated at the same maximum value, then it will be more likely to obtain this maximum value multiple times during a simulation year. To address this concern, multipliers are used to set the monthly truncation values. These multipliers are identified and defined in Table S10 where the multiplier is the selected extreme event depth to be used as the truncation threshold divided by the observed maximum daily precipitation depth in the PRISM 1981-2010 data set of 200.7 mm. Table S13 provides the implementation of multipliers with the observed monthly maximum depths for the PRISM data set and the "LOCA Archive" of ensemble simulation results.

Appendix A.3. Minimum and Maximum Air Temperature
Air temperature is conditioned on precipitation state and calculated during stochastic simulation using the methods employed in WGEN [39,40,55]. This representation for non-precipitation variables is a first-order vector autoregression.
The conditional formulation is presented in Equations (A3)-(A8). The day of the year mean, µ, and standard deviation, σ, in Equation (A3) for each state were determined using Fourier series smoothing, or low pass filtering, of the day of the year mean and standard deviation series derived from data and from the "LOCA Archive." In the low pass filter implementation, the lowest five frequencies were retained which correspond to periods of 366, 183, 122, 91.5, and 73.2 days. The resultant, smoothed series will contain some seasonal variability at the 3-month frequency and lower but higher frequency variations are filtered or attenuated.
The smoothed mean series was subtracted from the original data series, and the smoothed standard deviation series was used to normalize the result to produce Z-scaled residual elements. Residual elements are used to calculate the M0, Equation (A7), and M1, Equation (A8), matrices which are used to calculate the A, Equation (A5), and B, Equation (A6), matrices [39,40].
T = non-precipitation weather variables, vector of k values k = number of non-precipitation weather variables 0 = dry day state 1 = wet day state t = index for day of the year µ = mean temperature for day of the year t σ = standard deviation for day of the year t z = white noise or variance term, Equation (A4) ρ 0 = lag 0 cross-correlation coefficient between variables j, k ρ 1 = lag 1 cross-correlation coefficient between variables j, k Appendix A. 4

. Site Watershed Precipitation Depth for Water Budget Calculations
A single value of precipitation depth is calculated for the entire watershed for water budget calculations. Site watershed precipitation depth is determined using area weighting of the grid cells that are completely or partially within the watershed boundaries. Table S14 provides the areas for each PRISM grid cell within or partially within the watershed boundary on Figure 1. Table S15 provides the equivalent table for the LOCA grid cells.

Appendix A.5. Weather Generator Calibration
A trial-and-error, manual calibration approach is used in conjunction with three scenarios to identify the appropriate mitigating measures and representation to reproduce the identified climate variations in comparative weather generator output. The goal of the calibration is to obtain an H1 pathway weather generator formulation that will provide a comparison of simulated weather parameters that yields: (1) A relatively constant increase in daily average temperature of at least 3 • C across 2011-2100; (2) similar, or relatively constant, annual average precipitation depths for each projection interval; and (3) a maximum daily precipitation depth simulated across 2011-2100 that is similar to or greater than the 100-year recurrence, daily event depth in Table A1. The three scenarios are described in Table S16.
The variations in the scenarios only apply to the alternative hypothesis, H1, pathway for simulated values during 2011-2100. During 1981-2010, both pathways use precipitation intensity and occurrence representations based on the PRISM 1981-2010 data set. Appendix A.6. Calculation of Potential Evapotranspiration The Hargreaves equation in its 1985 form is provided in Equation (A9) and provides a calculation for reference crop evapotranspiration (ET o ). A crop coefficient (K c in Equation (A15)) can be used to estimate the potential evapotranspiration for different vegetation than the reference crop which is usually short, actively growing grass [64].

Appendix B. Water Balance Models and Site Calibration
The framework employs two water balance models, one in each pathway. For analysis of climate change impacts to water resources, the water balance models should be identical and only the input weather parameter forcing should vary between the pathways. This allows isolation of impacts to water resources from variations in average weather.
Any water budget calculation that uses precipitation and potential evapotranspiration (PET), calculated from location and air temperature, as inputs and produces actual evapotranspiration (AET), runoff, and recharge as outputs can be used within the framework. Two types of water-budget calculations are used within the framework in this study: (1) monthly water-balance models and (2) continuous simulation models. Water balance models are calibrated to site-specific conditions prior to inclusion in the framework formulation.

Appendix B.1. Additional Study Site Details
The 471 km 2 Dolan Creek Watershed in Val Verde County, TX, USA is the study site. Dolan Creek is the primary stream in the watershed and provides the surface outlet from the watershed. As shown on Figures S1 and S2, it is ephemeral with the exception of the downstream-most reach where a number of springs, including Dolan Springs, provide flow to the stream throughout the year.
The study site is in a remote region of Texas at the intersection of the Edwards Plateau and Chihuahuan Desert biological regions. This region has little economic development and there are no paved roads within the watershed. Vegetation across the study site is predominately dry land scrub or shrublands. Common shrub types include creosote, juniper, and mesquite. Stands of trees, mainly oaks and sycamore, may be found adjacent to perennial rivers and streams.
Beck et al. [26] provide global maps of Köppen-Geiger climate classifications at 1-km resolution for current conditions and for projected future conditions under climate change. The study area is mapped in the hot, semi-arid category (BSh) under both current and future conditions. This suggests limited change to vegetation communities across 2011-2100.
USGS Gauge 08449100 is located on Dolan Creek near the watershed outlet and the confluence of Dolan Creek and Devils River, see Figure S2. This stream gauging station has been in operation since November 2011. A probabilistic, day of the year discharge plot is provided for USGS Gauge 08449100 Dolan Creek on Figure A2. In this figure, the interquartile (25th percentile to 75th percentile) and lower quartile (0th percentile to 25th percentile) ranges suggest a relatively consistent discharge across the year. The consistent flow pattern is attributed to the dominance of spring discharge from Dolan Springs and YR-70-01-701 on the Dolan Creek discharge hydrograph. The study area is in karst terrain. In terms of surficial geology, Dolan Creek watershed is at the southwestern margin of the Edwards Plateau, a resistant carbonate upland of nearly flat-lying limestone and dolostone with thin soils, caprock mesas and dry arroyos [24]. Dolan Springs and four other mapped springs, in or near the site watershed, are located associated with the outcrop of the Fort Terrett limestone of the Edwards Formation. Segovia limestone conformably overlies the Fort Terrett limestone; most of the watershed has Segovia limestone of the Edwards Formation mapped at the surface except for a few ridgetops mapped as Buda limestone of the Edwards Formation. Conduits and caves are present in the Segovia limestone above the contact with the Fort Terrett limestone.
Table S17 provides a listing of the pertinent hydrologic soil properties for the watershed from SSURGO [25]. "Hydrologic Soil Group" provides a descriptor of the expected amount of infiltration for four soil types: A, B, C, and D. Hydrologic Soil Group B is not present at the study site. Hydrologic Soil Group A, 7% of the watershed, occurs in the stream valleys and dry stream beds and provides for rapid infiltration rates. The majority of the watershed, approximately 92% coverage, is characterized as Hydrologic Soil Group D, see Figures S1 and S2, which provides very slow infiltration and rapid surface runoff. Hydrologic Soil Group D denotes almost impervious regions.
The "Available Water Supply" category in Table S17 provides an estimate of the volume of water that can be stored in the soil. For this category, centimeters (cm) represent a length/depth measurement that can be converted to volume through multiplying by surface area. "Available Water Supply" represents the pore space in the soil that could be filled with water at saturation. If 40% pore space is assumed, then over 92% of the watershed is characterized by very thin, less than 10 cm (1/0.4 = 2.5; 2.5 × 4 cm = 10 cm) deep, soil. For reference, it is common to have 50-100 cm deep soil in areas that support significant vegetation. Water storage is available in the soil column primarily in the dry stream beds and valley bottoms where the "Available Water Supply" is between 10 and 23 cm. The study area is in karst terrain. In terms of surficial geology, Dolan Creek watershed is at the southwestern margin of the Edwards Plateau, a resistant carbonate upland of nearly flat-lying limestone and dolostone with thin soils, caprock mesas and dry arroyos [24]. Dolan Springs and four other mapped springs, in or near the site watershed, are located associated with the outcrop of the Fort Terrett limestone of the Edwards Formation. Segovia limestone conformably overlies the Fort Terrett limestone; most of the watershed has Segovia limestone of the Edwards Formation mapped at the surface except for a few ridgetops mapped as Buda limestone of the Edwards Formation. Conduits and caves are present in the Segovia limestone above the contact with the Fort Terrett limestone.
Table S17 provides a listing of the pertinent hydrologic soil properties for the watershed from SSURGO [25]. "Hydrologic Soil Group" provides a descriptor of the expected amount of infiltration for four soil types: A, B, C, and D. Hydrologic Soil Group B is not present at the study site. Hydrologic Soil Group A, 7% of the watershed, occurs in the stream valleys and dry stream beds and provides for rapid infiltration rates. The majority of the watershed, approximately 92% coverage, is characterized as Hydrologic Soil Group D, see Figures S1 and S2, which provides very slow infiltration and rapid surface runoff. Hydrologic Soil Group D denotes almost impervious regions.
The "Available Water Supply" category in Table S17 provides an estimate of the volume of water that can be stored in the soil. For this category, centimeters (cm) represent a length/depth measurement that can be converted to volume through multiplying by surface area. "Available Water Supply" represents the pore space in the soil that could be filled with water at saturation. If 40% pore space is assumed, then over 92% of the watershed is characterized by very thin, less than 10 cm (1/0.4 = 2.5; 2.5 × 4 cm = 10 cm) deep, soil. For reference, it is common to have 50-100 cm deep soil in areas that support significant vegetation. Water storage is available in the soil column primarily in the dry stream beds and valley bottoms where the "Available Water Supply" is between 10 and 23 cm. "Depth to Restrictive Layer" provides the final soil property listed in Table S17. At this site, the "Depth to Restrictive Layer" provides an estimate for the depth to bedrock. The difference between the "Depth to Restrictive Layer" estimate and the estimate soil depth, which can be calculated from the "Available Water Supply," can be considered the depth of regolith.
In summary, the ridgetops and valley/mesa sides (92-93% of total watershed area) have low water storage capacity, shallow bedrock, and high runoff potential while the streambeds and valley bottoms (7-8% of total watershed area) provide high infiltration rates, relatively deep bedrock, and near surface water storage capacity. Most of the watershed (92% from Table S17) is naturally impervious with shallow to no soil due to the rocky nature of the mesa-dominated landscape. However, the site watershed is in karst terrain and valley bottom-areas exhibit enhanced secondary porosity from limestone dissolution. This secondary porosity creates the relatively large storage capacity and elevated permeability which produces the rapid infiltration rates for the valley bottom and dry stream bed locations.
Karst is a topography formed by the dissolution of soluble rocks like limestone and is characterized by underground drainage systems including caves and springs combined with river systems that divert into and out of the underground drainage system in a complex, interrelated system of flow pathways and water storage areas that cannot be neatly divided into traditional surface water and groundwater categories. Given the karstic nature of the site, the idealized water cycle, water budget processes shown on Figure 5 are probably simplistic in comparison with actual subsurface conduit-and channel-based flow patterns.
The working hypothesis for water balance modeling of the site watershed is that the Fort Terrett limestone is relatively impermeable and spring discharge occurs at or associated with the contact of the Fort Terrett and overlying material. The overlying material is the Segovia limestone in which numerous conduits and caves are evident in the mesa sides above the stream valleys. It is likely that conduits and caves provide transitory sub-surface storages and discrete flow pathways to route precipitation through the site water budget. It is possible that there is not an "aquifer" directly associated with the site watershed in which case there would not technically be "recharge" but just perched water tables, interflow, and solution conduit runoff that feeds regional springs, streams, and rivers. Because the locations of the conduits and caves and actual sub-surface flow patterns are unknown, the conceptualized water budget processes in Figure 5 are enforced on the system as the only means to generate water balance estimates.

Appendix B.2. Monthly Water Balance Model
In this paper, a watershed water balance model is defined and described in accordance with Alley [15] and Dunne and Leopold [14]. Water balance models are bookkeeping type procedural models that estimate the balance between incoming water to the watershed from precipitation and snowmelt and outflowing water from the watershed related to the processes of evapotranspiration, stream flow, and groundwater recharge. A Thornthwaite and Mather-style water balance calculation is used for monthly water balance model calculations. The calculation approach follows that presented by Dunne and Leopold [14] with two exceptions. First, a loss from detention storage to aquifer recharge is tacked on to the calculation procedure as shown in Figure S3 (see calculation #13). Second, tables provide a lookup for the calculation of soil moisture (SM) from accumulated potential water loss (APWL) in the standard Thornthwaite and Mather calculation approach. Here, Equation (A16) which was derived from these tables by Westenbroek et al. [71] is used in place of lookup tables. SM = 10 (log 10 W c − ( |APWL| * S l * W c p )) (A16) SM = soil moisture in inches W c = available water capacity of soil, inches APWL = accumulated potential water loss, inches S l = slope coefficient for Thornthwaite  The purpose of the Thornthwaite and Mather-style water balance calculation is to transform precipitation and PET for a watershed to the hydrologic cycle components of actual AET, runoff, and aquifer recharge. This water balance calculation technically only applies to a uniform soil column that is assumed to represent the entire watershed. The calculated runoff (see #11 in Figure S3) is assumed to be equivalent to the monthly volume of stream flow generated from the watershed. However, the physical processes of stream flow are not simulated. Similarly, aquifer recharge is assumed to be the water that leaves the soil column water balance and does not go toward the stream flow generation as estimated with calculation #13 in Figure S3. Aquifer recharge is the water that flows across the water table to join the waters in the saturated zone of an aquifer [57]. A water table is not simulated or involved in the calculations depicted on Figure S3 and so aquifer recharge is not simulated, either, but it is attributed to the deep infiltration.

Calibration
The primary inputs to the water balance calculation are monthly precipitation and PET. Monthly PET is calculated from temperature using the approach detailed in Appendix A, Appendix A.6 to first generate PET for a reference crop (ET o ). ET o is then adjusted using two calibration coefficients to generate the PET value used in water balance calculations. The two calibration coefficients are a simple crop coefficient (K c see Equation (A15) in Appendix A) and a reduction in PET, relative to the PET calculated from temperature, for rainy days (R rd in calculation #3 of Figure S3).
In addition to the weather parameter inputs, this water balance calculation requires a maximum soil-water capacity value for the soil column (W c see Equation (A16)), a runoff coefficient (C RO in calculation #11 of Figure S3) that provides for estimation of the monthly runoff from the total available for runoff each month and a recharge coefficient (C Re in calculation #13 of Figure S3) that generates the estimate of monthly recharge from the monthly detention volume. W c was calculated as the area-average total available water supply from Table S17. C RO and C Re were estimated using a manual calibration approach.
The manual calibration process involved matching the cumulative, simulated monthly runoff to the cumulative, discharge at the Dolan Creek Gauge during 2015-2018. PRISM data were used for the temperature and precipitation inputs. The best fit of calculated discharge to observed discharge was judged using subjective visual examination and by comparison of the monthly root mean square error (RMSE). Figure A3 presents the match of simulated runoff to observed discharge on a cumulative monthly basis, and Table A2 provides the calibration results for the calibration parameters C RO , C Re , K c , and R rd . The RMSE value of 1 mm per month provided in Table A2 should be viewed in context of a minimum monthly runoff of 1.5 mm, a maximum monthly runoff of 6.8 mm, and a cumulative total runoff of 125.6 mm from the gauge record during 2015-2018. If the RMSE is normalized by the range in monthly runoff values observed in the gauge record for 2015-2018, then the normalized root mean square error (NRMSE) is 19%. Figure A3. Monthly water balance model-calibrated match of simulated runoff to observed gauge discharge. The monthly water balance calculation approach will only provide a reasonable match of cumulative gauge discharge to water balance runoff, and runoff includes spring discharge in this simulation. It is assumed that the spring shed is within the watershed footprint and that subsurface runoff and interflow from within the watershed provides spring discharge. Also, note the limited months with positive P − PET.  Table A3 provides the estimated water budget for the calibration simulation. AET is estimated to be about 85% of total water inflow to the watershed from precipitation. Watershed runoff, which is used as a surrogate for stream flow generation, is calculated as 4.5% of precipitation. This highlights that the observed annual average volume of water measured by the stream gauge is equivalent to between 4% and 5% of the average volume of precipitation that falls on the surface of the watershed during a year. Recharge (Re) 186 6.7 Figure A3. Monthly water balance model-calibrated match of simulated runoff to observed gauge discharge. The monthly water balance calculation approach will only provide a reasonable match of cumulative gauge discharge to water balance runoff, and runoff includes spring discharge in this simulation. It is assumed that the spring shed is within the watershed footprint and that subsurface runoff and interflow from within the watershed provides spring discharge. Also, note the limited months with positive P − PET.  Table A3 provides the estimated water budget for the calibration simulation. AET is estimated to be about 85% of total water inflow to the watershed from precipitation. Watershed runoff, which is used as a surrogate for stream flow generation, is calculated as 4.5% of precipitation. This highlights that the observed annual average volume of water measured by the stream gauge is equivalent to between 4% and 5% of the average volume of precipitation that falls on the surface of the watershed during a year.
It should be noted that a significant portion of the discharge in the gauge record comes from spring discharge and that it is not known how the spring discharge should be characterized in terms of subsurface runoff, interflow, and discharge from saturated groundwater. Here, it is assumed that all stream flow is generated by subsurface runoff, interflow, and surface water runoff (calculation #11 in Figure S3) from the site watershed. This is equivalent to assuming that the spring sheds are within the watershed footprint and that subsurface runoff and interflow provide all spring discharge. Recharge (calculated as #13 in Figure S3) is estimated as 6.7% of the total precipitation. One approach toward improving the fit between calculated and observed discharge would be to use monthly values in place of each of the constant coefficient values (C RO , C Re , K c , and R rd ) that were used as calibration parameters. However, the purpose of the calibrated water balance model is to provide a screening-level analysis and a transfer function for estimating relative changes in AET, runoff, and recharge from changes to precipitation and temperature. The calibrated water balance model presented in Figure A3 and Table A2 can provide a first cut estimate of these relative changes without additional improvement or augmentation.

Appendix B.3. Continuous Simulation, Distributed Parameter Model
The Hydrological Simulation Program-FORTRAN (HSPF) is the continuous simulation, water balance model used in this study. HSPF works on the concepts of hydrologic response units (HRUs), or sub-basins, and stream segments. HRUs represent relatively homogenous property, sub-watershed areas. Each HRU is divided into a pervious component, called a PERLND segment, and an impervious component, called an IMPLND segment. Stream reaches, called RCHRES components, are represented with the "reservoir" ordinary differential equation and routing is used to represent flow from upstream segments to downstream segments. The routing algorithm provides for multiple exits from each RCHRES structure; each exit can be used to direct water to a different destination. The default behavior is to utilize one exit that sends water downstream to the next reach.
Impervious land, IMPLND, regions in an HSPF model only represent the processes of AET and surface runoff. Because the land is impervious, there is no infiltration. Without infiltration there can be no percolation and no recharge. Soil column processes are not simulated for IMPLND areas.
In contrast, pervious land, PERLND, regions have a soil column with upper and lower zones. Surface runoff is the rainfall that is left over after infiltration, interception, ET during the storm, and surface depression storage. Infiltration occurs from the land surface into the upper zone and percolation is simulated from the upper to the lower zone. AET is calculated from both soil zones. Interflow is simulated as a loss from the lower soil zone which provides for delayed runoff to streams. mHSP2 is the HSPF-variant used for continuous simulation water balance modeling. It is one component of the pyHS2MF6 integrated hydrologic model [72]. mHSP2 was used for this study because the author had full access to the source code and is intimately familiar with the source code. Access and knowledge of the source code facilitates the minor modifications required to have mHSP2 function as an integrated part of the simulation framework.

Calibration
The site watershed was divided into 12 HRUs to facilitate routing of water from the headwaters to the watershed outlet. Figure 10 displays the HRU delineation; each HRU contains a pervious land, PERLND, and an impervious land, IMPLND, component. The percentage of impervious area was estimated using the NLCD2016 land cover map-ping [73,74]. Dolan Creek is split into five reaches, each represented with a RCHRES structure, as shown on Figure 10.
The streams are ephemeral, and the stream valleys are designated as regions of high infiltration. The one area where streams are perennial and where the stream valleys are discharge points is the outcrop of the Fort Terrett limestone shown on Figures S1 and S2. The contact of the Fort Terrett with the overlying rocks is associated with spring discharge, as evidenced by the location of the five springs in Figure S2. Mesa tops and sides, which comprise the study watershed area that is not stream valley, are runoff generating areas.
The working hypothesis for continuous simulation water balance modeling is that the stream valleys, where the Fort Terrett is not present at the surface, are regions of high infiltration because of enhanced secondary porosity. Consequently, Reaches #1-#4 in Figure 10 are losing reaches. For these losing reaches, RCHRES outflow exit number two is used to remove water from the stream reach and send it to saturated groundwater as focused recharge. In contrast, RCHRES outflow exit number one routes stream water to the next stream segment downstream for all stream reaches in the model.
The Fort Terrett outcrop is relatively impermeable; subsurface runoff and groundwater piles up on top of it and discharges at mapped springs and generally along the contact of the Fort Terrett and overlying rocks. Reach #5 in Figure 10 is the only RCHRES that has Fort Terrett mapped in the valley bottom. Consequently, Reach #5 does not have an outflow exit number two. Spring discharge provides baseflow for Dolan Creek in this short perennial reach, i.e., Reach #5, just upstream of the Dolan Creek Gauge. This spring discharge and baseflow enters the model from an external inflow time series. The 25th percentile day of year discharge values for USGS Gauge 08449100, see the boundary between the interquartile and lower quartile ranges in Figure A2, provide an estimate of baseflow for Dolan Creek. The 25th percentile day of the year discharge values are Fourier smoothed using the approach discussed in Section 2.2.6 to provide a generic depiction of expected baseflow that contains seasonal variations for use in simulation of future conditions. All baseflow is provided by spring discharge from Dolan Springs and YR-70-01-701 in Figure S2.
AET is calculated for each HRU and stream segment in the model. For a stream segment, AET is the evaporation from the water surface. In the pervious portion of an HRU, AET is calculated from the upper soil zone, the lower soil zone, interflow, and depression storage. AET is only calculated from depression storage for the impervious portions of the HRU.
Runoff is simulated for each HRU. The simulated runoff is discharged to the nearest stream segment and then routed downstream along the reaches. During the routing along the stream reaches, water may be lost to evaporation (i.e., AET) and to recharge. Consequently, the runoff from the study watershed is the outflow discharge from Reach #5. This provides the equivalent representation to runoff in the monthly water balance model. Both distributed and focused recharge mechanisms are explicitly represented in the HSPF model. Distributed recharge is attributed to inactive groundwater inflow (IGWI) which is the water that infiltrates into the pervious portion of an HRU and then percolates across both soil zones and out of the bottom of the soil column. Focused recharge is simulated using a specified RCHRES outflow exit (exit number two) for Reaches #1-#4. These are hypothesized to be the losing reaches in the model and so only these four stream segments can produce focused recharge. Total recharge simulated for the site watershed is the sum of the IGWI values for each HRU (i.e., distributed recharge) and the sum of losses to groundwater from outflow exit number 2 for Reaches #1-#4 (i.e., focused recharge). Given the thin soil column and high runoff rates for most of the watershed in Table S17, focused recharge provides the majority of recharge.
Watershed parameters were estimated, or calibrated, using automated calibration techniques with the USGS Dolan Creek gauge record as the calibration targets. Figure A4 provides a comparison of the calibrated, standalone HSPF model output to the gauge record. Simulated discharge provides a reasonably good match to flows observed at the gauge. provides a comparison of the calibrated, standalone HSPF model output to the gauge record. Simulated discharge provides a reasonably good match to flows observed at the gauge. Figure A4. mHSP2 model calibration results. Daily measured discharge at USGS Gage 08449100 provides the calibration targets. Spring discharge to Reach #5 enters the HSPF model as an external inflow time series. Overall, the match between simulated and observed is adequate. In time intervals where the peaks and/or valleys are not captured correctly, the working assumption is that the 25th percentile day of year discharge values do not appropriately represent spring discharge.
As part of the calibration process, state variables defining initial conditions for the mHSP2 PWATER module were modified to provide reasonable initial discharge values. Discharge values from all reaches were calculated using volume-based tables. Parameters for the impervious and pervious HRU segments were manually calibrated within reasonable parameter value ranges to accurately capture runoff from storm events and provide a good match to peak flows. The primary calibration criterion was visual match to the gauge record. Automated calibration techniques used minimization of the total absolute error between simulated discharge and gauge discharge as the goodness-of-fit measure. Automated calibration occurred first, and then manual calibration was used in a trial-anderror process to provide the subjective, best visual match to the gauge record. Figure A4. mHSP2 model calibration results. Daily measured discharge at USGS Gage 08449100 provides the calibration targets. Spring discharge to Reach #5 enters the HSPF model as an external inflow time series. Overall, the match between simulated and observed is adequate. In time intervals where the peaks and/or valleys are not captured correctly, the working assumption is that the 25th percentile day of year discharge values do not appropriately represent spring discharge.
As part of the calibration process, state variables defining initial conditions for the mHSP2 PWATER module were modified to provide reasonable initial discharge values. Discharge values from all reaches were calculated using volume-based tables. Parameters for the impervious and pervious HRU segments were manually calibrated within reasonable parameter value ranges to accurately capture runoff from storm events and provide a good match to peak flows. The primary calibration criterion was visual match to the gauge record. Automated calibration techniques used minimization of the total absolute error between simulated discharge and gauge discharge as the goodness-of-fit measure. Automated calibration occurred first, and then manual calibration was used in a trial-and-error process to provide the subjective, best visual match to the gauge record.