Evaluating Potential for Groundwater Contamination from Surface Spills Associated with Unconventional Oil and Gas Production: Methodology and Application to the South Platte Alluvial Aquifer

: Surface spills occur frequently during unconventional oil and gas production operations and have the potential to impact groundwater quality. A screening-level analysis using contaminant fate and transport simulations was performed to: (1) evaluate whether hypothetical (yet realistic) spills of aqueous produced ﬂuids pose risks to groundwater quality in the South Platte Aquifer, (2) identify the key hydrologic and transport factors that determine these risks, and (3) develop a screening-level methodology that could be applied for other sites and pollutants. This assessment considered a range of representative hydrologic conditions and transport behavior for benzene, a regulated pollutant in production ﬂuids. Realistic spill volumes and areas were determined using publicly available data collected by Colorado’s regulatory agency. Risk of groundwater pollution was based on predicted benzene concentrations at the groundwater table. Results suggest that the risk of groundwater contamination from benzene in a produced water spill was relatively low in the South Platte Aquifer. Spill size was the dominant factor inﬂuencing whether a contaminant reached the water table. Only statistically larger spills (volume per surface area ≥ 12.0 cm) posed a clear risk. Storm events following a spill were generally required to transport typical (median)-sized spills (0.38 cm volume per surface area) to the water table; typical spills only posed risk if a 500 or 100 year storm (followed by little degradation or sorption) occurred right after the spill. This methodology could be applied to evaluate spills occurring over other aquifers.


Introduction
The U.S., Colorado in particular, has seen a dramatic increase in unconventional oil and gas activity over the past decade or so due to the use of hydraulic fracturing and directional drilling. Scientific and public debate has arisen over the potential risks to groundwater resources associated with this development. Although there are several ways for unconventional oil and gas activity to potentially impact groundwater quality, this study focuses on surface spills of produced water because they occur frequently, can potentially release harmful substances into the environment, and few studies have evaluated the potential risks that these spills may pose to groundwater quality. This work presents a screening-level analysis using contaminant fate and transport simulations to evaluate whether hypothetical (yet realistic) spills of aqueous produced fluids pose risks to groundwater quality in the South Platte Aquifer in Colorado, and to identify the key hydrologic and transport factors that determine these risks.
The The number of active oil and gas wells in Colorado increased from approximately 22,700 wells in January 2002 to 53,651 wells in July 2016 [1]. Between 2004 and 2014, Colorado crude oil production more than quadrupled while marketed natural gas production increased by 51% [2]. Much of this development is focused in Weld County, located in central-northern Colorado, which was the top oil-producing county in the state in 2013 [3]. Oil and gas activity in Weld County overlaps with the expanding suburban corridor along the Front Range of Colorado as well as agricultural interests towards the east.
Although there are several ways for unconventional oil and gas activity in Colorado to potentially impact groundwater quality, a surface spill and subsequent leakage into a shallow groundwater aquifer is the most likely groundwater contamination pathway [4,5]. Surface spills are a common occurrence during oil and gas development [6][7][8]. From 2010 to 2014, the Colorado Oil and Gas Conservation Commission (state oil and gas regulatory agency) reported 3449 surface spills related to oil and gas activity [9], despite various containment measures implemented at oil and gas sites.
Surface spills occurring in the region of high-density oil and gas activity in Weld County have the potential to impact the South Platte Alluvial Aquifer, a key shallow aquifer with over 12,000 groundwater wells [10]. Figure 1 displays a map of the South Platte Alluvial Aquifer showing depths to the water table (areas in red indicate regions where the groundwater table is less than 10 ft deep), locations of water wells in light blue, and oil and gas wells in yellow. Clearly, there are many oil and gas wells in areas with shallow groundwater. Oil and gas activity in this region may pose a risk to shallow groundwater quality. The discussion of the risks to groundwater resources associated with oil and gas activity is especially heightened in Colorado, with an arid climate, limited water resources, and large projected population increases.  [11,12] shown in color; (b) showing overlap between presence of shallow groundwater tables, oil and gas activity [13], and water wells [14]. Counties of Colorado are labeled and shown in light purple [15]. Light blue circles represent water wells drilled into the South Platte Alluvium.
There are many different fluids that can be accidentally released during oil and gas production. In a study conducted by S.S. Papadopulos & Associates (SSPA) for the Colorado Oil and Gas Conservation Commission (COGCC), SSPA found that from January 2010 to August 2013, 78% of oil-and gas-related spills in Colorado occurred during the production phase. These spills had an average volume of 104 barrels (bbls, 1 bbl = 42 gallons = ~159 L) and likely released produced water (water from the geologic formation which flows to the surface and is usually co-produced with natural gas or oil). Although the composition of produced water varies by geologic formation, produced water is known for containing many different aliphatic, aromatic, resin, and asphaltene hydrocarbons as well as hazardous volatile organic compounds, semi-volatile organic compounds, and polycyclic aromatic hydrocarbons [16][17][18][19]. Produced water is often highly saline with high total dissolved solids (TDS) [16][17][18]20,21].
Although studies have considered the potential for oil-and gas-related surface spills to impact groundwater, an in-depth study evaluating the physicochemical factors that control the degree of risk associated with spills at the scale of an oil or gas "play" (generally regional or subregional) and precipitation events has never been conducted. Furthermore, few studies have focused on spills in Colorado, a major oil and gas state in an arid region with scarce water resources.
Here, we use contaminant fate and transport modeling to simulate surface spills of produced water associated with oil and gas operations typically related to the Niobrara formation in the Colorado Front Range to evaluate the relative importance of hydrologic and transport factors that impact whether spills pose risks to groundwater quality. For these simulations, we use conditions representative of the unconfined South Platte Aquifer (but not necessarily applicable to a specific site in the aquifer basin). A secondary purpose is to present a screening-level analysis that could later be applied at specific sites deemed most vulnerable (which would require a more careful collection of site-specific data).
We assess factors and outputs that relate to risk of contaminating an aquifer. We define high risk as concentrations above EPA maximum contaminant levels (MCLs), the  [11,12] shown in color; (b) showing overlap between presence of shallow groundwater tables, oil and gas activity [13], and water wells [14]. Counties of Colorado are labeled and shown in light purple [15]. Light blue circles represent water wells drilled into the South Platte Alluvium.
There are many different fluids that can be accidentally released during oil and gas production. In a study conducted by S.S. Papadopulos & Associates (SSPA) for the Colorado Oil and Gas Conservation Commission (COGCC), SSPA found that from January 2010 to August 2013, 78% of oil-and gas-related spills in Colorado occurred during the production phase. These spills had an average volume of 104 barrels (bbls, 1 bbl = 42 gallons =~159 L) and likely released produced water (water from the geologic formation which flows to the surface and is usually co-produced with natural gas or oil). Although the composition of produced water varies by geologic formation, produced water is known for containing many different aliphatic, aromatic, resin, and asphaltene hydrocarbons as well as hazardous volatile organic compounds, semi-volatile organic compounds, and polycyclic aromatic hydrocarbons [16][17][18][19]. Produced water is often highly saline with high total dissolved solids (TDS) [16][17][18]20,21].
Although studies have considered the potential for oil-and gas-related surface spills to impact groundwater, an in-depth study evaluating the physicochemical factors that control the degree of risk associated with spills at the scale of an oil or gas "play" (generally regional or subregional) and precipitation events has never been conducted. Furthermore, few studies have focused on spills in Colorado, a major oil and gas state in an arid region with scarce water resources.
Here, we use contaminant fate and transport modeling to simulate surface spills of produced water associated with oil and gas operations typically related to the Niobrara formation in the Colorado Front Range to evaluate the relative importance of hydrologic and transport factors that impact whether spills pose risks to groundwater quality. For these simulations, we use conditions representative of the unconfined South Platte Aquifer (but not necessarily applicable to a specific site in the aquifer basin). A secondary purpose is to present a screening-level analysis that could later be applied at specific sites deemed most vulnerable (which would require a more careful collection of site-specific data).
We assess factors and outputs that relate to risk of contaminating an aquifer. We define high risk as concentrations above EPA maximum contaminant levels (MCLs), the highest level of a contaminant that is allowed in drinking water, reaching the groundwater table. This condition would not necessarily result in risk to human health at a downstream groundwater receptor (owing to mixing with flowing clean groundwater and additional attenuation processes during aquifer transport). However, the inherent assumption is that infiltrating spill water that reaches the groundwater table at concentrations greater than regulatory standards will be of concern to policy makers and/or the general public, and would be considered a threat to groundwater quality. The findings discuss what conditions control whether the spill reaches groundwater, and what processes are important to consider when modeling these spills.

Data Compilation
We first analyzed data collected by COGCC on oil-and gas-related surface spills to estimate the number of spills and range of spill sizes that occur in Colorado. Starting in approximately 1983, the COGCC required spills ≥5 barrels (210 gallons,~795 L), or any spill volume that impacted state waters, to be reported [22]. In August 2013, the minimum reporting volume for spills occurring outside secondary containment (usually earthen or steel berms) decreased to 1 barrel (42 gallons,~159 L) [23]. These spill reports are publicly available through the online database, the Colorado Oil and Gas Information System (COGIS). Because the COGCC does not offer a bulk download of all spill data available, we extracted data from various sources that had scraped COGIS data with varying degrees of success. The AirWaterGas NSF Sustainability Network Data Portal (AWGDP) [24], S.S. Papadopulos and Associates dataset [22], and the COGCC dataset [25] were the three datasets used for different parts of the spill analysis.
The AWGDP is a publicly available tool developed by researchers at the University of Colorado Boulder as part of an NSF Sustainability Research Network. Of the three datasets, the AWGDP is the most comprehensive including 7729 spill entries for Colorado from 1990 to 2015. The COGCC dataset is a publicly available bulk download of more recent spill data prepared by the COGCC and updated every month. The data used for this analysis were last updated on 4 January 2016, and consisted of 2738 Colorado spill records from 2009 to 2015. The SSPA dataset included spill reports that were downloaded and organized for a technical report the firm conducted for the COGCC [22]. The dataset was obtained from SSPA and consisted of 1416 spill records from 2010 to 2013.
Spill data from all three databases were combined and prepared for analysis. Every effort was made to use only produced water spill volumes for the analysis. This proved to be challenging because some spill reports did not have sufficient information to properly categorize the released fluid as produced water (see Appendix A for further details). We acknowledge the possibility that this data subset may contain a few spills of other fluids (e.g., frac water, treated produced water, flowback water). EPA ran into a similar issue when conducting their national study [26]. We note that oil or condensate spill volumes were not included in this analysis and simulating these types of spills may require multiphase fluid modeling. Produced water spills tend to be more frequent than condensate and oil releases [25] and between 2010 and 2018, approximately 8 million gallons (~190,476 bbl, 30,283,294 L) of produced water compared to approximately 1 million gallons (~23,810 bbl, 3,785,412 L) of oil were spilled in Colorado [9]. Thus, we chose to simulate spills of aqueous fluids to represent typical spill conditions.

Produced Water Spill History and Characterization
To determine how many produced water spills have occurred in Colorado on an annual basis, we analyzed 3638 unique spills from the AirWaterGas Data Portal (3295) and the COGCC dataset (343). Following personal communication with practitioners in the oil and gas industry, it became clear that spill size (i.e., volume and area) may be correlated to the targeted geologic formation, flowrates, and onsite equipment (the latter two somewhat dependent on the first). To capture a spill distribution that would be representative of spills that could impact the South Platte Alluvial Aquifer, we chose to only consider spills that occurred in counties that also contained portions of the South Platte Alluvial Aquifer (i.e., represent geology, flowrates, and equipment used for relevant production sites). A subset of 90 unique surface spills (85 from the COGCC dataset and 5 from the SSPA dataset) that occurred at separate distinct locations with available spill volume and spill area data (necessary parameters for accurate vadose zone modeling-described in more detail later) was used for spill characterization. We calculated net spill volumes (total spill volumerecovered spill volume) for the 90 spills. We divided the net spill volume by the area to obtain the spill volume per area, which yields a "depth of water" for each spill (required for model input). We computed the 50th to the 100th spill depth percentiles, in increments of 5. Table 1 lists the spill percentile and corresponding spill volume per area or spill depth. We note that these spill percentiles were calculated based on a limited dataset of spill observations and therefore do not represent a theoretical probability distribution function for all possible spill conditions (i.e., more larger spills are possible, even if they have not yet occurred). Rather, these percentiles provide a range of realistic spill conditions to use for modeling based on spills that have occurred from 2010 to 2015. Further details regarding the spill data analysis process can be found in Appendix A.

Modeling Methods
HYDRUS-1D [27] was parameterized to simulate produced water spills and represent the unsaturated zone overlying the South Platte Alluvial Aquifer. HYDRUS-1D is a onedimensional finite element groundwater model capable of modeling water and solute flow through variably saturated porous media [27]. HYDRUS-1D describes unsaturated soil hydraulic behavior using van Genuchten and Mualem constitutive equations [28,29] and parameterizes these equations using soil hydraulic properties data for each of the U.S. Department of Agriculture (USDA)-based soil texture classes [30,31]. HYDRUS-1D is cost free to the public, relatively easy to use (with a graphic user interface), and commonly used by practitioners who conduct vadose zone flow and transport analyses. To simulate solute transport, our model utilizes the advection-dispersion equation with equilibrium, linear, reversible partitioning between water, air, and soil phases, and firstorder transformation of contaminants. Because this work presents a screening-level model of risk associated with spills, rather than a site-specific analysis, and numerous simulations were required to identify and evaluate dominant governing processes, use of a 1D model was deemed appropriate.
The model domain consisted of a hypothetical but realistic unsaturated zone (soil surface to the groundwater table). Groundwater table depths of 2 and 10 ft were specified; these depths are representative of shallower groundwater levels in the South Platte Alluvial Aquifer in the Front Range of Colorado more likely to be associated with risk of aquifer contamination. To specify the surface spill flux, we assume that the spill infiltrates at a rate equal to the long-term infiltration capacity of the soil (i.e., at a rate equal to the saturated hydraulic conductivity, K SAT , assuming ponding of the spill at the soil surface). The initial impacts of dry-soil capillarity are neglected, which implies the soil surface saturates very quickly as the spill begins. Assuming infiltration is equal to K SAT and neglecting capillarity may underestimate the short-term water velocity through the vadose zone. On the other hand, assuming an infiltration rate of K SAT could overestimate the longer-term infiltration velocity when ponding of the spill subsides. We would require knowledge of initial soil moisture in the profile to more accurately model infiltration. For the non-site-specific purposes of this study stated above, using K SAT as the infiltration rate is appropriate.
Spill duration was determined using the assumed infiltration rate and the spill depth, spill volume per area calculated during the spill data analysis (Table 1). Fluid infiltrates through the soil surface for the duration of the spill. Unless noted otherwise (i.e., the rainfall simulations discussed later), no additional water or solute enters the system following the spill. The bottom boundary condition for water flow was set assuming a constant pressure head equal to the atmospheric pressure head (a head value of zero with the convention used) to represent the groundwater table. For solute transport, the upper boundary condition was specified using a concentration flux, while a zero concentration gradient was used for the lower boundary condition. No pollutants were present in the vadose zone before the spill.
Run times were determined by studying model output and ensuring that both the water balance error and concentration balance error did not exceed 2%. Details regarding the numerical simulations can be found in the Supplementary Information.

Aquifer Parameterization
We assumed a sandy loam soil type, which out of the 12 USDA-based soil texture classes, best represents the South Platte Alluvial Aquifer based on county-level soil survey data. Soil hydraulic parameters (residual water content, saturated water content, and soil-water retention curve parameters) representative of sandy loam soils were taken from Carsel and Parrish 1988 to parameterize the van Genuchten and Mualem equations. We assumed homogeneous soils, initially hydrostatic conditions (based on specified water table depth), and no hysteresis.
At most sites, surface soil is removed during well-pad construction (based on personal experience visiting well pads), including the A and B soil horizons, so organic carbon is expected to be relatively low. For simulations that considered contaminant sorption, we assumed a fraction of organic carbon of 10 −4 , which is the value recommended by the EPA for aquifers containing low organic carbon [32,33]. For most pollutants, higher organic carbon will cause significant transport retardation via sorption, so this assumption is conservative with respect to risk of pollutants reaching the underlying aquifer. Dispersivity was set to 0.33 ft for the 10 ft simulations and 0.066 ft for the 2 ft simulations, based on values suggested in Gelhar et al. 1992 [34].

Chemical-Specific Parameterization
For the produced water spill simulations, we chose to focus on contaminant transport of benzene. Benzene is a common constituent of produced water [35], a known carcinogen, federally regulated (with a relatively low EPA MCL of 0.005 mg/L), and relatively mobile. If high concentrations of benzene from surface spills reach the groundwater table, this finding would likely mandate remedial action or additional regulatory restrictions on oil and gas operations. For the purposes of a screening-level analysis, we felt benzene was an appropriate choice. The initial benzene concentration in the spill was calculated by taking the average benzene concentration (~0.28 mg/L) measured in produced water streams 90 days after hydraulic fracturing at 3 different well locations in southeastern Pennsylvania [36]. It is unclear whether this value represents a low-end or high-end value for benzene concentrations in produced water from unconventional oil and gas production; benzene concentrations in produced water from conventional oil and gas sites can reach up to 27 mg/L [35]. Table 2 lists the chemical-and aquifer-specific parameters used for the produced water simulations. Simulations in this study, unless otherwise noted, accounted for first-order aerobic microbial degradation and instantaneous, reversible partitioning to soil and air phases. Median values of organic carbon-soil partitioning coefficients, aerobic microbial degradation rates (we expect the shallow unsaturated zone to be generally oxic), soil bulk density, and Henry's constants were calculated considering the range of values measured in aqueous environments reported in the references cited in Table 2 and chosen for simulations. The potential impact of using median values in simulations compared to high or low values in the reported range is discussed in more detail in the forthcoming Discussion section. To consider conservative scenarios and account for variability in benzene's chemical behavior, benzene was also simulated without any degradation or sorption. Benzene may not or may slowly degrade if microbes preferentially degrade other hydrocarbons or organics present in the produced water [37] or the system goes anoxic due to low diffusive flux of oxygen in near-saturated soil conditions. Anaerobic benzene degradation tends to occur slowly (possibly at rates orders of magnitude lower than aerobic degradation rates [38,39]), is associated with long lag times, and may require particular environmental conditions to optimally occur [40]. A median aerobic benzene degradation rate (half-life of approximately 10 days) then represents a realistic scenario (tending toward best case) in terms of benzene removal while conservative transport represents a upper bound impact scenario.

Post-Spill Storm Simulations
In the storm simulation scenarios, storm events were simulated immediately following a spill (i.e., prior to significant evaporation and redistribution, which mitigates infiltration of spill fluids to the water table), which represents an upper bound transport scenario. Because the average storm in Fort Collins, CO is estimated to be 10.36 hr long [44], we considered precipitation frequency data estimated and reported for the most similar storm duration, the 12 h storm at the Greeley UNC station (ID 05-3553) in Greeley, CO [45] (for additional information see Appendix A). Combinations of precipitation rates within a particular storm are essentially infinite; thus, we assumed a constant precipitation rate during the 12 h storm which was adequate for the purposes of this study. Table 3 summarizes the precipitation rates used for the various storm frequencies.
Of course, it would be logical to consider various storms occurring in the months or seasons after a spill, although the possibilities are too numerous to effectively simulate. With this in mind, it is interesting to note that total precipitation depths associated with relatively large storms can be similar to monthly total precipitation or seasonal data (see Table 4). For context, total precipitation during an average May (the wettest month of year for this station) is equivalent to a 10 year storm and total precipitation for an average spring (April to June) is very similar to a 500 year storm. Assuming equal runoff ratios and insignificant evapotranspiration or redistribution, the amount of infiltrating water would be similar for both cases, and thus may have a similar impact on moving spills downward in the vadose zone. Although the impact of different storm intensities was the focus of this analysis, we considered the possibility that similar effects could be expected for sustained rainy periods over days, weeks, or months.

Spill Analysis
The annual number of reported produced water spills in Colorado has generally increased over time, and the number of spills increased drastically following the beginning of the shale-gas boom in 2004, as shown in Figure 2. However, this trend may also be due to spill reporting becoming more stringent over time; Colorado oil and gas regulations underwent major changes in 2008, 2012, and 2013 [3]. A total of 185 produced water spills were reported in Colorado in 2015.
For produced water spills occurring in areas overlying the South Platte Alluvial Aquifer and reporting both spill volume and spill area data, net released spill volumes ranged from 0 to 8400 gallons (200 bbl,~31,797 L), with a median of 84 gallons (2 bbl, 318 L) ( Figure 3). A majority (62) of the 90 spills have volumes below 250 gallons (~6 bbl, 946 L), including 26 spills with net 0 volume spilled. A total of 6 spills had volumes exceeding 1000 gallons (~24 bbl,~3785 L). The overall trend of many low-volume and few high-volume spills, as shown in Figure 3, is consistent with conclusions from previous studies [26,46].

Spill Analysis
The annual number of reported produced water spills in Colorado has generally increased over time, and the number of spills increased drastically following the beginning of the shale-gas boom in 2004, as shown in Figure 2. However, this trend may also be due to spill reporting becoming more stringent over time; Colorado oil and gas regulations underwent major changes in 2008, 2012, and 2013 [3]. A total of 185 produced water spills were reported in Colorado in 2015. For produced water spills occurring in areas overlying the South Platte Alluvial Aquifer and reporting both spill volume and spill area data, net released spill volumes ranged from 0 to 8400 gallons (200 bbl, ~31797 L), with a median of 84 gallons (2 bbl, ~318 L) (Figure 3). A majority (62) of the 90 spills have volumes below 250 gallons (~6 bbl, ~946 L), including 26 spills with net 0 volume spilled. A total of 6 spills had volumes exceeding 1000 gallons (~24 bbl, ~3785 L). The overall trend of many low-volume and few high-volume spills, as shown in Figure 3, is consistent with conclusions from previous studies [26,46].  Net spill depth was similarly distributed to net volume spilled with many low values and a few high values (see Supplementary Information). For the produced water spills, 80 out of 90 spills had net spill depths less than 5 cm. The median spill depth was 0.38 cm (Table 1). Spill areas in our dataset ranged from 4 to 217,800 ft 2 , with a median of 800 ft 2 . Spill depths (depth of impact) reported for 55 of 77 spills in Gross et al. 2013 ranged from 1 to 18 ft with a mean of 7 ft [47]. Differences between our analysis and Gross et al. 2013 may be due to the fact that Gross et al. 2013 only considered spills that impacted groundwater and included spills with unknown spill volumes (but known spill areas) and mixtures of released fluids (condensate and oil in addition to produced water), which may act Net spill depth was similarly distributed to net volume spilled with many low values and a few high values (see Supplementary Information). For the produced water spills, 80 out of 90 spills had net spill depths less than 5 cm. The median spill depth was 0.38 cm (Table 1). Spill areas in our dataset ranged from 4 to 217,800 ft 2 , with a median of 800 ft 2 .
Spill depths (depth of impact) reported for 55 of 77 spills in Gross et al., 2013 ranged from 1 to 18 ft with a mean of 7 ft [47]. Differences between our analysis and Gross et al., 2013 may be due to the fact that Gross et al., 2013 only considered spills that impacted groundwater and included spills with unknown spill volumes (but known spill areas) and mixtures of released fluids (condensate and oil in addition to produced water), which may act as long-term sources of benzene (i.e., a continuous source of pollution rather than a specified-time-duration spill considered in our study). We accounted for 90 spills of produced water (which had both spill volume and area data available) that occurred across a longer time period and broader geographical space and were not known a priori to impact groundwater.

Spill Simulations
In this section, we first discuss modeling results for spills considering non-conservative transport (using literature median values for various transport parameters) followed by a discussion of simulations conducted for spills considering conservative transport. Figure 4 shows the concentration breakthrough curve expected at a 10 ft groundwater table depth following a maximum spill (64.2 cm of produced water released over~14.5 hr). The maximum spill reaches the groundwater table relatively quickly, after 24 hr, with a concentration peak at 0.037 mg/L. The concentration in infiltrating water exceeds the EPA maximum contaminant level (MCL) for benzene (0.005 mg/L) at the water table by almost 1 order of magnitude. However, no benzene reaches the water table from the 95th percentile spill (12 cm of produced water released over~2.7 h). The benzene in the 95th percentile spill is completely biodegraded before it reaches the groundwater table 10 ft below. percentile spill is completely biodegraded before it reaches the groundwater Table 10 ft below. Running benzene transport as a conservative solute does not substantially change what we term here the "spill size risk threshold" or the smallest spill size that exhibits benzene concentrations exceeding the EPA MCL at a 10 ft deep groundwater table. Assuming no degradation and no sorption, for a 10 ft groundwater table, both the 90th and 95th percentile spills displayed benzene concentrations that were far below the EPA MCL (not shown here). Eventually, the vadose zone reaches hydrostatic conditions and benzene remains in the vadose zone.
With a 2 ft groundwater table along with degradation and sorption, both the maximum and 95th percentile spills exhibit benzene concentrations higher than the EPA MCL, as shown in Figure 5. The breakthrough curve for the maximum spill peaks at a concentration of ~0.28 mg/L. The 95th percentile spill reaches a peak of ~0.038 mg/L. However, the 90th percentile spill concentration curve lies entirely below the EPA MCL. Assuming conservative transport, the 90th percentile spill does exhibit benzene concentrations Running benzene transport as a conservative solute does not substantially change what we term here the "spill size risk threshold" or the smallest spill size that exhibits benzene concentrations exceeding the EPA MCL at a 10 ft deep groundwater table. Assuming no degradation and no sorption, for a 10 ft groundwater table, both the 90th and 95th percentile spills displayed benzene concentrations that were far below the EPA MCL (not shown here). Eventually, the vadose zone reaches hydrostatic conditions and benzene remains in the vadose zone.
With a 2 ft groundwater table along with degradation and sorption, both the maximum and 95th percentile spills exhibit benzene concentrations higher than the EPA MCL, as shown in Figure 5. The breakthrough curve for the maximum spill peaks at a concentration of~0.28 mg/L. The 95th percentile spill reaches a peak of~0.038 mg/L. However, the 90th percentile spill concentration curve lies entirely below the EPA MCL. Assuming conservative transport, the 90th percentile spill does exhibit benzene concentrations higher than the MCL; however, the 85th percentile spill results in benzene concentrations below the MCL (see Figure S2).
suming no degradation and no sorption, for a 10 ft groundwater table, both the 90th and 95th percentile spills displayed benzene concentrations that were far below the EPA MCL (not shown here). Eventually, the vadose zone reaches hydrostatic conditions and benzene remains in the vadose zone.
With a 2 ft groundwater table along with degradation and sorption, both the maximum and 95th percentile spills exhibit benzene concentrations higher than the EPA MCL, as shown in Figure 5. The breakthrough curve for the maximum spill peaks at a concentration of ~0.28 mg/L. The 95th percentile spill reaches a peak of ~0.038 mg/L. However, the 90th percentile spill concentration curve lies entirely below the EPA MCL. Assuming conservative transport, the 90th percentile spill does exhibit benzene concentrations higher than the MCL; however, the 85th percentile spill results in benzene concentrations below the MCL (see Figure S2). In addition to spill depth, degradation rate, and sorption, we explored how changing hydraulic conductivity and residual water content would impact the spill size risk threshold. Increasing the hydraulic conductivity by an order of magnitude increased the predicted peak concentrations relative to the base case but did not change the spill size risk threshold (e.g., for the 90th percentile spill at a 2 ft deep water table, peak concentration increased by ~2.3 × 10 −4 mg/L). Decreasing the hydraulic conductivity ten-fold resulted in a decrease in the peak benzene concentration; more benzene degrades because of the longer vadose zone residence time (e.g., for the 95th percentile spill for a 2 ft deep groundwater table, the peak concentration decreased from ~0.038 to ~0.020 mg/L). For the 90th percentile spill over a 2 ft deep groundwater table considering conservative transport, varying hydraulic conductivity by an order of magnitude only results in small changes to the final concentration.
Residual water content is the amount of water retained in the vadose zone for the driest conditions after an infiltration event (likely closer to the surface under the assumption of hydrostatic conditions). The value used in the simulations (0.065) was the mean value for a sandy loam calculated by Carsel and Parrish 1988 [30]. A smaller residual water content, that might still be reasonable for a sandy loam, could result in a smaller percentile spill reaching the water table because less of the infiltrating water is retained in the vadose zone due to capillarity. Given the variability and difficulty in estimating residual water content [48,49], we simulated spills using a smaller residual water content for a sandy loam. We chose a residual water content of 0.039, an average value calculated from measurements on 481 sandy loam samples from three databases [49]. The reported standard deviation for this value was 0.054, resulting in a range of 0-0.093 for residual water content [49]. Simulations using this lower value for residual water content (0.039) showed no change in the spill size risk threshold. With non-conservative transport there was only In addition to spill depth, degradation rate, and sorption, we explored how changing hydraulic conductivity and residual water content would impact the spill size risk threshold. Increasing the hydraulic conductivity by an order of magnitude increased the predicted peak concentrations relative to the base case but did not change the spill size risk threshold (e.g., for the 90th percentile spill at a 2 ft deep water table, peak concentration increased bỹ 2.3 × 10 −4 mg/L). Decreasing the hydraulic conductivity ten-fold resulted in a decrease in the peak benzene concentration; more benzene degrades because of the longer vadose zone residence time (e.g., for the 95th percentile spill for a 2 ft deep groundwater table, the peak concentration decreased from~0.038 to~0.020 mg/L). For the 90th percentile spill over a 2 ft deep groundwater table considering conservative transport, varying hydraulic conductivity by an order of magnitude only results in small changes to the final concentration.
Residual water content is the amount of water retained in the vadose zone for the driest conditions after an infiltration event (likely closer to the surface under the assumption of hydrostatic conditions). The value used in the simulations (0.065) was the mean value for a sandy loam calculated by Carsel and Parrish 1988 [30]. A smaller residual water content, that might still be reasonable for a sandy loam, could result in a smaller percentile spill reaching the water table because less of the infiltrating water is retained in the vadose zone due to capillarity. Given the variability and difficulty in estimating residual water content [48,49], we simulated spills using a smaller residual water content for a sandy loam. We chose a residual water content of 0.039, an average value calculated from measurements on 481 sandy loam samples from three databases [49]. The reported standard deviation for this value was 0.054, resulting in a range of 0-0.093 for residual water content [49]. Simulations using this lower value for residual water content (0.039) showed no change in the spill size risk threshold. With non-conservative transport there was only~3% difference in the peak concentration in the breakthrough curve for the 90th percentile spill over a 2 ft depth to groundwater. With conservative transport, a lower residual water content results in a 45% higher peak concentration for the 90th percentile spill over a 2 ft depth to groundwater.

Storm Simulations after a Spill
Because only very large (over 90th percentile) spills reached the groundwater table in conservative circumstances in the spill simulations, an obvious question to ask is how storm events (those occurring soon after a spill as well as typical rains for a period after the spill) would impact the spill size risk threshold. Results for the storm simulations are discussed below using, unless noted otherwise, base case chemical and aquifer properties as listed in Table 2 and considering 10 and 2 feet depths to the water table for both non-conservative (including representative sorption and degradation) and conservative transport.

Storm Simulations after a Spill for a 10 Feet Water Table
For a 10 ft groundwater table depth, even following a 1000 year storm, all percentile spills smaller than the maximum spill did not display benzene concentrations greater than the EPA standard at the groundwater table when benzene sorption and degradation were considered (not shown here). However, certain spills considering benzene as a conservative solute did reach the 10 ft deep groundwater table (see Figure S3). With conservative transport, larger spill sizes (90th and 95th percentile) coupled with more intense storms (100 or 500 year storms) led to higher maximum concentrations and in certain cases, exceedances of the MCL at the 10 ft groundwater table. Less intense storms did not provide sufficient velocities to flush the soil profile and transport benzene to the groundwater table at high concentrations. Higher velocities with the larger storms that transport more solute out the bottom boundary would be associated with greater dispersion, resulting in less contaminant mass in the vadose zone. After smaller storms, the soil profile slowly drains and benzene remains stored throughout the soil profile.

Storm Simulations after a Spill for a 2 Feet Water Table
For 2 ft depth to groundwater, benzene concentrations at the water table for the 90th percentile spill without a subsequent rain event did not exceed the EPA MCL ( Figure 5). Storms after a spill result in benzene concentrations greater than the MCL reaching the 2 ft water table for spill sizes smaller than the 90th percentile. As seen in Figure 6, the 500 year storm is sufficient to cause spill sizes from median to 95th percentile, to exceed the EPA MCL at the groundwater table considering representative sorption and degradation. The peak benzene concentration decreases with spill size and ranged from~0.17 to 0.0053 mg/L for the 95th percentile spill and median spill, respectively. Table 5 summarizes the storm simulations for a 2 ft groundwater table depth reporting the predicted peak concentration for each scenario. Higher peak benzene concentrations are predicted for the larger spill sizes-higher intensity storm combinations. Compared to the 500 year storm, the 100 year and 10 year storms did not contain adequate precipitation to cause the median spill to pose risk to groundwater quality. For a 100 year storm and non-conservative benzene transport, all spills greater than or equal to the 70th percentile result in benzene concentrations at the water table exceeding the EPA MCL. For more typical storm frequencies, only the larger spills pose a risk of groundwater contamination; for the 10 year and 1 year storms, the 95th and 90th percentile spills are the only spills predicted to pose risk. percentile spill without a subsequent rain event did not exceed the EPA MCL ( Figure 5). Storms after a spill result in benzene concentrations greater than the MCL reaching the 2 ft water table for spill sizes smaller than the 90th percentile. As seen in Figure 6, the 500 year storm is sufficient to cause spill sizes from median to 95th percentile, to exceed the EPA MCL at the groundwater table considering representative sorption and degradation. The peak benzene concentration decreases with spill size and ranged from ~0.17 to 0.0053 mg/L for the 95th percentile spill and median spill, respectively.  Table 5 summarizes the storm simulations for a 2 ft groundwater table depth reporting the predicted peak concentration for each scenario. Higher peak benzene concentrations are predicted for the larger spill sizes-higher intensity storm combinations. Compared to the 500 year storm, the 100 year and 10 year storms did not contain adequate precipitation to cause the median spill to pose risk to groundwater quality. For a 100 year storm and non-conservative benzene transport, all spills greater than or equal to the 70th percentile result in benzene concentrations at the water table exceeding the EPA MCL. For more typical storm frequencies, only the larger spills pose a risk of groundwater contamination; for the 10 year and 1 year storms, the 95th and 90th percentile spills are the only spills predicted to pose risk.   We also consider scenarios in which benzene may act as a conservative solute. Conservative transport of benzene following a storm event represents an upper bound concentration transport scenario. Peak benzene concentrations from simulations run with a 2 ft groundwater table, different spill sizes, different storm frequencies, and conservative benzene transport (no degradation or sorption) can be seen in Table 6. If benzene acts as a conservative solute, 100 year storms are expected to cause all spills equal to and larger than the median spill to pose risk to groundwater quality. For a 10 year storm, spills 75th percentile and larger result in benzene concentrations above EPA standards. In the case of a 1 year storm, the 85th percentile spill poses concern. Even considering conservative transport and a 2 ft shallow groundwater table, large storms (100 year and 500 year) need to occur for typical (median-sized) spills to begin to pose risk. Storms with more typical frequencies (1 year and 10 year) do not create conditions for sufficiently high velocities to transport median spills to the groundwater table at concentrations above the MCL. Similar to the results from the individual spills, only the large spills pose risk. Table 6. Maximum concentrations a predicted for produced water spill and storm simulations considering a 2 ft groundwater table, assuming no degradation and no sorption.  We also evaluated the possibility for long-term precipitation trends to impact spill risk. For example, because the average total precipitation in May is very similar to a 10 year storm (Table 4), precipitation over the course of a month could be equivalent to the total precipitation of multiple storms, neglecting evapotranspiration and capillary redistribution between events (which would reduce the volume of spills reaching the water table). In scenarios assuming non-conservative transport (accounting for degradation and sorption), the same total precipitation distributed over a month led to lower peak concentrations and larger spill size thresholds. As shown in Figure 7, although precipitation depth for an average May and 10 year storm was 60 mm, the 90th percentile spill over a 2 ft water table followed by a 10 year storm resulted in a peak concentration of 0.019 mg/L while simulations that applied the average May precipitation over a month resulted in a peak concentration of 0.002 mg/L. Considering degradation and sorption, the spill size threshold for the 10 year storm was the 90th percentile. If the same amount of precipitation is distributed over the course of a month following a spill (representative of typical precipitation in May), the spill size threshold was the 95th percentile. A similar trend is seen when comparing a 500 year storm to an average spring (April-May-June) (see Figure S4). With conservative transport, the relative importance of precipitation rate vs. total precipitation varied. As seen in Figure 8, the spill size risk threshold for a 2 ft water table for an average-precipitation May increases to the 80th percentile spill compared to the spill size risk threshold for the 10 year storm which is the 75th percentile spill. For a 10 ft deep water table and assuming no degradation and no sorption, the 500 year storm generally produced benzene concentrations similar to those predicted for an average spring (April-May-June) (see Figure S5).

Spill Percentile
With conservative transport, the relative importance of precipitation rate vs. total precipitation varied. As seen in Figure 8, the spill size risk threshold for a 2 ft water table for an average-precipitation May increases to the 80th percentile spill compared to the spill size risk threshold for the 10 year storm which is the 75th percentile spill. For a 10 ft deep water table and assuming no degradation and no sorption, the 500 year storm generally produced benzene concentrations similar to those predicted for an average spring (April-May-June) (see Figure S5).

Evaluating Hydrologic and Transport Factors
From the simulations of individual produced water spills, we can see that only very large spills, 90th percentile and larger for the case of a 2 ft groundwater table and the maximum spill for a 10 ft groundwater table, pose risks to groundwater quality due to benzene concentrations even when assuming no degradation and no sorption. The primary reason is that, for reported typical spill volumes and areas, the spill is largely contained in the vadose zone (i.e., within the soil moisture retained by capillarity) for moderate time scales. In general, the peak benzene concentration increased and concentration at the groundwater table exceeded the EPA limit for longer periods of time as the spill percentile increased. Although degradation is responsible for benzene removal, in general the biodegradation rate was not the most important factor controlling the degree of risk for the cases considered here. Spills lower than the 90th percentile did not contain sufficient water to adequately saturate the vadose zone and create velocities to quickly transport the spill to the groundwater table. Rather, the spill water displaced clean water in the vadose zone, the displaced water drained slowly, and the soil profile approached hydrostatic conditions, holding contaminated water in the vadose zone and allowing the Figure 8. Benzene concentration breakthrough curves for a 2 ft water table and conservative transport with precipitation events immediately following the spill. This plot compares the 10 year storm to an average May, with the same total precipitation depth but different precipitation rates.

Evaluating Hydrologic and Transport Factors
From the simulations of individual produced water spills, we can see that only very large spills, 90th percentile and larger for the case of a 2 ft groundwater table and the maximum spill for a 10 ft groundwater table, pose risks to groundwater quality due to benzene concentrations even when assuming no degradation and no sorption. The primary reason is that, for reported typical spill volumes and areas, the spill is largely contained in the vadose zone (i.e., within the soil moisture retained by capillarity) for moderate time scales. In general, the peak benzene concentration increased and concentration at the groundwater table exceeded the EPA limit for longer periods of time as the spill percentile increased. Although degradation is responsible for benzene removal, in general the biodegradation rate was not the most important factor controlling the degree of risk for the cases considered here. Spills lower than the 90th percentile did not contain sufficient water to adequately saturate the vadose zone and create velocities to quickly transport the spill to the groundwater table. Rather, the spill water displaced clean water in the vadose zone, the displaced water drained slowly, and the soil profile approached hydrostatic conditions, holding contaminated water in the vadose zone and allowing the benzene to degrade; this near-static condition would have relatively high air-to-water ratios which are favorable for aerobic benzene degradation.
Benzene concentrations predicted at the groundwater table following spill-storm events would change if degradation rate, hydraulic conductivity, and organic carbon content were different than the base case conditions represented in Table 2. For the spills where degradation was important relative to conservative transport, a higher degradation rate would reduce benzene concentrations and thus reduce the water-quality risk of a certain spill size and storm frequency combination (resulting in a larger spill size risk threshold). Lower degradation rates would produce results intermediate between the conservative and non-conservative results discussed above.
Hydraulic conductivity impacts the infiltration rate of spills. Of course, hydraulic conductivity is almost certain to vary over a couple orders of magnitude in the surface sediments above the South Platte Aquifer. For the spills where benzene concentrations exceeded the EPA limit at the water table, a lower hydraulic conductivity value than assumed for our simulations could reduce infiltrative velocities, allow more time for degradation, and generally reduce the risk associated with a specific spill size. A higher hydraulic conductivity may result in higher infiltrative velocities, and thus less degradation, which would cause results to be intermediate between our current conservative and nonconservative results described above.
Sorption generally had a negligible impact in these simulations because of the small fraction of organic carbon used, and benzene's relatively low soil-organic carbon partitioning coefficient, compared to other pollutants. Sorption tends to reduce peak concentrations, but retard transport, allowing more time for degradation. Thus, stronger sorption than assumed here would likely produce results intermediate between the conservative and non-conservative results described above or concentrations further below the non-conservative case.
Residual water content is another hydraulic property that can impact transport of spills through soil that was explored in this study. If the residual water content in the soil was lower than the value assumed here, a smaller spill could reach the water table (because less water is retained in the driest parts of the vadose zone after drainage). However, it is likely that initial water contents in the shallow dry areas of the soil profile would be more similar to the so-called field capacity, which essentially is the steady water content of the soil between precipitation events for the particular climate conditions of the soil location. In general, field capacities are approximately 0.2 for the soils overlying the South Platte Alluvial Aquifer [50], which is similar to the initial hydrostatic water content used in our simulations (0.15 at the soil surface).
Multiple storms after a spill were not considered in these simulations because an infinite combination of possible storm frequencies and durations exist. However, as one can see from Table 3, an unlikely case of 1 year storms in 4 successive days would provide approximately the same total volume as a 500 year storm; the results above for a 500 year storm could be viewed as a reasonable high precipitation scenario for multiple storms soon after a spill. Similarly, two 1 year storms (36 mm total precipitation each) occurring in the same week, although unlikely, would result in a similar total precipitation as a 10 year storm (60 mm). Simulated results would be intermediate between the simulation of one 10 year storm and precipitation for an average May applied evenly over 30 days.
For the spill-storm simulations, higher storm intensities resulted in higher peak concentrations and more typical spill sizes exceeding the benzene EPA MCL at the 2 ft water table. In non-conservative transport scenarios, precipitation rate was an important factor in determining the benzene levels at the groundwater table. Higher precipitation rates typically led to higher saturated soil conditions which increases the hydraulic conductivity and in turn, may increase the transport velocity. In general, these higher transport velocities lead to lower residence times in the soil profile, less time for microbial degradation, greater mass of benzene advected to the aquifer, and higher concentrations of benzene at the groundwater table. In general, a lower precipitation rate dissipating the same total precipitation over a longer time period resulted in higher peak concentration predicted at the groundwater table relative to the base case scenario without any rain events, but did not provide sufficiently high transport velocities to substantially change the spill size threshold.
If we assume conservative transport, then the relative importance of precipitation rate varies. For a 10 ft water table, there was no difference in spill size threshold between the 500 year storm and an average spring. For a 2 ft water table, higher precipitation rates transported smaller spill sizes to the water table at concentrations that exceeded the EPA MCL for benzene. It is important to note that Colorado has an arid climate. Numerous rains over several months or years may provide more rain than a 500 year storm, but it is likely that microbial degradation between rain events during this period would remove benzene from the soil profile.
Long-term precipitation could also impact the initial conditions of the soil profile when the spill occurs. Because certain areas near high density oil and gas development are known to be net recharge areas [51][52][53], initial conditions in the soil column may not be hydrostatic. In the South Platte basin, the annual recharge rate to groundwater from precipitation over native vegetation areas is relatively low (0.43 inches/year) while irrigation and seepage from canals and ditches along the South Platte contribute substantially to aquifer recharge [17].
Benzene source concentration is another factor that would impact concentrations reaching shallow groundwater within the South Platte Alluvial Aquifer. For the various transport conditions (conservative transport, linear sorption, first-order degradation) considered in this study, predicted benzene concentrations scale relative to the source concentration similar to analytical solutions of the advection-dispersion equation [54]. For example, the high end of benzene concentrations in produced water from conventional oil and gas operations is 27 mg/L [35], approximately 100× higher than the benzene source concentration used in our simulations. Assuming all other parameter values stay constant, the peak benzene concentration predicted at the groundwater table following a produced water spill with 27 mg/L of benzene would be approximately 100× greater than the peak concentration predicted in our simulations. By reviewing Tables 5 and 6, we can see that accounting for a 27 mg/L benzene source concentration would allow for smaller spill sizes and precipitation events to transport benzene to the groundwater table at concentrations exceeding the EPA MCL. Still median-sized spills followed by 1 year precipitation events would result in benzene concentrations below the EPA MCL. Using this scaling characteristic of the concentration results, the conservative transport simulations can provide insight into what salt concentrations would be predicted at the groundwater table following a produced water spill.
Although this study did not conduct a rigorous sensitivity analysis of all possible hydrogeologic, transport, climate, and spill conditions, we did rely on scientific literature to determine parameter values that would be most relevant to surface produced water spills containing aqueous phase benzene in areas overlying the South Platter Alluvial Aquifer and for certain parameters (e.g., degradation rate, sorption) that were highly uncertain, we considered a range of values. The modeling framework used here could be used to test any specific spill, site, or climate condition desired, which is a major purpose of this work. We analyzed spill data relevant to areas of the South Platte Alluvial Aquifer but larger spill sizes are possible. "Historical spills" can be discovered after much time has passed making it difficult to estimate the volume released and area impacted. Other types of spills composed of different fluids (e.g., oil, mixture of oil and water), that slowly release fluid over time, or form a source that slowly leaches benzene (as opposed to the pulse-like spills simulated in this study) over time could result in benzene concentrations exceeding the EPA standard at the water table. Although we chose to focus on the contaminant benzene, there are many other hydrocarbon compounds, inorganic constituents, and hydraulic fracturing-related chemicals that are relevant to spills at unconventional oil and gas production sites. Future work could evaluate the potential of other constituents and slow release type spills at oil and gas production sites, refineries, retail sites, or pipelines to impact shallow groundwater.
A couple of modeling studies have simulated surface spills related to oil and gas development and explored different factors that could affect whether spills pose risk to groundwater quality. Shores et al., 2017 conducted a similar modeling study of produced water spills in Weld County considering three different spill intensities, two soil types, three depths to groundwater, and five contaminants. Although our work involved a more comprehensive analysis of spill data and considered the impact of storm-precipitation events, Shores et al., 2017 did report similar results for spills containing benzene occurring over sandy loam soil. High-intensity spills (equivalent to a spill size larger than the maximum spill size we considered) were found to exceed the EPA limit for benzene for groundwater up to 10 ft deep over sandy loam soil [55]. Medium-intensity spills (similar to a 75-80th percentile spill in our study) exceeded the EPA standard at groundwater up to 5 ft deep and low-intensity spills (equivalent to our 65th percentile spill) resulted in peak concentrations below the EPA limit at a 1 ft deep water table [55].
API 2005 conducted a generic modeling study of chloride in produced water releases and explored how a variety of factors such as soil type, depth to groundwater, initial water content in the vadose zone, subsurface heterogeneity, dispersion, climate, chloride source concentration, spill volume, spill volume per area, repeat releases, aquifer groundwater flux, background chloride concentrations, and aquifer saturated thickness would impact predicted chloride concentrations in groundwater and travel times through the vadose zone. Only two spill sizes (similar to our 80th percentile and maximum spill) were considered representing large releases based on experience of oil and gas industry personnel [56]. While our work focused on representing spills and hydrogeologic conditions relevant to the South Platte Alluvial Aquifer, the API 2005 study showed similar results in that large spills had the potential to impair groundwater quality in areas with sandy soils for groundwater depths less than 10 ft. Smaller releases were not evaluated, but the authors inferred that these smaller releases were unlikely to impair groundwater quality for groundwater depths greater than 10 ft [56]. Similar to our findings, the spill volume per area was identified as one of the most important factors in determining the peak chloride concentration in groundwater [56]. The study also found that movement of chloride immediately following a release depended on weather conditions rather than climate; effects of climate only became apparent after approximately 50 weeks [56].

Potential Risk to Receptors
Although we quantify risk to aquifer water quality as exceeding the EPA benzene limit if the spill is transported to the groundwater table, it is important to note that the actual risk to human health is based on the contaminant concentration at the drinking water receptor. Dilution and dispersion effects in an aquifer would likely be significant. Additional human risk may occur if contaminated groundwater is used for agricultural irrigation, and pollutants enter food crops, or crops intended to feed livestock that are food sources. Understanding human health risk would involve modeling groundwater flow in the aquifer to see if significant concentrations reach the groundwater well after the spill enters the aquifer, as well as risk exposure modeling after the water is produced (e.g., [57]). We believe that this type of modeling is most useful when specific receptors are at risk (i.e., if a spill occurs near a shallow drinking water or irrigation well). Unlike the hydrology, the receptor and human uptake conditions cannot be generalized for the South Platte basin; thus, this type of modeling is beyond the scope of this paper. Our results would provide useful information for such a modeling approach, development of which is recommended for future work on site-specific risk analyses.
Fletcher 2012 and Gradient 2013 present two stochastic modeling studies that have considered the potential for surface spills to impact groundwater quality at receptors [46,58]. They both used analytical solutions for conservative contaminant transport and groundwater flow, which account for simple transport processes and spill boundary conditions, but allow for sensitivity analyses to represent regional conditions (on state or country-wide level). Fletcher 2012 developed a broad risk assessment framework to analyze the risk of groundwater contamination due to spills of hydraulic fracturing fluid at shale gas extraction sites in Pennsylvania targeting the Marcellus formation. Gradient 2013 conducted a risk assessment of unintended surface releases of hydraulic fracturing fluid and flowback water and considered the potential for these fluids to affect drinking water sources and cause human health impacts.
For our analysis, dilution factors (ratio between the maximum concentration predicted and the original concentration in the released fluid) for spills that exhibited benzene exceeding the EPA standard ranged from 10 −0.007 (~0.98) to 10 −1.73 (~1.9 × 10 −2 ), which were generally larger than those reported by Fletcher 2012. The longer travel distances and 3D groundwater flow considered in Fletcher 2012 help explain the difference in dilution factors. Gradient 2013 reported a distribution of dilution factors for the unsaturated zone, ranging from 9.9 × 10 −3 to 5.3 × 10 −2 from the 50th to the 95th percentile of dilution factors, respectively. The unsaturated zone dilution factors calculated by Gradient 2013 tended to be lower than the values calculated in our analysis, possibly due to the broad range of U.S. nationwide unsaturated zone conditions Gradient 2013 considered. Gradient 2013 found that the human health risks associated with surface spills related to hydraulic fracturing activity were insignificant based on agency risk management guidelines.

Conclusions
In this study, we focused on produced water spills occurring in the Front Range of Colorado and evaluated the risk to South Platte Alluvial Aquifer groundwater quality. Data analysis of produced water spills showed that most spills released small net volumes and resulted in low spill volume per area. This result is consistent with those from previous studies [26]. From the produced water spill and storm simulations conducted in this study, we can draw the following conclusions:

•
Using representative hydrologic and contaminant transport parameters for transport of benzene in the South Platte basin, which is appropriate for a screening-level assessment of this basin and to present our methodology, the risk of benzene contamination of groundwater from a median produced water spill (based on the range of spill sizes that have previously occurred) is relatively low in the South Platte Aquifer. • Spill size is the dominant factor influencing whether a contaminant reaches the water table because spills are retained in the vadose zone after displacing the pre-spill soil moisture.

•
Only statistically larger spills pose a clear risk to groundwater quality in the absence of immediate post-spill precipitation. Spills at the maximum spill size would be required to consistently pose risk in areas with groundwater tables 10 ft below ground surface, even for conservative transport conditions. Of course, actual site conditions, such as soils with much lower residual water content, could enable smaller spills to reach ground water. The residual water content values considered here are representative of sandy loam soils typically observed in the South Platte River basin; however, areas with higher sand composition should be further evaluated by simulations using lower residual water content. • Storms events following a spill are generally required to flush spills to the water Overall, this study demonstrated that only large produced water spills consistently posed risk with respect to aqueous benzene concentrations in areas with shallow groundwater tables. Typical produced water spills are small enough that the entire volume can be contained in the vadose zone, even under most rainfall conditions, allowing contaminants to attenuate and degrade over time. Most areas of the South Platte Alluvial Aquifer have groundwater tables 30 ft below ground surface [53], and are thus likely to be low risk for groundwater contamination due to the surface spills considered in this study. Areas with high sand content soils may possibly be at risk.
Although our purpose was to understand the general threat of produced water spills to groundwater quality of the South Platte Alluvial Aquifer and the important factors to consider when evaluating this threat, future work could involve collecting site-specific data and information to simulate a particular spill and location. We acknowledge that there are other types of fluids such as oil, condensate, frac water, flowback water, and hydraulic fracturing fluid as well as other contaminants (e.g., hydrocarbons, inorganic constituents) that can be released at production sites. The results of the benzene simulations are important as benzene has associated human health risks, is a common constituent of produced water, and has a low drinking water standard. Furthermore, the conservative transport benzene simulations could be used as a proxy to estimate the concentrations reached at the water table for other constituents. However, spills involving the slow release of fluids or formation of sources that gradually leach contaminants into groundwater are also possible. Future work should evaluate these other types of spills and constituents to determine if they pose risk to groundwater quality of the South Platte Alluvial Aquifer and other areas.
Our results suggest that resources should focus on ensuring that oil and gas sites with shallow groundwater tables are properly lined and have additional measures in place to avoid leakage. Field studies have found strong evidence of groundwater contamination due to produced water spills [59][60][61][62]. Many of these sites were producing oil and gas prior to enforcement of strict environmental regulations, and contamination occurred before sites were upgraded. Some of these sites likely experienced multiple large spill events and many rain events, or spills of oil that can serve as long-term sources for contamination. These spill scenarios should be studied in future work. Pits used to store produced water and trenches dug to convey produced water around production sites were unlined [59][60][61]. In Colorado, it is no longer common practice to use earthen pits to store produced water and equipment handling produced water are usually surrounded by berms and lined. As a result, it seems that large volume pulse-like spills are not commonly released into the environment (recall that our analysis used the net spill volume not recovered during clean up) in areas related to the South Platte Alluvial Aquifer.
There are states (e.g., California) where produced water can be disposed of in unlined ponds, and thus the risk of spills infiltrating into the subsurface and impacting groundwater quality is greater. Although this work focuses on one aquifer in Colorado, the results should be useful to those who wish to conduct similar studies in other regions. This study may be of more direct value as a screening analysis for areas with loamy soils and shallow groundwater that is used for agriculture or drinking water. The modeling methodology could be easily translated to represent aquifer conditions in different areas, or other spill problems not related to oil and gas development.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-444 1/13/3/353/s1, Table S1: HYDRUS iteration criteria and model tolerances used for all simulations; Figure S1: Histogram of spill volume per area or spill depth for produced water spills occurring in counties overlying the South Platte Alluvial Aquifer; Figure S2: Benzene concentration breakthrough curves for the 90th and 85th percentile spills considering a 2 ft groundwater table and assuming no degradation and no sorption; Figure S3: Benzene concentration breakthrough curves predicted at a 10 ft groundwater table assuming no degradation and no sorption with storm events occurring immediately following a spill; Figure S4: Benzene concentration breakthrough curves for a 2 ft water table considering sorption and degradation with precipitation events occurring immediately following a spill; Figure S5. Benzene concentration breakthrough curves for a 10 ft water table considering conservative transport with precipitation events occurring immediately following a spill. Funding: This research was funded by the ConocoPhillips Center for a Sustainable WE2ST, targeted to promote the joint sustainability of unconventional energy development and water resources in arid regions. The opinions expressed are those of the authors. Findings, opinions, and conclusions in this presentation do not represent and should not be construed to represent any ConocoPhillips determination or policy.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
Publicly available datasets were analyzed in this study. This data can be found at [24,25].

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A Appendix A.1. Data Compilation and Review
The spill databases contained multiple entries for the same spill event. Spill event dates, locations, volumes, and descriptions were reviewed as much as possible to ensure that each spill was a unique event. If insufficient spill information was available to verify a spill was unique, it was removed from the dataset. Spill descriptions were also reviewed to confirm that the spill involved a release of produced water. Spill reports that explicitly mentioned produced water as the released fluid remained in the dataset. If the terms "produced water," "flowback water," or "brine" were not in the spill description, the author reviewed spill reports to try to understand where the spill occurred on the site and what equipment failed. This information was used to infer if releases involved produced water. If insufficient information was available, the spill was removed from the dataset. The same review criteria were applied to all datasets.
We note that the five SSPA spills used to determine the spill percentiles were included in the AWGDP database and accounted for in the produced water spill history analysis.

Appendix A.2. Storm and Precipitation Data
Driscoll et al., 1989 only considered two precipitation gages in Colorado, Denver and Fort Collins, for calculating storm duration [44]. We relied on storm statistics for Fort Collins because it was geographically closer to the area of interest and thus more representative of precipitation for areas overlying the South Platte Alluvial Aquifer. We chose to focus on precipitation frequency estimates calculated for Greeley, CO again, due to the station's close proximity to the area of interest.