A Comparison and Validation of Saturated Hydraulic Conductivity Models

Saturated hydraulic conductivity (Ksat) is fundamental to shallow groundwater processes. There is an ongoing need for observed and model validated Ksat values. A study was initiated in a representative catchment of the Chesapeake Bay Watershed in the Northeast USA, to collect observed Ksat and validate five Ksat pedotransfer functions. Soil physical characteristics were quantified for dry bulk density (bdry), porosity, and soil texture, while Ksat was quantified using piezometric slug tests. Average bdry and porosity ranged from 1.03 to 1.30 g/cm3 and 0.51 to 0.61, respectively. Surface soil (0–5 cm) bdry and porosity were significantly (p < 0.05) lower and higher, respectively, than deeper soils (i.e., 25–30 cm; 45–50 cm). bdry and porosity were significantly different with location (p < 0.05). Average soil composition was 92% sand. Average Ksat ranged from 0.29 to 4.76 m/day and significantly differed (p < 0.05) by location. Four models showed that spatial variability in farm-scale Ksat estimates was small (CV < 0.5) and one model performed better when Ksat was 1.5 to 2.5 m/day. The two-parameter model that relied on silt/clay fractions performed best (ME = 0.78 m/day; SSE = 20.68 m2/day2; RMSE = 1.36 m/day). Results validate the use of simple, soil-property-based models to predict Ksat, thereby increasing model applicability and transferability.


Introduction
Saturated hydraulic conductivity (K sat ) is an important hydraulic parameter [1][2][3][4], as K sat represents the ability of soils to transmit water throughout the saturated zone, which is essential for relating water transport rates to hydraulic gradients [5][6][7]. Accurate K sat estimates are needed to characterize and predict how soil-water dynamics influence local water balances [8][9][10]. Thus, K sat estimates can inform resource management decisions related to water conservation, irrigation systems, fertilizer application, drainage, solute mitigation, and plant growth [11][12][13]. Generally, K sat is measured through field and laboratory techniques (e.g., pumping, permeameter, and slug tests) [14][15][16][17] that are relatively simple to complete [18][19][20]. However, performing a sufficient number of field-based tests may be too expensive, in terms of duration and cost [21,22]. Additionally, field-based K sat estimates can be limited by incomplete aquifer geometry information, while laboratory methods can present problems with obtaining representative sample numbers. These challenges suggest the need for methods to estimate K sat that are accurate and efficient [23][24][25].
Bay Watershed (CBW) (Figure 1) [58][59][60]. The Cacapon River drainage area is approximately 65.2 km 2 ( Table 1). Land use and land cover (LULC) throughout the upper Cacapon River drainage area is 77.8% forest and 14.9% agriculture (Table 1) [61,62]. Land cover across the contributing drainage area for each study site (n = 8) is similarly dominated by forest and agriculture LULC (Table 1).  62]. Land cover across the contributing drainage area for each study site (n = 8) is similarly dominated by forest and agriculture LULC (Table 1).   1 Site specific drainage areas were delineated using a 1-3 m statewide mosaic for West Virginia [62].  62]. (c) Soil core grid design of the current study in Wardensville, West Virginia, USA. SW represents an example stilling well location and P represents an example piezometer location in relation to Moore's Run. The dashed arrow indicates the flow direction towards the Cacapon River.  1 Site specific drainage areas were delineated using a 1-3 m statewide mosaic for West Virginia [62].
The dominant soils in the study area are Basher fine sandy loam in the floodplains and Monongahela silt loam (3-8% slopes) along the stream terraces, with dry bulk density and K sat ranging from 1.34 to 1.53 (g/cm 3 ) and 0.19-1.99 (m/day), respectively [63][64][65][66]. Natural Resources Conservation Service (NRCS) methods used to derive K sat typically include double ring infiltrometers, constant head well permeameters, and empirical relationships between dry bulk density and particle size distributions [27,63]. Both soil types are moderately well drained [67]. Particle size fractions for Basher fine sandy loam range from 50 to 90% sand, 6 to 48% silt, and 5 to 12% clay, while average particle size fractions for Monongahela silt loam range from 20 to 27% sand, 51 to 54% silt, and 19 to 26% clay [63]. Regional geology consists of Silurian limestone, Devonian sandstone, Devonian shale, and Quaternary alluvium [68,69]. Soil textures include sand, sandy loam, silty clay loam, silt loam, and loam [63]. A hydroclimate monitoring station was installed at RMF in 2019 as a part of an ongoing nested-scale experimental watershed study (Figure 1) [52][53][54][55][56][70][71][72]. Average daily air temperature (August 2019 to April 2020) was 9.43 • C, while daily precipitation totals ranged from 0 mm to 41.4 mm. Longer term records  show average annual air temperature at the time of observation and average annual precipitation at 8.6 • C was 810.9 mm, respectively [73].

Soil Strutural Properties
Soil structural properties were determined using the soil core method [37,74]. The soil core method requires the collection of a known soil volume (i.e., core) by driving a cylindrical sampler into the soil to extract relatively undisturbed samples from a series of profile depths. Each core is weighed and then oven-dried at 105 • C for 24 to 48 h, or until the sample mass remains constant with additional drying time [16,75]. The oven-dried sample mass is cooled to room temperature in a desiccator and weighed [37]. At each study site (n = 8), three 98.17 cm 3 soil cores (n = 179) were collected from depth intervals of 0-5, 25-30, and 45-50 cm at nine equidistant locations across a 2 m × 2 m grid ( Figure 1). The grids were designed to be proximal to the piezometers installed along Moore's Run [53]. Depth intervals were selected to characterize the soil structural properties of the upper portion of the unsaturated zone. Soil cores could not be collected at every depth/sampling location due to the presence of large rocks.
After sampling, each core was capped at both ends, stored on ice, and transported to the laboratory within 24 h. Each core was processed and analyzed using the following equations from Hillel [37]. Soil dry bulk density, bdry, (g/cm 3 ) was quantified as where M s (g) is the mass of the soil core solids and V t (cm 3 ) is the soil core total volume. Porosity (unitless) was estimated as where V f (cm 3 ) is the volume of the soil core pore spaces [74,76]. Soil dry bulk density and porosity were quantified because they are structural properties that are known to influence shallow groundwater flow [2,8,38,77] and are easy to measure during routine soil surveys [63]. Two-way analysis of variance (ANOVA) was performed on soil characteristics and tested for significant differences between study site means and independent soil depths [75,78]. After each two-way ANOVA, a Tukey's post hoc multiple-comparison test compared the nominal and measurement variables in all possible combinations [75,78]. All statistical analyses were performed using Origin Pro 2019 (OriginLab Corporation, Northampton, Massachusetts, MA, USA) software.

Particle Size Fractions and Soil Texture
After the soil cores were dried and weighed, particle size fraction analysis was performed on the 183 samples using a combination of dry sieving and gravimetric filtration [79][80][81]. Particle size fractions were defined using the NRCS classification system [82]. First, any remaining soil aggregates were carefully broken-up by hand [79][80][81][82][83]. Individual soil core samples were poured into a nest of stacked sieves and separated into three size classes with a mechanical shaker [2,14,81]. The fraction retained on the largest grade sieve (2 mm mesh opening) consisted of coarse gravel, the fraction retained on the fine mesh sieve (53 µm mesh opening) contained sand particles, and the fraction retained in the pan contained fine particles, which consisted of silt and clay [82,84]. After 15 min of sieving, each size fraction was weighed [79,85].
The remaining fine particle fraction was rehydrated with 300 mL of deionized water and vigorously shaken until the particles and the water were well mixed [79,80]. Next, the fine particle mixture was filtered gravimetrically with a vacuum flask [81]. Washed and dried Whatman filter paper (pore size = 2 µm) was used to physically separate the silt and clay size fractions. Filters were oven-dried for one hour at 105 • C, cooled in a desiccator to room temperature, and weighed. The filter drying and weighing process was repeated to confirm that the filtered sample masses agreed within 4% [86]. The mass retained on the filters consisted of the silt fraction [87]. The clay fraction was determined by residual after subtracting the dry-sieved gravel mass, sand mass, and gravimetrically filtered silt mass from the original soil mass [79,81].
The resulting percentages of sand, silt, and clay for each soil core were used to model saturated hydraulic conductivity, as detailed below in Section 2.5. Similar statistical analyses, as detailed in Section 2.2., were conducted on the particle size fraction data. Soil texture was determined for each core using the NRCS Soil Texture Calculator [88].

Field Saturated Hydraulic Conductivity
A series of falling and rising head slug tests were conducted at the piezometer monitoring locations (n = 8; Figure 1) on 20 September 2019, 20 January 2020, and 6 March 2020, to determine K sat (m/day). Steel drive point piezometers 0.61-1.52 m long with a 3.18 cm inner diameter and a 0.77 m screened bottom segment (i.e., drive point) were driven into the unconfined alluvial aquifer at RMF. Each piezometer was equipped with a Solinst Levelogger Edge [89] pressure transducer to measure water level (M5: error ± 0.003 m; M10: error ± 0.005 m) either every 0.125 s or 0.5 s for the duration of the slug tests. The difference in sampling interval was due to the length of time required for each test and instrument storage capacity. After enough time (i.e., 5-10 min) passed for the water level to equilibrate with the added sensor, a 0.17 cm 3 copper slug was quickly lowered into and removed from each piezometer three times, with time allowed for water levels to equilibrate between each slug insertion and retrieval. Based on earlier field tests, water levels were estimated to equilibrate after a period of at least 5 min. Steady state conditions between replicate slug tests were visually confirmed by identifying asymptotes in the water level data. Average K sat were calculated from the falling and rising water levels using Hvorslev's method [15,17] where r is the radius of the well casing (cm), R is the radius of the well screen (cm), L e is the length of the well screen (cm), and t 37 (s) is the time it takes for the water level inside the piezometer to rise or fall 37% of the initial change during a slug test. It should be noted that the average K sat derived from slug tests requires the assumption that the average K sat is representative of the actual field conditions [23]. Falling and rising head slug tests were conducted to assess whether K sat varied during the piezometer emptying and filling as water levels returned to static conditions. Thus, a significant difference in falling-and rising-head-derived K sat demonstrates the presence of a directional dependence on the slug test type that may be attributed to site non-idealities [18]. Slug tests were repeated to increase the statistical power of the results [19]. Differences between the average (n = 3) falling-head-derived K sat and average rising-head-derived K sat (n = 3) were tested with a paired sample T-test [78]. Differences in site-average K sat (n = 6), independent of slug test type, were quantified with a one-way ANOVA and a Tukey's post hoc multiple-comparison test compared pairs of site-average K sat in all possible combinations [78].

Modeling Saturated Hydraulic Conducitivity
Average site-level K sat was estimated using soil particle size fraction data and five previously published pedotransfer functions [11,27,[39][40][41], with the understanding that K sat predictions reflect neither horizontal nor vertical aquifer properties due to the nature of the soil core sampling procedure [90]. Studies have shown that floodplain soil characteristics converge at depths of approximately 50 cm [75,91], which supports K sat predictions based on data from shallower (i.e., <50 cm) depths. Model results were compared to average measured K sat . Rising and falling head slug tests were combined so that each average measured K sat was based on a larger sample size (n = 6) to increase statistical power and capture unsaturated zone water flow variability. The resulting average K sat for each study site accounted for soil wetting and drying. Model performance was evaluated by comparing predicted K sat to measured K sat at the farm-level.

Puckett et al. Model
Puckett et al. [39] developed a model to predict K sat based on only clay-sized particles. The authors showed that fine sand, sand, and clay percentages were highly correlated with K sat , surface area, and volumetric water content at specific pressure heads [39,82]. The model for estimating K sat based on clay fractions is as follows where K sat (p) i is the predicted soil saturated hydraulic conductivity (cm/s) at each study site (i) and clay represents the average dimensionless clay fraction.

Jabro Model
Jabro [27] proposed a model that used bdry and grain size as predictive variables of K sat where K sat (p) i is the predicted soil saturated hydraulic conductivity (cm/h) from each site (i); silt and clay represent site-average dimensionless fractions of silt and clay, respectively; and bdry is the site-average dry bulk density (g/cm 3 ).

Campbell Model
Campbell [40] published a model to predict K sat from existing soil texture data in which K sat (p) i is the predicted soil saturated hydraulic conductivity (mm/h) from each site (i).
Silt and clay represent site average dimensionless fractions of silt and clay, respectively. The constant C is equal to 144 and was derived from previously published studies by Hall et al. [92] and Smettem and Bristow [41].

Smettem and Bristow Model
Smettem and Bristow [41] developed a model to predict K sat from soil clay content using a variety of agricultural topsoil samples [13]. The two-equation K sat model is as follows where h b is the bubbling pressure (mm), clay represents the average dimensionless fraction of clay, and K sat (p) i is the predicted soil saturated hydraulic conductivity (mm/h) at each study site (i).
The constant, C, in Equation (8) is the same constant as Equation (6).

Saxton et al. Model
Saxton et al. [11] studied the relationships between soil texture and soil moisture content at saturation (Equation (12)) and soil texture and K sat (Equation (9)). The relationships between these parameters and K sat are where K sat (p) i is the predicted soil saturated hydraulic conductivity (mm/s) at each study site (i); sand and clay represent the average dimensionless fraction of sand and clay, respectively; and θ s is the soil moisture content at saturation (m 3 /m 3 ).

Statistical Analysis
Farm-level predicted and measured K sat were compared for each K sat model using a statistical analysis outlined in Duan et al. [13]. The mean error (ME), the sum of squared error (SSE), and the root of the mean-square error (RMSE) were quantified for each model. The mean difference between the average predicted and average measured values was determined for the ME with where K sat (m) i is the measured soil-saturated hydraulic conductivity (m/day) from each of study site (i); K sat (p) i is the predicted soil-saturated hydraulic conductivity (m/day) from each of study site (i); and n is the number of sites included the farm-level metrics (n = 8). The SSE and RMSE were determined using the following equations

Soil Structural Properties
Soils cores were extracted at depths of 0-5, 25-30, and 45-50 cm within the 2 m × 2 m study grid (Figure 1c), for a total sample size ranging from 14 to 27 at each study site (n = 179, total core number). The average soil core results for bdry and porosity over the total depth (50 cm) were 1.11, 1.25, and 1.29 g/cm 3 and 0.58, 0.53, and 0.51, respectively ( Figure 2). Mean bdry ranged from 1.03 g/cm 3 at RMF7 to 1.30 g/cm 3 at RMF5 and RMF8, with an eight site mean of 1.21 g/cm 3 . Mean porosity ranged from 0.51 at RMF4, RMF5, and RMF8 to 0.61 at RMF7, with an eight site mean of 0.54. Average bdry was below the NRCS range for the region (i.e., 1.34-1.54 g/cm 3 ) but within the range expected for sandy soils [63,93]. The average porosity values were within the expected soil range of 0.3-0.7 [94]. The differences in bdry were likely due to differences in sampling locations for the NRCS study. Based on two-way ANOVA results, sites differed significantly in bdry (n = 179; p < 0.05). A comparison of the sites sampled at all depths, using Tukey's post hoc multiple comparison, showed that bdry was significantly lower at the 0-5 cm depth (n = 14 to n = 27 each site; p < 0.05) [78]. Porosity was also significantly different among study sites (n = 179; p < 0.05) and was significantly higher (n = 14 to n = 27 each site, p < 0.05) at the 0-5 cm depth. bdry and porosity were statistically similar (p > 0.05) at 25-30 and 45-50 cm, confirming that below the surface (i.e., 0-5 cm), RMF alluvial soils are homogeneous [75,91]. However, significant differences (p < 0.05) in RMF bdry and porosity are evidence of inter-site heterogeneity between RMF study sites.

Particle Size and Soil Texture
When particle size classes (i.e., sand, silt, and clay) were averaged over the total depth (50 cm), sand was consistently the dominant particle size class, followed by silt, and then clay. Average (n = 179) sand, silt, and clay percentages were 92%, 7%, and <1%, respectively (Figure 3). Based on the results of two-way ANOVA tests, the sites differed significantly in all particle size classes (n = 179; p < 0.05) [78]. The soil core textures were sand (n = 155) with a few instances of loamy sand (n = 24) [63]. A comparison of the particle size fractions at all sampled depths with Tukey's post hoc multiple comparison tests showed that the sites did not differ significantly in average particle size class percentages between soil depths (p > 0.05) [78]. When comparing these results to NRCS-mapped soils from the Web Soil Survey, which included higher percentages of silt and clay fractions, RMF soils had higher sand percentages, and smaller silt and clay percentages [63]. Based on two-way ANOVA results, sites differed significantly in bdry (n = 179; p < 0.05). A comparison of the sites sampled at all depths, using Tukey's post hoc multiple comparison, showed that bdry was significantly lower at the 0-5 cm depth (n = 14 to n = 27 each site; p < 0.05) [78]. Porosity was also significantly different among study sites (n = 179; p < 0.05) and was significantly higher (n = 14 to n = 27 each site, p < 0.05) at the 0-5 cm depth. bdry and porosity were statistically similar (p > 0.05) at 25-30 and 45-50 cm, confirming that below the surface (i.e., 0-5 cm), RMF alluvial soils are homogeneous [75,91]. However, significant differences (p < 0.05) in RMF bdry and porosity are evidence of inter-site heterogeneity between RMF study sites.

Particle Size and Soil Texture
When particle size classes (i.e., sand, silt, and clay) were averaged over the total depth (50 cm), sand was consistently the dominant particle size class, followed by silt, and then clay. Average (n = 179) sand, silt, and clay percentages were 92%, 7%, and <1%, respectively (Figure 3). Based on the results of two-way ANOVA tests, the sites differed significantly in all particle size classes (n = 179; p < 0.05) [78]. The soil core textures were sand (n = 155) with a few instances of loamy sand (n = 24) [63]. A comparison of the particle size fractions at all sampled depths with Tukey's post hoc multiple comparison tests showed that the sites did not differ significantly in average particle size class percentages between soil depths (p > 0.05) [78]. When comparing these results to NRCS-mapped soils from the Web Soil Survey, which included higher percentages of silt and clay fractions, RMF soils had higher sand percentages, and smaller silt and clay percentages [63]. Results confirm that site soil textures were mostly comprised of sand but differed from NRCS findings, which include larger silt and clay fractions ( Figure 3) [63,64]. This could be due to the proximity of the sampling grids to Moore's Run and is evidence of inter-site particle size heterogeneity across RMF study sites and the NRCS sampling locations. Differences in particle size fraction percentages with increasing depth were not significant (p > 0.05), which supports the use of near-surface (i.e., ≤50 cm) particle size fraction data to model Ksat values in the current work.

Field Saturated Hydraulic Conductivity
Site-level comparisons of slug-test-derived Ksat are presented in Figure 4 and separated by slug test type (i.e., FH or RH). FH Ksat were typically higher than RH Ksat, except for RMF6, where FH and RH Ksat were similar. The differences in Ksat between the slug test type can be attributed to nonidealities such as well-skin effects [18,19], while the similar Ksat values at RMF6 may be explained by differing non-idealities at this site. However, these differences may be attributed to the unique soil properties and site features adjacent to the RMF6 piezometer. A Tukey's post hoc multiple comparison of piezometer adjacent soils revealed that particle size fractions at RMF6 were significantly different (p < 0.05) when compared to RMF4 and RMF5, while the RMF6 clay fraction was significantly different (p < 0.05) from RMF3 [78]. Additionally, the piezometer at RMF6 was proximal to a buried culvert pipe that may have created an artificial hydraulic boundary, resulting in FH and RH Ksat estimates that were different from the other sites. Results confirm that site soil textures were mostly comprised of sand but differed from NRCS findings, which include larger silt and clay fractions ( Figure 3) [63,64]. This could be due to the proximity of the sampling grids to Moore's Run and is evidence of inter-site particle size heterogeneity across RMF study sites and the NRCS sampling locations. Differences in particle size fraction percentages with increasing depth were not significant (p > 0.05), which supports the use of near-surface (i.e., ≤50 cm) particle size fraction data to model K sat values in the current work.

Field Saturated Hydraulic Conductivity
Site-level comparisons of slug-test-derived K sat are presented in Figure 4 and separated by slug test type (i.e., FH or RH). FH K sat were typically higher than RH K sat , except for RMF6, where FH and RH K sat were similar. The differences in K sat between the slug test type can be attributed to non-idealities such as well-skin effects [18,19], while the similar K sat values at RMF6 may be explained by differing non-idealities at this site. However, these differences may be attributed to the unique soil properties and site features adjacent to the RMF6 piezometer. A Tukey's post hoc multiple comparison of piezometer adjacent soils revealed that particle size fractions at RMF6 were significantly different (p < 0.05) when compared to RMF4 and RMF5, while the RMF6 clay fraction was significantly different (p < 0.05) from RMF3 [78]. Additionally, the piezometer at RMF6 was proximal to a buried culvert pipe that may have created an artificial hydraulic boundary, resulting in FH and RH K sat estimates that were different from the other sites. Observed Ksat values across all study sites ranged from 0.35-9.33 m/day and 0.21-4.37 m/day for FH and RH slug tests, respectively. The larger range in FH Ksat may be attributed to more variable flow resistance during the FH slug tests [18,95]. When compared across all sites (n = 24 per slug test type), FH and RH Ksat values were significantly different (p < 0.05) (Figure 4) [78]. When examined by site (n = 3 per slug test type), RMF2, RMF3 and RMF8 had significantly different FH and RH Ksat (p < 0.05), likely due to differences in flow resistance near the piezometer screened interval [18,19,95,96].When compared to the expected average Ksat for the region, five average FH Ksat were higher than Ksat range reported by the NRCS of 0.19-1.99 m/day, while only two average RH Ksat were higher than the expected range [63,64]. These differences in Ksat range may be due to methodology, as NRCS uses double ring infiltrometers and constant head well permeameters for field Ksat [97], while soil core bdry and particle size distributions are used for laboratory Ksat estimates [27].
Site-level average Ksat ranged from 0.29 m/day at RMF8 to 4.76 m/day at RMF5, while the average farm-level Ksat was 2.24 m/day ( Table 2). The results of a one-way ANOVA indicated that the sitelevel average Ksat were significantly different across all study sites (n = 8; p < 0.05) [78]. Tukey's posthoc multiple comparison test showed that average measured Ksat at RMF 3 and RMF5 were significantly higher than RMF2 and average measured Ksat at RMF5 was significantly higher than RMF6, RMF7, and RMF8 (p < 0.05). These significant differences highlight the inter-site heterogeneity between RMF study sites [78]. Since site-level average Ksat include FH and RH slug test variability, and thus better capture heterogeneity, they were used to assess Ksat model performance. Site-level average Ksat coefficient of variation (CV) ranged from 0.19 at RMF1 to 0.96 at RMF3, indicating greater variation between slug tests at RMF3. The greater variability in measured Ksat at RMF3 likely contributed to the high farm-level CV ( Table 2) when all measure Ksat were averaged (n = 48). Observed K sat values across all study sites ranged from 0.35-9.33 m/day and 0.21-4.37 m/day for FH and RH slug tests, respectively. The larger range in FH K sat may be attributed to more variable flow resistance during the FH slug tests [18,95]. When compared across all sites (n = 24 per slug test type), FH and RH K sat values were significantly different (p < 0.05) (Figure 4) [78]. When examined by site (n = 3 per slug test type), RMF2, RMF3 and RMF8 had significantly different FH and RH K sat (p < 0.05), likely due to differences in flow resistance near the piezometer screened interval [18,19,95,96]. When compared to the expected average K sat for the region, five average FH K sat were higher than K sat range reported by the NRCS of 0.19-1.99 m/day, while only two average RH K sat were higher than the expected range [63,64]. These differences in K sat range may be due to methodology, as NRCS uses double ring infiltrometers and constant head well permeameters for field K sat [97], while soil core bdry and particle size distributions are used for laboratory K sat estimates [27].
Site-level average K sat ranged from 0.29 m/day at RMF8 to 4.76 m/day at RMF5, while the average farm-level K sat was 2.24 m/day ( Table 2). The results of a one-way ANOVA indicated that the site-level average K sat were significantly different across all study sites (n = 8; p < 0.05) [78]. Tukey's post-hoc multiple comparison test showed that average measured K sat at RMF 3 and RMF5 were significantly higher than RMF2 and average measured K sat at RMF5 was significantly higher than RMF6, RMF7, and RMF8 (p < 0.05). These significant differences highlight the inter-site heterogeneity between RMF study sites [78]. Since site-level average K sat include FH and RH slug test variability, and thus better capture heterogeneity, they were used to assess K sat model performance. Site-level average K sat coefficient of variation (CV) ranged from 0.19 at RMF1 to 0.96 at RMF3, indicating greater variation between slug tests at RMF3. The greater variability in measured K sat at RMF3 likely contributed to the high farm-level CV ( Table 2) when all measure K sat were averaged (n = 48). Table 2. Site-level soil saturated hydraulic conductivity (m/day) comparisons derived from the falling head and rising head slug tests. The coefficient of variation (CV) is presented as a unitless ratio of the standard deviation to the mean.

Modeled Saturated Hydraulic Conductivity
Predicted site-level K sat and descriptive statistics for farm-level K sat are presented in Table 3 for the Puckett et al. [39], Jabro [27], Campbell [40], Smettem and Bristow [41], and Saxton et al. [11] models. Average farm-level K sat predicted by five tested models ranged from 1.94 m/day with the Saxton et al. [11] model to 39.07 m/day with the Jabro [26] model (Table 3). When compared to the observed farm-level K sat (Table 2), the Puckett et al. [39], Campbell [40], and Saxton et al. [11] models resulted in estimated K sat within one standard deviation of the average farm-level K sat and were similar in magnitude. The Smettem and Bristow [41] model resulted in an estimated K sat within two standard deviations of the farm-level K sat . The Jabro [27] model, which was the only model that included bdry as a model parameter, resulted in an unrealistically high K sat that was 178% higher than the observed result. Differences between model results may be attributable to the soil texture range of samples used in the previous studies or the topographic position of the sampling sites. Additionally, the inclusion of additional model parameters did not always result in a lower CV, as evidenced by the increase in CV, apart from the Campbell [40] model which had a slightly improved CV when compared to the Smettem and Bristow [41] model. Table 3. Site-level soil saturated hydraulic conductivity (m/day) comparisons estimated using five K sat models [11,27,[39][40][41], the resulting farm-level descriptive statistics, and additional model information including data source location(s), type of model parameters(s), and the number of estimated model parameters. The coefficient of variation (CV) is presented as a unitless ratio of the standard deviation to the mean. When compared to K sat estimates derived from the NRCS Web Soil Survey, all predicted K sat values, except for the Saxton et al. [11], were higher than the NRCS estimated K sat range of 0.19-1.99 m/day [63]. These NRCS underestimates may be attributable to spatial heterogeneity, where the soils were collected, and/or which method NRCS used to estimate K sat [27,82,97]. The results from four models were characterized by CV scores of less than 0.5 (Table 3), apart from the Jabro [27] model, indicating relatively low spatial variability between farm-scale K sat predictions [78].

Model Performance
Model performance is shown in Figure 5 with average predicted K sat versus averaged measured K sat at each study site. Four out of five models performed similarly well, as predicted average K sat were within the same order of magnitude [11,[39][40][41]. Predicted K sat from the Puckett et al. [39] ( Figure 5a) and the Smettem and Bristow [41] (Figure 5d) models scattered away from the 1:1 line with most or all values, respectively, falling between the positive y-axis and 1:1 line, representing an overestimation by the models. In contrast, half of the predicted K sat from the Campbell [40] (Figure 5c) and Saxton et al. [11] models (Figure 5e) scattered along the 1:1 line, indicating smaller relative error and overall better model fits. The Jabro [27] model ( Figure 5b) performed poorly, as evidenced by the range of K sat predictions and distance from the 1:1 line, indicating it was not valid for the RMF soil data. Notably, the Jabro [27] model had the most model parameters (i.e., three) and was the only model to incorporate bdry, indicating that this pedotransfer function was not suited for RMF soils.
The quantified errors for each model are shown in Table 4. Farm-level ME for the four validated models ranged from 0.26 m/day with the Saxton et al. [11] model to 4.44 m/day with the Smettem and Bristow [41] model, while the Jabro [27] model ME was 37.11 m/day, which was an order of magnitude larger than the other models. Positive MEs indicate that the models generally overestimated RMF K sat , but smaller ME values (<1 m/day) for the Saxton et al. [11] and Campbell [40] model confirms better fits to the observed data (Table 4). Error ranges were similar in magnitude (<10) for four of the five models, while the Jabro [27] model error range was two orders of magnitude larger. Similar model error ranges can be partially explained by the variability in the measured K sat . Although most of the model error ranges were similar, the smaller error ranges for Puckett et al. [39] and Campbell [40] models confirms that they were better fits for the measured data. Table 4. Performance of five site-level soil saturated hydraulic conductivity (K sat ) (m/day) models [11,27,[39][40][41] for RMF soil by site and farm-level.

Model Comparison
Similar magnitude farm-level SSE and RMSE for the Puckett et al. [39], Campbell [40], and Saxton et al. [11] models indicates that these models were similar and better fits for the RMF data when compared to the Jabro [27] and Smettem and Bristow [41] models, the estimates from which were characterized by larger SSE and RMSE values (  (Table 4). The Campbell [40] model was the best fit for the measured data, as it had the smallest farm-level SSE and RMSE, while the Jabro [27] model did not result in a reasonable fit. Additionally, the Campbell [40] model seemed to perform better when the average site-level K sat were between 1.5 and 2.5 m/day, whereas the other models, did not seem to have a range for better model predictions. The presence of an ideal K sat range demonstrates that soil-property-based K sat models could be characterized by an ideal soil texture range, outside of which they become unsuitable. For example, the Saxton et al. [11] model was a better fit for the sandy RMF soils, as it was developed with more sand-dominant textures, while the Jabro [27] model utilized silt-dominant textures and was not accurate for RMF soils. Additionally, the range of soil textures used to develop each model likely influenced the individual model outcomes.

Model Results Implications
One of the most important implications of the current work is that four out of the five evaluated pedotransfer functions provide a valid, alternative approach to direct K sat measurements, as evidenced by low ME values (i.e., <4.5 m/day) ( Table 4). The similarity in model performance between the Campbell [40] and Saxton et al. [11] models demonstrates their applicability in sand textured soils with similar order of magnitude K sat . Model predictions were improved with the inclusion of a second particle size parameter, as RMSE for the Campbell [40] and Saxton et al. [11] models decreased, or was similar to the one particle size parameter models (i.e., Puckett et al. [39] and Smettem and Bristow [41]). Although slightly more complicated, two parameter models may provide increased accuracy in K sat estimates, as in the current work, and therefore may justify additional complexity [98]. The Jabro [27] model, which utilized three model parameters, including bdry, was not a good fit for the measured data, likely due to the RMF soils' high sand content (average 92%). As such, this applied model comparison shows that particle size based K sat models provide a good option for practitioners to predict K sat because (1) gathering the required soil particle size data are relatively simple [36][37][38] and (2) four particle-size-based K sat models adequately and accurately predicted farm-level K sat , as evidenced by RMSE < 0.5.
The implications of the current work extend beyond the presentation of measured K sat and particle size based K sat model validation. The results increase confidence in K sat models, especially when observed the data are not readily available. Further application of soil property K sat models can improve K sat predictions, and thus understanding of soil-water dynamics, in hydrologic studies throughout the northeast region, Chesapeake Bay Watershed, and areas with readily available particle size data. This model validation work provides practitioners and water resource managers with relatively simple alternatives to easily predict a parameter that is essential for governing shallow groundwater flow, which in turn, can increase K sat estimate availability and transferability.

Conclusions
Soil properties and saturated hydraulic conductivity (K sat ) were measured in a small forested drainage area within the Northeast USA to validate five soil property K sat models. Soil cores from three sampling depths were collected (n = 179) in a grid design to determine soil characteristics and a series of slug tests were performed to quantify K sat at eight study sites. Mean dry bulk density (bdry) and porosity ranged from 1.03 to 1.30 g/cm 3 and 0.51 to 0.54, respectively. bdry and porosity varied significantly (n = 179; p < 0.05) at the 0-5 cm depth and with study site location, while bdry and porosity were statistically similar at 25-30 and 45-50 cm, indicating that the site soils were homogenous below the surface. On average, soil cores were 92% sand and soil textures were sand or sandy loam. Particle size fractions sites did not differ significantly in average particle size class percentages between soil depths (n = 179; p < 0.05) but did differ significantly by site (n = 8; p < 0.05).
Measured K sat were 0.35-9.33 m/day and 0.21-4.37 m/day for FH and RH slug tests, respectively, and varied significantly (p < 0.05) with slug test type at three of the eight study sites. Average site K sat ranged from 0.29 to 4.76 m/day and varied significantly with site (n = 8; p < 0.05). Five pedotransfer functions that predicted K sat from soil property data were tested. Four models performed well and resulted in low, spatial variability between farm-scale estimated K sat (CV < 0.5). The models that relied on particle size parameters performed better (RMSE < 4.64 m/day) than the models that relied on particle sizes and bdry (RMSE = 63.26 m/day). Improved K sat estimates justify using a two-parameter particle size model [40] to predict K sat at the farm-level for RMF soils.
Study results provide soil property characteristics and demonstrate that using affordable and readily available soil characteristic data can accurately and efficiently predict K sat . This comparison study validates and supports the use of soil-property-based models to predict K sat in sandy soils. These results are particularly relevant for understanding regional soil-water dynamics but are also informative for hydrologic studies in landscapes with similar soil properties. The broader impacts of this work extend to providing practitioners with an assessment of K sat modeling that can used to effectively inform water resource management decisions and increase the applicability of K sat estimates at locations with available particle size data.