Evaluation of Paired Watershed Runoff Relationships since Recovery from a Major Hurricane on a Coastal Forest—A Basis for Examining Effects of Pinus palustris Restoration on Water Yield

: The objective of this study was to test pre-treatment hydrologic calibration relationships between paired headwater watersheds (WS77 (treatment) and WS80 (control)) and explain the difference in ﬂow, compared to earlier published data, using daily rainfall, runoff, and a water table measured during 2011–2019 in the Santee Experimental Forest in coastal South Carolina, USA. Mean monthly runoff difference between WS80 and WS77 of − 6.80 mm for 2011–2019, excluding October 2015 with an extreme ﬂow event, did not differ signiﬁcantly from − 8.57 mm ( p = 0.27) for the 1969–1978 period or from − 3.89 mm for 2004–2011, the post-Hurricane Hugo (1989) recovery period. Both the mean annual runoff coefﬁcient and monthly runoff were non-signiﬁcantly higher for WS77 than for WS80. The insigniﬁcant higher runoff by chance was attributed to WS77’s three times smaller surface storage and higher hypsometrical integral than those of WS80, but not to rainfall. The 2011–2019 geometric mean regression-based monthly runoff calibration relationship, excluding the October 2015 runoff, did not differ from the relationship for the post-Hugo recovery period, indicating complete recovery of the forest stand by 2011. The 2011–2019 pre-treatment regression relationship, which was not affected by periodic prescribed burning on WS77, was signiﬁcant and predictable, providing a basis for quantifying longleaf pine restoration effects on runoff later in the future. However, the relationship will have to be used cautiously when extrapolating for extremely large ﬂow events that exceed its ﬂow bounds.


Introduction
Restoration of longleaf pine (LLP) (Pinus palustris) ecosystems is a public land management objective throughout the southeastern United States, and it is a principal goal in the Forest Plan for the Francis Marion National Forest in South Carolina, USA. While there have been numerous plot or stand-scale studies of LLP ecology, silviculture, and ecosystem services [1], there are uncertainties regarding the watershed-scale runoff effects of reestablishing longleaf pine communities due to the spatial heterogeneity of soil conditions, microtopography, slope, and understory vegetation, all of which affect soil water storage. In contrast to loblolly pine (Pinus taeda L.) (LP) stands managed for timber production, LLP stands managed for the open canopy with frequent prescribed fire have a much lower stocking, a longer period of open canopy, a sparse mid-story, and an understory generally dominated by grasses and sedges, potentially influencing soil moisture and evapotranspiration (ET) [2]. As a result of these differences in stand structure and composition, it may be Interestingly, the paired pre-Hugo flow relationship (WS77 > WS80), reported by Richter [36] for 1969-1978, reversed (WS80 > WS77) four years after Hurricane Hugo's 1989 arrival for 10 years (1994-2003) before the relationship recovered to the pre-Hugo direction (WS77 > WS80) by 2004, as did the forest stands [24] ( Figure S1). Jayakaran et al. [24], who analyzed pre-and post-Hugo monthly data over 2011, suggested that lowered vegetative water use likely increased outflows in both watersheds because trees were lost to the hurricane. However, WS77 recovered to its pre-hurricane runoff level by 1993, having an abundance of pine seedlings and saplings there compared to WS80, which recovered its flow pattern in 2003. Jayakaran et al. [24] noted that it seems likely that high rain- Richter [36] suggested negligible deep seepage losses for these poorly drained soils and found no evidence of weir leakage on either watershed. Based upon seasonal flows and vegetation composition analysis, the author also argued that differences in water yield cannot be explained by vegetational differences. His analysis also ruled out watershed boundary effects in these low-gradient systems, in which the watershed drainage areas are bounded by the elevated roads built with well compacted soils, minimizing any possible lateral seepage, except for the northeast corner of WS80, which is a watershed divide. However, Richter [36] suggested that because of the consistency in annual ET (rainfall-runoff) and predictability of runoff measured on WS77, differences were attributed possibly to WS80 runoff estimates, particularly for high flows. Nonetheless, the author also suggested a need for calibration of both stream gauges. In a long-term paired watershed study on grasslands in Uruguay, Chescheir et al. [38] also found similar inherent differences between the paired watersheds for the pre-treatment period, with higher runoff from the treatment than from the control, which was attributed to a higher baseflow from the treatment watershed, likely due to lower ET from its shallow soils or groundwater inflow from outside the watershed.
Interestingly, the paired pre-Hugo flow relationship (WS77 > WS80), reported by Richter [36] for 1969-1978, reversed (WS80 > WS77) four years after Hurricane Hugo's 1989 arrival for 10 years (1994-2003) before the relationship recovered to the pre-Hugo direction (WS77 > WS80) by 2004, as did the forest stands [24] ( Figure S1). Jayakaran et al. [24], who analyzed pre-and post-Hugo monthly data over 2011, suggested that lowered vegetative water use likely increased outflows in both watersheds because trees were lost to the hurricane. However, WS77 recovered to its pre-hurricane runoff level by 1993, having an abundance of pine seedlings and saplings there compared to WS80, which recovered its flow pattern in 2003. Jayakaran et al. [24] noted that it seems likely that high The heavy rainfall and subsequent dry conditions in the following year might have somewhat confounded the exact timing and mechanisms responsible for the return of flow relationships to those pre-Hugo. Therefore, the authors cautioned that whether the 2003 wet and 2004 dry years accelerated the recovery to the pre-Hugo (WS77 > WS80) direction is an area for further study. Although the relationship was restored, the difference (WS77 > WS80) in the average magnitude in 2004 through to 2011 was 2-3 times smaller than before Hugo ( Figure S1). Since these authors did not find any significant difference in the monthly rainfall totals between the two watersheds for any of the three periods, they attributed the relative periodical magnitude differences in paired streamflow to relative changes in ET dynamics between the watersheds. Therefore, the objective of this study was to re-evaluate and re-establish the paired calibration relationship of watersheds, recovered since the 1989 hurricane, using climatic data for 2011-2019, which includes very large rainfall and dry events [39] (Table 1). This period was chosen because the stands reported by Jayakaran et al. [24] as being recovered by 2004, as stated above, were hypothesized to be fully recovered by 2011 ( Figure S1) or 22 years after the hurricane. This hypothesis is consistent with studies reporting a recovery period of 7 to 25 years for the annual water yield in three forested watersheds in the northeastern US [8], as many as 20 years in the US and > 14 years in Japan [40], and 10 to 15 years for recovery of the water table and drainage outflow in managed pine forest watersheds in coastal North Carolina [26,41]. Table 1. Measured annual flow, with the average annual and standard deviation (StdDev), for paired watersheds WS77 and WS80 and the difference in flow between them for 2011-2019, pre-Hurricane Hugo (1969)(1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978), and post-Hugo (2004-2011), reported by Jayakaran et al. [24] (See also Figure S1). In addition, the choice of the 2011-2019 period as a pre-treatment baseline reference was supported by its closer agreement of the computed average annual flow difference of 82.1 mm between the treatment and control watersheds, than the 46.7 mm for 2004-2011, with the pre-Hugo average difference of 102.8 mm (Table 1). Furthermore, the StdDev of the flow difference for the baseline was closer to the pre-Hugo period than that of the post-Hugo, indicating their similar intra-annual variability. A similar approach was reported by Oda et al. [40] for testing disturbance effects using a paired watershed approach.

Pre-Hurricane Hugo
Regarding choosing a stable and sufficient record length for a baseline calibration period, Ssegane et al. [42] found statistically significant pre-treatment calibration relationships using only 762 days and 608 days, respectively, for two treatment watersheds from 2009 to 2012 that included some disturbances. Similarly, Bren and Lane [32] found a rapid increase in the quality of calibration relationship as the record length increased up to 3 years, but no increase was found beyond that, for all temporal scales of flow. The authors suggested that 5 years were adequate for most purposes, consistent with Clausen and Spooner [31], and the main advantage of longer periods was lower mean errors.
It was hypothesized, therefore, that the nine-year (2011-2019) record period, covering years with very low (2012) and very high (2015) runoff (Table 1), should be adequate for obtaining a stable pre-treatment (baseline) calibration relationship that is significant and quantifiable for future applications in treatment evaluations. This model would be applied using the measured flow from the control watershed to estimate expected flows for the WS77 treatment, assuming no disturbance, starting in 2020 when the harvesting and thinning treatments began for longleaf restoration. Next, the expected flow from the treatment watershed would be compared with actual measured flow. Deviations of the treated watershed's measured flow from expected values were considered to represent treatment effects if the deviations fell outside specified confidence intervals (95%) placed around the calibration regression line. In addition, the treatment regressions would also be evaluated against the pre-treatment baseline.
Various potential reasons, including rainfall and storm events, and understory prescribed burning implemented in 2013, 2016, and 2018 on the WS77, as shown by Richter et al. [35] and discussed above, were evaluated for the inherent differences in paired watershed flows. This study is novel in that no other studies, to the authors' knowledge, have reassessed the paired watershed calibration relationship after the reported recovery of forests following a major natural disturbance that altered the pre-disturbance flow regime between the watersheds.
Objective 1: Evaluate the annual rainfall, runoff coefficient, and ET (as the difference between rainfall and flow) in the paired watersheds for the pre-treatment baseline period and compare them with the 2004-2011 post-recovery period.

Hypothesis 1.
There will be no significant difference in the pre-treatment mean annual runoff coefficient (ROC) or in mean monthly rainfall between the paired watersheds, consistent with the post-recovery period, despite the effects of relatively very wet and dry years (defined, respectively, as 30% above and below the long-term average rainfall).
Objective 2: Evaluate the difference in pre-treatment monthly runoff between the paired watersheds compared to the post-recovery period and the possible reasons for difference, if any, building upon earlier studies [24,36].

Hypothesis 2.
The difference in the monthly runoff response between the paired watersheds will be similar (WS77 > WS80) to that in the post-recovery period.
Objective 3: Evaluate the pre-treatment monthly runoff calibration relationship between the watersheds compared to the post-recovery period.

Hypothesis 3.
The paired pre-treatment monthly runoff calibration relationship will not be different from the relationship for the post-recovery period and will be significant and quantifiable, with predictive capability.

Hypothesis 4.
The periodic prescribed burning treatments in the pre-treatment period will not affect the change in paired runoff relationship between the watersheds.

Site Description
The paired watersheds (WS77 and WS80) drain into Fox Gulley Creek and further down to Turkey Creek, a tributary of Huger Creek. These are parts of the headwaters of Huger Creek, a fourth-order stream and a major tributary of the East Branch of Cooper River, which drains into Charleston Harbor (Figure 1a). Basic characteristics of the wa-  Table 2. The original WS80 watershed area was 206 ha when it was installed in 1968 [35], but on 6 November 2001, a culvert was installed to drain its northeastern portion, thus reducing its area by 46 ha to 160 ha ( Table 2). The vegetation in WS80 is a mixed hardwood-pine stand, regenerated since Hugo (Figure 1b,c). The vegetation in WS77 is dominated by loblolly pine (Figure 1b,d), planted for silvicultural research in the late 1970s. Soils in the watersheds are poorly to moderately well-drained sandy clay loam overlaying clay, typified by the Wahee and Craven soil series in the uplands and the Megget and Betheera soils in the riparian zones ( Figure 1b). The control watershed is 48% wetlands compared to only 11% in WS77, as estimated from recent National Wetland Inventory data ( Table 1). The mean surface depressional storage reported by Amoah et al. [17] for the WS80 watershed was nine times higher than for the WS77 ( Table 1). The climate is warm-humid temperate, with an average daily temperature of 17.8 • C and annual rainfall of about 1370 mm [43]. Chronological activities of both watersheds are given in Table S1, and more details are described elsewhere [15,17,43].

Hydro-Meteorologic Monitoring
Beginning in 2003, digital records of precipitation were collected using automatic tipping bucket gauges backed up by a manual gauge at the Met5 station in WS77 and at the Met25 station in WS80. Data from nearby gauges (Figure 1a,b) were also used to fill gaps [44,45]. Digital measurements of stage, also beginning in 2003, were recorded every 10 min by the Teledyne ISCO flowmeters installed upstream of both the WS77 and WS80 watershed weir outlet gauging stations (Figure 1a,b). These digital stage data were used with established rating curves for compound V-notch weirs for estimating streamflow rates [39,44,45]. Details of stream gauges, stage measurements, and estimates of flow rates and the quality control are given in [44,45]. The flow data have been recently verified and are of a high quality for the rating range they were developed for. Daily average weather parameters obtained from weather sensors installed on a 27-m tall tower (above the forest canopy) in WS80 in 2010 (Figure 1a,b) were used to estimate daily Penman-Monteith (P-M) [46] based potential evapotranspiration (PET) for the forest conditions following Amatya et al. [47] (Table 3). A 3 m weather station installed on open grass at the nearby SEF Headquarters (SHQ) (Figure 1a) [48] provided data to fill in some missing values for a few short periods [45]. The WT in upland well H and riparian well D in WS80 and upland well J in WS77 have been measured hourly since 2004 using pressure transducers with a datalogger (Figure 1a [43] for WS80 were used, assuming the LAI of the fully recovered stands on this control watershed remained unchanged. A comparison of WS77 LAI with WS80 LAI is shown in Figure S3. Details of all hydro-meteorologic measurements, including data quality control, can be found elsewhere [15,17,43,44,49].

Data and Statistical Analyses
Measured daily rainfall, streamflow or runoff (watershed area-based depth), and PET, estimated from the daily weather data for the 2011-2019 period, were used to obtain monthly and annual totals and to compute the annual rainfall normalized runoff coefficient (ROC). The number of daily rain events >25 mm in each year was also logged. Flow data were lost for both watersheds for Hurricane Joaquin (3-4 October 2015), while only WS77 lost some data for Hurricane Matthew (8 October 2016), because the measured stage exceeded the rating curve range of each watershed. The exceptionally high flow of October 4, 2015 was assumed to be an outlier, as discussed in the Results section below, so that month was excluded from both watersheds in the comparative monthly analysis. Data for WS77 for October 8, 2016 were constructed by assuming the maximum rating curve flow value for less than 9 h of the day when the measured stage exceeded the rating curve limit. Integration of all 10-min interval flow rates, including the peak rates for this day, yielded 242.2 mm of flow as a response to 204 mm rain on that day, preceded by 90.4 mm rain the day before with only 1.7 mm flow, indicating that most of the two-day rain contributed to this single day large event. This daily value of 242.2 mm, which was lower than the 187.6 mm observed for WS80, was used in the analyses. Daily flow data were used to derive the daily flow duration curves to identify differences in flow magnitudes, frequencies, and duration of daily runoff between the watersheds. Daily WT depths were obtained by integrating hourly data.
Monthly rainfall, as well as annual runoff and ROC, for both watersheds were statistically analyzed to test Hypothesis 1. Measured monthly runoff data were used to (a) compare the mean monthly difference in flow between the paired watersheds against the post-recovery period to test Hypothesis 2 and (b) develop a baseline calibration regression of the monthly flow between the paired watersheds to test Hypothesis 3. Finally, a MOSUM (moving sums of recursive residuals) approach was used to detect changes in the paired flow regime, if any, and also in the paired calibration relationship, due to the prescribed burning, to test Hypothesis 4.
The Shapiro-Wilk normality test [50] showed a non-normal distribution (p < 0.001) of monthly runoff. Therefore, the nonparametric Wilcoxon signed-rank test was used to assess the significance of differences in mean monthly runoff between the two watersheds measured for 108 months or nine (2011-2019) years. An ordinary least squares regression (OLS) was used to develop a calibration equation between the control and treatment watersheds and its significance test [51]. However, since the Durbin-Watson (DW) test [50] showed a positive autocorrelation of the monthly runoff of both watersheds (DW_WS77 = 0.054, p < 0.0001; DW_WS80 = 0.029, p < 0.0001), regression relationships using an OLS versus geometric mean (GM) regression were compared. Based on Ssegane et al. [42], the ts and lmodel2 R statistical packages [52] were used to examine if the OLS was significantly different from the GM. The ts function is used to create time-series objects. These are vectors or matrices with a class of "ts" (and additional attributes), which represent data which have been sampled at equispaced points in time. In the matrix case, each column of the matrix data is assumed to contain a single (univariate) time series. Similarly, the lmodel2 function computes model II simple linear regression using the following methods: ordinary least squares (OLS), major axis (MA), standard major axis (SMA), and reduced major axis (RMA) of the GM. The model accepts only one response and one explanatory variable. Model II regression should be used when the two variables in the regression equation are random, i.e., not controlled by the researcher. GM regression is a resampling technique that accounts for autocorrelation in the time series by resampling the original data in pre-determined blocks 1000 times to estimate regression coefficients. GM, also known as the reduced major axis (RMA) regression, is suited for paired watershed analysis, because it assumes errors are associated with both dependent (treatment watershed) and independent (control watershed) variables [53]. The coefficient of determination (R 2 ), Nash-Sutcliffe efficiency (NSE), and root mean squared error (RMSE) were used to evaluate the strength and significance of the regression. For R 2 (0 ≤ R 2 ≤ 1.0) and for NSE (−∞ ≤ NSE ≤ 1.0), a value of 1.0 indicates an optimal model. All statistical significance tests for similarity with no difference were conducted for the α = 0.05 level.
An OLS-based MOSUM (moving sums of recursive residuals) approach using monthly flow data was conducted to detect the change in flow behavior, if any, between the watersheds due to prescribed burning of the WS77 watershed and potential effects on the monthly flow regression relationship. The null hypothesis (Hypothesis 4) tested by the MO-SUM is that regression coefficients of a linear model are constant over time; the alternative hypothesis is that the coefficients change over time due to external factors [24,42].
A morphometric analysis, among other factors, was also used for explaining possible reasons for inherent differences in streamflow between the paired watersheds, with a higher, but insignificant, flow from the treatment than from the control watershed since the historic study [36]. A morphologic analysis was conducted by deriving the hypsometric curves and indices [54,55] to examine the effects of land morphologic characteristics on runoff generation for the paired watersheds. A system for automated geoscientific analysis (SAGA)-GIS [56] and LiDAR-based DEM were used to generate the hypsometric curves of WS77 and WS80. The hypsometric integral (HI), skewness (skew), and kurtosis of the hypsometric curves were computed using general formulations by Harlin [57] and Pérez-Peña, Azañón, and Azor [58].

Annual Rainfall, Runoff, Runoff Coefficient (ROC), and ET
The first year (2011) of the pre-treatment (baseline) period was relatively dry, with rainfall below 32% of the long-term average (1370 mm) [43], and 2015 was relatively wet, with 58% above average rainfall. The nine-year average baseline period rainfall in WS77 was about 9% above the long-term average ( Table 3). The nine-year baseline and the eight-year post-recovery periods yielded the highest and the lowest mean annual flow, respectively, for both watersheds (WS77 > WS80) ( Table 1). An unusually high outflow in 2015 due to an extreme October event [59] might have caused the largest average flow in the baseline period. However, the mean annual ROC values, although almost consistently higher in WS77 (mean of 0.24) than in WS80 (0.19) ( Table 3) The annual ET, calculated as a difference between the annual rainfall and runoff, assuming no change in storage, varied from 903 mm in the relatively dry year of 2011 to as high as 1272 mm in the relatively wet year of 2018, with an average of 1167 mm for the control watershed (WS80) ( Table 3). The annual ET was consistently lower in WS77, although not significantly so (p = 0.07), with a mean of only 1105 mm, primarily because it had a higher runoff than WS80. The mean annual ET for WS80 was very close to the estimated P-M PET, with no significant difference (p = 0.46), while the ET for WS77 was significantly lower (p = 0.046) than the PET, potentially indicating WS77's soil water limitations (Table 3). However, the annual ET increased insignificantly with rainfall, yielding a higher R 2 (0.71) for WS77 than for WS80 (R 2 = 0.42), indicating, again, WS77 as being more soil water limited than WS80.

Monthly Rainfall and Runoff between the Watersheds
A plot of the monthly rainfall averaged from each month of the 2009-2011 period is shown in Figure 2a for the paired watersheds with their standard deviations. Data show similar rainfall between the watersheds but higher values with larger variabilities in June-October, influenced by tropical storms/hurricanes, than in winter for both. For example, October 2015 yielded a very large rainfall of 667 mm for WS80 and 686 mm for WS77 due to an extreme two-day rainfall of nearly 500 mm on 3-4 October caused by Hurricane Joaquin [59], followed by the second large rainfall event of 296 mm for WS80 and 294 mm for WS77 in October of 2016 as a result of Hurricane Matthew (October 8). These data highly influenced the variability of rainfall in October (Figure 2a). However, the paired watershed monthly rainfall for this study period showed similar means (124.7 ± 93.4 mm for WS77 and 122.9 ± 90.8 mm for WS80) with no significant difference (p = 0.89), consistent with the earlier post-Hugo period reported by Jayakaran et al. [24]. However, the mean monthly rainfall for that period was insignificantly lower, by chance, than the baseline period. This was likely due to more than six times the average number of daily rain events > 25 mm during the 2011-2019 baseline period (not shown), compared to the earlier long-term (1946-2008) period reported by Dai et al. [43]. Moreover, the 2011-2019 period had eight rainfall events exceeding 100 mm in 24 h, induced by hurricanes and tropical depressions.
with the earlier post-Hugo period reported by Jayakaran et al. [24]. However, the mean monthly rainfall for that period was insignificantly lower, by chance, than the baseline period. This was likely due to more than six times the average number of daily rain events > 25 mm during the 2011-2019 baseline period (not shown), compared to the earlier longterm  period reported by Dai et al. [43]. Moreover, the 2011-2019 period had eight rainfall events exceeding 100 mm in 24 h, induced by hurricanes and tropical depressions.   Figure S4a). Similarly, there was no difference in variance in the monthly flow between WS77 and WS80 (p = 0.21).
The effects of extreme rainfall on the water table influencing runoff for both watersheds during the October 2015 and 2016 hurricanes (Figure 2a,b) were similar due to fully saturated antecedent soil conditions ( Figure S2). Monthly runoff responses to hurricanes in the following years, i.e., Irma (11-12 September 2017, with rainfall of 130 mm), Florence (14-15 September 2018, with rainfall of 110 mm), and Dorian (4-5 September 2019, with rainfall of 190 mm), were smaller than the two previous ones. The study period also experienced relatively drier months with more no-flow months for the control watershed than for WS77 (Figure 2b), with slightly more variability in monthly summer rainfall between the pair. Data in Figure 3 for the monthly difference in runoff between the watersheds for 2011-2019 showed WS77 yielding somewhat higher flows (negative difference) than WS80, except for a few periods, consistent with the pre-Hugo (1969)(1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978) pattern [36] ( Figure S1). The mean monthly runoff difference of −6. higher, although not significantly different (p = 0.54), than the −3.89 mm (±1.09 mm [SE]) obtained by Jayakaran et al. [24] for the 2004-2011 period, when the forest stands recovered. The difference for the 2011-2019 period was slightly, but not statistically (p = 0.27), lower than the pre-Hugo (1969-1978) mean of −8.57 mm (±1.65 mm [SE]) obtained by Richter [36], although the difference between the baseline and each of the pre-and post-Hugo periods was in the same direction. Thus, this result supports Hypothesis 2, confirming the validity of the 2011-2019 period data for pre-treatment calibration.

Ordinary Least Squares Regression Versus Geometric Mean Regression for Paired monthly Runoff
The plot in Figure 4a compares the relationships of monthly runoff using ordinary least squares (OLS) and geometric mean (GM) regressions between the control watershed (WS80) and the treatment watershed (WS77) without October 2015 because of its extreme flow event. Results showed that the regression slope for the GM (WS77 = 1.15 xWS80 + 3.70; R 2 = 0.87) lay just at the border of the 95% confidence bounds (0.99-1.15) of the slope of the OLS regression (WS77 = 1.07 xWS80 + 5.39; R 2 = 0.87) (Figure 4a), and so it was barely statistically different (p = 0.01). Therefore, subsequent analysis focused only on the GM

Ordinary Least Squares Regression versus Geometric Mean Regression for Paired Monthly Runoff
The plot in Figure 4a

Calibration Regression of Paired Monthly Flows
The plot in Figure 4a shows the regression relationships of measured monthly runoff between the control (WS80) and treatment (WS77) watersheds for the pre-treatment calibration period of 2011-2019, without October 2015 because of its extreme rainfall event. The GM regression in Figure 4a yielded a significant monthly runoff relationship (WS77 = 1.15 xWS80 + 3.70; R 2 = 0.87) between the paired watersheds. Both the slope of 1.15 and an intercept of 3.7 mm were significant (p < 0.0001). This significance indicates that both the flow rate as well as the shift from the zero intercept could be attributed to the average

Calibration Regression of Paired Monthly Flows
The plot in Figure 4a shows the regression relationships of measured monthly runoff between the control (WS80) and treatment (WS77) watersheds for the pre-treatment calibration period of 2011-2019, without October 2015 because of its extreme rainfall event.
The GM regression in Figure 4a yielded a significant monthly runoff relationship (WS77 = 1.15 × WS80 + 3.70; R 2 = 0.87) between the paired watersheds. Both the slope of 1.15 and an intercept of 3.7 mm were significant (p < 0.0001). This significance indicates that both the flow rate as well as the shift from the zero intercept could be attributed to the average difference in monthly flow, with WS77 insignificantly higher than WS80, as discussed above (Figure 3). The variability of flow around the 95% confidence limits of the regression line showed somewhat higher discharges for WS77 than for WS80 for most of the months for a flow of less than 100 mm. However, the regression with slope = 1.15 and intercept = 3.7 for the 2011-2019 pre-treatment period differed significantly from the 1969-1978 period with a slope = 1.43 and intercept of −0.68 [36] but not from the 2004-2011 post-Hugo period with 1.14 slope and 1.70 intercept [24] (Figure 4b). Thus, it both validates and invalidates Hypothesis 3. However, the relationship has to be cautiously interpreted as it includes one large event with flow >200 mm on October 8, 2016 (Hurricane Matthew), when a few hours of the unusable data during the peak flow of WS77 were estimated as explained above in Section 3.3, and thus it may have some uncertainties [60]. This event was included in this pre-treatment regression analysis because WS80 had good data, and frequencies of such large events are expected to increase in coming years [11]. For example, 17 out of 17 flow events >30 mm per day −1 occurred mostly because of hurricanes and tropical depressions within the 5 years since 2015 in this study ( Figure 5).

Daily Flow (Runoff) Duration Curves
The watersheds' daily flow frequency duration curves for 2011-2020 are compared in Figure 5. Daily runoff was consistently lower from WS80 than from WS77, as in 1969-1978 [36]. The magnitude of daily runoff of 20 mm was exceeded 0.74% of the time for WS77 and 0.49% of the time for WS80 ( Figure 5).
For 10% of the time, daily runoff exceeded 1.53 mm for WS77 and 1.23 mm for WS80. Similarly, WS77 had zero runoff 41.2% of the time, compared to 46.6% of the time for WS80 ( Figure 5), which was somewhat similar to the 1969-1978 period, when WS77 had zero runoff 35.6% of the time and WS80 43.8% of the time (not shown). Furthermore, the difference in the percentage of zero runoff days between the watersheds was found to be smaller (5.5%) for the current period than the 8.2% for 1969-1978, although the two periods covered different numbers of days. Excluding the three days of the extreme event in October 2015, runoff exceeded nearly 100 mm day −1 for two hurricanes (Irma and Dorian, in September 2017 and September 2019, respectively), with a steeper slope for both, which indicates a flooding regime consistent with Amatya et al. [7]. Two other storm event days exceeded a 50 mm runoff, in August 2016 and July 2017, indicating an increasing pattern of large flow events during 2015-2019 for these low-gradient watersheds ( Figure 5).

Daily Flow (Runoff) Duration Curves
The watersheds' daily flow frequency duration curves for 2011-2020 are compared in Figure 5. Daily runoff was consistently lower from WS80 than from WS77, as in 1969-1978 [36]. The magnitude of daily runoff of 20 mm was exceeded 0.74% of the time for WS77 and 0.49% of the time for WS80 ( Figure 5).
For 10% of the time, daily runoff exceeded 1.53 mm for WS77 and 1.23 mm for WS80. Similarly, WS77 had zero runoff 41.2% of the time, compared to 46.6% of the time for WS80 ( Figure 5), which was somewhat similar to the 1969-1978 period, when WS77 had zero runoff 35.6% of the time and WS80 43.8% of the time (not shown). Furthermore, the difference in the percentage of zero runoff days between the watersheds was found to be smaller (5.5%) for the current period than the 8.2% for 1969-1978, although the two periods covered different numbers of days. Excluding the three days of the extreme event in October 2015, runoff exceeded nearly 100 mm day −1 for two hurricanes (Irma and Dorian, in September 2017 and September 2019, respectively), with a steeper slope for both, which indicates a flooding regime consistent with Amatya et al. [7]. Two other storm event days exceeded a 50 mm runoff, in August 2016 and July 2017, indicating an increasing pattern of large flow events during 2015-2019 for these low-gradient watersheds ( Figure 5).

Discussion
The paired watersheds, besides being adjacent, are similar in many characteristics, including the area, topography, drainage, dominant soil types (Table 2), and mean Leaf Area Index (LAI) of the existing forest stands ( Figure S3). Despite these similarities, the treatment watershed (WS77) yielded non-significantly, by chance, a higher monthly runoff than WS80 (Figure 3), except for a few periods with saturated soils (WT near the surface) in July 2013, March 2015, November-December 2015, and October 2016 ( Figure S2), when WS80 runoff exceeded that of WS77 (Figure 3). The events in those periods resulted in large peak discharges of WS80, consistent with Harder et al. [15], who found exponentially increasing runoff as the WT in well H neared the surface or got ponded ( Figure S2). The insignificantly higher monthly runoff (WS77 > WS80) (Figure 3), consistent with earlier studies [24,36], was also supported by the daily flow duration curves for 2011-2019 ( Figure 5). Richter [36], in his study prior to Hugo (1989), ruled out the possible causes of groundwater seepage, drainage area, and vegetation effects for this difference in flow. However, he speculated some possible shortfalls in WS80 flow measurements for that period, particularly during high flow periods with a daily flow >5 mm. Since that historic study, however, several recent studies, including this one, have verified the flow measurements for WS80, as well as the drainage area and seepage [4,15,17,24,41,59,61]. Therefore, the flow measurements were consistent, except for the October 2015 hurricane [44], when flow rates exceeded the limits of the established rating curves for both watersheds and were estimated using theoretical equations for the WS80 outlet structure [59]. October 2015 was assumed to be an outlier and not used in the monthly analysis of this study.
Below, a few other potential factors were examined that may help explain the reasons why WS77 yielded slightly, but non-significantly, higher runoff than WS80. The mean annual rainfall, as a primary driver of runoff, was not significantly different (p = 0.90) between the watersheds, but both had more rainfall than the long-term average of 1370 mm reported by Dai et al. [43] for 1946-2008, meaning the study period was wetter. Total rainfall for each month was not significantly (α = 0.05) different between the two watersheds, although WS77 experienced somewhat more (Table 3). This finding was consistent with Jayakaran et al. [24], who suggested that given the similarity of rainfall across the two watersheds, relative changes in streamflow are good indicators of relative changes in ET dynamics, as shown in Table 3. Data in Table 3 also show that WS77 is slightly more energy limited (ET/PET) than WS80, which is slightly more moisture limited (ET/Rain) than WS77 [7]. These observations are also supported by the annual rainfall-runoff relationships between the paired watersheds ( Figure S4b), which yielded similar slopes, indicating similar rates of runoff response to rainfall. However, WS80 had a larger intercept, indicating more storage, potentially due to its greater ET than that of WS77 (Table 3). For example, although the annual ET through the baseline period was relatively constant, with a coefficient of variation (COV) <0.1 for both watersheds, WS80 yielded a somewhat higher mean value (1167 mm) than WS77 (1105 mm). These results, including the ROCs, are consistent with Boulet et al. [62], who also found such a hydrological difference between two paired Mediterranean headwater catchments with dissimilar land covers (Pinis pinaster and Eucalyptus globulus). In addition, the watersheds in this study differed in three important land use and management aspects, which were not addressed in earlier studies and are discussed below.
First, historic land use differed between WS80 and WS77. The lower reaches of WS80 were used for rice cultivation. As a result, the watershed has historical water management structures which WS77 does not have. LiDAR data analysis also revealed some depressions caused by legacy dikes on downstream riparian areas of WS80 [63]. More evidence comes from Amoah et al. [17], who found a nine-times higher mean overall surface depressional storage capacity (DSC) in WS80 than WS77, as well as more than four-times more wetland area in WS80 (Table 2). Thus, it is very likely that the high WS80 DSC values may have contributed to a higher water table during winter with lower ET demands and an increased ET with a lower water table during the summer growing season ( Figure S2). Another likely cause of WS80's reduced streamflow is modulated peaks caused by the storage, as evidenced by WS80's flatter slope in the range of an approximately 10-40 mm daily flow ( Figure 5), consistent with historical records [36]. The smaller flow rates of WS80 were further evidenced by the daily flow and 10-min hydrographs for two events of 8 June and 5 September in 2019, as an example (Figure 6a,b), in which the flow rates of WS77 with a much lower DSC were 3-4 times higher than that of WS80, consistent with some other years as well (not shown). These observations are consistent with other studies [18,19,[64][65][66][67] that reported a water table position and microtopography influence storage, and they are critical factors that affect streamflow patterns, stormflow peaks, and volume on shallow coastal forests. For example, Rains et al. [66] noted that the cumulative effect of depressions (WS80 in our study) can play an important role in landscape-scale hydrology by regulating the frequency, magnitude, timing, duration, and rate of flows to downgradient waters along overland and groundwater flow paths. Similarly, Acreman and Holden's conclusions [64] on five characteristics (landscape location and configuration, topography, soil characteristics, soil moisture status, and drainage management) largely determined the influence on floods, consistent with our runoff observations for these two watersheds. Holden's conclusions [63] on five characteristics (landscape location and configuration, topography, soil characteristics, soil moisture status, and drainage management) largely determined the influence on floods, consistent with our runoff observations for these two watersheds. Secondly, a contemporary difference between the two watersheds is that WS80 has not received any forest management activities since it was established in 1968, while WS77 has been actively managed for loblolly pine silvicultural research using prescribed fire in a 2-3-year cycle ( Figure SI-5(a)) for the past 20 years. There is a potential for flow to increase soon after fire [67,68]. For example, reduced understory vegetation ( Figure SI-5(b)), LAI, and ET caused by prescribed burning in March 2013, April 2016, and April 2018 (Figure 3) might have contributed to some temporary increased runoff in June 2013, August 2016, and August 2018. Accordingly, a detailed analysis using a MOSUM test in Figure 7 shows that the months immediately after prescribed burning captured a slight change in Secondly, a contemporary difference between the two watersheds is that WS80 has not received any forest management activities since it was established in 1968, while WS77 has been actively managed for loblolly pine silvicultural research using prescribed fire in a 2-3-year cycle ( Figure S5a) for the past 20 years. There is a potential for flow to increase soon after fire [68,69]. For example, reduced understory vegetation ( Figure S5b), LAI, and ET caused by prescribed burning in March 2013, April 2016, and April 2018 ( Figure 3) might have contributed to some temporary increased runoff in June 2013, August 2016, and August 2018. Accordingly, a detailed analysis using a MOSUM test in Figure 7 shows that the months immediately after prescribed burning captured a slight change in the relationship of paired flow, as shown by the upward or downward movement of the MOSUM curve, but not significantly impacting the linear regression coefficients (curve within the two red horizontal lines). Change was detected in June 2017, more than a year after prescribed burning. In addition, earlier studies [34,35] showed that prescribed fire had minimal or non-significant effects on soil properties, water quality, and water yield compared to the untreated reference for these watersheds. Furthermore, no heavy equipment, which may compact the soil, potentially reducing the conductivity, was used in this treatment. In addition, a rapid establishment of ground cover after the fire stabilizes the soil. The fact that the mean monthly difference (WS80-WS77) in runoff (−6.8 mm) for the 2011-2019 period was found to be not different from the 1969-1978 (−8.6 mm) and 2004-2011 (−3.9 mm) periods further indicated the initiation of forest regeneration by 2004, with complete recovery by 2011 [24], as shown in Figure S1.
Third, active management of WS77 results in a stand that is predominately loblolly pine, in contrast to WS80, which is mixed hardwood-pine. Despite this difference in stand composition, the mean LAI of 2.31 m 2 m −2 (1.23 m 2 m −2 -3.36 m 2 m −2 ), measured on the control watershed with pine-mixed-hardwood forest [43], was not significantly different (p = 0.34) from the mean of 2.54 m 2 m −2 (1.62 m 2 m −2 -2.92 m 2 m −2 ) measured in 2019-2020 in WS77 with pine stands (Figure SI-3). Although the WS77 mean LAI was slightly higher than that of WS80, the growing season LAI, with a potential to influence ET, was higher (as high as 4-6 m 2 m −2 ) in some plots during the growing season for WS80. Therefore, the lower WS80 runoff during growing season was partially attributed to a higher ET of the mixed hardwood-pine forest. For example, in the daily flow comparison for 2019 in Figure  6a, despite a lower WS77 total rainfall (235 mm) from Hurricane Dorian (September 5) than WS80′s 242 mm, the WS77 daily flow was larger by 80 mm than WS80's. The larger flow of WS77 was likely due to its shallower WT, with less storage than that of WS80 in late August/early September (Figure 6c,d), resulting in an early initiation of flows. Accordingly, for WS80, a deeper WT and larger storage possibly caused larger ET loss than for WS77, with no flows for 54 days until this September 5 event, in contrast with only 2 days without flows for WS77. This pattern, with deeper WT depths and larger growing season deficits of WS80 than of WS77, was also evident in the summers of 2011 to 2014, as well as briefly in 2016, 2017, and 2019 when WT fell below 100 cm ( Figure S2), potentially contributing to higher ET and lower flows.
In addition, land morphology, defined by the hypsometric curve, also might have The fact that the mean monthly difference (WS80-WS77) in runoff (−6.8 mm) for the 2011-2019 period was found to be not different from the 1969-1978 (−8.6 mm) and 2004-2011 (−3.9 mm) periods further indicated the initiation of forest regeneration by 2004, with complete recovery by 2011 [24], as shown in Figure S1.
Third, active management of WS77 results in a stand that is predominately loblolly pine, in contrast to WS80, which is mixed hardwood-pine. Despite this difference in stand composition, the mean LAI of 2.31 m 2 m −2 (1.23 m 2 m −2 -3.36 m 2 m −2 ), measured on the control watershed with pine-mixed-hardwood forest [43], was not significantly different (p = 0.34) from the mean of 2.54 m 2 m −2 (1.62 m 2 m −2 -2.92 m 2 m −2 ) measured in 2019-2020 in WS77 with pine stands ( Figure S3). Although the WS77 mean LAI was slightly higher than that of WS80, the growing season LAI, with a potential to influence ET, was higher (as high as 4-6 m 2 m −2 ) in some plots during the growing season for WS80. Therefore, the lower WS80 runoff during growing season was partially attributed to a higher ET of the mixed hardwood-pine forest. For example, in the daily flow comparison for 2019 in Figure  6a, despite a lower WS77 total rainfall (235 mm) from Hurricane Dorian (September 5) than WS80 s 242 mm, the WS77 daily flow was larger by 80 mm than WS80's. The larger flow of WS77 was likely due to its shallower WT, with less storage than that of WS80 in late August/early September (Figure 6c,d), resulting in an early initiation of flows. Accordingly, for WS80, a deeper WT and larger storage possibly caused larger ET loss than for WS77, with no flows for 54 days until this September 5 event, in contrast with only 2 days without flows for WS77. This pattern, with deeper WT depths and larger growing season deficits of WS80 than of WS77, was also evident in the summers of 2011 to 2014, as well as briefly in 2016, 2017, and 2019 when WT fell below 100 cm ( Figure S2), potentially contributing to higher ET and lower flows.
In addition, land morphology, defined by the hypsometric curve, also might have played a role in the runoff difference between the watersheds. The shape of the hypsometric curve is represented by a hypsometric integral, HI [70,71], with a value of 0.5 representing a threshold between the concave (HI < 0.5) and convex (HI ≥ 0.5) hypsometric forms. Vivoni et al. [71] found, keeping all other watershed variables constant (e.g., land use, land cover, rainfall), that modeled watersheds with a higher HI yielded higher runoff than those with a lower HI. Concave hypsometric curves for WS77 and WS80 (Figure 8a,b) clearly indicated that WS77 may be expected to yield more runoff than WS80 because of its higher HI of 0.405, compared to 0.285 for WS80 until 2001. The recomputed HI value of 0.313 after the drainage area of WS80 changed from 206 ha to 160 ha in 2001 was still < 0.405 (WS77). These results suggest that the shape of the basin hypsometry could be another reason for the difference in runoff between the two watersheds.  The pre-treatment monthly paired flow relationship without the October 2015 extreme runoff did not differ significantly from the 2004-2011 relationship (Figure 4b) reported by Jayakaran et al. [24]. The estimated monthly runoff of 609 mm was dominated by a one-day (4 October 2015) estimated extreme runoff of 311 mm on WS80. The peak flow rate for this hurricane event, estimated as 17.4 m 3 s −1 (10.9 m 3 s −1 km −2 ), was assumed to have exceeded the 500-year flood [59,71], and therefore, this month, as an outlier, was omitted from the monthly analysis. However, the daily flow frequency duration analysis in this study also included the October 2015 month, except for the 3-5 October extreme flow days, and other large flow events caused by other hurricanes ( Figure 5). These data may explain how the 2011-2019 calibration relationship might have been influenced by an increased number of high precipitation events. However, the fact that the 2004-2011 regression for the period with a reportedly recovered forest [24], but not the pre-Hugo 1969-1978, was like that of 2011-2019 may indicate the similar runoff response to rainfall during the two recent periods, dissimilar from 1969-1978. This similarity is potentially supported by the observations of Dai et al. [43], who reported more annual average storms > 50 mm in the 1982 to 2008 period than in 1946-2008, with even more storms by 2019 (not shown). We suggest, therefore, that the 2011-2019 relationship, which included some hurricane/tropical storm events, with no difference in either the mean monthly flow or regression relationship of the recent 2004-2011 period, is more justified than the pre-Hugo 1969-78 for its application in treatment effects evaluation of WS77 water yield later.
Our computed p-value, R 2 , NSE, and RMSE statistics characterizing statistical significance and predictive quantifiable regression were also consistent with similar statistics (R 2 = 0.97, NSE = 0.97) for the paired daily flow relationships for 1988-1989 and higher than R 2 = 0. 48  The pre-treatment monthly paired flow relationship without the October 2015 extreme runoff did not differ significantly from the 2004-2011 relationship (Figure 4b) reported by Jayakaran et al. [24]. The estimated monthly runoff of 609 mm was dominated by a one-day (4 October 2015) estimated extreme runoff of 311 mm on WS80. The peak flow rate for this hurricane event, estimated as 17.4 m 3 s −1 (10.9 m 3 s −1 km −2 ), was assumed to have exceeded the 500-year flood [59,72], and therefore, this month, as an outlier, was omitted from the monthly analysis. However, the daily flow frequency duration analysis in this study also included the October 2015 month, except for the 3-5 October extreme flow days, and other large flow events caused by other hurricanes ( Figure 5). These data may explain how the 2011-2019 calibration relationship might have been influenced by an increased number of high precipitation events. However, the fact that the 2004-2011 regression for the period with a reportedly recovered forest [24], but not the pre-Hugo 1969-1978, was like that of 2011-2019 may indicate the similar runoff response to rainfall during the two recent periods, dissimilar from 1969-1978. This similarity is potentially supported by the observations of Dai et al. [43], who reported more annual average storms > 50 mm in the 1982 to 2008 period than in 1946-2008, with even more storms by 2019 (not shown). We suggest, therefore, that the 2011-2019 relationship, which included some hurricane/tropical storm events, with no difference in either the mean monthly flow or regression relationship of the recent 2004-2011 period, is more justified than the pre-Hugo 1969-78 for its application in treatment effects evaluation of WS77 water yield later.
Our computed p-value, R 2 , NSE, and RMSE statistics characterizing statistical significance and predictive quantifiable regression were also consistent with similar statistics (R 2 = 0.97, NSE = 0.97) for the paired daily flow relationships for 1988-1989 and higher than R 2 = 0.48 and NSE = 0.34 for the 2007-2008 calibration period reported by Ssegane [26] in their North Carolina pine forest study. Those values were also similar to R 2 = 0.83 and NSE = 0.82 and R 2 = 0.91 and NSE = 0.91 for two separate paired watersheds for the 2009-2012 calibration periods reported for studies in coastal North Carolina by Ssegane et al. [42].
Thus, the strong and significant 2011-2019 geometric regression-based pre-treatment baseline monthly runoff calibration relationship with given confidence limits (Figure 4b) could be used to compare the actual measured WS77 flow response with its expected flow response compared to WS80, within the bounds of data used in the calibration regression for quantifying the magnitude and significance of effects of longleaf pine restoration treatments in the near future. Nevertheless, it should still be cautiously interpreted and applied if frequencies of extreme events, like the October 2015 hurricane excluded from this study, continue to increase, as predicted by regional studies across the southeastern region [73]. This study also emphasizes a need to analyze long-term datasets, when available, to better understand the role of hydrological dynamics and their evolution and adaptation, including the paired watershed calibration for assessing treatment effects, in the context of a changing climate [74].

Conclusions
This study evaluated the seasonal rainfall and runoff response pattern and the flow calibration relationship using nine years (2011-2019) of hydro-meteorologic data for two long-term paired watersheds (155 ha, WS77 (treatment) and 160 ha, WS80 (control)) designated for a longleaf pine (LLP) restoration project at Santee Experimental Forest on the Atlantic Coastal Plain. The geometric mean regression-based monthly runoff relationship, proposed as a pre-treatment baseline, was compared to relationships reported earlier using 1969-1978 for pre-hurricane Hugo and 2004-2011 as post-Hugo recovery periods by Jayakaran et al. [24]. Other paired hydrologic metrics with a potential to influence the runoff were also used. Results revealed that the historical pattern in the runoff difference of WS77 > WS80 was maintained in the current baseline assessment. Furthermore, the difference in the mean monthly runoff between the two watersheds did not vary significantly (α = 0.05) from the pre-Hugo and post-Hugo periods, indicating a complete runoff recovery, as shown earlier by Jayakaran et al. [24]. The insignificantly higher, by chance, mean seasonal flow for WS77 than for WS80 was attributed to a lower surface storage (mean depressional storage capacity; Table 2) and higher hypsometric integral (a land morphological characteristic; Figure 8) for WS77 than for WS80, with a larger surface storage as well as subsurface storage indicated by a deeper average water table than that of WS77. In addition, the baseline monthly runoff calibration relationship, with multiple large flow events covering 2011-2019, except for an extreme of October 2015, did not differ from the 2004-2011 period but differed from 1969-1978, indicating a complete forest recovery and, possibly, a similarity in the climatic pattern of two recent periods. The baseline calibration relationship, found to be unaffected by periodic prescribed burning, was also significant (α = 0.05), predictable, and consistent, thereby providing a basis for quantifying post-treatment effects of the full LLP restoration on water yield later in the future. However, the relationship will have to be used cautiously when extrapolating for extremely large flow events, exceeding flow limits of the relationship as well as possibly exceeding the rating curve limits of the current gauging stations, otherwise equipped with well-defined compound weir control structures and dual sensors, including a backup for precise measurements of stage elevations.  Figure S5: Pictures of (a) operational prescribed burning and (b) post-burn land cover on WS77 treatment watershed, Table S1: Chronology of activities that took place on the WS77 and WS80 watersheds.