Regression Modeling of Baseflow and Baseflow Index for Michigan Usa

Baseflow plays an important role in maintaining streamflow. Seventeen gauged watersheds and their characteristics were used to develop regression models for annual baseflow and baseflow index (BFI) estimation in Michigan. Baseflow was estimated from daily streamflow records using the two-parameter recursive digital filter method for baseflow separation of the Web-based Hydrograph Analysis Tool (WHAT) program. Three equations (two for annual baseflow and one for BFI estimation) were developed and validated. Results indicated that observed average annual baseflow ranged from 162 to 345 mm, and BFI varied from 0.45 to 0.80 during 1967–2011. The average BFI value during the study period was 0.71, suggesting that about 70% of long-term streamflow in the studied watersheds were likely supported by baseflow. The regression models estimated baseflow and BFI with relative errors (RE) varying from −29% to 48% and from −14% to 19%, respectively. In absence of reliable information to determine groundwater discharge in streams and rivers, these equations can be used to estimate BFI and annual baseflow in Michigan.


Introduction
Baseflow is a very important component of streamflow generated from groundwater inflow or discharge.Baseflow is generally derived from available streamflow records using hydrograph separation techniques such as graphical methods [1], recession-curve methods [2], analytical methods [3,4], mass-balance methods [5,6], and digital baseflow filter methods [7,8].Many of these techniques have (1) Developing a database to compile hydrologic and physiographic characteristics of the studied watersheds; (2) Partitioning baseflow from daily streamflow records using the Web-based Hydrograph Analysis Tool; (3) Developing regression equations for baseflow and BFI estimation using multiple regression analysis; (4) Validating the regression equations with data from different watersheds in Michigan.

Study Watersheds
This study was conducted with a group of watersheds in Michigan.Seventeen gauging stations with data from 1967-2011 with no effects of regulation and diversion on streamflow were selected based on the 2011 USGS water report [35].The selected gauging stations have long-term streamflow records and are distributed across the state.The watersheds draining into these gauging stations were delineated to summarize various hydrophysiographic and geologic characteristics based on the National Hydrographic Dataset streamflow lines [36] using Spatial Analyst Tools in ArcGIS.Results of watershed delineation were compared with published USGS watershed boundaries for Michigan and shown in Figure 1 and Table 1.
Michigan is located in the Great Lakes region of the United States (Figure 1).It consists of a Lower Peninsula (LP) and Upper Peninsula (UP) which lie approximately between 82°30' and 90°30' west.The two peninsulas are separated by the Straits of Mackinac which connects Lake Michigan and Lake Huron.The principal river in Michigan is the Grand, which is 420 km long, flowing through the LP into Lake Michigan.The Saginaw River and its tributaries drain 15,500 km 2 and form the largest watershed within Michigan.In the UP, most rivers flow southward into Lake Michigan and its various bays.About one-fifth of the state is covered by forest and the principal agricultural region is located in the southern half of the LP where farmlands account for about 50% of the total land area [37].The distribution of precipitation in Michigan depends on the season and location.The southwest of the LP and parts of the UP receive about 1020 mm of precipitation per year, including snowfall, while the northeast of LP receives only 660-760 mm of precipitation per year.In areas with plant cover, approximately 40% of rainfall is returned to the atmosphere through evapotranspiration, and 10% directly flow into streams [38].Nearly 50% of rainfall in the state infiltrates the soil and replenishes groundwater [38].As a major contributor to streams, inland lakes, wetlands, and Great Lakes coastal wetlands, groundwater provides about 23% of public water supply in Michigan [39].More than 2.7 million people, including the majority of the rural population, rely on domestic wells for their daily needs [39].Groundwater accounts for a large proportion of total streamflow in Michigan [34].

Baseflow Separation
Baseflow was separated from long-term streamflow records using the Web-based Hydrograph Analysis Tool (WHAT) [14].There are two digital filter methods available in the WHAT program: one parameter digital filter method [40] and two-parameter digital filter parameter method (also known as the Eckhardt filter method [41]).The Eckhardt filter was used for baseflow separation in this study as the method was previously validated against seven baseflow separation techniques [18].The two parameters of the Eckhardt method consist of the filter parameter and BFI max .The filter parameter describes the rate at which the streamflow decreases with time following a recharge event and can be derived by recession analysis.The BFI max is the maximum baseflow index which can be modeled by the recursive digital filter algorithm [41,42].Daily baseflow computation with the Eckhardt filter method can be expressed as [18,41]: where, Q b,t , Q b,t−1 is baseflow at time step t and t−1; Q s,t is the total streamflow at time step t; a is the filter parameter.Baseflow for the first time step, Q b,t-1 , was assumed 50% of streamflow in Equation (1).
Eckhardt [41] proposed values for BFI max [in Equation (1)] based on various aquifer types such as perennial streams with porous aquifers, ephemeral streams with porous aquifers and perennial streams with hard rock aquifers.In this study, the 17 watersheds selected were considered perennial streams with porous aquifers based on hydrologic and geological characteristics of the studied watersheds [35,43].Thus, default BFI max and filter parameter values of 0.80 and 0.98 describing watersheds with perennial streams and porous aquifers were used as implemented in WHAT.

Watershed Characteristics
Baseflow is generally influenced by watershed characteristics such as watershed physiographic features, distribution of water storage, evapotranspiration, geomorphology, land use, and soil types [32].As mentioned in the Introduction above, many of these watershed characteristics or indices developed with them were used for baseflow and BFI modeling in previous studies [17,[23][24][25]27,28,31,32,44].For instance, Longobardi and Villani [25] used catchment permeability index (i.e., the ratio of permeable area to watershed drainage area) to establish regression equations for estimating BFI.Lacey and Grayson [28] related BFI to geology-vegetation groups (i.e., combination of rock types with vegetation communities), topographic index (i.e., drainage index defined as the ratio of total stream network length to the square root of drainage area, slope index defined as the ratio of catchment relief to the square root of drainage area, and flat area ratio), climatic index (i.e., the ratio of rainfall to potential evapotranspiration), forest cover and forest growth stage.In this study, a total of 21 watershed characteristics were compiled with data processing techniques in ArcGIS.These watershed characteristics and the sources of datasets used are shown in Tables 2 and 3.

Regression Analysis
Multiple linear regression was used to develop equations for estimating annual baseflow and BFI in the following form: where, Q b is the predicted annual baseflow (m 3 ) or BFI; b 0 is the regression constant; b 1 , b 2 , b 3 , …, b n are regression coefficients; X 1 , X 2 , X 3 , …, X n are watershed characteristics.The log-transformation of Equation ( 2) is written as: (3) The models developed were evaluated using Relative Error (RE), R 2 and Nash-Sutcliffe coefficient (E NS ) shown respectively as [45][46][47]: where, Q b(obs) (i) is the observed baseflow or BFI which was separated from the daily streamflow record; Q b(pred) (i) is the predicted baseflow or BFI; b(obs) Q is the mean of Q b(obs) , and n is the total number of years.These statistics are widely used to evaluate the performance of hydrologic and water quality models [45,[48][49][50][51].The scientific literature provides guidelines for acceptable levels of model performance [50].For example, Santhi et al. [49] pointed that R 2 greater than 0.5 could be considered acceptable.Moriasi et al. [50] recommended that model simulations could be judged as satisfactory if E NS was greater than 0.50.Ramanarayanan et al. [51] suggested that the model performance could be considered as satisfactory if the correlation coefficient and the E NS were greater than 0.5 and 0.4, respectively.It appears that acceptable model performance based on statistical measures is project specific requirements [48].Prior to model development, the Spearman correlation test was used to determine the correlation among baseflow, BFI and watershed characteristics.The correlation analysis showed that BFI, BDA and HSGA were independent variables (from each other) but related to baseflow, while BFI was affected by WLC and WTD (Table 4).Precipitation was also considered as an independent variable in this study, although it was not strongly correlated with baseflow in Michigan (Table 4).The statistical analysis software (SAS) [58] was used for the analysis.
After the independent variables were selected, regression models were developed in SAS (at a significance level of 5%) using "proc reg" procedure [27,58].For annual baseflow estimation, an option of "BEST = 10" in SAS "proc reg" procedure was used to output the best 10 models based on different combinations of explanatory variables with the highest R 2 and adjusted R 2 values.Then, p-values of individual explanatory variables were examined for significance.If two independent variables have similar significance, the simplest (i.e., easily available for practical applications) was used for model development.This process allowed selection of the final independent variables used to develop the models.In addition, residuals of the fitted models were checked for normality.Similar model development steps were followed using two independent variables (i.e., WLC and WTD) to develop an equation for BFI.Ahiablame et al. [27] could be consulted for a detailed description regarding the steps for model development and validation.

Baseflow and Baseflow Index in the Studied Watersheds
Average annual baseflow and BFI ranged from 162 to 345 mm/yr and 0.45-0.80 in the studied watersheds for the period of 1967-2011 (Table 5).In general, baseflow decreases from north to south in the UP.The largest baseflow and total streamflow were 345 mm/yr and 529 mm/yr in the Trap Rock River (04043050) which is located in the most northern part of the UP (Figure 1; Table 5).In this area, heavy snow and large amounts of spring snowmelt are the main recharge sources of streamflow.Shallow mixed glacial drifts and high stream gradients in the Trap Rock River may also lead to moderate groundwater inflow and high peak flow [59].The Ford River (04059500) has the lowest baseflow (187 mm/yr) (Tables 1 and 5), which could be attributed to mixed glacial deposits and thin glacial tills over bedrock in this watershed [59].The Ford River also has a low total streamflow (278 mm/yr), likely due to reduced amounts of precipitation recorded in this watershed over the study period.Ford River is located in the southwestern UP and is adjacent to Wisconsin, so the climate of this watershed may be influenced by continental climate rather than lake-effect climate, causing relatively little precipitation (especially snowfall) and large seasonal variability in the watershed.In the LP, baseflow varied from 162 mm/yr for Stony Creek (04161580) to 342 mm/yr for Manistee River (04124000) and the corresponding total streamflow ranged from 237 to 428 mm/yr during the study period (Tables 1 and 5).Baseflow in the north tends to be higher than baseflow in the southern LP, mainly because the northern region is dominated by permeable coarse glacial deposits that provide favorable conditions for groundwater storage [37,60].Overall, there is no particular pattern in baseflow variation across the studied watersheds in Michigan.For watersheds with coarse materials and well-drained soils such as Manistee River (04124000) and Pere Marquette River watersheds (04122500) [39], streamflow is typically dominated by groundwater [56].These streams are mostly located in the UP watersheds (e.g., Sturgeon River, Tahquamenon River, and Sturgeon River watersheds (Table 5, Figure 1).Baseflow appears low for streams that drain fine-textured soils like Stony Creek watershed (04161580) and River Rouge watershed (04166100) (Figure 1) due to low infiltration capacity of the fine materials [39].
The average BFI value of 0.71 for the studied watersheds suggests that about 70% of long-term total streamflow in the studied watersheds could possibly be the contribution of groundwater discharge.Holtschlag and Nicholas [61] analyzed streamflow for 195 streams in the Great Lakes basin and attributed 67% of streamflow to groundwater discharge.It should be noted that watersheds covered principally by coarse materials and natural vegetation tend to contribute high proportions of groundwater to streams, while low percentages of groundwater discharge are associated with watersheds having large proportions of imperviousness.Neff et al. [17] also reported that about 80% of annual streamflow in the LP resulted from groundwater discharge.The analysis showed large differences between Augusta Creek (04105700) and Macatawa River (04108800), although total streamflow in these two watersheds approaches 400 mm/yr (Table 1; Table 5).BFI is 0.78 for Augusta Creek (04105700), while BFI is 0.45 for Macatawa River (04108800) (Tables 1 and 5).The difference in BFI between these two watersheds could be explained by the fact that Macatawa River Basin has large proportions of agricultural and urban land use.The watershed is also dominated by hydrologic soil group C (HSGC), which would influence infiltration by reducing the rate of water transmission of the underlying aquifer and groundwater discharge into the streams [62].On average, BFI seems to be slightly lower in the UP (with an average of 0.70) than the average BFI of 0.73 in the LP.

Model Development
Twelve out of 17 watersheds were used for model development and the remaining five watersheds were used for model validation .The regression equations developed for estimating annual baseflow and BFI are shown in Table 6.In Table 6, Q b(pred) is the predicted baseflow (m 3 ); BFI is baseflow index; BDA is basin drainage area (km 2 ); HSGA is hydrologic soil group A (%); AP is annual precipitation (mm); BFI (pred) is the predicted BFI; WLC is wetland cover (%); WTD is water table depth (m).

Model description
Both Model 1 and Model 2 were developed for baseflow estimation, and Model 3 was developed to estimate BFI.The significant explanatory variables to estimate baseflow in this study include basin drainage area (BDA), precipitation (AP), hydrologic soil group A (HSGA) and baseflow index (BFI) (Table 6).Watershed characteristics like basin drainage area, precipitation, and baseflow index were retained as explanatory variables in previous similar studies to estimate baseflow (e.g., [27,31,34,44]).For example, Holtschlag [44] related drainage area, forest land cover, snowfall, outwash, clay and fine-texture glacial till to low flow characteristics in Michigan.Zhu and Day [31] correlated baseflow with basin drainage area, precipitation, evapotranspiration and elevation.Ahiablame et al. [27] utilized basin drainage area, precipitation, baseflow index and proportion of tile drainage area to predict baseflow.
For BFI estimation, the significant explanatory variables in the present study include wetland cover (WLC) and water table depth (WTD) (Table 6).When water table rises above the streambed, groundwater will flow from upland areas toward streams or other surface water bodies [39].When the water table is higher than river level, groundwater flows will discharge to the streams.In the opposite case, stream flows will recharge groundwater.The interaction between surface water and groundwater may result in baseflow variations in a watershed.The contribution of groundwater discharge into the streams may also vary with variability in land cover due to differences in permeability rate.Wetlands are critical components of water balance for most Michigan wetlands [39].Water interaction between groundwater discharge and water from wetlands would impact the fluctuation of the groundwater table, with potential impacts of groundwater contributions to the streams.
Watershed characteristics like precipitation, land cover, slope and soils have also been used in previous studies to estimate BFI (e.g., [24,25,27,63]).Haberlandt et al. [63], for example, found that BFI is strongly correlated to topographical, pedological, hydrogeological and precipitation characteristics.Mazvimavi et al. [24] considered slope and grassland cover for watersheds in Zimbabwe.Ahiablame et al. [27] developed a regression equation for BFI estimation using water land cover, HSGB and HSGC.
The evaluation of the two baseflow equations (Model 1 and Model 2) in the 12 watersheds used for model development shows that the RE between predicted and observed annual baseflow for Model 1 and Model 2 vary from −26% to 45% and from −29% to 48%, respectively (Table 7).The R 2 values range from 0.17 to 0.57, and E NS values vary between −2.95 and 0.39 for Model 1 and Model 2, indicating that Model 1 performed slightly better than Model 2 (Figure 2, Table 7).This varying performance between the two models could be the presence of HSGA in Model 1 in lieu of BFI (in Model 2) (Table 6).Different hydrologic soil groups support different infiltration rates in a watershed.Hydrologic soil group A provides sites for large infiltration rates and water transmission rate in the aquifer, allowing abundant groundwater discharge to the streams [62].Hydrologic soil groups have also been used to develop regression equations for baseflow estimation in other studies (e.g., [21,64]).Index of relative infiltration adopted in Armbruster [64] was calculated on the basis of hydrologic soil group.Gebert et al. [21] also related baseflow to basin drainage area, baseflow factor and soil infiltration rate in Wisconsin.There are various techniques for partitioning baseflow from streamflow records (e.g., [1,6,7]).The resulting baseflow and BFI values would likely be different from one technique to another.The same baseflow separation method could also generate different results with the same dataset if implemented in different software packages [65].These varying results could affect the predictive capacity of regression equations developed with the estimated baseflows.
It should be noted that the RE values in Stony Creek watershed (04161580) are higher, simultaneously, for Model 1 (39%) and Model 2 (48%) compared to the other studied watersheds, and the corresponding E NS values are −0.86 and −1.49, respectively (Table 7).Groundwater discharge to the Stony Creek may be influenced by decreased water infiltration within the watershed due to large proportions of residential, commercial and industrial land uses [34].Variations in land use conditions over time were not explicitly taken into account by the regression models, and therefore may create large disparities between the observed and predicted baseflow in the Stony Creek watershed (Table1, Figure 2).Overall, E NS values appeared to be negative for watersheds with high RE between predicted and observed baseflow (Table 7).This trend is observable for Stony Creek (04161580), Trap Rock River (04043050), and Sturgeon River (04127997) watersheds.However, RE and E NS values for Model 1 are slightly better than that of Model 2 for these three watersheds.The negative E NS values indicated that there were large deviations between the predicted and observed annual baseflow, suggesting that the models have limited predictive power when used for baseflow estimation in these three watersheds (i.e., the observed mean baseflow was a better estimator than the predicted value).The large deviations of the predicted baseflow in Trap Rock River watershed could be explained by moderate groundwater inflow due to the presence of shallow mixed glacial drifts and high stream gradients in this watershed [59].In the Sturgeon River watershed, the presence of permeable soils, large proportions of forest cover, and a large variability in topography facilitates high groundwater inflow into the streams [66], leading to the relatively high bias between the predicted and observed baseflow.The RE between predicted and observed (calculated with WHAT) BFI ranges from −14% to 19% (Table 7).In general, the equation developed to predict BFI (i.e., Model 3) tends to overestimate BFI values for watersheds in the UP, while the opposite pattern appears in the LP, except for the Macatawa River watershed (04108800) (Figure 3, Table 7), where the predicted BFI (0.54) is greater than the observed BFI (0.45).The relatively large RE for the estimated BFI in the Macatawa River watershed (Table 7) may be attributed to the presence, in this watershed, of fine-texture soils and large proportions of agricultural and urban land cover, which would reduce infiltration and then affect groundwater discharge into the streams [62].

Model Validation
The three regression models (Model 1, 2, and 3) were validated with five watersheds during the 1967-2011 study period.The predicted BFI values (calculated with Model 3) were applied to evaluate Model 2 in this study.Results showed that Model 1 performed slightly better than Model 2 for 3 [i.e., Ford River watershed (04059500), Battle Creek watershed (04105000), and Manistee River watershed (04124000)] out of the five watersheds used for model validation.In the remaining two watersheds (Rabbit River watershed (04108600) and River Rouge watershed (04166100), Model 2 predicted baseflow better than Model 1 (Table 1, Figure 4).Model 1 tends to estimate baseflow better in large watersheds than Model 2, which performed slightly better in the small watersheds (Table 1, Figures 2 and 4).The same pattern was found with watersheds used for model development (Table 7, Figure 2).The models developed in this study mostly overestimated baseflow in the watersheds used for model validation, except in Manistee River watershed (04124000), where most predicted baseflow are smaller than observed baseflow (Figure 4).It should be noted that annual precipitation is the only variable which is not constant in the baseflow estimation equation (Table 6), indicating that variability in precipitation would explain variability in baseflow estimated with the regression models in a given watershed.Jeffrey [67] reported that precipitation in Michigan and the Great Lakes region has an increasing trend since the 1940s.However, this increasing trend has remained steady during the past decade.A trend analysis with Mann-Kendall method [68] to assess the changes in precipitation over the study period in each of the watersheds used for model validation did not reveal any statistically significant trend.The predicted baseflow in these watersheds also did not show any significant increase or decrease over the study period.In addition, precipitation did not considerably vary from year to year over the study period in these watersheds, resulting in little annual variation of predicted baseflows as shown in Figure 4, where the predicted annual baseflows are clustered around the 1:1 line.
The models developed in this study did not seem to predict annual baseflow with high accuracy in the watersheds used for model validation.This could be explained by factors such as lake effects and snow melt that were not explicitly considered in model development.Limitations in the predictive capacity of the models could also be due to the fact that all baseflows in a watershed do not necessarily originate from areas within the watershed boundary.

Conclusions
Regression equations for estimating baseflow and BFI in Michigan were developed in this study.Seventeen watersheds were delineated to summarize various hydrophysiographic and geologic characteristics using ArcGIS.Baseflow was partitioned from daily streamflow records from 1967-2011 using the two-parameter recursive digital filter method.Twelve watersheds were used to develop two regression models for baseflow estimation and one model for BFI estimation.The remaining five watersheds were used for model validation.
Results indicate that average annual baseflow and BFI vary from 146 to 345 mm and 0.45-0.80,respectively.The average BFI value is 0.71 across Michigan, suggesting that about 70% streamflow in the studied watersheds might be derived from groundwater discharge.The significant explanatory variables to estimate annual baseflow include basin drainage area, precipitation, hydrologic soil group A, and baseflow index.For BFI estimation, the significant independent variables are wetland cover and water table depth.Overall, Model 1 performed slightly better than Model 2 due to the presence of HSGA as an explanatory variable in Model 1.The BFI equation (i.e., Model 3) tends to overestimate BFI values for watersheds in the UP while the opposite pattern appears in the LP, except for Macatawa River watershed.Fine-texture soils and large proportions of agricultural and urban land cover in this watershed could reduce infiltration and affect baseflow recharge.During the validation period, the models (Model 1 and Model 2) mostly overestimate baseflow in all the watersheds, and Model 1 performed slightly better than Model 2 in 80% of the watersheds used for model validation.Taking into consideration methodological limitations (e.g. the same recession constant was used in the studied watersheds for baseflow separation, and the variables in the equations are all considered to be constant except for precipitation), the equations developed in this study have the ability to predict baseflow and BFI in the watersheds across Michigan.

Figure 1 .
Figure 1.Gauging stations and delineated watersheds used for the study in Michigan.

Figure 2 .
Figure 2. Predicted and observed average annual baseflow in watersheds used for model development (values on the bars represent average annual baseflow in the studied watersheds).

Figure 3 .
Figure 3. Predicted and observed BFI in watersheds used for model development.

3 Figure 4 .Figure 4 .
Figure 4. Predicted versus observed annual baseflow for Model 1 and Model 2 in watersheds used for model validation during 1967-2011 period.

Table 1 .
Michigan watersheds used for model development and validation.

Table 2 .
Abbreviation and unit of all watershed characteristics for multiple regressions.

Table 3 .
Sources of datasets for all watershed characteristics.

Table 4 .
Correlation analysis of variables as used for the development of regression models.

Table 5 .
Average annual baseflow for 17 watersheds in Michigan.

Table 7 .
Relative Error (%), R 2 and E NS in watersheds used for model development.