A Study to Suggest Monthly Baseflow Estimation Approach for the Long-Term Hydrologic Impact Analysis Models: A Case Study in South Korea

Changes in both land use and rainfall patterns can lead to changes in the hydrologic behavior of the watershed. The long-term hydrologic impact analysis (L-THIA) model has been used to predict such changes and analyze the changes in mitigation scenarios. The model is simple as only a small amount of input data are required, but it can predict only the direct runoff and cannot determine the streamflow. This study, therefore, aimed to propose a method for predicting the monthly baseflow while maintaining the simplicity of the model. The monthly baseflows for 20 watersheds in South Korea were estimated under different land use conditions. Calibration of the monthly baseflow prediction method produced values for R2 and the Nash–Sutcliffe efficiency (NSE) within the ranges of 0.600–0.817 and 0.504–0.677, respectively; during validation, these values were in the ranges of 0.618–0.786 and 0.567–0.727, respectively. This indicates that the proposed method can reliably predict the monthly baseflow while maintaining the simplicity of the L-THIA model. The proposed model is expected to be applicable to all the various forms of the model.


Introduction
Changes in land use or rainfall pattern not only change the behavior of direct runoff and baseflow in the watershed, but also affect the occurrence of nonpoint source (NPS) pollution. Urbanization and industrialization have increased the ratio of impervious surfaces in watersheds. Therefore, it is necessary to analyze such changes in land use and investigate scenarios that may reduce the impacts of these changes. Hydrologic models are generally used for such analyses; Bieger et al. [1] assessed the impact of land use changes using the soil and water assessment tool (SWAT) [2,3], and three land use scenarios for forest, cropland, and orchard area changes were established. The result indicated that forest, cropland, and orchard area changes of −34.48%, +265.32%, and 204.51% led to surface flow increases of 46.1%. Guse et al. [4] analyzed the impact of spatially distributed five crop rotations using the SWAT model; nitrate loads were reduced with dynamic changes in agricultural crop rotations. Martin et al. [5] used the regional hydro-ecological simulation system model [6]; the study result indicated that high flows (highest 10 percentiles) increased by 37-88% and that low flows (lowest 10 percentiles) increased by 23-37% by land use changes. Additionally, Srivastava et al. [7] and Aghsaei et al. [8] reported that vegetation can provide a significant effect on hydrological components with the variations in the physical characteristics of the land surface, soil, and vegetation, which are the roughness, albedo, architectural resistance, infiltration capacity, leaf area index, root depth, and stomatal conductance.
The long-term hydrologic impact analysis (L-THIA) model has been in use for this purpose since 1994 [9]. The L-THIA model was first developed in the form of a spreadsheet in 1994 [9], followed by redevelopment so that it could be integrated with geographic information systems (GIS) [10,11]. However, the model is limited in terms of reflecting the various types of land use. To counter such limitations, Lim et al. [12] developed the L-THIA/NPS WWW model so that L-THIA could be used in the form of a spreadsheet in L-THIA/NPS GIS, which is based on the ArcView software. Bhaduri et al. [13] used the GIS-based L-THIA/NPS GIS model to predict changes in the direct runoff and pollution from nonpoint sources resulting from land use changes; they reported that the average annual direct runoff increased by 80%, while with pollution from lead, copper, and zinc increased by more than 50% as the urban and impervious area increased by 18% during the period 1973-1991. Wilson and Weng [14] analyzed changes in the direct runoff that were caused by land use changes using the L-THIA NPS model, which is an ArcHydro GIS extension. They reported a two-fold increase in the direct runoff in places where the residential land use increased by 37.3%, even though the precipitation increased by <30%. Liu et al. [15] considered best management practices (BMPs) such as the implementation of wet ponds, green roofs, and bioretention or the construction cost, annual maintenance cost, and interest rate of 12 techniques that belong to low impact development (LIDs) using the L-THIA-LID model [16]. They determined cost-effective strategies for 15 scenarios to reduce direct runoff, total nitrogen (TN), total phosphorus (TP), total suspended solids (TSS), Pb, biochemical oxygen demand (BOD), and chemical oxygen demand (COD). Eaton [17] analyzed a direct runoff reduction method using the L-THIA LID model to analyze green infrastructure screening and reported that direct runoff can be reduced by 12% by using bioretention and raingardens in the watershed. Li et al. [18] analyzed changes in the surface runoff caused by land use and rainfall changes using the ArcL-THIA 10.1 model that is based on ArcGIS 10.0 [19]. The authors calibrated the ArcL-THIA 10.1 model by using the baseflow filter program (BFLOW) [20] model to separate direct runoff; they reported that enhanced precipitation contributed more significantly to the observed changes compared to land use during the period 2005-2015.
Different methods for analyzing scenarios resulting from changes in land use or rainfall conditions, which also consider baseflow, have been proposed for the L-THIA model. Ahiablame et al. [21] analyzed the effects of rain barrel/cistern and porous pavement using the LTHIA-LID model in their study, which reflected the influence of baseflow. The annual baseflow regression equation, which includes the area of the watershed, annual precipitation, and baseflow index, was utilized for baseflow analysis in the L-THIA model. However, as this method can only predict the annual baseflow, it has limitations when used to determine monthly characteristics. Ryu et al. [22] improved the prediction process of the existing L-THIA in detail by using the asymptotic curve number (ACN) instead of CN for predicting the direct runoff. This made it possible to predict streamflow by adding modules for baseflow predictions and channel routing to overcome the most significant limitations of the existing L-THIA. However, this process involves an increase in the complexity of the model as three additional model parameters related to direct runoff prediction, four model parameters related to baseflow prediction, and three model parameters related to channel routing prediction were required, and the need for an optimization algorithm was suggested for calibrating the model.
The L-THIA model is based on spreadsheets [9,16] and GIS [10][11][12][13][14]19]. It has been in continuous use since its conception for analyzing direct runoff resulting from changes in land use or rainfall patterns as well as the effects of BMP and LID techniques. However, a comparison of the predictions made using this model with actual streamflow is required to improve the utility of this model. In addition, it is necessary to reflect the influence of baseflow on the analysis process based on this model. However, the model is generally used for the above purposes because of its simplicity, as it requires only input data to define CN and its computation process is not complicated. It is, therefore, necessary to maintain the current benefits of this model while predicting baseflow. This study aims to propose a method for predicting the baseflow while maintaining the simplicity of the current L-THIA model.

Decription of the Study Area
As the purpose of this study is the proposal of a method that can predict baseflow while maintaining the simplicity of the current L-THIA model, streamflow data describing the runoff in a watershed, the land use map, and monthly precipitation data were required as input into the L-THIA model.
The land use map (scale at 1:5000) was provided by the Environmental Geographic Information Service [23]; the monthly precipitation data were provided by the Korea Meteorological Administration [24]; the daily stream flow data were acquired from the Water Resources Management Information System (WAMIS) [25]. These data were used in this study. The watersheds selected for the study include the flow data measurement points operated by WAMIS as watershed outlets, and they are distributed such that the conditions in various regions of Korea are considered without any spatial overlap ( Figure 1). A total of 20 watersheds were selected with areas ranging from 5694.8 to 155,805.9 ha ( Table 1). The land use types were classified into urban, agriculture, forest, pasture, wetland, bare land, and water. Forest occupied the largest area in all the watersheds studied, followed by agriculture. The exception to this trend was Wsd-02 in which agriculture (2741.0 ha) represented the largest proportion (41.5%), followed by urban (2042.1 ha), which accounted for 30.9% of the watershed area (Table 1).  The analysis period was set to the five-year period from 1 January 2016 to 31 December 2020. The minimum monthly precipitation in each watershed ranged from 0.5 mm (Wsd-01) to 8.1 mm (Wsd-07), whereas the maximum monthly precipitation ranged from 454.5 mm (Wsd-17) to 822.0 mm (Wsd-14). The maximum precipitation was, therefore, 73 times (Wsd-07) to 1271 times (Wsd-01) higher than the minimum in each watershed, indicating a significant difference in monthly precipitation ( Table 2). The minimum daily stream flow ranged from 0.01 m 3 Table 2). The difference between the minimum and maximum daily stream flow was significant in each watershed. Therefore, it was judged necessary to reflect monthly precipitation conditions in predicting the monthly baseflow.

Baseflow Separation
As the purpose of this study is to propose a method for predicting monthly baseflow, measurement of the actual baseflow is required for comparison. The measured streamflow is the sum of the surface runoff and the baseflow, which means that it is difficult to evaluate the monthly baseflow predictions method solely by comparison with the corresponding streamflow. Therefore, the monthly baseflow prediction method was examined by separating the baseflow from the streamflow in each watershed using the method suggested by Eckhardt (Equation (1)). In this baseflow separation method [26], the baseflow on day t (b t ) is determined using the maximum value of the long-term ratio of baseflow to streamflow (BFI max ), the filter parameter (a), the filtered baseflow at the t − 1 time step (b t−1 ), and the streamflow at t time step (Q t ): After separating the baseflow (m 3 /s) from the streamflow (m 3 /s) in each watershed using Equation (1), the baseflow can be expressed (in m 3 ). It reflects the area of each watershed as shown in Table 3. Overall, the minimum, maximum, and average values of the streamflow and baseflow showed a tendency to rise as the area of the watershed increased. The mean flow percentage (%), which is the ratio of baseflow to streamflow, ranged from 32.7% (Wsd-15) to 59.2% (Wsd-19), indicating significant differences among the mean flow percentages in each watershed despite the seemingly insignificant differences observed using the actual numbers. Wsd-15, with a relatively large watershed area, exhibited the minimum percentage of 32.7%; however, the mean flow percentage showed a tendency to slightly increase as the watershed area increased. This indicates that it is necessary to consider parameters related to the area under investigation when predicting the monthly baseflow.

Monthly Baseflow Estimation Approach
A remarkable method that can be used for baseflow prediction is the spreadsheet tool for the estimation of pollutant load (STEPL) [27] model. This model was proposed by the U. S. Environmental Protection Agency (U. S. EPA) to establish a total maximum daily load plan for the United States, and it can analyze the effects of more than 50 BMPs while also simulating the average annual runoff, sediment, nitrogen, phosphorus, and the BOD. In this model, prediction of the baseline is based on the annual precipitation, land use, and hydrologic soil group (HSG). For example, it predicts that the baseflow will correspond to 36, 24, 12, and 6% of the annual precipitation in an urban area when the HSG is A, B, C, and D, respectively. In this model, land uses other than urban include cropland, pasture, and forest, but the baseflow is estimated using the same method for all types of area. In other words, the model predicts that the baseflow will correspond to 45, 30, 15, and 7.5% of the annual precipitation for the HSGs A, B, C, and D, respectively. Since this method is a statistical approach with a simple prediction process, it was judged that the application of this method could also maintain model simplicity in the baseflow prediction process of the L-THIA model.
A basic formula is required to predict the monthly baseflow. Land use conditions need to be reflected because the baseflow is likely to vary alongside land use. It was also judged that monthly rainfall requires consideration because both the baseflow and streamflow are affected by precipitation. The basic formula for predicting the monthly baseflow is shown in Equation (2). For predicting the baseflow in month i, this equation considers the conditions under which precipitation occurs in month i, the coefficient for urban land (C URBN ) and the area covered by urban land (A URBN ), the coefficient for agriculture (C AGRL ) and the area of agricultural land (A AGRL ), the coefficient for forest (C FRST ) and the area of forested land (A FRST ), the coefficient for pasture (C PAST ) and the area covered by pasture (A PAST ), the coefficient for wetland (C WTLD ) and the area covered by wetland (A WTLD ), the coefficient for bare land (C BARE ) and the area covered by bare land (A BARE ), and the coefficient for water (C WATR ) and the area covered by water (A WATR ).

Base f low i = Precipitation
Equation (2) can reflect the conditions surrounding both monthly precipitation and land use for predicting the baseflow, as the coefficients allow the degree of influence from each land use type to vary. However, the coefficients require definition, for which a genetic algorithm (GA) [28] was used in this case. GA, which is similar to the evolutionary process of nature, is used to obtain the optimized solution to a given problem. It is, therefore, useful in solving highly complex problems in the fields of business or engineering [29][30][31]. In this study, the selected watersheds were divided into two groups for determination of the seven coefficients. The first group included 11 watersheds. It consisted of the watersheds that were given even numbers: Wsd-01, Wsd-02, Wsd-04, Wsd-06, Wsd-08, Wsd-10, Wsd-12, Wsd-14, Wsd-16, Wsd-18, and Wsd-20, and it included the watersheds with the minimum and maximum areas. The second watershed group consisted of the remaining nine watersheds: Wsd-03, Wsd-05, Wsd-07, Wsd-09, Wsd-11, Wsd-13, Wsd-15, Wsd-17, and Wsd-19. After the coefficients of Equation (2) were defined through GA for the watersheds in the first group, Equation (2) was applied to the watersheds in the second group, along with a definition for all the coefficients, to examine the monthly baseflow prediction method. The processes described in Section 2.1 can be expressed as the following Figure 2.

Determination of Regression Model Coefficients
Optimal values for the seven coefficients in Equation (2) for predicting the monthly baseflow were determined by GA based on the monthly baseflow estimated for each watershed and the monthly flow that was distinguished using the Eckhardt filter equation (Equation (1)). The coefficients for which the smallest difference between the monthly baseflow separated using Equation (1) and the monthly baseflow estimated using Equation (2) was observed in each watershed were determined to be optimal. In this study, it was deemed necessary to measure the separated monthly baseflows and to develop the criteria for examining the validity of the estimated monthly baseflow. Duda et al. [32] reported that the estimated result is applicable when the R 2 is higher than 0.65 and the difference is 45% or less. Skaggs et al. [33] mentioned that the result is applicable when the Nash-Sutcliffe efficiency (NSE) is higher than 0.50, and Wang et al. [34] suggested that it is applicable when the NSE is higher than 0.50, the R 2 is higher than 0.60, and the PBIAS is ±15%. Moriasi et al. [35] deemed the result to be applicable when the R 2 is higher than 0.60, NSE is higher than 0.50, and the PBIAS is less than 15%. In other words, there are various criteria for determining the applicability of the estimated result. The different criteria were summarized in this study, and the applicability of the monthly baseflow was estimated using a scatter plot with an NSE higher than 0.50 and an R 2 higher than 0.60.
In terms of the optimized coefficients for the 11 watersheds in the first group, C URBN ranged from 0.00289 ( Table 4). The optimized coefficients in each watershed were obtained when the difference between the separated and estimated baseflows describing baseflow was minimal. Since the purpose of this study is to propose a method for predicting the monthly baseflow in multiple watersheds rather than judging the accuracy of the monthly baseflow prediction for a specific watershed, the final values of the coefficients of Equation (2) were determined based on the average of each coefficient. The coefficients were, therefore, determined to be 0.04 for C URBN , 0.40 for C AGRL , 0.20 for C FRST , 0.18 for C PAST , 0.48 for C WTLD , 0.15 for C BARE , and 0.22 for C WATR . In general, the contribution of land use to the baseflow can be considered to be related to the impervious surface ratio. The contribution of urban areas to the baseflow will be low because the impervious surface ratio is high. A C URBN of 0.04 was finally determined for urban areas based on the optimized results; this reflects the conditions of impermeability, as this coefficient is relatively lower than those obtained for other land uses. In contrast, the contribution of wetlands and reservoirs to the baseflow is high because of the constant infiltration of water. The value of 0.48 for C WTLD also appears to reflect this condition, as it is relatively high as compared to the coefficients for other types of land use.
It is noteworthy that the coefficient for agricultural land, C AGRL , had the second highest value after C WTLD . Agriculture in Korea is dominated by rice paddies, which are maintained in pond conditions during the rice cultivation period from May to October, resulting in a similar contribution to the baseflow as wetland. The contribution of agriculture to the baseflow should, therefore, be similar to that of wetland; this condition is sufficiently reflected in the coefficient for agriculture.
The monthly baseflow in the first watershed group was calculated again using Equation (2) by applying the finally determined coefficients, and the suitability of the monthly baseflow was determined by the values of R 2 , NSE, and the scatter plot. Both R 2 and NSE showed applicable ranges with R 2 ranging from 0.600 (Wsd-06) to 0.817 (Wsd-16) and NSE from 0.504 (Wsd-01) to 0.677 (Wsd-18 and Wsd-20) (Table 5). Based on the scatter plot, the estimated monthly baseflow tended to be slightly lower than the separated monthly baseflow for Wsd-06, Wsd-08, and Wsd-10; Wsd-20 showed a scattered tendency that was comparable to the other watersheds. Overall, however, there were no significant differences in the tendencies or values obtained via prediction and the separated monthly baseflow (Figure 3).  In the plot of flow duration curves, the estimated monthly baseflow did not capture the separated monthly baseflow in the dry-conditions (flow duration intervals from 60% to 90%) and the low-flow (flow duration intervals from 90% to 100%) regimes often; however, it does in the other flow regimes, which are the high-flow (flow duration intervals from 0-10%), the moist-conditions (flow duration intervals from 10% to 40%), and the mid-range flow (flow duration intervals from 40% to 60%) regimes ( Figure 4).     Therefore, the application of optimized coefficients to each watershed will render the estimated monthly baseflow similar to the separated monthly baseflow. Thus, the application of the finally determined coefficients is expected to result in satisfactory predictions.

Validation of Regression Model Coefficients
When the model parameters that were determined for use for calibration were adjusted for the associated watersheds during calibration, the results derived for these watersheds could be considered satisfactory. However, it is necessary to apply these model parameters to watersheds other than those used in the calibration process to examine whether the model parameters were well calibrated and determine whether the estimated results are applicable. Therefore, in this study, the values determined for the coefficients that correspond to the model parameters in the calibration process were applied to the second group to determine whether the estimated monthly baseflow is also applicable in this group.
The estimated monthly baseflow was determined to be applicable because the R 2 ranged from 0.618 (Wsd-09) to 0.786 (Wsd-17) and the NSE from 0.567 (Wsd-07) to 0.727 (Wsd-05) ( Table 6). Based on the scatter plot, the estimated monthly baseflow tended to be slightly lower than the separated monthly baseflow for Wsd-03, Wsd-11, and Wsd-19. Overall, however, no significant differences were observed in either tendency or value for the estimated and separated monthly baseflow ( Figure 5).    Similar to the results of calibration, the estimated monthly baseflow did not capture the separated monthly baseflow in the dry-conditions and the low-flow regimes often in the flow duration curve plots; however, it does in the other flow regimes, which are the high-flow, the moist-conditions, and the mid-range flow regimes ( Figure 6). Based on flow duration curves in both calibration and validations processes, the estimated monthly baseflow fit to the separated monthly baseflow reasonably in the high-flow and the moistconditions; however, it did not especially in the low-flow regime. This means that the monthly baseflow approach will be reasonable in the applications with the issues regarding the high flow and the moist conditions such as flooding or nonpoint source pollution analysis. However, caution needs to be exercised when the approach is used for any applications regarding low flow such as water supply simulations in drought.

Conclusions
This study aimed to propose a method for predicting the baseflow in order to improve the shortcomings of the current long-term hydrologic impact analysis (L-THIA) model, which can only predict surface runoff. To achieve this aim, the measured streamflow was separated into surface runoff and baseflow to determine whether the L-THIA model can predict the hydrologic behavior in a given watershed through comparison with the measured flow. The model is limited in that it cannot determine the influence of the baseflow because only the surface runoff can be estimated. It has been used until now because it requires only land use maps, soil maps, and precipitation data and its computation process is simple from a user point of view. Therefore, it is necessary to improve the current L-THIA model so that it can predict the baseflow while maintaining its simplicity. As such, a method to improve the limitations of the L-THIA model was proposed by applying the baseflow prediction method based on land use and annual precipitation, which is used in the spreadsheet tool for the estimation of pollutant load (STEPL) model.
To this end, 20 independent watersheds in South Korea were selected and divided into two groups. The first group of 11 watersheds was used to propose the monthly baseflow prediction method, and the second group of nine watersheds was used to examine the proposed method. Since the proposed method uses monthly precipitation data along with the areas and coefficients associated with seven different land use types, it does not exceed the range of data used in the current L-THIA model to predict surface runoff. The simplicity of the model can, therefore, be maintained. In addition, the coefficients represent the contribution of precipitation to the baseflow for each land use type. In the case of urban areas, which are generally considered to have a low contribution to the baseflow due to their high impervious surface ratio, the coefficient was determined to be lower than that of other land uses. Wetland is considered to have a high contribution to the baseflow because of the continuous infiltration that is associated with water storage. Therefore, it has a higher coefficient than the other land use types. In addition, ponded paddy fields, which are characteristic of the agriculture in Korea, are assumed to contribute to the baseflow in a similar manner as wetland during the cultivation period from May to October. Therefore, the coefficient for agricultural land was determined to be similar to the that for wetland. In other words, it appears that the proposed monthly baseflow prediction method can sufficiently reflect the conditions of the different land uses while reflecting seasonal conditions as it uses monthly precipitation.
A proposed step in use of the approach is to separate direct runoff and baseflow from measured streamflow first, to calibrate direct runoff estimated with curve numbers, and to calibrate baseflow estimated by the suggested method, in turn. Additionally, since the proposed method does not reflect the hydrologic behavior beneath the surface and can be seen as a statistical or probabilistic approach based on the characteristics of the watersheds in Korea, caution needs to be exercised when it is applied to watersheds without ponded paddy fields. Attention also needs to be paid to the application of this monthly baseflow prediction regression equation when the watershed area used in the derivation process is exceeded. In addition, the suggested regression was developed on a monthly basis; therefore, it will be limited to apply for daily or weekly baseflow estimations.

Data Availability Statement:
The data presented and used in the study are available on the request from the corresponding author.