Data- and Model-Based Discharge Hindcasting over a Subtropical River Basin

This study aims to evaluate the performance of the Soil and Water Assessment Tool (SWAT), a simple Auto-Regressive with eXogenous input (ARX) model, and a gene expression programming (GEP)-based model in one-day-ahead discharge prediction for the upper Kentucky River Basin. Calibration of the models were carried out for the period of 2002–2005 using daily flow at a stream gauging station unaffected by the flow regulation. Validation of the calibrated models were executed for the period of 2008–2010 at the same gauging station along with another station 88 km downstream. GEP provided the best calibration (coefficient of determination (R) value 0.94 and Nash-Sutcliffe Efficiency (NSE) value of 0.88) and validation (R values of 0.93 and 0.93, NSE values of 0.87 and 0.87, respectively) results at the two gauging stations. While SWAT performed reasonably well in calibration (R value 0.85 and NSE value 0.72), its performance somewhat degraded in validation (R values of 0.85 and 0.82, NSE values of 0.65 and 0.65, for the two stations). ARX performed very well in calibration (R value 0.92, NSE value 0.82) and reasonably well in validation (R values of 0.88 and 0.92, NSE values of 0.76 and 0.85) at the two stations. Research results suggest that sophisticated hydrological models could be outperformed by simple data-driven models and GEP has the advantage to generate functional relationships that allows investigation of the complex nonlinear interrelationships among the input variables.


Introduction
The availability of water at the watershed or basin scales in addition to the spatial and temporal distribution of water are largely affected by climatic and topographic factors [1]. Soil, land cover, and land use characteristics are also considered when studying the movement and exchange of water [2]. The recent development of robust hydrologic models and significant advancement in the processing power of computers [3] allow modelers to take these factors into account when trying to solve the complexity of hydrological processes. These models, often embedded within decision support tools, are especially necessary when complete information and observed data on discharge, inflow, climate, soil moisture, or other related factors of a basin are limited [4]. Changes in water budget and fluxes within a watershed can be studied based on the estimation and simulation of the models. Several popular conceptual and physically based watershed models have been used for discharge and water quality modeling in the last three decades including CREAMS (Chemical Runoff and Erosion from Agricultural Management Systems) [5], PRMS (Precipitation Runoff Modeling System) [6], HSPF (Hydrologic Simulation Program-Fortran) [7], SWAT (Soil and Water Assessment Tool) [8], and SWBM (Spatial Water Budget Model) [9]. These models are computationally effective and require relatively small time steps to produce outputs through simulations [10]. Among these models, SWAT can analyze how the changes in land use and land cover impact the runoff and water quality in addition to its ability to quantify the hydrological responses to the variation of hydrometeorological conditions, The Kentucky River Basin covers 42 counties of the state of Kentucky, with a total area of about 18,000 km 2 . The major tributaries of Kentucky River include Red River, Eagle Creek, North Fork Kentucky River, Elkhorn Creek, Middle Fork Kentucky River, Dix River, and the South Fork Kentucky River ( Figure 1). The length of the main stem of Kentucky River is 418 km with a dense network of tributaries and streams that feed the river with a total length of approximately 25,700 km. The elevation of Kentucky River Basin ranges from 243 m to 305 m above the mean sea level with an average slope 0.13 m/km. The mean annual rainfall within the basin is approximately 1168 mm [17]. Climate in the basin is moist-continental and the average annual temperature is 13 • C. The basin experiences the minimum daily temperature in January and February (about −4 • C) while July and August bring the maximum daily temperature (about 31 • C) [18].  [19]. The older LDs (built in 19th century) were primarily built as timber crib and stone masonry structures and the relatively newer LDs (built in 20th century) were primarily built out of concrete. While most of the LDs (LD [5][6][7][8][9][10][11][12][13][14] are out of operation at present and have been sealed with concrete barriers, four LDs (LD 1-4) are still in operation (located within 65 river miles above mouth) and have been rebuilt and repaired over time [20]. To avoid the impact of the LDs on the natural flow of Kentucky River, a point that drains the North Fork Kentucky watershed, the Middle Fork Kentucky watershed, and the South Fork Kentucky watershed where none of the LDs is present is selected for calibration of the hydrologic model. This point drains about 6888 km 2 and has its discharge recorded daily by USGS 03282000 stream gauge, which is located just upstream of LD 14 at Heidelberg, Kentucky. However, in addition to the natural flow sub-basin, the validation of the hydrologic model and other models was also performed on upper Kentucky River Basin (Figure 1). The outlet of upper Kentucky River is located just upstream from LD 10, near Winchester, Kentucky at the USGS 03284000 stream gauge. With the presence of four dams LD 11-14 within the upper Kentucky River, the outlet discharge is affected, to some extent, by operation of the LDs.

The SWAT Model
SWAT is a semi-distributed, physically based eco-hydrologic model that is based on dividing the basin being modeled into smaller sub-basins, each of which can be divided into smaller hydrologic response units (HRUs) incorporating unique combinations of soil type, ground slope, and land use [22,23]. The SWAT is being widely used in hydrologic and ecologic simulations over basins of different scales in the US and other regions of a wide range of climatic and geologic conditions [24][25][26]. SWAT simulates the following water balance equation in its hydrologic simulation: To avoid the impact of the LDs on the natural flow of Kentucky River, a point that drains the North Fork Kentucky watershed, the Middle Fork Kentucky watershed, and the South Fork Kentucky watershed where none of the LDs is present is selected for calibration of the hydrologic model. This point drains about 6888 km 2 and has its discharge recorded daily by USGS 03282000 stream gauge, which is located just upstream of LD 14 at Heidelberg, Kentucky. However, in addition to the natural flow sub-basin, the validation of the hydrologic model and other models was also performed on upper Kentucky River Basin (Figure 1). The outlet of upper Kentucky River is located just upstream from LD 10, near Winchester, Kentucky at the USGS 03284000 stream gauge. With the presence of four dams LD 11-14 within the upper Kentucky River, the outlet discharge is affected, to some extent, by operation of the LDs.

The SWAT Model
SWAT is a semi-distributed, physically based eco-hydrologic model that is based on dividing the basin being modeled into smaller sub-basins, each of which can be divided into smaller hydrologic response units (HRUs) incorporating unique combinations of soil type, ground slope, and land use [22,23]. The SWAT is being widely used in hydrologic and ecologic simulations over basins of different scales in the US and other regions of a wide range of climatic and geologic conditions [24][25][26]. SWAT simulates the following water balance equation in its hydrologic simulation: where SW represents soil water content minus the 15-bar pressure water content; t represents the time in days; R represents the amount of precipitation on day i (mm); Q represents the daily runoff (mm); ET represents the daily evapotranspiration on day i (mm); P represents the percolation on day i (mm); and QR represents the return flow on day i (mm). The SWAT model can select from a variety of methods in the calculation of surface runoff, channel flow, evapotranspiration, and other watershed dynamics before running a simulation. The model thus incorporates a myriad of parameters. Arnold and Srinivasan [22] described all the parameters of the water balance equation and its components in detail.

The ARX Model
The ARX (Auto-Regressive with eXogenous input) is a computationally and conceptually simple linear regression-based representation of a random process output variable that linearly depends on its own previous values, the previous values of another external (exogenous) variable, and on a stochastic term. ARX has been widely used to model the dynamic response of a system to exogenous factors in different fields including hydrology, agriculture, chemical engineering, medicine, biological science, and energy economics [27,28]. An autoregressive model has a physical hydrologic basis in baseflow recession theory because the lag-1 serial correlation coefficient of the model is equivalent to the widely used hydrograph recession constant [29,30]. The objective in this study is to develop and implement an ARX model at LD10 and LD14 of Kentucky River Basin that can forecast the discharge one-step ahead. The following equation represents the ARX model: (2) where e(t) refers to the noise (assumed to be Gaussian); a na and b nb represents the model parameters; the order of the polynomials of the output A(q) and the input B(q) are represented by n a and n b , respectively; and the time delay between y(t) and u(t) is represented by n k . Equation (2) can be expressed as follows: where A(q) and B(q) are given by: where q −1 represents the delay operator and least-squares identification is used in the estimation of A(q) and B(q) [31,32].

The GEP Model
Gene expression programming (GEP) belongs to the 'Evolutionary Algorithms' group and, similar to other evolutionary algorithms, uses populations of individual solutions, selects and reproduces (breeds) some of them in accordance with fitness, and introduces genetic variation using mutation or recombination as prerequisites for evolution to occur [33]. A Genetic Algorithm (GA) and Genetic Programming (GP), on the other hand, are simple replicator systems with GP being more complex than GA. The fundamental difference among GA, GP, and GEP is in the nature of their individuals. The individuals are linear strings of fixed length (chromosomes) in the case of GA whereas GP consists of individuals that are of different sizes and shapes. The advantages of GP and GA are retained in GEP, making it capable of solving the coding explosion problem observed in GP while keeping the genetic operations simple similar to GA [34]. Attributes from both GA and GP are carried to GEP, and the individuals of GEP are encoded as linear strings of fixed length. These individuals are expressed in GEP as nonlinear entities afterwards having different sizes and shapes [35]. The function of Expression Trees (ETs) is to hold the genetic information. GEP follows simplistic genetic code of one-on-one relationship between symbols of the genes and the nodes of the ETs. ETs can be further divided into sub-ETs. Figure 2 shows the working principle of GEP in a brief flowchart. The model randomly generates chromosomes of initial population, expresses each individual chromosome, evaluates them based on a fitness function, and modifies the best fitted ones [36]. Evolved individuals of the new generation possess developed expressions of the genomes and evolving processes are repeated until a predefined best-fitting value is achieved or for a predefined number of generations [37].
Water 2021, 13, x FOR PEER REVIEW 5 of 21 genetic information. GEP follows simplistic genetic code of one-on-one relationship between symbols of the genes and the nodes of the ETs. ETs can be further divided into sub-ETs. Figure 2 shows the working principle of GEP in a brief flowchart. The model randomly generates chromosomes of initial population, expresses each individual chromosome, evaluates them based on a fitness function, and modifies the best fitted ones [36]. Evolved individuals of the new generation possess developed expressions of the genomes and evolving processes are repeated until a predefined best-fitting value is achieved or for a predefined number of generations [37].    were extracted from USGS GIRAS (Geographic Information Retrieval and Analysis System) LULC (Land Use Land Cover) files. Data gaps and overlaps were removed through manual adjustment and the final land use map was re-classified by land use groups. Deciduous forest (FRSD) is predominant in the basin (covering 84.1% of total area) while pasture (PAST) and barren (BARR) cover 11.2% and 1.9% of the area, respectively ( Figure  4).  SSURGO (Soil Survey Geographic Database) soil data of the NRCS (Natural Resources Conservation Service) for the upper Kentucky River Basin were obtained from the Soil Data Mart (http://soildatamart.nrcs.usda.gov (accessed on 23 June 2021)). SSURGO soil maps allow users to link with SSURGO soil database in ArcSWAT through the MUKEY attribute. The dominant soil group in the upper Kentucky River Basin is ultisols, also known as red clay soil, covering 90% of the basin's total area. Inceptisols and alfisols are found in scattered areas across the basin with a total area of less the 10% of the basin's area. Figure 5 suggests that the dominant soil hydraulic group in the upper Kentucky River Basin is Group B (45%; silt loam/loam, moderately well to well drained, moderate infiltration, and runoff rate) followed by Group C (25%; sandy clay loam, low infiltration rate) and Group A (23%; sand/loamy sand/sandy loam, well drained, low runoff, high infiltration rate). Water 2021, 13, x FOR PEER REVIEW 7 of 21 SSURGO (Soil Survey Geographic Database) soil data of the NRCS (Natural Resources Conservation Service) for the upper Kentucky River Basin were obtained from the Soil Data Mart (http://soildatamart.nrcs.usda.gov (accessed on 23 June 2021)). SSURGO soil maps allow users to link with SSURGO soil database in ArcSWAT through the MUKEY attribute. The dominant soil group in the upper Kentucky River Basin is ultisols, also known as red clay soil, covering 90% of the basin's total area. Inceptisols and alfisols are found in scattered areas across the basin with a total area of less the 10% of the basin's area. Figure 5 suggests that the dominant soil hydraulic group in the upper Kentucky River Basin is Group B (45%; silt loam/loam, moderately well to well drained, moderate infiltration, and runoff rate) followed by Group C (25%; sandy clay loam, low infiltration rate) and Group A (23%; sand/loamy sand/sandy loam, well drained, low runoff, high infiltration rate).

Meteorological Data
SWAT simulation requires meteorological data inputs including daily precipitation, maximum and minimum air temperature, solar radiation, humidity, and wind speed. Global Historical Climatology Network-Daily (GHCN-Daily) product from the National Climatic Data Center (NCDC) was used as the rainfall product. The product is based on observations by rain gauges within or near the basin at a daily time scale. Data were ob- Meteorological Data SWAT simulation requires meteorological data inputs including daily precipitation, maximum and minimum air temperature, solar radiation, humidity, and wind speed. Global Historical Climatology Network-Daily (GHCN-Daily) product from the National Climatic Data Center (NCDC) was used as the rainfall product. The product is based on observations by rain gauges within or near the basin at a daily time scale. Data were obtained from the Hydrologic Information System of the Consortium of Universities for the Advancement of Hydrologic Sciences (CUAHSI-HIS, cuahsi.org (accessed on 23 June 2021)). The CUAHSI HydroDesktop application allows users to query a specific type of data for a time-period for a specific area. Queried precipitation data for the two periods, 2002-2005 and 2008-2010, were downloaded at daily time step for 21 rain gauges. Temperature, humidity, and wind speed data were obtained from the National Climatic Data Center (NCDC) of the National Oceanic and Atmospheric Administration (NOAA) (http://gis.ncdc.noaa.gov/map/cdo/ (accessed on 23 June 2021)). These data were recorded at an hourly time-step for the study period and were averaged to daily values. Solar radiation data used in this study are satellite-based products that are estimated from NOAA Geostationary Operational Environmental Satellites (GOES) visible satellite images and were available to download at https://solaranywhere.com/ (accessed on 23 June 2021).

The ARX Model
The ARX model only requires daily discharge and precipitation data as input with precipitation as the exogenous variable. The ARX model uses the same precipitation time series data that have been used in the SWAT model. The selection of number of lagged days of precipitation and discharge used in the model affects the result and the optimum combination varies with the size, shape, and characteristics of the watershed [38]. The ARX models were explored with different combination of lagged days and the model with the best performance is the one with three lagged days of precipitation and discharge. The following equation was used to calculate the simulated discharge: where Q(t) is the simulated discharge; Q(t − 1), Q(t − 2), and Q(t − 3) are lagged discharges; P(t − 1), P(t − 2), and P(t − 3) are lagged precipitations; X1-X6 are coefficients obtained through regression; and C is the intercept.

The GEP Model
One of the main advantages of data-driven models is the simplicity of data and elimination of complexities of physically-based models [39]. The solution model is typically an easy-to-track linear or non-linear formula that relates the dependent and independent variables, and the user can control the complexity of the solution model. This study used the commercial software GeneXproTools 5.0 for GEP modelling in which appropriate dataset(s) are fed as input and the software can analyze the input dataset(s) and gives the best model(s) to fit the observed output. Discharge data alone or discharge data along with precipitation data can be used as input data for the models with the output being the one-day-ahead discharge. Selection of mathematical functions (e.g., Sqrt, Exp, Ln, Log, xˆt, Sine, Cosine, average, among many more) for the model can either be performed manually or the software can select the most appropriate functions. The best performing model is selected from numerous generated models and the expression tree (ET) of the selected model includes nine sub-ETs. The sub-ET 7 and the sub-ET 9 of the selected model ( Figure 6) are expressed by the algebraic formulas in Equations (7) and (8): Water 2021, 13, 2560 9 of 20 or the software can select the most appropriate functions. The best performing model is selected from numerous generated models and the expression tree (ET) of the selected model includes nine sub-ETs. The sub-ET 7 and the sub-ET 9 of the selected model ( Figure 6) are expressed by the algebraic formulas in Equations (7) and (8): where 0; where Here, A, B, C, D, E, F, G, and H are simplified forms to express mathematical formulas with less complexity. d0 is precipitation of the day in which discharge is to be predicted and d1 is discharge of the previous day of prediction day. c0, c1, c2, c3, c4, c5, c6, c7, c8, and c9 are numerical constants for each gene. The selected model contains total nine genes Here, A, B, C, D, E, F, G, and H are simplified forms to express mathematical formulas with less complexity. d0 is precipitation of the day in which discharge is to be predicted and d1 is discharge of the previous day of prediction day. c0, c1, c2, c3, c4, c5, c6, c7, c8, and c9 are numerical constants for each gene. The selected model contains total nine genes and the values of c0 to c9 varies for each gene. These sub-ETs are considered as phenotypes while the genotypes are as follows: Genotype for sub-ET 9: * Atan 3Rt * NOT d1 + − X2 NOT + 3Rt -Avg2 * c3 d0 c8 c3 c4 d1 d0 d1 Genotype for sub-ET 7: + Avg2 Atan d0 NOT * /+ c7 d0 X2 c8 * Avg2 Avg2 d0 d0 c9 d1 d1 (10)

Observed Discharge Data
Daily discharge data recorded at two stream gauges, USGS 03282000 at Heidelberg, Kentucky and USGS 03284000 near Winchester, Kentucky were used in this study. USGS 03282000 stream gauge measures discharge at the confluence of North Fork Kentucky River, Middle Fork Kentucky River, and the South Fork Kentucky River and represents the natural flow of the upper Kentucky River as it is located upstream of all the locks and dams. Meanwhile, USGS 03284000 stream gauge measures the outlet of the whole upper Kentucky River. Located downstream of four locks and dams (LD 11, 12, 13, and 14) on the main stem of upper Kentucky River, discharge at USGS 03284000 stream gauge is affected by the operation of the four LDs.
In general, discharge recorded at the downstream station, USGS 03284000, is higher than that measured at USGS 03282000 station because the river gains more water from downstream tributaries. Figure 7 shows time series of daily flows recorded at the two stations during the calibration and validation periods. On average, discharge at the downstream station is 26% and 23% higher for the calibration and validation periods, respectively. However, the flow at the downstream station is sometimes lower than that at the upstream stream. For instance, the peak flows during the rainfall events in January and May 2002 or in April, June, and September 2003 present are lower discharge at USGS 03284000 at the downstream station. That is mainly due to the flow regulation by the dams between the two stations. However, on average, measured discharge at USGS 03284000 gauge is about 30% lower than value at above measuring location in those periods.

Model Performance Evaluation
Three standard statistical measures were used in the model performance evaluation: correlation coefficient (R), Nash-Sutcliffe Efficiency (NSE), and Error in runoff volume (Bias). These measures are defined as follows: where ( ) and ( ) represents the observed and the simulated discharge (on day t), respectively; and represents the average of the observed and the simulated discharge, respectively; and N represents the total number of records.
The degree of linear correlation (between the simulated and the observed discharge time series) can be evaluated using the R 2 statistical measure (value of 1 indicates the perfect match between the simulated and the observed discharge [40]. The NSE indicates the ability of the model to reproduce observed discharge and holds values smaller or equal to 1 [41]. Larger NSE values imply high ability of in reproducing the observed data. E_peak and E_runoff measure the percentage differences between simulated and observed peak flow runoff volume, respectively. Table 1 represents the performance ratings of NSE, R 2 , and PBIAS:

Model Performance Evaluation
Three standard statistical measures were used in the model performance evaluation: correlation coefficient (R), Nash-Sutcliffe Efficiency (NSE), and Error in runoff volume (Bias). These measures are defined as follows: where Qo(t) and Qs(t) represents the observed and the simulated discharge (on day t), respectively; Qo and Qs represents the average of the observed and the simulated discharge, respectively; and N represents the total number of records. The degree of linear correlation (between the simulated and the observed discharge time series) can be evaluated using the R 2 statistical measure (value of 1 indicates the perfect match between the simulated and the observed discharge [40]. The NSE indicates the ability of the model to reproduce observed discharge and holds values smaller or equal to 1 [41]. Larger NSE values imply high ability of in reproducing the observed data. E_peak and E_runoff measure the percentage differences between simulated and observed peak flow runoff volume, respectively. Table 1 represents the performance ratings of NSE, R 2 , and PBIAS:

Model Calibration and Validation
For all three models, the natural flow of the upper Kentucky River for the period of 2002-2005 as observed by USGS 03282000 stream gauge above LD 14 was used for calibration. When necessary, manual calibration was performed by modifying values of model parameters individually to reach the best fit between simulated and observed discharge. For the SWAT model, the calibration involved adjusting the values of several parameters. SWAT outputs were more sensitive to CH_N2 (Manning's N value for stream channels), CN2 (Soil Conservation Service (SCS) curve numbers), SOL_K (soil hydraulic conductivity), and SOL_AWC (available water capacity of the soil). Other parameters with less impact on model outputs, such as ALPHA_BF (base flow alpha factor), ESCO (soil evaporation compensation factor), and GW_REVAP (groundwater "revap" coefficient) were also included in the calibration. In case of the ARX model, several combinations were tried out with different numbers of lagged days of precipitation and discharge to obtain the combination with the best agreement between the ARX-predicted and observed one-day-ahead discharge was identified. For the GEP model, the number of chromosomes and genes, the head size, and the linking functions were controlled manually with several iterations to obtain the best model combination.
For validation, the calibrated models were used to predict daily discharge at the calibration stream gauge and for the entire upper Kentucky River Basin with the outlet near USGS 03284000 stream gauge for the 2008-2010 period. Validation at the downstream station assesses the effect of regulation on the discharge predictability and how the calibrated model can simulate the basin response beyond the calibration area.

Calibration Results
The 4-year calibration period included numerous rainfall events with an average annual precipitation of 1360 mm (16% above average). Initially, the SWAT model was run using the default values of model parameters and the result from the un-calibrated model overestimated both the base-flows and the peaks of Kentucky River discharge. The model was then calibrated manually with the objective of obtaining the best values of the statistical measures described above. Table 2 presents the calibrated values of the major SWAT model parameters. The SWAT model was able to adequately reproduce the daily discharge (Figures 8 and 9) with good R 2 and NSE values of 0.72 and 0.71, respectively ( Table 3). The errors in peak discharge and runoff volume were 10.1% and 16.9%, respectively.
Data-driven models require no input parameters for calibration. In this study, only lagged-day discharge and precipitation data were used, and the best prediction statistical measures were obtained with three previous time steps of rainfall and one previous time step of streamflow observations as inputs for both models (ARX and GEP). For the ARX model, the R 2 , NSE, and bias for the calibration are 0.85, 0.85, and −0.02%, respectively ( Table 3). The same measures for the GEP model are 0.88, 0.88, and −0.17%, respectively (Table 3). These measures indicate that the GEP and ARX models clearly outperform the SWAT model with the GEP model producing the best calibration results.
SWAT model performance generally improved in the last 2 years of the calibration period because such a physically based model needs a spin-up time for the effects of errors in assumed initial conditions estimates to diminish. ARX and GEP, on the other hand, are data-driven models and do not require any warm-up period. Their performances in individual years appear to be random and no specific trend was observed.    Data-driven models require no input parameters for calibration. In this study, only lagged-day discharge and precipitation data were used, and the best prediction statistical measures were obtained with three previous time steps of rainfall and one previous time step of streamflow observations as inputs for both models (ARX and GEP). For the ARX model, the R 2 , NSE, and bias for the calibration are 0.85, 0.85, and −0.02%, respectively ( Table 3). The same measures for the GEP model are 0.88, 0.88, and −0.17%, respectively (Table 3). These measures indicate that the GEP and ARX models clearly outperform the SWAT model with the GEP model producing the best calibration results.

Validation Results
The three models were validated for the period 2008-2010, which had an average annual precipitation of about 10% lower than the calibration period. As mentioned earlier, validation was performed at LD14 where models were calibrated and at LD10, which is located approximately 88 km downstream. The comparison between observed and simulated daily discharge at LD14 and LD10 during validation period of 2008-2010 is represented in Figures 10 and 11. The statistical performance measures are summarized in Table 4.    Table 4 indicate that GEP model has the best performance at both locations with very good performance rating which is consistent with prior studies [14,36,45]. As in the calibration results, GEP model outperformed the other two models with R 2 values of 0.86 and 0.87 and NSE values of 0.86 and 0.87 for validation at LD10 and LD14, respectively. However, the ARX model and the GEP model provided similar performance at LD10. The performance of the ARX model is similar to the previous studies (good/very good performance rating) in one-day-ahead discharge prediction [46,47]. The SWAT model displayed moderate performance with NSE of about 0.65 and an R 2 value of about 0.64. This observation is in agreement with the previous studies which concluded performance rating of SWAT model was satisfactory or good [48][49][50]. It is very clear that both GEP and ARX perform better than SWAT during low-flow periods.  Table 4 indicate that GEP model has the best performance at both locations with very good performance rating which is consistent with prior studies [14,36,45]. As in the calibration results, GEP model outperformed the other two models with R 2 values of 0.86 and 0.87 and NSE values of 0.86 and 0.87 for validation at LD10 and LD14, respectively. However, the ARX model and the GEP model provided similar performance at LD10. The performance of the ARX model is similar to the previous studies (good/very good performance rating) in one-day-ahead discharge prediction [46,47]. The SWAT model displayed moderate performance with NSE of about 0.65 and an R 2 value of about 0.64. This observation is in agreement with the previous studies which concluded performance rating of SWAT model was satisfactory or good [48][49][50]. It is very clear that both GEP and ARX perform better than SWAT during low-flow periods. Water 2021, 13, x FOR PEER REVIEW 18 of 21  Statistical analysis of SWAT results indicates that the model performed better during the second and the third year of validation period. This once more suggests that the model performance significantly improved after a warm-up period. ARX and GEP do not have this problem. Moreover, ARX and GEP are much less affected by the length of the calibration period compared to SWAT. For example, using only 1 year in calibrating the models reduced by SWAT validation NSE values by more than 25% while NSE in the ARX model deteriorated by about 5% and GEP performance remained virtually unaffected. It is well known in hydrology that the spatio-temporal and spatial heterogeneity of precipitation and watershed characteristics and the myriad of processes involved in runoff and discharge generation make the rainfall-runoff relationships highly nonlinear, complex, and hard to accurately simulate. The SWAT model was run at daily time step in this study due to the resolution of precipitation data, which could have affected its performance. Moreover, the output of SWAT depends on the accuracy of the input data, which can never be guaranteed. Additionally, more vigorous calibration of SWAT model could have improved the accuracy of its predictions. Nonetheless, the SWAT results reported in this study are not inferior to those reported in similar studies.

Conclusions
Streamflow forecasting is of enormous importance for water budget planning and numerous operational water resources application. The study demonstrates that machine learning (ML) techniques can be useful in the prediction of hydrologic variables, such as streamflow, particularly when the underlying processes have complex nonlinear interrelationships. However, many ML techniques, unlike GEP, do not produce solutions that provide insight into the explicit relationship between input and output variable. These characteristics are illustrated using a simple, though very widely used, autoregressive  Statistical analysis of SWAT results indicates that the model performed better during the second and the third year of validation period. This once more suggests that the model performance significantly improved after a warm-up period. ARX and GEP do not have this problem. Moreover, ARX and GEP are much less affected by the length of the calibration period compared to SWAT. For example, using only 1 year in calibrating the models reduced by SWAT validation NSE values by more than 25% while NSE in the ARX model deteriorated by about 5% and GEP performance remained virtually unaffected. It is well known in hydrology that the spatio-temporal and spatial heterogeneity of precipitation and watershed characteristics and the myriad of processes involved in runoff and discharge generation make the rainfall-runoff relationships highly nonlinear, complex, and hard to accurately simulate. The SWAT model was run at daily time step in this study due to the resolution of precipitation data, which could have affected its performance. Moreover, the output of SWAT depends on the accuracy of the input data, which can never be guaranteed. Additionally, more vigorous calibration of SWAT model could have improved the accuracy of its predictions. Nonetheless, the SWAT results reported in this study are not inferior to those reported in similar studies.

Conclusions
Streamflow forecasting is of enormous importance for water budget planning and numerous operational water resources application. The study demonstrates that machine learning (ML) techniques can be useful in the prediction of hydrologic variables, such as streamflow, particularly when the underlying processes have complex nonlinear interrelationships. However, many ML techniques, unlike GEP, do not produce solutions that provide insight into the explicit relationship between input and output variable. These characteristics are illustrated using a simple, though very widely used, autoregressive Statistical analysis of SWAT results indicates that the model performed better during the second and the third year of validation period. This once more suggests that the model performance significantly improved after a warm-up period. ARX and GEP do not have this problem. Moreover, ARX and GEP are much less affected by the length of the calibration period compared to SWAT. For example, using only 1 year in calibrating the models reduced by SWAT validation NSE values by more than 25% while NSE in the ARX model deteriorated by about 5% and GEP performance remained virtually unaffected. It is well known in hydrology that the spatio-temporal and spatial heterogeneity of precipitation and watershed characteristics and the myriad of processes involved in runoff and discharge generation make the rainfall-runoff relationships highly nonlinear, complex, and hard to accurately simulate. The SWAT model was run at daily time step in this study due to the resolution of precipitation data, which could have affected its performance. Moreover, the output of SWAT depends on the accuracy of the input data, which can never be guaranteed. Additionally, more vigorous calibration of SWAT model could have improved the accuracy of its predictions. Nonetheless, the SWAT results reported in this study are not inferior to those reported in similar studies.

Conclusions
Streamflow forecasting is of enormous importance for water budget planning and numerous operational water resources application. The study demonstrates that machine learning (ML) techniques can be useful in the prediction of hydrologic variables, such as streamflow, particularly when the underlying processes have complex nonlinear interrelationships. However, many ML techniques, unlike GEP, do not produce solutions that provide insight into the explicit relationship between input and output variable. These char-acteristics are illustrated using a simple, though very widely used, autoregressive model and a classic hydrologic prediction problem. The Kentucky River case study demonstrates the superiority of GEP and its ability to provide fast, relatively accurate, and computationally inexpensive estimation of the underlying physical/functional processes that are employable beyond forecast applications. This study used SWAT, ARX, and GEP models to simulate daily river discharge of the Upper Kentucky River to examine the capability of the three models to reproduce river discharge using four evaluation metrics.
The two data-driven models outperformed the physically based SWAT model, which produced less accurate but still acceptable results. The performance metrics indicate that the GEP model provided the best performance with the ARX model being not far off. The SWAT model overestimated runoff and underestimated peak discharge while the ARX and GEP models underestimated the peak discharge. Runoff is mostly underestimated by GEP and ARX except for a few exceptions. The percentage of error in predicting peak discharge and runoff was higher for the hydrologic model than data-driven models most of the time. Research results also suggest that a warm-up period (1 or 2 years) is necessary for SWAT to achieve improved and reliable performance. Performance of data-driven models, however, exhibits no specific trend with time and largely dependent on selection of modelling datasets and adaptation of methods. Overall, statistical measures for model calibration and validation suggest that the data-driven models, specially the GEP model, provided outstanding performance to reproduce daily discharge in alternative locations of the Upper Kentucky River under changing climatic variables.
A major advantage of GEP, unlike many other data-based models, is that it can provide an analytical form of the relationship between input and output variables. This will allow modelers to gain insight into this relationship and perform modifications if necessary. This study used the GEP model in its simplest form. Modelers also have the flexibility to improve the prediction capability of the GEP model by judiciously including additional input variables that can affect the basin response, e.g., temperature. GEP models can be further improved by preprocessing the input data, e.g., by applying wavelet-de-noising or modeling low-and high-flow periods separately [51]. The flexibility of GEP suggests immense potential applications in hydrometeorology such as estimation of precipitation, evapotranspiration, and soil moisture based on direct and indirect measurements. In general, GEP will allow progressive model identification when new data become available by adapting the functional relationship for the former and modifying the parameters of the latter in addition to the recursive model updating that allows to track temporal changes. Lastly, unlike the SWAT model, GEP and ARX model do not require a spin-up period as they are less sensitive to the extent of the calibration time series.