A Hybrid Approach Combining Conceptual Hydrological Models, Support Vector Machines and Remote Sensing Data for Rainfall-Runo ﬀ Modeling

: Understanding catchment response to rainfall events is important for accurate runo ﬀ estimation in many water-related applications, including water resources management. This study introduced a hybrid model, the Tank-least squared support vector machine (LSSVM), that incorporated intermediate state variables from a conceptual tank model within the least squared support vector machine (LSSVM) framework in order to describe aspects of the rainfall-runo ﬀ (RR) process. The e ﬃ cacy of the Tank-LSSVM model was demonstrated with hydro-meteorological data measured in the Yongdam Catchment between 2007 and 2016, South Korea. We ﬁrst explored the role of satellite soil moisture (SM) data (i.e., European Space Agency (ESA) CCI) in the rainfall-runo ﬀ modeling. The results indicated that the SM states inferred from the ESA CCISWI provided an e ﬀ ective means of describing the temporal dynamics of SM. Further, the Tank-LSSVM model’s ability to simulate daily runo ﬀ was assessed by using goodness of ﬁt measures (i.e., root mean square error, Nash Sutcli ﬀ e coe ﬃ cient (NSE), and coe ﬃ cient of determination). The Tank-LSSVM models’ NSE were all classiﬁed as “very good” based on their performance during the training and testing periods. Compared to individual LSSVM and Tank models, improved daily runo ﬀ simulations were seen in the proposed Tank-LSSVM model. In particular, low ﬂow simulations demonstrated the improvement of the Tank-LSSVM model compared to the conventional tank model. ﬀ is detected during low ﬂow periods. This result suggests that the tank model alone may be limited in its ability to describe low ﬂow dynamics adequately. A more rigorous approach to low ﬂow simulation is to use a hybrid model that incorporates the intermediate variables obtained from the tank model into the LSSVM-based RR modeling framework. runo ﬀ data. The ﬁgure indicates that rapid low ﬂow ﬂuctuations are better simulated by the Tank-LSSVM model.


Introduction
A hydrologic model is a simplified representation of a complex system that facilitates the understanding of the hydrologic cycle in a simplified manner. The rainfall-runoff (RR) model plays a central role in describing different aspects of water resources management. These include streamflow record extension for designing hydraulic structures [1], operational applications to hydraulic structures [2][3][4], streamflow prediction for ungauged catchments [5], and assessments of land use and climate change impacts [6,7]. Over the last three decades, the hydrology field has focused on the development and evaluation of runoff prediction models. RR models have undergone considerable efforts and improvements to understand the impacts of rainfall events on catchment response, but whether these improvements meet scientific and practical demands is still in question [8]. Spatio-temporal variability is substantially dependent on spatial aspects. That is, the runoff response to rainfall is intricately linked to influencing factors such as climate conditions; land cover; soil properties; topography; and human activities, including irrigation and urbanization [6,9,10].

1.
How much do the intermediate variables obtained from the RR model correlate with in situ SM or remotely sensed SM products? 2.
Can the intermediate variables be effective in RR modeling within a machine learning-based regression framework, particularly for low flow simulations in a hybrid model?
This study investigates a hybrid RR model that uses intermediate state variables obtained from a conceptual model (i.e., the tank model) within an LSSVM-based regression framework. The fundamental hypothesis behind the proposed hybrid approach is that the combined use of different models may capture complex features of the RR process, thus providing a better representation than individual models. First, we built the tank model and explored its performance in terms of simulating daily runoff discharge in the Yongdam Catchment, South Korea. We assumed that the water depths at each tank were the intermediate state variables to be considered as a proxy measure of SM. We then explored the water depths at each tank and their role in the LSSVM-based RR model. In addition, we examined the potential use of remotely sensed SM products and combined them with the state variables.
The hydrologic and satellite data used in this study are illustrated in the following section. The theoretical backgrounds for the RR model and the machine learning model are presented in Section 3. A detailed discussion of the proposed modeling scheme is provided in Section 4, and this paper ends with a summary of the findings and future work in Section 5.

Study Area and In Situ Observations
The Yongdam Catchment has a drainage area of 930 km 2 . The primary land use consists of forest, paddy, and mountainous areas, covering 71.1%, 9.4%, and 9.9% of the catchment area, respectively. An annual rainfall of 1299 mm and runoff depth of 680 mm was measured in the area between 2007 and 2016. The Yongdam Multipurpose Dam, operated by Korea Water Resources Cooperation (K-water), was built in 2000 and used for water supply (1,050,000 m 3 /d), hydroelectric power (24,400 kW), and flood control [44]. Hydrologic data such as runoff, rainfall, and SM records are measured by K-water (http://english.kwater.or.kr). There are six hydrologic stations where precipitation has been measured since 2001. Here, the Yongdam dam inflow records are regarded as the unregulated flow (or natural flow). SM has been recorded since 2014 using a time-domain reflectometer (TDR; [45]). The study area and the location of stations used in this research are presented in Figure 1. Here, the areal rainfall is defined by Thiessen Polygons, and climate data (i.e., air temperature, relative humidity, wind speed, and hours of sunshine) used to estimate reference evapotranspiration were obtained from the Jangsu weather station operated by the Korea Meteorological Administration (KMA; https://web.kma.go.kr/eng/).

Satellite SM Measurements
The European Space Agency (ESA) released satellite-based SM products on a global scale by combining active and passive microwave sensors [33]. The SM products from the individual sensors were blended by spatiotemporal resampling and rescaling (i.e., the cumulative distribution function matching) techniques [46]. The ESA provides SM datasets from 1978 to 2016 to the public domain with different units (i.e., volumetric units (m 3 /m 3 ) for the passive and combined product, and percent of saturation units (%) for the active dataset). The active measurements were obtained by merging Active Microwave Instruments Windscat (AMI-WS) and ASCAT. Passive products were acquired by combining several remote sensing data sources, including SMMR, SSM/I, TRMM Microwave Imager (TMI), AMSR-E, WindSat, AMSR2 (successor to AMSR-E) and SMOS data. Additional details on the methodology and comparison to the ESA CCI SM can be found in the literature [33,46,47]. The ESA CCI SM products (v04.2) are used in this study and were obtained from their website (https://www.esa-soilmoisture-cci.org/; accessed and downloaded on 17th September 2018). The active-passive combined SM products with a 0.25° spatial resolution were chosen for this study due to their higher temporal resolution and better accuracy compared to both the active and passive retrievals [48]. It should be noted that both the satellite soil moisture retrievals and in situ soil moisture were averaged over the entire watershed for a representation. To be specific, the satellite pixel values whose centroids are located in the study site are extracted then averaged. Similarly, daily in situ soil moisture data are first collected then averaged to reproduce a catchment mean soil moisture time series.

The Tank Model
The tank model is classified as a deterministic, conceptual, and continuous model. Similar to other conceptual models consisting of a series of interconnected subsystems (i.e., storages), the tank

Satellite SM Measurements
The European Space Agency (ESA) released satellite-based SM products on a global scale by combining active and passive microwave sensors [33]. The SM products from the individual sensors were blended by spatiotemporal resampling and rescaling (i.e., the cumulative distribution function matching) techniques [46]. The ESA provides SM datasets from 1978 to 2016 to the public domain with different units (i.e., volumetric units (m 3 /m 3 ) for the passive and combined product, and percent of saturation units (%) for the active dataset). The active measurements were obtained by merging Active Microwave Instruments Windscat (AMI-WS) and ASCAT. Passive products were acquired by combining several remote sensing data sources, including SMMR, SSM/I, TRMM Microwave Imager (TMI), AMSR-E, WindSat, AMSR2 (successor to AMSR-E) and SMOS data. Additional details on the methodology and comparison to the ESA CCI SM can be found in the literature [33,46,47]. The ESA CCI SM products (v04.2) are used in this study and were obtained from their website (https: //www.esa-soilmoisture-cci.org/; accessed and downloaded on 17th September 2018). The active-passive combined SM products with a 0.25 • spatial resolution were chosen for this study due to their higher temporal resolution and better accuracy compared to both the active and passive retrievals [48]. It should be noted that both the satellite soil moisture retrievals and in situ soil moisture were averaged over the entire watershed for a representation. To be specific, the satellite pixel values whose centroids are located in the study site are extracted then averaged. Similarly, daily in situ soil moisture data are first collected then averaged to reproduce a catchment mean soil moisture time series.

The Tank Model
The tank model is classified as a deterministic, conceptual, and continuous model. Similar to other conceptual models consisting of a series of interconnected subsystems (i.e., storages), the tank model is Remote Sens. 2020, 12, 1801 5 of 21 composed of three vertically interconnected tanks (e.g., 3-tank), as illustrated in Figure 2. The structure of the tank model, such as the number of tanks and their associated side outlets, can be formulated in terms of physical catchment attributes and climatological conditions. For instance, a model with two tanks (e.g., 2-tank) was used to evaluate the RR relationship for a paddy field [23], while [14] used a 4-tank model to represent the deep percolation process in a forested region.
The structure of the tank model, such as the number of tanks and their associated side outlets, can be formulated in terms of physical catchment attributes and climatological conditions. For instance, a model with two tanks (e.g., 2-tank) was used to evaluate the RR relationship for a paddy field [23], while [14] used a 4-tank model to represent the deep percolation process in a forested region. The 1 st tank's level is determined by rainfall, while the other two tanks' depths are governed by infiltration and percolation. The tanks are depleted through evapotranspiration and runoff, and the amount of water left in each tank represents catchment storage. A continuous daily rainfall is used as input for the 1 st tank; evapotranspiration is assumed to occur in all three tanks. If there is insufficient water in the 1 st tank, the storage level in the 2 nd tank is reduced to fulfill the demand for evapotranspiration in the 1 st tank. Similarly, a lack of water in both of the top two tanks affects storage in the 3 rd tank. The side outlets in each tank simulate runoff in different layers, including surface runoff, intermediate runoff, and baseflow. The total runoff is calculated by a summation of the runoff accumulated from the side outlets as follows [13]: where is total runoff and refers to the runoff of j th side outlet at the i th tank (mm) with the associated runoff coefficient . represents the storage of the i th tank (mm). is the height of side outlet for the j th side outlet in the i th tank (mm). More details on the model equation can be found in Appendix A.
To accurately estimate runoff with the tank model, reliable daily rainfall and evapotranspiration datasets over a given catchment area are required for the input data. In our study, we used daily rainfall sequences from six stations and the areal mean rainfall over the catchment area calculated by the Thiessen polygon method. The standard FAO-56 Penman-Monteith method was used to estimate the reference evapotranspiration [49]. The potential evapotranspiration estimates were adjusted via the model's calibration process and considering the water balance over the catchment between the input (rainfall) and output (runoff). The 1st tank's level is determined by rainfall, while the other two tanks' depths are governed by infiltration and percolation. The tanks are depleted through evapotranspiration and runoff, and the amount of water left in each tank represents catchment storage. A continuous daily rainfall is used as input for the 1st tank; evapotranspiration is assumed to occur in all three tanks. If there is insufficient water in the 1st tank, the storage level in the 2nd tank is reduced to fulfill the demand for evapotranspiration in the 1st tank. Similarly, a lack of water in both of the top two tanks affects storage in the 3rd tank. The side outlets in each tank simulate runoff in different layers, including surface runoff, intermediate runoff, and baseflow. The total runoff is calculated by a summation of the runoff accumulated from the side outlets as follows [13]: where Q is total runoff and q ij refers to the runoff of j th side outlet at the i th tank (mm) with the associated runoff coefficient a ij . ST i represents the storage of the i th tank (mm). H ij is the height of side outlet for the j th side outlet in the i th tank (mm). More details on the model equation can be found in Appendix A.
To accurately estimate runoff with the tank model, reliable daily rainfall and evapotranspiration datasets over a given catchment area are required for the input data. In our study, we used daily rainfall sequences from six stations and the areal mean rainfall over the catchment area calculated by the Thiessen polygon method. The standard FAO-56 Penman-Monteith method was used to estimate the reference evapotranspiration [49]. The potential evapotranspiration estimates were adjusted via the model's calibration process and considering the water balance over the catchment between the input (rainfall) and output (runoff).

The Least Square Support Vector Machine (LSSVM) Model
The SVM developed by [50] has been widely used for classification and regression tasks in many different fields, including RR predictions [31,51]. The LSSVM is a modified version of the SVM. More specifically, the LSSVM differs from the conventional SVM in that the LSSVM approach uses a set of linear equations for solving optimization problems instead of the quadratic form used in the conventional SVM [9,52]. The LSSVM is presented schematically in Figure 3. The LSSVM adopts a least squares linear system as the loss function with two-layer networks. Below is a brief description of the LSSVM; a more detailed explanation can be found in the literature [53].
where φ(•) is the feature map embedding the input data into a higher-dimensional feature space, is a weight vector, and is a bias term. For the LSSVM regression, the optimization problem is given as follows: Minimize: where is the regulation parameter and is the error variable. Eq. (3) can be rewritten by introducing Lagrange multipliers as follows: where is the Lagrangian multiplier and ( , ) is the kernel function.
σ is the width of the radial basis function. The LSSVM's performance is highly dependent on the kernel function. Among the many kernel functions, such as linear, polynomial, sigmoidal, and radial basis functions (RBFs), an RBF was used for its practical aspects. The RBF is more flexible than the others and has fewer parameters to estimate Let x k , y k N k=1 be an N length set of datasets with the input (x ∈ R N ) and output (y ∈ R), where R N denotes the N-dimensional input/output vector space. The LSSVM model can be formulated as follows: where ϕ(·) is the feature map embedding the input data into a higher-dimensional feature space, w is a weight vector, and b is a bias term. For the LSSVM regression, the optimization problem is given as follows: Minimize : where γ is the regulation parameter and e 2 k is the error variable. Equation (3) can be rewritten by introducing Lagrange multipliers as follows: Remote Sens. 2020, 12, 1801 where a k is the Lagrangian multiplier and K(x k , x l ) is the kernel function.
σ is the width of the radial basis function. The LSSVM's performance is highly dependent on the kernel function. Among the many kernel functions, such as linear, polynomial, sigmoidal, and radial basis functions (RBFs), an RBF was used for its practical aspects. The RBF is more flexible than the others and has fewer parameters to estimate [30]. The LSSVM with a Coupled Simulated Annealing (CSA) optimization algorithm [54] determines an initial set of suitable parameters based on five multiple starters. These parameters are used in the second optimization procedure to further fine-tune the parameters. A simplex optimization approach [55] is then employed here for tuning the parameters through a cross-validation procedure by partitioning samples into training data and test data. To compare the tank model under the same conditions, hydrologic data used for the calibration period (2007-2013) were also used for the LSSVM model's training process. The remaining verification period data (2014-2016) were used for the testing phase.

The Tank-LSSVM Hybrid Model
Our hybrid RR model, hereinafter called the Tank-LSSVM model, was constructed using the output (i.e., ST 1 , ST 2 , ST 3 ) obtained from the tank model in an LSSVM-based regression framework. In this study, the tank model's storage variables ST 1 , ST 2 , and ST 3 were considered intermediate state variables representing the SM temporal variation over the catchment area. A schematic representation of the proposed hybrid model, together with specific implementation procedures, is presented in Figure 4. Input variable (or predictor) selection is important in a machine learning-based approach [26]. In this study, predictors considered for the Tank-LSSVM model were rainfall, tank storage, and ESA CCI SM data. Satellite SM products are of particular interest for ungauged watersheds. We conducted an experimental study to explore the use of satellite SM products within the proposed model structure. We used an iterative approach to obtain the optimal combination of independent variables in a time-lagged stepwise regression manner by minimizing the difference between the simulated and observed flow. The partial autocorrelation function (PACF) was utilized to explore the relative importance of the time-lagged variables in the hybrid prediction model. Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 22 [30]. The LSSVM with a Coupled Simulated Annealing (CSA) optimization algorithm [54] determines an initial set of suitable parameters based on five multiple starters. These parameters are used in the second optimization procedure to further fine-tune the parameters. A simplex optimization approach [55] is then employed here for tuning the parameters through a cross-validation procedure by partitioning samples into training data and test data. To compare the tank model under the same conditions, hydrologic data used for the calibration period (2007-2013) were also used for the LSSVM model's training process. The remaining verification period data (2014-2016) were used for the testing phase.

The Tank-LSSVM Hybrid Model
Our hybrid RR model, hereinafter called the Tank-LSSVM model, was constructed using the output (i.e., , , ) obtained from the tank model in an LSSVM-based regression framework. In this study, the tank model's storage variables , , and were considered intermediate state variables representing the SM temporal variation over the catchment area. A schematic representation of the proposed hybrid model, together with specific implementation procedures, is presented in Figure 4. Input variable (or predictor) selection is important in a machine learning-based approach [26]. In this study, predictors considered for the Tank-LSSVM model were rainfall, tank storage, and ESA CCI SM data. Satellite SM products are of particular interest for ungauged watersheds. We conducted an experimental study to explore the use of satellite SM products within the proposed model structure. We used an iterative approach to obtain the optimal combination of independent variables in a time-lagged stepwise regression manner by minimizing the difference between the simulated and observed flow. The partial autocorrelation function (PACF) was utilized to explore the relative importance of the time-lagged variables in the hybrid prediction model. Prior to the construction of the LSSVM-based RR model, the exploratory (independent) variables considered in this study were all normalized. This is a common approach in data-driven models. It reduces the problems caused by relatively high values [56]. In this context, normalization is done by subtracting the mean ( ̅ ) from each variable (x) prior to dividing by the standard deviation (S). This is called a Z-score (Z).

Root-Zone ESA CCI SM Products
One fundamental issue in using remote sensing SM products is that SM is measured a few centimeters below the surface. This makes the data difficult to use in hydrological applications [57]. Prior to the construction of the LSSVM-based RR model, the exploratory (independent) variables considered in this study were all normalized. This is a common approach in data-driven models. It reduces the problems caused by relatively high values [56]. In this context, normalization is done by Remote Sens. 2020, 12, 1801 8 of 21 subtracting the mean (x) from each variable (x) prior to dividing by the standard deviation (S). This is called a Z-score (Z).

Root-Zone ESA CCI SM Products
One fundamental issue in using remote sensing SM products is that SM is measured a few centimeters below the surface. This makes the data difficult to use in hydrological applications [57]. Root-zone SM contents are more representative when simulating a catchment response to rainfall [32,35]. Given the mismatch between the satellite SM measurements and the tank model's configuration for soil layers, the exponential filter method, also known as the soil water index (SWI), was used to derive the root-zone SM from the original ESA CCI SM data. This approach is a common preprocessing step for satellite-derived SM data used as an input to hydrological models [34,35,58,59]. In this study, a recursive formulation of the exponential filtering method introduced by [60] was adopted and expressed as: where SWI n and SSM (t n ) are the soil water index and the ESA CCI SM at t n , respectively. The gain parameter K ranging from 0 to 1 can be obtained from the following relationship: The filtering was initialized by applying SWI 0 = SSM (t 0 ) and K 0 = 1 in Equatuon (6). The optimal characteristic time length T was obtained by maximizing the correlation coefficient (r) between the SWI and simulated SM (i.e., tank storage) from the tank model. The parameter T substantially adjusted the SM temporal variation. More specifically, this approach smoothed the original ESA CCI SM surface series and could be considered an SM proxy for a deeper layer. For further details, the reader is referred to [60,61].

Performance Scores
The proposed RR model's efficiency was evaluated using three goodness-of-fit (GOF) measures-the Nash-Sutcliffe coefficient, the coefficient of determination, and the root mean square error. These are commonly used in hydrologic and hydroclimatic models [62]. A more detailed description of the GOF measures is summarized in Table 1. In this study, we adopted RMSE_Q70 to quantify the model performance, particularly for low flows within the range of 70-100% time exceedance, following [7]. We also considered the descriptive performance criteria, using the NSE as proposed by [63] to determine the degree of accuracy in simulating the daily runoff. The performance criteria were very good: NSE ≥ 0.7; good: 0.5≤ NSE <0.7; satisfactory: 0.3≤ NSE < 0.5; unsatisfactory: NSE< 0.3. Table 1. Performance metrics employed in this study. O and O indicate the observed runoff and observed runoff mean, respectively. E is the simulated runoff and E is the simulated runoff mean. n is the number of observations.

Rainfall-Runoff Using the Tank Model
The tank model is a conceptual RR model that requires a small amount of data and relatively few model parameters. Through a model calibration process, ten parameters were adjusted to approximate the observed streamflow, and the validation process was carried using the calibrated parameters. The purpose of the validation process was to ensure that the model could be used with new data and successfully reproduce daily streamflow observations over different periods of time. In this context, both the calibration and validation were performed under different climate regimes to evaluate the model performance better. In this study, daily rainfall and streamflow data from 2007 to 2013 were used for calibration because this date range covers average, wet, and dry climate conditions. Data from 2014 to 2016 were used for the model validation ( Figure 5). The optimal parameter set for the calibration period was derived from a standard gradient-based automatic optimization scheme [64], the MATLAB function 'fmincon', which is a built-in function in MATLAB. More details on the optimization procedure adopted in this study can be found at https://www.mathworks.com/help/optim/ug/fmincon.html. The model parameters and their values, determined via the optimization scheme, are summarized in Table 2. Figure 6 shows the daily simulated and observed runoff (Q) together with rainfall recorded during the 2007-2016 investigation period. The results suggest that the tank model's performance for both the calibration and validation periods can be regarded as "very good" in terms of the NSE. The results show that the runoff process is reproduced effectively by the tank model, with an R 2 = 0.92 (0.81) and RMSE = 20.18 m 3 /s (14.72 m 3 /s) for the calibration and validation phases, respectively. However, the tank model does not fully capture the RR process's complex behavior in the validation phase compared to the calibration period (i.e., 0.92 for the calibration period and 0.81 for the validation phase). A notable difference between the observed and simulated runoff is detected during low flow periods. This result suggests that the tank model alone may be limited in its ability to describe low flow dynamics adequately. A more rigorous approach to low flow simulation is to use a hybrid model that incorporates the intermediate variables obtained from the tank model into the LSSVM-based RR modeling framework. The optimal parameter set for the calibration period was derived from a standard gradient-based automatic optimization scheme [64], the MATLAB function 'fmincon', which is a built-in function in MATLAB. More details on the optimization procedure adopted in this study can be found at https://www.mathworks.com/help/optim/ug/fmincon.html. The model parameters and their values, determined via the optimization scheme, are summarized in Table 2. Figure 6 shows the daily simulated and observed runoff (Q) together with rainfall recorded during the 2007-2016 investigation period. The results suggest that the tank model's performance for both the calibration and validation periods can be regarded as "very good" in terms of the NSE. The results show that the runoff process is reproduced effectively by the tank model, with an R 2 = 0.92 (0.81) and RMSE = 20.18 m 3 /s (14.72 m 3 /s) for the calibration and validation phases, respectively. However, the tank model does not fully capture the RR process's complex behavior in the validation phase compared to the calibration period (i.e., 0.92 for the calibration period and 0.81 for the validation phase). A notable difference between the observed and simulated runoff is detected during low flow periods. This result suggests that the tank model alone may be limited in its ability to describe low flow dynamics adequately. A more rigorous approach to low flow simulation is to use a hybrid model that incorporates the intermediate variables obtained from the tank model into the LSSVM-based RR modeling framework.  [64], the MATLAB function 'fmincon', which is a built-in function in MATLAB. More details on the optimization procedure adopted in this study can be found at https://www.mathworks.com/help/optim/ug/fmincon.html. The model parameters and their values, determined via the optimization scheme, are summarized in Table 2. Figure 6 shows the daily simulated and observed runoff (Q) together with rainfall recorded during the 2007-2016 investigation period. The results suggest that the tank model's performance for both the calibration and validation periods can be regarded as "very good" in terms of the NSE. The results show that the runoff process is reproduced effectively by the tank model, with an R 2 = 0.92 (0.81) and RMSE = 20.18 m 3 /s (14.72 m 3 /s) for the calibration and validation phases, respectively. However, the tank model does not fully capture the RR process's complex behavior in the validation phase compared to the calibration period (i.e., 0.92 for the calibration period and 0.81 for the validation phase). A notable difference between the observed and simulated runoff is detected during low flow periods. This result suggests that the tank model alone may be limited in its ability to describe low flow dynamics adequately. A more rigorous approach to low flow simulation is to use a hybrid model that incorporates the intermediate variables obtained from the tank model into the LSSVM-based RR modeling framework.  First, we explored the correlation between tank storage and in situ SM to determine if the tank storage was an effective proxy for SM and if it represented the SM temporal variability over the catchment area. Six cases were studied to understand how the individual tanks (i.e., Cases 1-3) and their combinations (i.e., Cases 4-6) correlated with the in situ SM (Figure 7). The in situ SM observations were limited to relatively small periods between 2014 and 2016. We analyzed the lagged windowed cross-correlation to quantify the temporal coherence between the tank storage and in situ SM (Figure 8) and found a statistically significant correlation at the 95% confidence level. A confidence interval is obtained by the Gaussian distribution N(0,1/N), whose standard deviation is 1/ √ N. For a 95% confidence level, the confidence interval can be defined as CI = 0 ± 1.96/ √ N. The strongest temporal coherence for the zero-lag correlation between the individual tank storage and in situ SM was observed in Case 2 (the 2nd tank), where the correlation coefficient (r) was 0.77. The lowest r value (0.43) was found for Case 3, suggesting that the intermediate variables from each tank describe the SM temporal dynamics. This is because the upper layer of the soil is subjected to more rapid drying and rewetting, and soil moisture variations in this layer are more prominent compared to that of the lower layer. The tank combinations (i.e., Cases 4-6) demonstrated slightly higher or lower r values compared to Case 2. Case 2 had the highest time lag r value (0.78) with lag-1, while the strongest correlation for Case 3 was observed with lag-11. This may be due to the SM's slow response behavior in Case 3, since it represents the deepest soil layer in the tank model. Overall, the results obtained from the cross-correlation analysis confirm that the tank storage temporal dynamics are closely related to SM variations. This suggests that the tank model's intermediate variables can be used as a proxy for the SM. values compared to Case 2. Case 2 had the highest time lag r value (0.78) with lag-1, while the strongest correlation for Case 3 was observed with lag-11. This may be due to the SM's slow response behavior in Case 3, since it represents the deepest soil layer in the tank model. Overall, the results obtained from the cross-correlation analysis confirm that the tank storage temporal dynamics are closely related to SM variations. This suggests that the tank model's intermediate variables can be used as a proxy for the SM.

Determination of Model Inputs
A machine learning-based hydrological model's performance is dependent on the lagged input vectors chosen for the model training [26,56]. Apart from rainfall and satellite SM, the tank storage derived from the tank model in this study is considered a proxy for the SM content. Additionally, the time-lagged relationships between the hydrologic variables enable the consideration of a sequential hydrological process in the proposed framework. Even though the input variables and time lags are important, their selection procedures are rather ad hoc when using the machine learning-based regression approach [26]. There is no universal or generalized model that selects the optimal timelagged input vector directly; this is why an empirical approach is favorable. In this study, we used the partial autocorrelation function (PACF) to understand the smallest lag time for a parsimonious model. The PACF function analysis plots for each input variable are presented in Figure 9. Based on these results, the lag extent for each variable was determined and the input vectors were considered predictors for the proposed hybrid model. For example, a statistically significant PACF in ST1 was observed by lag-2, and the PACF falls outside of the 95% confidence interval at lag-3. Here, the ST1 lagged vectors ranging from lag-0 to lag-2 were considered primary. Other variables were similar.

Determination of Model Inputs
A machine learning-based hydrological model's performance is dependent on the lagged input vectors chosen for the model training [26,56]. Apart from rainfall and satellite SM, the tank storage derived from the tank model in this study is considered a proxy for the SM content. Additionally, the time-lagged relationships between the hydrologic variables enable the consideration of a sequential hydrological process in the proposed framework. Even though the input variables and time lags are important, their selection procedures are rather ad hoc when using the machine learning-based regression approach [26]. There is no universal or generalized model that selects the optimal time-lagged input vector directly; this is why an empirical approach is favorable. In this study, we used the partial autocorrelation function (PACF) to understand the smallest lag time for a parsimonious model. The PACF function analysis plots for each input variable are presented in Figure 9. Based on these results, the lag extent for each variable was determined and the input vectors were considered predictors for the proposed hybrid model. For example, a statistically significant PACF in ST1 was observed by lag-2, and the PACF falls outside of the 95% confidence interval at lag-3. Here, the ST1 lagged vectors ranging from lag-0 to lag-2 were considered primary. Other variables were similar. Figure 9. The partial autocorrelation function of the input variables (a-f) and their time lags considered in this study (f). STn, RF, and ESA CCI SWI represent the n-th tank storage in the tank model, rainfall, and root-zone European Space Agency (ESA) CCI SM data, respectively. The solid blue line represents partial autocorrelation along with the 95% confidence interval for a white noise process. Values outside of the confidence interval are statistically significant.

LSSVM Model
The LSSVM-based RR model was constructed using several lagged independent variable values, rainfall (P t-n ) and ESA CCI SWI (θ t−n ), as inputs without a set of intermediate variables from the tank model. The input vectors were partitioned into two subsets, a training phase and a testing phase, with the same period as the tank model to facilitate comparisons with the tank model results under the same conditions. Both the dependent and independent variables were normalized before applying the LSSVM model.
We built the first LSSVM-based runoff model using a single rainfall data predictor. The ESA CCI SWI data were added as an additional input to explore the satellite-based SM contribution to the LSSVM-based RR model. Among many hydrological variables, rainfall and satellite-based SM inputs were selected specifically because the two variables are both closely related to the runoff process and also closely related. This allowed us to consider the interdependence in the hydrological cycle indirectly. Table 3 presents the results of the runoff simulation with different combinations of input variables. The relative impact of these combinations on the model performance was investigated in a stepwise manner by repeatedly adding more lagged values until the improvement stopped. To avoid overfitting, we mainly focused on performance improvement for unseen data during the testing phase. An obvious increase in the NSE performance efficiency was seen (0.40 for the SV1 and 0.68 for the SV5) until lag-4, after which no further performance improvement was seen. Similar results were obtained for other performance measures, such as an increase in R 2 from 0.49 to 0.75 and a decrease in RMSE from 26.02 to 18.91 m 3 /s. Therefore, rainfall data from lag-0 to lag-4 were used in the subsequent analysis. As summarized in Table 3, the SV6-SV9 models introduced the lagged ESA CCI SWI data as an input to the LSSVM modeling framework and showed better, more comparable performance than the SV1-SV5 models. The only exception to this was RMSE Q70 during the testing period, which indicated that the SM states inferred from the ESA CCI SWI provided an effective means of describing the SM's temporal dynamics in the LSSVM-based RR model. A slight increase in the NSE values, ranging from 0.69 to 0.73, was identified for the SV6-SV9 models. Similar improvements in the performance measures R 2 , RMSE, and RMSE Q70 were also observed. Based on the performance criteria, the SV8 and SV9 models demonstrate "very good" performance (NSE > 0. 7). No improvement in RMSE Q70 was seen when the ESA CCI SWI data was included, suggesting that the direct use of remote sensing SM products played a limited role in the simulation of low flow. The simulated runoff was compared to the observed runoff over the course of the testing period and depicted as scatter plots with the corresponding R 2 (Figure 10), where enhanced results are seen for the SV6-SV9 models.

Tank-LSSVM Model
The Tank-LSSVM model bridges the intermediate state variables from the tank model and the machine learning framework (LSSVM). Several lagged values of tank storage (ST t−n ) derived from the tank model were utilized together with the rainfall and ESA CCI SWI data in the LSSVM model. The RR simulation results from different input variable combinations are summarized in Table 4. The HY1-HY4 models were carried out without time lag consideration, whereas the HY5-HY9 models included time-lagged input vectors. Contrary to the LSSVM model results, the use of time-lagged input variables showed little or no improvement in the Tank-LSSVM model's performance when the intermediate SM state variables from the tank model were used (models HY5-HY9). This is partially due to the fact that the SM state variables derived from the well-tuned RR model themselves could represent the behavior of catchment-scale SM in this study without time-lagged values. The HY1 model only considered the 1st tank storage and rainfall as input variables, while the 2nd and 3rd tanks were included in the HY2 and HY3 models, respectively. The HY4 model combined the ESA CCI SWI datasets with the HY3 model to assess the satellite SM contribution.  Table 4. The HY1-HY4 models were carried out without time lag consideration, whereas the HY5-HY9 models included time-lagged input vectors. Contrary to the LSSVM model results, the use of time-lagged input variables showed little or no improvement in the Tank-LSSVM model's performance when the intermediate SM state variables from the tank model were used (models HY5-HY9). This is partially due to the fact that the SM state variables derived from the well-tuned RR model themselves could represent the behavior of catchment-scale SM in this study without time-lagged values. The HY1 model only considered the 1 st tank storage and rainfall as input variables, while the 2 nd and 3 rd tanks were included in the HY2 and HY3 models, respectively. The HY4 model combined the ESA CCISWI datasets with the HY3 model to assess the satellite SM contribution.
The Tank-LSSVM models' NSE were all classified as "very good" based on their performance during the training and testing periods. Compared to the individual LSSVM and tank models, improved daily runoff simulations were seen in the Tank-LSSVM model. The HY2 model's overall performance showed a considerable improvement over the HY1 model in terms of the RMSE Q70 low flow simulation. This was evident during both the training and testing periods, where the HY2 model results showed a considerable reduction in RMSE (i.e., 2.67 m 3 /s) with respect to the tank model's 3.12 m 3 /s. We concluded that the intermediate SM states inferred from the 2 nd tank storage  The Tank-LSSVM models' NSE were all classified as "very good" based on their performance during the training and testing periods. Compared to the individual LSSVM and tank models, improved daily runoff simulations were seen in the Tank-LSSVM model. The HY2 model's overall performance showed a considerable improvement over the HY1 model in terms of the RMSE Q70 low flow simulation. This was evident during both the training and testing periods, where the HY2 model results showed a considerable reduction in RMSE (i.e., 2.67 m 3 /s) with respect to the tank model's 3.12 m 3 /s. We concluded that the intermediate SM states inferred from the 2nd tank storage ST2 played an essential role in simulating low flow in the HY2 model. Including the 3rd tank storage in the HY3 model substantially improved the low flow simulation, where the RMSE Q70 was 2.13 m 3 /s. These results were comparable to the performance metrics in the models HY1 and HY2. Similar to the 2nd tank storage's role in HY2, the 3rd tank storage played a critical role in describing HY3's base flow as a proxy variable. This led to a substantial improvement in the low flow simulation. The Tank-LSSVM models' performances were also confirmed by graphical representation, as displayed in Figure 11. A linear regression of the HY4 model showed that the observed runoff was underestimated by about 6% (i.e., y = 0.94 x + 0.5), which appeared to fail to provide a visible improvement in the RR simulation when adding satellite SM products into the proposed LSSVM-based RR framework. In other words, the satellite SM product contribution appeared insignificant. This was partially due to the fact that the SM temporal dynamics were described by the tank storage.
in Figure 11. A linear regression of the HY4 model showed that the observed runoff was underestimated by about 6% (i.e., y = 0.94 x + 0.5), which appeared to fail to provide a visible improvement in the RR simulation when adding satellite SM products into the proposed LSSVMbased RR framework. In other words, the satellite SM product contribution appeared insignificant. This was partially due to the fact that the SM temporal dynamics were described by the tank storage.   Accurate RR modeling is of great importance during dry seasons for water resource planning and management studies, especially for water quality and drought issues associated with low flows. To assess the accuracy of low flow simulations, flow duration curves derived from the simulated runoff were compared to the observed runoff from 2014 to 2016. Apart from the HY1 model, the Tank-LSSVM model's low flow simulations were better than those of the tank model ( Figure 12). Figure 13 shows the simulated runoff time series from the tank model and the Tank-LSSVM model, along with the observed runoff data. The figure indicates that rapid low flow fluctuations are better simulated by the Tank-LSSVM model.
To assess the accuracy of low flow simulations, flow duration curves derived from the simulated runoff were compared to the observed runoff from 2014 to 2016. Apart from the HY1 model, the Tank-LSSVM model's low flow simulations were better than those of the tank model ( Figure 12). Figure 13 shows the simulated runoff time series from the tank model and the Tank-LSSVM model, along with the observed runoff data. The figure indicates that rapid low flow fluctuations are better simulated by the Tank-LSSVM model.

Concluding Remarks
In this study, we explored a new RR model that combined intermediate state variables obtained from a tank model with an LSSVM-based nonlinear regression model. The main study assumption was that combining the different models would be more favorable and efficient for runoff modeling than using each model individually. The hybrid RR model's performance was compared to two individual RR models-the tank model and the LSSVM model. The main findings are summarized

Concluding Remarks
In this study, we explored a new RR model that combined intermediate state variables obtained from a tank model with an LSSVM-based nonlinear regression model. The main study assumption was that combining the different models would be more favorable and efficient for runoff modeling than using each model individually. The hybrid RR model's performance was compared to two individual RR models-the tank model and the LSSVM model. The main findings are summarized as follows: (1) The tank model's performance with the calibrated parameters confirms that it is capable of accurately describing rainfall-runoff relationships and is categorized as "very good (NSE > 0. 7)". The tank model shows relatively large deviations, however, for low flow simulations, suggesting that the tank model alone is insufficient for simulating particular RR processes.
(2) This study first explored the LSSVM-based RR model without the intermediate variables from the tank model. The LSSVM-based RR model with rainfall and ESA CCI SWI lagged predictor input values demonstrated that the satellite SM data were effective for describing the SM's temporal dynamics. The LSSVM model's performance is classified as "good (0.5≤ NSE <0.7)" or "very good (NSE > 0.7)", depending on different combinations of time-lagged input variables. Although the overall performance of the LSSVM model alone is generally lower than that of the tank model, the results support satellite-based SM product use for hydrological applications.
(3) The Tank-LSSVM models' NSE are all classified as "very good" based on their performance during the training and testing periods. Compared to the individual LSSVM and tank models, improved daily runoff simulations are seen in the Tank-LSSVM model. In particular, the Tank-LSSVM model including the intermediate state variables has considerably improved low flow simulation during the training and testing periods. The improvement of the LSSVM over the tank model may be partially due to the time-lagged input vectors that represent the routing effect in the rainfall-runoff process. The satellite SM products have not substantially contributed to the low flow simulations because the SM's temporal dynamics are largely described by tank storage. The results confirm that the SM state variables derived from the well-calibrated continuous RR model can better represent the SM's temporal dynamics than those obtained from the satellite SM data.
Due to the runoff simulation improvements in the hybrid RR model, this study's modeling framework could be beneficial and relevant for a number of different hydrologic applications. To support this study's findings, future work is necessary to explore other conceptual models and physically-based models for different regions with longer records. Combining existing RR models with an LSSVM-based RR framework outperforms models that use satellite SM data. This is partially because the existing RR models use the mean value over the catchment area rather than spatially distributed SM. This can lead to a misrepresentation of overall performance in the proposed modeling the framework. Moreover, we acknowledge that the obtained results from the tank model could be affected by the optimization scheme used in the calibration process. More specifically, the gradient-based automatic optimization scheme could be possibly trapped in local optima that are far from the desired global optima. Future work will further investigate the model sensitivity with different optimization schemes.