Machine Learning Models Coupled with Variational Mode Decomposition : A New Approach for Modeling Daily Rainfall-Runoff

Accurate modeling for nonlinear and nonstationary rainfall-runoff processes is essential for performing hydrologic practices effectively. This paper proposes two hybrid machine learning models (MLMs) coupled with variational mode decomposition (VMD) to enhance the accuracy for daily rainfall-runoff modeling. These hybrid MLMs consist of VMD-based extreme learning machine (VMD-ELM) and VMD-based least squares support vector regression (VMD-LSSVR). The VMD is employed to decompose original input and target time series into sub-time series called intrinsic mode functions (IMFs). The ELM and LSSVR models are selected for developing daily rainfall-runoff models utilizing the IMFs as inputs. The performances of VMD-ELM and VMD-LSSVR models are evaluated utilizing efficiency and effectiveness indices. Their performances are also compared with those of VMD-based artificial neural network (VMD-ANN), discrete wavelet transform (DWT)-based MLMs (DWT-ELM, DWT-LSSVR, and DWT-ANN) and single MLMs (ELM, LSSVR, and ANN). As a result, the VMD-based MLMs provide better accuracy compared with the single MLMs and yield slightly better performance than the DWT-based MLMs. Among all models, the VMD-ELM and VMD-LSSVR models achieve the best performance in daily rainfall-runoff modeling with respect to efficiency and effectiveness. Therefore, the VMD-ELM and VMD-LSSVR models can be an alternative tool for reliable and accurate daily rainfall-runoff modeling.


Introduction
Estimating rainfall-runoff relationship and streamflow accurately is a significant element which should be considered for managing water resources effectively [1,2].Hydrologic practices, including water supply and allocation, reservoir planning and operation, flood and drought management, and other hydrological applications, can be conducted successfully only when the rainfall-runoff relationship and streamflow behavior in a river watershed are estimated accurately.However, the development of accurate rainfall-runoff and streamflow models is still a challenging task since hydrological processes inherently exhibit nonlinear and complex behavior [3].
According to Wang [4], rainfall-runoff and streamflow models can be largely categorized into process-driven models (also known as white-box, physical or conceptual models) and data-driven models (also called black-box, meta-models or surrogate models).The process-driven models are based on the physical interpretation of watershed system.These models are formulated utilizing complex physical equations and parametric assumptions [2].Contrastively, the data-driven models characterize the relationship between input and output, not describing the natural watershed process [2,5].The modeling simplicity and high accuracy have increased hydrologists' attention for rainfall-runoff and streamflow modeling based on the data-driven models.Furthermore, the increased availability of gauging data, the development of advanced modeling techniques, and the increase of computing power have accelerated the development of rainfall-runoff and streamflow models utilizing the data-driven models [2].
Over the last few decades, the development of data-driven rainfall-runoff and streamflow models has been conducted using statistical time series models (also called stochastic models), including autoregressive (AR), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), and transfer function-noise (TFN) models [5,6].Since rainfall-runoff relationship and streamflow time series are usually highly non-stationary and nonlinear, it has been noted that the modeling capability of these models classified as linear models is limited [5,7].
To overcome the limitation of the statistical time series models, various machine learning models (MLMs) have been applied successfully for rainfall-runoff and streamflow modeling.The MLMs included artificial neural network (ANN) [1,8], neuro-fuzzy (NF) [9], support vector machines (SVMs) (for regression, also called support vector regression (SVR)) [10,11], random forest (RF) [12], least squares support vector machine (LSSVM) (for regression, also called least squares support vector regression (LSSVR)) [13,14] and extreme learning machine (ELM) [15,16].The MLMs are able to deal with nonlinearity and non-stationarity inherent in rainfall-runoff relationship and streamflow time series effectively.Therefore, these models have achieved better performance than the conventional statistical time series models and have been accepted as effective tools for rainfall-runoff and streamflow modeling.The comprehensive review of MLMs can be found in ASCE [17,18], Yassen et al. [19], Fahimi et al. [20], and Fotovatikhah et al. [21].
On the other hand, many studies have also been conducted to enhance the performance of MLMs for rainfall-runoff and streamflow modeling.These studies were mostly on the development of hybrid MLMs in which the MLMs were combined with various statistical and mathematical methods.The hybrid model development for rainfall-runoff and streamflow modeling can be classified into the following four types.First, to improve the model performance, the MLMs have been combined with statistical methods, including phase-space reconstruction [22,23], principal component analysis [24,25], fuzzy c-means clustering [7,22], k-means clustering [26,27], self-organizing map (SOM) [28,29] and bootstrap [30].Second, the MLMs have been coupled with evolutionary optimization algorithms, including genetic algorithm (GA) [31,32], particle swarm optimization (PSO) [11,33], artificial bee colony [34], bat algorithm [35], and firefly algorithm [36].The addressed algorithms were very helpful for efficient model learning and optimal parameter searching.Third, as the preprocessing techniques of the MLMs, time series decomposition methods have been applied to hybrid MLMs development.The methods included discrete wavelet transform (DWT) [37,38], maximal overlap DWT (MODWT) [39], wavelet packet transform (WPT) [40], empirical mode decomposition (EMD) [41,42], and ensemble EMD (EEMD) [43,44].It has been reported that these hybrid MLMs, which consists of time series decomposition and sub-time series modeling, were able to achieve better performance compared with the single MLMs.Finally, the hybrid MLMs, combined with more than two methods, have been developed for rainfall-runoff and streamflow modeling including DWT, PSO, and SVMs [45]; DWT, GA, and adaptive neuro-fuzzy inference system (ANFIS) [46]; EEMD, PSO, and SVMs [47]; EEMD, SOM, and linear genetic programming [48]; wavelet transform, singular spectrum, chaotic approach, and SVR [49]; and Hermite-projection pursuit regression, social spider optimization, and least square algorithm [50].
Especially, the time series decomposition methods, including DWT, MODWT, WPT, EMD, and EEMD, have been effectively applied for improving the performance of MLMs in rainfall-runoff and streamflow modeling.Using the addressed methods, an original time series was decomposed into sub-time series, and the MLMs were then modeled utilizing the decomposed sub-time series.Using the detailed expressions, the DWT and MODWT, which are data-preprocessing techniques for analyzing a time series in time-frequency domains, produce a low-frequency component (approximation) and multiple high-frequency components (details) by decomposing an original time series, whereas the WPT decomposes both the approximation and the details at each level.It has been reported that these wavelet transform methods were effective tools for improving the performance of MLMs in rainfall-runoff and streamflow modeling since the methods were able to capture the useful information in different levels [51].Adamowski and Sun [39] presented an ANN model combined with MODWT for daily streamflow forecasting and confirmed that the hybrid model produced the better accuracy than the single ANN model for daily streamflow.Kisi and Cimen [37] examined the accuracy of a hybrid model combining DWT and SVMs to predict monthly streamflow and suggested that the conjunction model could enhance the prediction accuracy of SVMs.Liu et al. [38] developed the conjunction model of DWT and SVR for forecasting daily and monthly streamflow and investigated the performance of conjunction model.They considered the wavelet decomposition factors, including decomposition level, mother wavelet and edge effect, for improving the model accuracy.They found that the ensemble approach was able to produce better performance compared with the best single DWT-SVR model.Zhang et al. [45] developed a streamflow forecasting model combining SVM with wavelet transform (WT) and PSO (WT-PSO-SVM).They revealed that the hybrid model provided a better alternative compared with the SVMs for monthly streamflow forecasting.Baydaro glu et al. [49] presented a coupling model of WT, chaotic approach (CA), singular spectrum analysis (SSA) and SVR.They proved that WT, SSA and CA for configuring the input matrix of the SVR were effective in the hybrid modeling for river flow prediction.Moosavi et al. [40] developed a robust model combining WPT and group method of data handling (GMDH) to estimate daily runoff.He concluded that the WPT dealt with non-stationaries in daily runoff data effectively and improved the performance of GMDH model efficiently.
The EMD and EEMD, which are self-adaptive and empirical methods, generate multiple sub-time series called intrinsic mode functions (IMFs) by decomposing an original time series.The wavelet transform methods work in frequency space and require basis functions (mother wavelets) which should be predetermined, whereas the EMD and EEMD work directly in temporal space and do not require any basis functions [52].Recently, the hybrid model development utilizing the EMD and EEMD have been applied successfully for rainfall-runoff and streamflow models.Napolitano et al. [41] explored the effects of data preprocessing for EMD-based ANN streamflow model.They found that the advantages of data preprocessing were dependent on the characteristics of intrinsic modes.Wang et al. [47] proposed a coupling of EEMD, PSO and SVMs for forecasting annual rainfall-runoff and concluded that the hybrid approach could improve the accuracy of annual runoff forecasting significantly.Huang et al. [42] assessed the performance of a modified EMD-SVMs model to forecast monthly streamflow and confirmed that the hybrid model provided high prediction accuracy and reliable stability.Wang et al. [43] developed an ANN modeling approach based on EEMD to forecast medium-and long-term runoff.They confirmed that the EEMD was able to increase the forecasting accuracy effectively.Also, the hybrid approach could provide higher forecasting accuracy compared with the single ANN model.Barge and Sharif [48] proposed the coupling of linear genetic programming (LGP), EEMD, and SOM, and they demonstrated the effectiveness of the hybrid model in streamflow forecasting.
Based on the previous studies mentioned above, it can be noted that the time series decomposition techniques, including DWT, MODWT, WPT, EMD, and EEMD, were effective for developing the hybrid MLMs.However, the addressed methods have some drawbacks.As Liu et al. [38] suggested, three factors, namely, decomposition level, mother wavelet and edge effect, should be considered for applying the wavelet methods.Among them, especially, the optimal mother wavelet should be selected through evaluating the performances of the wavelet-based MLMs, depending on different mother wavelets and wavelet indices.Since these factors should be considered for effective wavelet-based MLM modeling, they can make the wavelet-based MLM modeling computationally intensive and time-consuming.The EMD has a disadvantage called the mode mixing [53], which results in incorrect time-frequency representation and consequently degrades the accuracy of time series processing.Furthermore, since the EMD is a recursive algorithm, the error of envelope estimation can be enlarged more and more, and the efficiency can be decreased [54].The stopping criteria and end-point effect also affect the decomposition process [53].To alleviate the problems, especially the mode mixing, Wu and Huang [53] proposed the EEMD.Although the problems can be reduced by the EEMD, they did not settle completely.The EEMD still has some unresolved problems including dissatisfaction with requirements for IMF, treatment of multi-mode distribution for IMF, end-point effect, and stopping criteria [53].
Recently, Dragomiretskiy and Zosso [55] developed an adaptive and non-recursive signal analysis technique called the variational mode decomposition (VMD) to resolve the drawback of EMD.Unlike the EMD, the VMD decomposes an original time series into multiple modes and then updates them.As compared to the EMD, the VMD is more robust to sampling and noise, and has excellent performance in frequency search and separation.Furthermore, the VMD can extract the time-frequency features accurately since it can alleviate the mode mixing through yielding narrow-banded modes [54].Due to these advantages of the VMD, the development of hybrid MLMs based on the VMD has been accomplished successfully in various fields, including renewable energy, financial and economic fields [56][57][58].On the other hand, the VMD is a relatively new method for hydrological application.Under the authors' knowledge, the hydrological hybrid MLM modeling based on the VMD has never been attempted.Therefore, the application of the VMD is required for developing hybrid MLMs in hydrological modeling.
Modeling rainfall-runoff relationship accurately is essential for effective hydrologic practices.However, since rainfall-runoff process is nonlinear and nonstationary, accurate rainfall-runoff modeling is very difficult and thus still one of significant tasks in hydrological field.Therefore, this paper proposes hybrid MLMs coupled with VMD for modeling nonlinear and nonstationary rainfall-runoff process effectively.In this study, two hybrid MLMs based on the VMD are proposed including VMD-based ELM (VMD-ELM) and VMD-based LSSVR (VMD-LSSVR).ELM and LSSVR are adopted for developing VMD-based rainfall-runoff models.MLMs including ANN, ANFIS, SVM, RF, etc. can be used as the alternatives.However, ELM and LSSVR have advantages over other models when considering generalization performance, learning speed, over-training, the number of user-defined parameters, and the possibility of getting into local minimum [13,18,[59][60][61][62][63][64].Therefore, in this study, ELM and LSSVR are employed for VMD-based rainfall-runoff modeling and ANN is selected for performance comparison.The model performances are evaluated through quantitative performance indices (efficiency and effectiveness indices).For confirming the usefulness of these conjunction models, their performances are compared with those of VMD-based ANN (VMD-ANN), DWT-based MLMs (DWT-ANN, DWT-ELM, and DWT-LSSVR), and single MLMs (ANN, ELM, and LSSVR).

Discrete Wavelet Transform (DWT)
DWT is a multiresolution signal processing method.Using the DWT, an original time series is separated into different frequency elements, namely, an approximation and multiple details.When X = {X t : t = 0, 1, • • • , N − 1} is a real-valued time series, the J 0 -level DWT of X yields DWT coefficients W using an orthonormal transform, W = WX, where W is a matrix that defines the DWT.The W and W can be written as follows [65]: where W j is the vector of wavelet coefficients and V J 0 is the vector of scaling coefficients.The X can be reconstructed from W as follows [65]: where D j = W T j W j is the jth level detail and S J 0 = V T J 0 V J 0 is the J 0 level approximation.Practically, the DWT is performed by the Mallat algorithm, also known as the pyramid algorithm [66].The key point of the algorithm is two-channel filters (also called half-band filters) which are comprised of wavelet (high-pass) filter {h l : l = 0, 1, • • • , L − 1} and scaling (low-pass) filter {g l : l = 0, 1, • • • , L − 1} of even width L. The main algorithm consists of circular filtering and downsampling.According to Percival and Walden [65], the wavelet and scaling coefficients for the jth decomposition level are defined as Equation (4): where W j, t and V j, t are the elements of W j and V j , respectively.Figure 1 depicts a flowchart for three-level DWT decomposition.By utilizing the pyramid algorithm, three details (D 1 , D 2 and D 3 ) and an approximation (A 3 ) are produced from an original time series.Details on the DWT can be found in Percival and Walden [65].
Atmosphere 2018, 9, x FOR PEER REVIEW 5 of 26 where is the jth level detail and is the J0 level approximation.
Practically, the DWT is performed by the Mallat algorithm, also known as the pyramid algorithm [66].The key point of the algorithm is two-channel filters (also called half-band filters) which are comprised of wavelet (high-pass) filter { } −  of even width L. The main algorithm consists of circular filtering and downsampling.According to Percival and Walden [65], the wavelet and scaling coefficients for the jth decomposition level are defined as Equation ( 4):

Variational Mode Decomposition (VMD)
VMD is a fully adaptive and non-recursive algorithm for time-frequency signal analysis.Using the VMD, an original time series f is decomposed into K IMFs.According to Dragomiretskiy and Zosso [55], the constrained variational formulation for yielding the IMFs can be written as Equation (5).
where δ = the Dirac function;

Variational Mode Decomposition (VMD)
VMD is a fully adaptive and non-recursive algorithm for time-frequency signal analysis.Using the VMD, an original time series f is decomposed into K IMFs.According to Dragomiretskiy and Zosso [55], the constrained variational formulation for yielding the IMFs can be written as Equation (5).
Atmosphere 2018, 9, 251 6 of 26 where δ = the Dirac function; ) the kth IMF; φ k = the non-decreasing function; and A k = the non-negative function.The constrained variational formulation is changed to the following unconstrained one by introducing an augmented Lagrangian method [55,67]: where L = the augmented Lagrangian, λ = the Lagrange multiplier, and a, b = the scalar product of a and b.
The saddle point (also called minimax point) of the L is then obtained by updating u n+1 k , ω n+1 k , and k in a sequence of iterative sub-optimizations using the alternate direction method of multipliers (ADMM) [68].Compared with Newton's method and sequential quadratic programming which are local convergence methods, the ADMM is globally convergent, robust and fast.Moreover, parallel computing becomes possible and computational cost can be greatly reduced since the ADMM can decouple a large problem into a series of sub-problems [69].The final updated formulations can be expressed as Equations ( 7)-( 9) [55]. ) where ˆ= the Fourier transform, n = the iteration number, α = the quadratic penalty factor, and τ = the time step of the dual ascent.Figure 2 shows a flowchart for the VMD.Details on the VMD can be found in Dragomiretskiy and Zosso [55].
Atmosphere 2018, 9, x FOR PEER REVIEW 6 of 26 ( ) where L = the augmented Lagrangian, λ = the Lagrange multiplier, and , a b = the scalar product of a and b.
The saddle point (also called minimax point) of the L is then obtained by updating and λ + in a sequence of iterative sub-optimizations using the alternate direction method of multipliers (ADMM) [68].Compared with Newton's method and sequential quadratic programming which are local convergence methods, the ADMM is globally convergent, robust and fast.Moreover, parallel computing becomes possible and computational cost can be greatly reduced since the ADMM can decouple a large problem into a series of sub-problems [69].The final updated formulations can be expressed as Equations ( 7)-( 9) [55].
where ^ = the Fourier transform, n = the iteration number, α = the quadratic penalty factor, and τ = the time step of the dual ascent.Figure 2 shows a flowchart for the VMD.Details on the VMD can be found in Dragomiretskiy and Zosso [55].

Extreme Learning Machine (ELM)
ELM is a non-iterative and least square-based learning algorithm for training single-hidden layer feed-forward neural networks (SLFNs) effectively [70].As stated in Huang et al. [70], the parameters

Extreme Learning Machine (ELM)
ELM is a non-iterative and least square-based learning algorithm for training single-hidden layer feed-forward neural networks (SLFNs) effectively [70].As stated in Huang et al. [70], the parameters of hidden layer (weights and biases) are generated randomly, and the SLFN is reduced to the linear system as Equations ( 10)- (13).
where H = the output matrix for the hidden layer, β = the output weight matrix; T = the target data matrix; h = the activation function, = the weight vectors between the ith hidden neuron and input neurons; the weight vectors between the ith hidden neuron and output neurons, and L = the number of hidden neurons.Unlike the conventional ANN, the β is estimated analytically utilizing the least-square method.The optimal solution for β can be obtained by inverting the H as follows [70]: where H † = PH T is the Moore-Penrose generalized inverse of H, and P = H T H −1 is the inverse of the covariance matrix of H. Figure 3 shows an example of ELM model structure comprised of eight (input), 30 (hidden), and one (output) neurons, respectively.Huang et al. [70] can be referred for more details on the ELM.
Atmosphere 2018, 9, x FOR PEER REVIEW 7 of 26 of hidden layer (weights and biases) are generated randomly, and the SLFN is reduced to the linear system as Equations ( 10)-( 13).
= Hβ T (10) where H = the output matrix for the hidden layer, β = the output weight matrix;,T = the target data matrix; h = the activation function, , , , = the weight vectors between the ith hidden neuron and input neurons; ( , ) , , ,

Least Squares Support Vector Regression (LSSVR)
LSSVR, which is a least-squares version of standard SVR, is a kernel-based statistical learning algorithm.The SVR performs a quadratic optimization involving inequality constraints and a εinsensitive loss function, whereas the LSSVR uses equality constraints and a least-squares loss function and solves a system of linear equations instead of the quadratic optimization.When a training set { } ( , ), ( , ), , ( , ) is given, the LSSVR can be written as Equation ( 15) [71].

Least Squares Support Vector Regression (LSSVR)
LSSVR, which is a least-squares version of standard SVR, is a kernel-based statistical learning algorithm.The SVR performs a quadratic optimization involving inequality constraints and a ε-insensitive loss function, whereas the LSSVR uses equality constraints and a least-squares loss function and solves a system of linear equations instead of the quadratic optimization.When a training set {(x 1 , y 1 ), (x 2 , y 2 ), • • • , (x N , y N )} is given, the LSSVR can be written as Equation ( 15) [71].
where x ∈ R p = the input data, p = the number of input patterns, y ∈ R = the target data, N = the data length, w ∈ R n f = the weight vector; n f = the dimension of a Hilbert space H, φ : R p → H = the mapping from the input space to the high dimensional feature spaces, and b ∈ R = the bias term.
From the research of Suykens et al. [71], the LSSVR can be converted into the primal optimization form as Equation (16).
where e i ∈ R is the slack variable and γ > 0 is the regularization parameter.The Equation ( 16) is solved by constructing the following Lagrangian [71]: where α i ∈ R is the Lagrange multiplier.The Equation ( 17) can be solved using the partial differentiation for w, b, e i , and α i .The solution of α and b can be obtained by Equation ( 18) after removing the variables w and e i [71]. where Finally, the LSSVR can be expressed as the following equation [71]: where K(x i , x j ) = φ(x i ) T φ(x j ) is the kernel function that is symmetric and continuous positive definite.The typical kernels which can be utilized in the LSSVR include linear, polynomial, sigmoid, and radial basis function (RBF).For regression problems, the following RBF is generally applied [72]: where σ is the parameter of RBF kernel, also called the width parameter.RBF kernel function has advantages compared with other kernel functions including linear, sigmoid, and polynomial kernel functions in terms of nonlinear mapping capability, parameter number, numerical limiting conditions, global superiority, and positive definite [73].Furthermore, linear kernel function is effective only for linear problems, sigmoid kernel function is not applied widely, and polynomial kernel function suffers from computational difficulties [74].On the other hand, RBF kernel function can be used for any problems as long as the parameter is selected appropriately [75].Figure 4 shows an example of LSSVR model structure.Details on the LSSVR can be found in Suykens et al. [71].

Artificial Neural Network (ANN)
ANN is an artificial intelligent computing system that is a collection of linked layers comprised of multiple nodes called neurons analogous to the biological neural network system.Multilayer perceptron (MLP), which is a nonparametric estimator and the most widely applied ANN model, is a feedforward ANN with intermediate layers called hidden layers to implement nonlinear discriminants for classification and approximate nonlinear function for regression [76].As seen in Figure 5, the MLP has three-layer architecture generally.
advantages compared with other kernel functions including linear, sigmoid, and polynomial kernel functions in terms of nonlinear mapping capability, parameter number, numerical limiting conditions, global superiority, and positive definite [73].Furthermore, linear kernel function is effective only for linear problems, sigmoid kernel function is not applied widely, and polynomial kernel function suffers from computational difficulties [74].On the other hand, RBF kernel function can be used for any problems as long as the parameter is selected appropriately [75].Figure 4 shows an example of LSSVR model structure.Details on the LSSVR can be found in Suykens et al. [71].ANN is an artificial intelligent computing system that is a collection of linked layers comprised of multiple nodes called neurons analogous to the biological neural network system.Multilayer perceptron (MLP), which is a nonparametric estimator and the most widely applied ANN model, is a feedforward ANN with intermediate layers called hidden layers to implement nonlinear discriminants for classification and approximate nonlinear function for regression [76].As seen in Figure 5, the MLP has three-layer architecture generally.The MLP can be expressed as follows [15]: where j x and ˆj y = the input and output vectors, respectively, L = the number of hidden neurons, h = the activation function (also called the transfer function), i w and i β = the connection weights for the hidden and output layers, respectively, i b and 0 β = the biases for the hidden and output layers, respectively, and N = the data length.The parameters can be adjusted iteratively using learning algorithms such as backpropagation (BP) algorithm.Detailed information on the ANN can be found in Alpaydin [76].The MLP can be expressed as follows [15]:

VMD and DWT-based MLM Modeling
where x j and ŷj = the input and output vectors, respectively, L = the number of hidden neurons, h = the activation function (also called the transfer function), w i and β i = the connection weights for the hidden and output layers, respectively, b i and β 0 = the biases for the hidden and output layers, respectively, and N = the data length.The parameters can be adjusted iteratively using learning algorithms such as backpropagation (BP) algorithm.Detailed information on the ANN can be found in Alpaydin [76].

VMD and DWT-based MLM Modeling
VMD-based MLMs (VMD-ELM, VMD-LSSVR, and VMD-ANN) are the hybrid models coupling the VMD and the single MLMs (ELM, LSSVR and ANN), respectively.In the same manner, DWT-based MLMs (DWT-ELM, DWT-LSSVR, and DWT-ANN) combine the DWT with the single MLMs, respectively.As shown in Figure 6, the VMD and DWT-based MLMs consist of the following three steps:

Quantitative Performance Indices
In this study, the model performances were assessed by efficiency and effectiveness indices.The efficiency indices include the coefficient of efficiency (CE), the index of agreement (IOA), the coefficient of determination (r 2 ), the persistence index (PI), the root-mean-square error (RMSE), the mean absolute error (MAE), the mean squared relative error (MSRE), the mean absolute relative error (MARE), the relative volume error (RVE), and the fourth root mean quadrupled error (R4MS4E), respectively.The effectiveness indices involve the average absolute relative error (AARE) and the threshold statistics (TS).Table 1 summarizes the quantitative performance indices employed in this study.Details on the indices can be found in Jain and Srinivasulu [31] for the effectiveness indices and Dawson et al. [77] for efficiency indices, respectively.CE, r 2 , IOA, and PI are dimensionless indices.The indices can provide a useful comparison between different studies since they are independent on data scale.RMSE and MAE can be used as more representative measures than other indices (ex., mean square error) since they have the same unit as original data.RMSE is a good efficiency index for high flows, whereas MAE evaluates all deviations from observed values.MARE is sensitive to errors for low flows, whereas less sensitive to them for high flows.For this reason, MARE is a good efficiency index for low flows.MSRE is a good efficiency index for moderate flows.RVE is a relative index for overall volume error and can provide an indication of overall water balance.R4MS4E is a good efficiency index for high and peak flows and has the same unit as original data [77,78].AARE and TS give appropriate weights for low, moderate, and high flows, and it has been reported that they can provide better performance evaluation [31].

Study Area and Observed Data
The study area for investigating the model performance is the Geumho River Watershed, South Korea.Sufficient and reliable observed data are essential for developing MLMs.The Geumho River Watershed has a number of gauging stations with long-term observation periods of more than 20 years.Moreover, since the gauging stations have been strictly managed by the Ministry of Land, Infrastructure and Transport, the quality and reliability of their data is good.The Geumho River Watershed is thus adequate as study area in this study in terms of the number of gauging stations and the availability and reliability of observed data.Figure 7 displays the study area and locations of gauging stations.

Quantitative Performance Indices
In this study, the model performances were assessed by efficiency and effectiveness indices.The efficiency indices include the coefficient of efficiency (CE), the index of agreement (IOA), the coefficient of determination (r 2 ), the persistence index (PI), the root-mean-square error (RMSE), the mean absolute error (MAE), the mean squared relative error (MSRE), the mean absolute relative error (MARE), the relative volume error (RVE), and the fourth root mean quadrupled error (R4MS4E), respectively.The effectiveness indices involve the average absolute relative error (AARE) and the threshold statistics (TS).Table 1 summarizes the quantitative performance indices employed in this study.Details on the indices can be found in Jain and Srinivasulu [31] for the effectiveness indices and Dawson et al. [77] for efficiency indices, respectively.CE, r 2 , IOA, and PI are dimensionless indices.The indices can provide a useful comparison between different studies since they are independent on data scale.RMSE and MAE can be used as more representative measures than other indices (ex., mean square error) since they have the same unit as original data.RMSE is a good efficiency index for high flows, whereas MAE evaluates all deviations from observed values.MARE is sensitive to errors for low flows, whereas less sensitive to them for high flows.For this reason, MARE is a good efficiency index for low flows.MSRE is a good efficiency index for moderate flows.RVE is a relative index for overall volume error and can provide an indication of overall water balance.R4MS4E is a good efficiency index for high and peak flows and has the same unit as original data [77,78].AARE and TS give appropriate weights for low, moderate, and high flows, and it has been reported that they can provide better performance evaluation [31].

Efficiency indices
Coefficient of efficiency

Efficiency indices
Mean squared relative error Mean absolute relative error Fourth root mean quadrupled error

Effectiveness indices
Average absolute relative error Qi is the estimated streamflow; Q i the observed streamflow; Q the mean of the observed streamflow; Q the mean of the estimated streamflow; n the size of testing data; m the size of training data; p the number of free parameters in a model; and n x is the total number of estimated streamflow data where the absolute relative error is less than x%.

Study Area and Observed Data
The study area for investigating the model performance is the Geumho River Watershed, South Korea.Sufficient and reliable observed data are essential for developing MLMs.The Geumho River Watershed has a number of gauging stations with long-term observation periods of more than 20 years.Moreover, since the gauging stations have been strictly managed by the Ministry of Land, Infrastructure and Transport, the quality and reliability of their data is good.The Geumho River Watershed is thus adequate as study area in this study in terms of the number of gauging stations and the availability and reliability of observed data.Figure 7 displays the study area and locations of gauging stations.The watershed is in the central-eastern part of the Nakdong River Basin which is the second largest among the river basins of South Korea.The watershed has an area of 2092.4 km 2 , an average watershed elevation of 235.4 m, an average watershed slope of 33.6%, and a stream length of 119.0 km [79].To develop the hybrid and single MLMs, daily streamflow and rainfall data were gathered from four streamflow and 15 rainfall gauging stations, respectively (see Figure 7).The data are available from the Water Management Information System (WAMIS) [79] which is a web portal information service that has been operated for providing and managing the water resources and environmental information of South Korea effectively.The areal mean rainfall (AMR) time series was calculated from the collected rainfall data using Thiessen polygon method [80].The AMR and streamflow time series were utilized for model training and testing.These time series were scaled to the range of [0,1] for efficient model training [78]

Development of Hybrid and Single MLMs
To decompose input and target time series by the VMD, the number of IMFs (K) and the quadratic penalty factor (α) should be determined beforehand.In this study, the parameters, K and α, were determined according to the following steps: Step 1. Decompose input and target time series into IMFs for different K = [1,20] and α = [5,2000].
Step 2. Add up the IMFs for each of the K and α values again and estimate the values of correlation coefficient (r) for the reconstructed and original time series.Step 3. Select the sets of K and α values for 1 r ≈ .
Step 4. Select the optimal K and α values producing the best performance of VMD-based MLMs.
Based on the above method, K = 5 and α = 10 were determined in this study.Figure 8 shows five IMFs decomposed by the VMD for daily streamflow data observed at the Dongchon streamflow gauging station.To develop the hybrid and single MLMs, daily streamflow and rainfall data were gathered from four streamflow and 15 rainfall gauging stations, respectively (see Figure 7).The data are available from the Water Management Information System (WAMIS) [79] which is a web portal information service that has been operated for providing and managing the water resources and environmental information of South Korea effectively.The areal mean rainfall (AMR) time series was calculated from the collected rainfall data using Thiessen polygon method [80].The AMR and streamflow time series were utilized for model training and testing.These time series were scaled to the range of [0,1] for efficient model training [78] and grouped into two data sets: training (2001-2010, data length = 3652) and testing data sets (2011-2014, data length = 1461).

Development of Hybrid and Single MLMs
To decompose input and target time series by the VMD, the number of IMFs (K) and the quadratic penalty factor (α) should be determined beforehand.In this study, the parameters, K and α, were determined according to the following steps: Step 1. Decompose input and target time series into IMFs for different K = [1,20] and α = [5,2000].
Step 2. Add up the IMFs for each of the K and α values again and estimate the values of correlation coefficient (r) for the reconstructed and original time series.Step 3. Select the sets of K and α values for r ≈ 1.
Step 4. Select the optimal K and α values producing the best performance of VMD-based MLMs.
Based on the above method, K = 5 and α = 10 were determined in this study.Figure 8 shows five IMFs decomposed by the VMD for daily streamflow data observed at the Dongchon streamflow gauging station.To decompose input and target time series by the DWT, the optimal level of decomposition (L) should be first selected.In this study, Equation ( 22) [81] was used for determining the L value: where N is the length of time series, int[ ] k returns the integer portion of k, and k is a real number.
Using Equation ( 22), L = 3 was determined in this study.The determination of decomposition level using Equation ( 22) has been adopted in many previous studies [81][82][83][84][85][86].Although the decomposition level can be also selected using a trial-and-error method, it is computationally burdensome and timeconsuming.Furthermore, a mother wavelet should be selected before performing the DWT.The performance of DWT-based MLMs is dependent on the mother wavelet [6].For each model, the optimal To decompose input and target time series by the DWT, the optimal level of decomposition (L) should be first selected.In this study, Equation ( 22) [81] was used for determining the L value: where N is the length of time series, int[k] returns the integer portion of k, and k is a real number.Using Equation ( 22), L = 3 was determined in this study.The determination of decomposition level using Equation ( 22) has been adopted in many previous studies [81][82][83][84][85][86].Although the decomposition level can be also selected using a trial-and-error method, it is computationally burdensome and time-consuming.Furthermore, a mother wavelet should be selected before performing the DWT.The performance of DWT-based MLMs is dependent on the mother wavelet [6].For each model, the optimal mother wavelet producing the best model performance was selected.As a result, the optimal mother wavelets were chosen as d12 for DWT-ELM, d12 for DWT-LSSVR, and d18 for DWT-ANN, respectively.Figure 9 shows an approximation and three details decomposed by the DWT for daily streamflow data observed at the Dongchon streamflow gauging station.
Atmosphere 2018, 9, x FOR PEER REVIEW 14 of 26 mother wavelet producing the best model performance was selected.As a result, the optimal mother wavelets were chosen as d12 for DWT-ELM, d12 for DWT-LSSVR, and d18 for DWT-ANN, respectively.Figure 9 shows an approximation and three details decomposed by the DWT for daily streamflow data observed at the Dongchon streamflow gauging station.In developing rainfall-runoff models using MLMs, one of the most important modeling steps is to select the appropriate input variables [78].Tables 2-4 summarize the input sets for VMD-based MLMs, DWT-based MLMs, and single MLMs, respectively.The input sets can be set up based on the following steps in this study: Step 1. Select the potential influencing variables based on the lag times of input variables determined by autocorrelation function (ACF), partial autocorrelation function (PACF), cross-correlation function (CCF), and average mutual information (AMI).
Step 2. Select five input sets for each model utilizing the variables selected in step 1 and all-possible- In developing rainfall-runoff models using MLMs, one of the most important modeling steps is to select the appropriate input variables [78].Tables 2-4 summarize the input sets for VMD-based MLMs, DWT-based MLMs, and single MLMs, respectively.The input sets can be set up based on the following steps in this study: Step 1. Select the potential influencing variables based on the lag times of input variables determined by autocorrelation function (ACF), partial autocorrelation function (PACF), cross-correlation function (CCF), and average mutual information (AMI).
Step 2. Select five input sets for each model utilizing the variables selected in step 1 and all-possible-regression method [87], where Mallows's C p and adjusted r 2 [87] are used as the criteria for selecting the sets.The Mallows's C p is minimized when the input set consists of only statistically significant variables.The adjusted r 2 helps prevent overfitting since a penalty is given when adding variables to a model [87,88].Step 3. Select the optimal input set producing the best performance for each model based on quantitative performance indices.
For the LSSVR modeling, selecting the regularization and kernel parameters is one of the significant modeling steps.The parameters should be selected in advance.Coupled simulated annealing (CSA) and derivative-free simplex search (DSS) [94,95] were used for selecting the optimal parameters.In the parameter optimization, the CSA determines the initial values, and the DSS performs the fine-tuning [96].Optimization algorithms, including gradient-based techniques, simulated annealing, genetic algorithms, particle swarm optimization, etc., can be used for selecting the optimal parameters.The gradient-based methods have limitations when the cost function is non-differential and the computation cost for large-scale problems is high.Additionally, the method suffers from performance degradation when there is multimodality, multidimensionality, and many local minima in search space.Global optimization algorithms including simulated annealing, genetic algorithms and particle swarm optimization have the advantage that they can escape from multiple local minima.However, global optimization algorithms have high computational burdensome and very slow convergence since they require very many evaluations for cost function to reach the global optimum.Moreover, they are sensitive to the initial parameters [95].On the other hand, CSA features fast convergence speed and can reduce the sensitivity for the initial parameters although there is a tradeoff between the number of evaluations for cost function and the quality of the final solutions.The tradeoff problem can be resolved by performing a fine tuning by the DSS.Thus, the coupled CSA and DSS result in better performance and more optimal solutions [97,98].
Considering the above modeling strategies, three VMD-based MLMs (VMD-ELM, VMD-LSSVR and VMD-ANN), three DWT-based MLMs (DWT-ELM, DWT-LSSVR and DWT-ANN) and three single MLMs (ELM, LSSVR, and ANN) were developed.Table 5 shows the optimal model architectures.For ANN, ELM, VMD-ANN, VMD-ELM, DWT-ANN, and DWT-ELM models, the digits represent the number of input, hidden, and output neurons, respectively.For LSSVR, VMD-LSSVR, and DWT-LSSVR models, the digits represent the number of input nodes, RBF kernel functions, and output nodes, respectively.

Performance Assessment
As stated in the Introduction chapter, this study aims at examining the performances of VMD-based MLMs for daily rainfall-runoff modeling and comparing them with those of DWT-based and single MLMs.The model performances were evaluated using the quantitative performance indices which measure the model efficiency and effectiveness.The results are summarized in Table 6.
When the values of CE, IOA, r 2 and PI are close to one and the values of RMSE, MAE, MSRE, MARE, RVE, and R4MS4E are close to zero, it indicates that a model achieves better efficiency compared with other models.Also, when lower AARE and higher TS are produced by a model, it represents that the model has better effectiveness.In comparing the performances of VMD-based and single MLMs, VMD-based MLMs yielded better efficiency and effectiveness indices than ELM, LSSVR and ANN models, respectively.It can be indicated from this result that the combination of VMD and single MLMs can improve the performances of single MLMs in terms of efficiency and effectiveness.Also, the performances of VMD-based MLMs were compared with those of DWT-based MLMs.As a result, VMD-ELM and VMD-LSSVR models provided better efficiency, whereas VMD-ELM and DWT-LSSVR models performed better in terms of effectiveness.When both model efficiency and effectiveness were considered, the VMD-ELM and VMD-LSSVR models achieved the better performance.On the other hand, VMD-ANN model yielded poor efficiency and effectiveness as compared with DWT-ANN model.These results revealed that the VMD was able to enhance the performances of ELM and LSSVR models, whereas the DWT was more suitable to improve the performance of ANN model than the VMD for daily rainfall-runoff modeling.Among all the models, the top three models with the best efficiency and effectiveness can be identified as VMD-ELM, VMD-LSSVR, and DWT-ELM models, respectively.These results indicated that the VMD provided better efficiency and effectiveness than the DWT for daily rainfall-runoff modeling utilizing MLMs, and the performance reliability was also dependent on the MLMs.
Figures 10 and 11 show the scatter plots and the residual boxplots for the VMD-based, DWT-based, and single MLMs, respectively.The plots provide the graphical comparison of model accuracy.The scatter plots represent the degree of correlation and dispersion between estimated and observed values, whereas the residual boxplots depict the distribution of residuals that are the differences between estimated and observed values.In comparing the performances of VMD-based and single MLMs, VMD-based MLMs yielded better efficiency and effectiveness indices than ELM, LSSVR and ANN models, respectively.It can be indicated from this result that the combination of VMD and single MLMs can improve the performances of single MLMs in terms of efficiency and effectiveness.Also, the performances of VMD-based MLMs were compared with those of DWT-based MLMs.As a result, VMD-ELM and VMD-LSSVR models provided better efficiency, whereas VMD-ELM and DWT-LSSVR models performed better in terms of effectiveness.When both model efficiency and effectiveness were considered, the VMD-ELM and VMD-LSSVR models achieved the better performance.On the other hand, VMD-ANN model yielded poor efficiency and effectiveness as compared with DWT-ANN model.These results revealed that the VMD was able to enhance the performances of ELM and LSSVR models, whereas the DWT was more suitable to improve the performance of ANN model than the VMD for daily rainfall-runoff modeling.Among all the models, the top three models with the best efficiency and effectiveness can be identified as VMD-ELM, VMD-LSSVR, and DWT-ELM models, respectively.These results indicated that the VMD provided better efficiency and effectiveness than the DWT for daily rainfall-runoff modeling utilizing MLMs, and the performance reliability was also dependent on the MLMs.can be suggested as a future study.For LSSVR, VMD-LSSVR, and DWT-LSSVR models, the optimal parameters were determined using coupled CSA and DSS algorithms in this study.The algorithms feature enhanced computation speed and accuracy compared with conventional global optimization algorithms.However, the coupled CSA and DSS algorithms may be more enhanced if they utilize the parallel computing concept using multiple computers and parallel genetic algorithm proposed by Cheng et al. [105] or graphics processing unit (GPU)-based computing (ex., Zhou and Tan [106]) which has been getting the spotlight recently.These methods can help an optimization algorithm to search multidimensional solution space efficiently since the enhanced computing ability can significantly reduce the computation time for optimization.In this study, original time series were decomposed into sub-time series by VMD and DWT.In contrast, Taormina et al. [107] split hydrograph into baseflow and excess flow components using digital filter, and then trained modular neural networks for the components, respectively.Wu and Chau [108] used singular spectrum analysis for the decomposition and analysis of original time series.Combining the advantages of these methods may help improve the performance of rainfall-runoff MLMs.In other words, it can be suggested to divide an original time series into baseflow and excess flow components using the baseflow separation method, decompose them into sub-time series by applying VMD or DWT to each component, and then utilize the sub-time series as a training dataset for modeling rainfall-runoff MLMs.

Conclusions
In this study, two different conjunction models, VMD-ELM and VMD-LSSVR, are developed for daily rainfall-runoff modeling in the Geumho River Watershed, South Korea, and their performances are investigated.The performances of the coupled models are evaluated utilizing the quantitative performance indices, namely, efficiency and effectiveness indices.The results are compared with those of VMD-ANN, DWT-based MLMs (DWT-ELM, DWT-LSSVR, and DWT-ANN), and single MLMs (ELM, LSSVR, and ANN).As a result, the VMD and DWT-based MLMs perform better than the single MLMs.The VMD-ELM and VMD-LSSVR models yield slightly better performance than the DWT-ELM and DWT-LSSVR models, whereas the DWT-ANN model produces slightly better performance as compared with the VMD-ANN model.Considering efficiency and effectiveness, the VMD-ELM and VMD-LSSVR models achieve the best performance.These results confirm that the VMD can enhance the performances of single MLMs for daily rainfall-runoff modeling, and the performances of VMD-based MLMs are dependent on the combination of single MLMs.Therefore, the VMD can be a novel alternative technique for hybrid rainfall-runoff modeling based on time series decomposition.
In this study, two VMD-based hybrid MLMs, VMD-ELM and VMD-LSSVR, are proposed for daily rainfall-runoff modeling.This study deals with rainfall-runoff modeling on daily basis using VMD-based hybrid MLMs.However, it is also necessary to investigate the performances of VMD-based rainfall-runoff models on weekly, monthly, and annual basis for effective river basin management, water supply and allocation, and reservoir planning and operation.Moreover, this study has the limitation that it employs rainfall and streamflow as predictors for rainfall-runoff modeling and does not consider runoff components (surface, subsurface, and groundwater flow components), hydro-physical elements (evapotranspiration, infiltration, etc.), and human-made factors.It can be suggested as future studies to investigate VMD-based rainfall-runoff modeling considering runoff components and various factors, effective variable selection methods, comparison with different time series decomposition methods and MLMs; hybrid learning algorithm combining global and local search algorithms, parameter optimization using GPU-based parallel computing, and rainfall-runoff MLMs using baseflow separation and time series composition methods.
j V , respectively.Figure1depicts a flowchart for three-level DWT decomposition.By utilizing the pyramid algorithm, three details (D1, D2 and D3) and an approximation (A3) are produced from an original time series.Details on the DWT can be found in Percival and Walden[65].

2 L=
distance; k ω = the center frequency; * = the convolution; the kth IMF; k φ = the non-decreasing function; and k A = the non-negative function.The constrained variational formulation is changed to the following unconstrained one by introducing an augmented Lagrangian method [55,67]:
weight vectors between the ith hidden neuron and output neurons, and L = the number of hidden neurons.Unlike the conventional ANN, the β is estimated analytically utilizing the least-square method.The optimal solution for β can be obtained by inverting the H as follows[70]: is the Moore-Penrose generalized inverse of H, and is the inverse of the covariance matrix of H. Figure3shows an example of ELM model structure comprised of eight (input), 30 (hidden), and one (output) neurons, respectively.Huang et al.[70] can be referred for more details on the ELM.

Figure 3 .
Figure 3.An example of Extreme Learning Machine (ELM) architecture.

Figure 3 .
Figure 3.An example of Extreme Learning Machine (ELM) architecture.

Figure 4 .
Figure 4.A General architecture of Least Squares Support Vector Regression (LSSVR).

Figure 4 .
Figure 4.A General architecture of Least Squares Support Vector Regression (LSSVR).

Figure 5 .
Figure 5.An example of Artificial Neural Network (ANN) architecture.
VMD-based MLMs (VMD-ELM, VMD-LSSVR, and VMD-ANN) are the hybrid models coupling the VMD and the single MLMs (ELM, LSSVR and ANN), respectively.In the same manner, DWTbased MLMs (DWT-ELM, DWT-LSSVR, and DWT-ANN) combine the DWT with the single MLMs, respectively.As shown in Figure6, the VMD and DWT-based MLMs consist of the following three steps:Step 1. Training and testing data sets are decomposed into multiple IMFs by the VMD and an approximation and multiple details by the DWT, respectively.Step 2. For each decomposed training data set, single MLMs (ELM, LSSVR, and ANN) are developed.Step 3. The final estimates of streamflow time series are obtained by aggregating the sub-time series estimated from the single MLMs, respectively.

Figure 5 .
Figure 5.An example of Artificial Neural Network (ANN) architecture.

Figure 7 .
Figure 7. Study area and locations of gauging stations.

Figure 7 .
Figure 7. Study area and locations of gauging stations.

Figure 8 .
Figure 8. Original streamflow time series and sub-time series (five IMFs) decomposed by VMD for daily streamflow data observed at the Dongchon streamflow gauging station.

Figure 8 .
Figure 8. Original streamflow time series and sub-time series (five IMFs) decomposed by VMD for daily streamflow data observed at the Dongchon streamflow gauging station.

Figure 9 .
Figure 9. Original streamflow time series and sub-time series (D1, D2, D3, and A3) decomposed by DWT for daily streamflow data observed at the Dongchon streamflow gauging station.

Figure 9 .
Figure 9. Original streamflow time series and sub-time series (D 1 , D 2 , D 3 , and A 3 ) decomposed by DWT for daily streamflow data observed at the Dongchon streamflow gauging station.
Figures 10 and 11  show the scatter plots and the residual boxplots for the VMD-based, DWTbased, and single MLMs, respectively.The plots provide the graphical comparison of model accuracy.The scatter plots represent the degree of correlation and dispersion between estimated and observed values, whereas the residual boxplots depict the distribution of residuals that are the differences between estimated and observed values.

Figure 10 .
Figure 10.Scatter plots of VMD-based, DWT-based, and single MLMs for testing period.

Figure 10 .
Figure 10.Scatter plots of VMD-based, DWT-based, and single MLMs for testing period.

Table 2 .
Configuration of input sets for single MLMs.

Table 3 .
Configuration of input sets for VMD-based MLMs.

Table 4 .
Configuration of input sets for DWT-based MLMs.

Table 6 .
Performance evaluation for testing period.