River Stage Modeling by Combining Maximal Overlap Discrete Wavelet Transform, Support Vector Machines and Genetic Algorithm

: This paper proposes a river stage modeling approach combining maximal overlap discrete wavelet transform (MODWT), support vector machines (SVMs) and genetic algorithm (GA). The MODWT decomposes original river stage time series into sub-time series (detail and approximation components). The SVM computes daily river stage values using the decomposed sub-time series. The GA searches for the optimal hyperparameters of SVM. The performance of MODWT–SVM models is evaluated using efﬁciency and effectiveness indices; and compared with that of a single model (multilayer perceptron (MLP) and SVM), discrete wavelet transform (DWT)-based models (DWT–MLP and DWT–SVM) and MODWT–MLP models. The conjunction of MODWT, SVM and GA improves the performance of the SVM model and outperforms the single models. The MODWT–based models using the SVM model enhance model performance and accuracy compared to those of using MLP model. Also, hybrid models coupling MODWT, SVM and GA improve model performance and accuracy in daily river stage modeling as compared with those combined with DWT. The MODWT–SVM model using the Coiﬂet 12 (c12) mother wavelet, MODWT–SVM-c12, produces the best efﬁciency and effectiveness among all models. Therefore, the conjunction of MODWT, SVM and GA can be an efﬁcient and effective approach for modeling daily river stages.

Recently, hybrid time series modeling approaches utilizing wavelet transform have been one of the research themes studied actively in the hydrological field [8][9][10][11][12][13][14]. In terms of signal analysis, the wavelet transform is a signal decomposition method which splits an original signal into sub-signals, including detail and approximation (smooth) components. Adamowski and Sun [10] suggested a hybrid model combining discrete wavelet transform (DWT) and ANN for streamflow forecasting. Kisi et al. [11] predicted short-and long-term air temperatures using wavelet-based genetic programming. Okkan and Serbes [12] modeled reservoir inflow using a combination of DWT and black box models including ANNs, multiple linear regression and least square support vector machines (LS-SVMs).

Discrete Wavelet Transform (DWT)
DWT, which is a simpler version of continuous wavelet transform, is a multiresolution analysis (MRA) technique which decomposes an original signal into approximation and detail components. This section outlines the basic concept of DWT. Detailed information on the DWT can be found in Nason [17]. According to Mallat [18], DWT can be written as the following equation [10]: where j and k are integer values controlling wavelet scale and translation; 0 s is the fixed scale step (commonly 0 2 s = ); ψ is the mother wavelet; and 0 τ is the location parameter (commonly 0 1 τ = ). For a discrete signal , the DWT computes the wavelet coefficient for the discrete wavelet of scale 2 j and location 2 j k using the following equation [16]: where ( , ) X W j k is the wavelet coefficient and N = an integer power of two.
Mallat's algorithm [18] is generally applied for the practical implementation of DWT. The algorithm uses low-and high-pass filters instead of wavelets. Figure 2 shows a flowchartfor threelevel DWT. As seen in Figure 2, an original signal is decomposed into detail components (D1, D2 and D3) and an approximation component (A3) using the algorithm. The filters are determined depending on the mother wavelets which are selected in advance. The mother wavelets include Daubechies wavelet (Daublet), Daubechies' least-asymmetric wavelet (Symmlet), Coiflet, Morlet (Gabor wavelet), Meyer wavelet, and Shannon wavlets (Littlewood-Paley wavelet), and so on. Details on different wavelets can be found in Percival and Walden [19] and Nason [17]. For decomposing input signals using wavelet analysis, the decomposition level should be determined beforehand. The level L was determined based on Equation (3) [20]: where int[·] is the function that returns the nearest integer of a number and n is the data length.

Discrete Wavelet Transform (DWT)
DWT, which is a simpler version of continuous wavelet transform, is a multiresolution analysis (MRA) technique which decomposes an original signal into approximation and detail components. This section outlines the basic concept of DWT. Detailed information on the DWT can be found in Nason [17]. According to Mallat [18], DWT can be written as the following equation [10]: where j and k are integer values controlling wavelet scale and translation; s 0 is the fixed scale step (commonly s 0 = 2); ψ is the mother wavelet; and τ 0 is the location parameter (commonly τ 0 = 1). For a discrete signal X = {X t , t = 0, 1, · · · , N − 1}, the DWT computes the wavelet coefficient for the discrete wavelet of scale 2 j and location 2 j k using the following equation [16]: where W X (j, k) is the wavelet coefficient and N = an integer power of two.
Mallat's algorithm [18] is generally applied for the practical implementation of DWT. The algorithm uses low-and high-pass filters instead of wavelets. Figure 2 shows a flowchartfor three-level DWT. As seen in Figure 2, an original signal is decomposed into detail components (D 1 , D 2 and D 3 ) and an approximation component (A 3 ) using the algorithm. The filters are determined depending on the mother wavelets which are selected in advance. The mother wavelets include Daubechies wavelet (Daublet), Daubechies' least-asymmetric wavelet (Symmlet), Coiflet, Morlet (Gabor wavelet), Meyer wavelet, and Shannon wavlets (Littlewood-Paley wavelet), and so on. Details on different wavelets can be found in Percival and Walden [19] and Nason [17]. For decomposing input Water 2017, 9,525 4 of 24 signals using wavelet analysis, the decomposition level should be determined beforehand. The level L was determined based on Equation (3) [20]: where int [·] is the function that returns the nearest integer of a number and n is the data length.

Maximal Overlap Discrete Wavelet Transform (MODWT)
MODWT is a mathematical technique which transforms a signal into multilevel wavelet and scaling coefficients. MODWT has several merits in comparison with DWT as discussed in Cornish et al. [15]. For example, MODWT can be properly defined for arbitrary signal length, while the DWT is limited to a signal length with an integer multiple of a power of two. This section outlines the concept of MODWT. Details on MODWT can be found in Percival and Walden [19].
For a discrete signal , the elements of the jth level MODWT wavelet and scaling coefficients, j W and j V , can be written as Equations (4) and (5), respectively [19]:  are the jth level DWT high-and low-pass filters; and L is the highest decomposition level. The filters are determined depending on the mother wavelets, as in DWT. Figure 3 shows a flowchart for three-level MODWT. As seen in Figure 3, the MODWT-based MRA decomposes an original signal X into a low-pass filtered approximation component (A3) and high-pass filtered detail components (D1, D2 and D3). The MODWT-based MRA can be written as Equations (6)-(8) [19]; where L A is the approximation component and

Maximal Overlap Discrete Wavelet Transform (MODWT)
MODWT is a mathematical technique which transforms a signal into multilevel wavelet and scaling coefficients. MODWT has several merits in comparison with DWT as discussed in Cornish et al. [15]. For example, MODWT can be properly defined for arbitrary signal length, while the DWT is limited to a signal length with an integer multiple of a power of two. This section outlines the concept of MODWT. Details on MODWT can be found in Percival and Walden [19].
For a discrete signal X = {X t , t = 0, 1, · · · , n − 1}, the elements of the jth level MODWT wavelet and scaling coefficients, W j and V j , can be written as Equations (4) and (5), respectively [19]: where W j, t is the tth element of the jth level MODWT wavelet coefficient; V j, t is the tth element of the jth level MODWT scaling coefficient; h • j, l and g • j, l are the jth level MODWT high-and low-pass filters (wavelet and scaling filters) yielded by periodizing h j, l and g j, l to length n, respectively; h j, l and g j, l are the jth level MODWT high-pass filter ( h j, l ≡ h j, l /2 j/2 ) and low-pass filter ( g j, l ≡ g j, l /2 j/2 ); h j, l and g j, l are the jth level DWT high-and low-pass filters; and L is the highest decomposition level. The filters are determined depending on the mother wavelets, as in DWT. Figure 3 shows a flowchart for three-level MODWT. As seen in Figure 3, the MODWT-based MRA decomposes an original signal X into a low-pass filtered approximation component (A 3 ) and high-pass filtered detail components (D 1 , D 2 and D 3 ). The MODWT-based MRA can be written as Equations (6)-(8) [19]; Water 2017, 9, 525 5 of 24 where A L is the approximation component and D j is the detail components (j = 1, 2, · · · , L).

Multilayer Perceptron (MLP)
ANN is a multilayered computing system for modeling complex nonlinear and multidimensional relationships. MLP, which is the most commonly applied ANN structure, consists of several layers. For hydrological applications, MLP with three layers, including input, output and hidden layers, is typically used, as shown from Figure 4. As described in Günther and Fritsch [21], three-layered MLP with J hidden nodes calculates Equation (9):  is the input vector; f is the activation function; ( ) o x is the output vector; 0 w is the intercept for output node; j w is the connection weight; is the connection weight vector; and 0 j w is the intercept for the jth hidden node. Details on the MLP are given in Günther and Fritsch [21]. For MLP modeling, the number of hidden nodes and the type of activation function should be determined in advance. The optimal number of hidden nodes can be determined utilizing trial-anderror or optimization methods. The activation functions used in the MLP include logistic sigmoid, linear and hyperbolic tangent functions. Although the activation functions depend on the type of

Multilayer Perceptron (MLP)
ANN is a multilayered computing system for modeling complex nonlinear and multi-dimensional relationships. MLP, which is the most commonly applied ANN structure, consists of several layers. For hydrological applications, MLP with three layers, including input, output and hidden layers, is typically used, as shown from Figure 4. As described in Günther and Fritsch [21], three-layered MLP with J hidden nodes calculates Equation (9): where x = {x i , i = 1, 2, · · · , n} is the input vector; f is the activation function; o(x) is the output vector; w 0 is the intercept for output node; w j is the connection weight; w = w 1j , · · · , w nj is the connection weight vector; and w 0j is the intercept for the jth hidden node. Details on the MLP are given in Günther and Fritsch [21].

Multilayer Perceptron (MLP)
ANN is a multilayered computing system for modeling complex nonlinear and multidimensional relationships. MLP, which is the most commonly applied ANN structure, consists of several layers. For hydrological applications, MLP with three layers, including input, output and hidden layers, is typically used, as shown from Figure 4. As described in Günther and Fritsch [21], three-layered MLP with J hidden nodes calculates Equation (9):  For MLP modeling, the number of hidden nodes and the type of activation function should be determined in advance. The optimal number of hidden nodes can be determined utilizing trial-anderror or optimization methods. The activation functions used in the MLP include logistic sigmoid, linear and hyperbolic tangent functions. Although the activation functions depend on the type of For MLP modeling, the number of hidden nodes and the type of activation function should be determined in advance. The optimal number of hidden nodes can be determined utilizing trial-and-error or optimization methods. The activation functions used in the MLP include logistic sigmoid, linear and hyperbolic tangent functions. Although the activation functions depend on the type of network and training algorithm, the logistic sigmoid activation function is often employed since it is real-valued, continuous, differentiable and computationally easy to perform. Furthermore, the logistic sigmoid activation function is often used to introduce nonlinear behavior in the MLP [21].

Support Vector Machine (SVM)
The SVM, which is a class of statistical models developed by Vapnik [22], is a supervised machine learning model for solving classification and regression problems. In a SVM, a nonlinear mapping function maps original data into a high-dimensional space. This section outlines the basic concept of SVM. Details on the SVM can found in Vapnik [22] and Noori et al. [23]. Given a training dataset , where x ∈ R m is the input vector of m component and y ∈ R is the target value, the SVM for regression can be formulated as Equation (10) [22]: where w is the weight vector; φ(x) is the mapping function; and b is the bias. The parameters, w and b, are estimated by minimizing Equation (11) [22]: where R reg is the regularized risk function; C is the regularization parameter; and L ε is the ε-insensitive loss function. The function can be written as Equation (12) [22]: where ε is the parameter of insensitive loss function. The hyperparameters, C and ε, should be determined beforehand. Thus, the SVM conducts linear regression in the high-dimensional space utilizing the ε-insensitive loss function. By introducing the non-negative slack variables ξ and ξ * , the R reg is converted into the optimization problem as in Equation (13) [22]: By introducing the dual set of Lagrange multipliers, α k and α * k , the SVM can be written as follows [22]: Thus, the non-linear regression function of SVM can be expressed as Equation (15) [22]: where x k is the support vector; m is the number of support vectors; and K(x k , x) = φ(x k ) · φ(x) is the kernel function. Figure 5 shows a three-layer SVM model architecture with eight inputs and one output. The radial basis function (RBF), which is suitable for regression problems, was used in this study. The function can be written as Equation (16) [22]: where γ is the kernel parameter (γ = 1/2p 2 ) and p is the width parameter.
Water 2017, 9, 525 7 of 24 output. The radial basis function (RBF), which is suitable for regression problems, was used in this study. The function can be written as Equation (16) [22]: where γ is the kernel parameter ( ) and p is the width parameter.
For SVM modeling, the optimal hyperparameters, including regularization, insensitive loss function and kernel parameters, should be selected in advance. Since the hyperparameters are difficult to determine by a trial-and-error approach, the optimal hyperparameters are usually selected utilizing optimization algorithms.

Genetic Algotrithm (GA)
GA, which is a class of evolutionary algorithms, is a stochastic optimization algorithm based on evolution strategy. GAs have been successfully applied to search for approximate or exact solutions to optimization problems [24]. A GA was applied to search the optimal hyperparameters of SVM in this study. The main genetic operators consist of selection, crossover and mutation operators. The selection operator chooses excellent chromosomes (set of candidate parameters), which are also called individuals, to be reproduced. The crossover operator exchanges genes (candidate parameters) between two chromosomes. The mutation operator determines whether a chromosome mutates to the next generation or not. The crossover and mutation operators generate new offspring and population (set of all parameters) in the next generation. The evolution process can be summarized as follows [25]:

•
Step 1. Generate an initial random population

•
Step 2. Compute the fitness typically proportional to the fitness.

•
Step 3. Reproduce new population For applying the GA optimization, the tuning parameters should be set in advance. The main tuning parameters include population size, number of generations, elite count, and crossover and For SVM modeling, the optimal hyperparameters, including regularization, insensitive loss function and kernel parameters, should be selected in advance. Since the hyperparameters are difficult to determine by a trial-and-error approach, the optimal hyperparameters are usually selected utilizing optimization algorithms.

Genetic Algotrithm (GA)
GA, which is a class of evolutionary algorithms, is a stochastic optimization algorithm based on evolution strategy. GAs have been successfully applied to search for approximate or exact solutions to optimization problems [24]. A GA was applied to search the optimal hyperparameters of SVM in this study. The main genetic operators consist of selection, crossover and mutation operators. The selection operator chooses excellent chromosomes (set of candidate parameters), which are also called individuals, to be reproduced. The crossover operator exchanges genes (candidate parameters) between two chromosomes. The mutation operator determines whether a chromosome mutates to the next generation or not. The crossover and mutation operators generate new offspring and population (set of all parameters) in the next generation. The evolution process can be summarized as follows [25]:

•
Step 1. Generate an initial random population {θ Step 2. Compute the fitness f (θ (k) i ) of each chromosome in the population, and assign probability p (k) i typically proportional to the fitness.

•
Step 4. Repeat from step 2 to step 3 until stop conditions are met. The algorithm yields θ * ≡ arg max i ) as the optimum. For applying the GA optimization, the tuning parameters should be set in advance. The main tuning parameters include population size, number of generations, elite count, and crossover and mutation rates. The population size is the number of chromosomes in population (typically, population size = 20-100). The number of generations is related to the improvement in the fitness function. The crossover rate is the probability that crossover will occur between chromosomes (typically, crossover rate = 0.80-0.95). The mutation rate is the probability that a mutation will occur in a parent chromosome (typically, mutation rate = 0.5-1.0) [25,26]. The elite count is the number of best fitness individuals to survive at each generation, which can be computed using the following equation [25]: where popSize is the population size and int(·) is the function which returns integer part.

River Stage Modeling Using DWT and MODWT
In river stage modeling using DWT and MODWT, the DWT and MODWT decompose original input signals (daily river stage data) into sub-signals (detail and approximation components). The sub-signals are then used as the inputs of single models, MLP and SVM. As depicted in Figure 6, the DWT-and MODWT-based river stage modeling approaches are comprised of a three-step algorithm. The algorithm is outlined as follows:

•
Step 1. Decompose original input signals into sub-signals (detail and approximation components) utilizing DWT and MODWT.

•
Step 2. Select effective inputs among the sub-signals.

•
Step 3. Train and test single models, MLP and SVM, utilizing the effective inputs.
Water 2017, 9, 525 8 of 24 mutation rates. The population size is the number of chromosomes in population (typically, population size = 20-100). The number of generations is related to the improvement in the fitness function. The crossover rate is the probability that crossover will occur between chromosomes (typically, crossover rate = 0.80-0.95). The mutation rate is the probability that a mutation will occur in a parent chromosome (typically, mutation rate = 0.5-1.0) [25,26]. The elite count is the number of best fitness individuals to survive at each generation, which can be computed using the following equation [25]: where popSize is the population size and int(·) is the function which returns integer part.

River Stage Modeling Using DWT and MODWT
In river stage modeling using DWT and MODWT, the DWT and MODWT decompose original input signals (daily river stage data) into sub-signals (detail and approximation components). The sub-signals are then used as the inputs of single models, MLP and SVM. As depicted in Figure 6, the DWT-and MODWT-based river stage modeling approaches are comprised of a three-step algorithm. The algorithm is outlined as follows: • Step 1. Decompose original input signals into sub-signals (detail and approximation components) utilizing DWT and MODWT.

•
Step 2. Select effective inputs among the sub-signals.

•
Step 3. Train and test single models, MLP and SVM, utilizing the effective inputs.

Model Efficiency Evaluation
The efficiencies of single, DWT-and MODWT-based river stage modeling approaches were assessed utilizing dimensionless and residual error-based indices (see Appendix A).

Selection of effective input variables Hyperparameter
Optimization -Genetic algorithm (GA) Figure 6. Flowchart of river stage modeling using DWT and MODWT.

Model Efficiency Evaluation
The efficiencies of single, DWT-and MODWT-based river stage modeling approaches were assessed utilizing dimensionless and residual error-based indices (see Appendix A). The CE, d and r 2 provide the measures of correlation between the estimated and observed data. The CE measures the capability of the model which estimates stage values different from the mean stage. The r 2 measures the variability of observed stage which is explained by a model. The d measures overall agreement between the estimated and observed data. The RMSE and MS4E measure goodness of fit at high stages, whereas the MSRE measures it at moderate stages. The MAE measures overall agreement between the estimated and observed data [27,28]. Higher dimensionless and lower residual error-based indices indicate whether a model produces superior efficiency to other models. Details on the indices can be found in Dawson and Wilby [27].

Model Effectiveness Evaluation
The effectiveness of single, DWT-and MODWT-based river stage modeling approaches are assessed utilizing average absolute relative error (AARE) and threshold statistics (TS) (see Appendix B). AARE and TS evaluate model effectiveness by measuring the predictive ability of a model. Furthermore, AARE and TS provide a more appropriate assessment since they give appropriate weight on all magnitude flows [29][30][31][32]. Lower AARE and higher TS values indicate that a model produces superior effectiveness to other models.

Model Development
One of the most important steps for developing single, DWT-and MODWT-based models is to select effective inputs. In this study, the optimal lags of inputs were determined based on average mutual information (AMI), autocorrelation function (ACF), partial autocorrelation function (PACF) and cross correlation function (CCF) [33,34]. The optimal lag for the river stage series of the Simcheon gauging station can be defined as a lag value at which the ACF, PACF and AMI show significant correlation. Specifically, the optimal lag is determined when the ACF reaches zero or a small value, or the PACF decays within the confidence interval, or the AMI attains the first minimum [33,34]. The optimal lags for other gauging stations were determined as lag values at which the CCFs between Simcheon and other gauging stations showed significant correlation, respectively. Based on the methods, the optimal lags were determined as lag 6 for Simcheon and lag 1 for the other gauging stations (Songcheon, Baekhwagyo and Hwanggan). Table 1 summarizes the input combination for developing the models. Table 1. Input combination for model development.
For the decomposing of input signals using DWT and MODWT, the decomposition level should be determined beforehand. In this study, the level L = 3 was determined based on Equation (3). Mother wavelets should be also selected ahead of time. Since the accuracy of DWT-and MODWT-based models depends on mother wavelets, nine mother wavelets were used, including Daublets (d6, d12 and d18), Symmlets (s6, s12 and s18) and Coiflets (c6, c12 and c18). The Daublet, also called Daubechies wavelet, is a collection of orthogonal wavelets with compact support. The Symmlet, also called least-asymmetric wavelet, is a modified version of Daublet which is proposed to enhance symmetry. The Symmlet is an orthogonal, continuous, compactly supported, but nearly symmetric wavelet. The Coiflet, also called Coifman wavelet, is a discrete wavelet with compact support which is more symmetric than the Daublet. Figures 7 and 8 show examples of sub-times series decomposed by three-level DWT and MODWT utilizing the c12 mother wavelet, respectively. The sub-time series include detail components (D 1 , D 2 and D 3 ) and an approximation component (A 3 ) based on the c12 mother wavelet.
Water 2017, 9,525 10 of 24 symmetry. The Symmlet is an orthogonal, continuous, compactly supported, but nearly symmetric wavelet. The Coiflet, also called Coifman wavelet, is a discrete wavelet with compact support which is more symmetric than the Daublet. Figures 7 and 8 show examples of sub-times series decomposed by three-level DWT and MODWT utilizing the c12 mother wavelet, respectively. The sub-time series include detail components (D1, D2 and D3) and an approximation component (A3) based on the c12 mother wavelet.
(e) Original river stage symmetry. The Symmlet is an orthogonal, continuous, compactly supported, but nearly symmetric wavelet. The Coiflet, also called Coifman wavelet, is a discrete wavelet with compact support which is more symmetric than the Daublet. Figures 7 and 8 show examples of sub-times series decomposed by three-level DWT and MODWT utilizing the c12 mother wavelet, respectively. The sub-time series include detail components (D1, D2 and D3) and an approximation component (A3) based on the c12 mother wavelet.
(a) D1 (e) Original river stage  For the MLP, DWT-and MODWT-MLP models, the number of hidden nodes was determined utilizing a trial-and-error approach as described in Seo et al. [8]. For selecting the optimal number of hidden nodes, the RMSE values of the MLP, DWT-MLP and MODWT-MLP models were estimated by varying the number of hidden nodes from 1 to 2k, where k is the number of input nodes. The optimal number of hidden nodes was determined based on the minimum RMSE. In this study, the output of each node was computed using the logistic sigmoid activation function. The MLP, DWT-MLP and MODWT-MLP models were trained using a backpropagation algorithm. In the algorithm, the connection weights are updated iteratively such that the overall error is decreased. Input and target data were normalized to the interval of (0, 1) for efficient training [21].
For SVM, DWT-SVM and MODWT-SVM models, the most significant step is to determine the optimal hyperparameters including regularization, insensitive loss function and kernel parameters. In this study, the optimal hyperparameters were selected using a GA. For applying the GA optimization, the tuning parameters of the GA should be set in advance. Considering the typical range of tuning parameters [25,26], they were set as follows: population size = 50, number of generations = 100, elite count = 3, crossover rate = 0.8 and mutation rate = 0.1.

Model Performance Assessment
The performance (efficiency and effectiveness) of single models (MLP and SVM), DWT-based models (DWT-MLP and DWT-SVM) and MODWT-based models (MODWT-MLP and MODWT-SVM) was evaluated utilizing performance criteria (efficiency and effectiveness indices). Table 2  For the MLP, DWT-and MODWT-MLP models, the number of hidden nodes was determined utilizing a trial-and-error approach as described in Seo et al. [8]. For selecting the optimal number of hidden nodes, the RMSE values of the MLP, DWT-MLP and MODWT-MLP models were estimated by varying the number of hidden nodes from 1 to 2k, where k is the number of input nodes. The optimal number of hidden nodes was determined based on the minimum RMSE. In this study, the output of each node was computed using the logistic sigmoid activation function. The MLP, DWT-MLP and MODWT-MLP models were trained using a backpropagation algorithm. In the algorithm, the connection weights are updated iteratively such that the overall error is decreased. Input and target data were normalized to the interval of (0, 1) for efficient training [21].
For SVM, DWT-SVM and MODWT-SVM models, the most significant step is to determine the optimal hyperparameters including regularization, insensitive loss function and kernel parameters. In this study, the optimal hyperparameters were selected using a GA. For applying the GA optimization, the tuning parameters of the GA should be set in advance. Considering the typical range of tuning parameters [25,26], they were set as follows: population size = 50, number of generations = 100, elite count = 3, crossover rate = 0.8 and mutation rate = 0.1.

Model Performance Assessment
The performance (efficiency and effectiveness) of single models (MLP and SVM), DWT-based models (DWT-MLP and DWT-SVM) and MODWT-based models (MODWT-MLP and MODWT-SVM) was evaluated utilizing performance criteria (efficiency and effectiveness indices). Table 2 summarizes the performance evaluation for the single, DWT-and MODWT-based models with higher performance for the overall stage. For example, MODWT-SVM2-c12 means MODWT-based SVM model for Set 2 and c12 mother wavelet. TSx means the threshold statistics for the absolute relative error level of x%.
For the overall stage, the performance of MODWT-SVM models was compared with that of single models. The MODWT-SVM models yielded higher dimensionless indices and slightly lower residual error-based indices than the single models. The AARE values of the MODWT-SVM models were lower than those of the single models. For the absolute relative error (ARE) levels of 0.01%, 0.02%, 0.05%, 0.10% and 0.50%, the TS values of the MODWT-SVM models were higher than those of the single models. These results indicated that the MODWT-SVM models achieved better efficiency and effectiveness than the single models for the overall stage, based on the statistical indices.
The performance of the MODWT-SVM models was compared with that of DWT-based models for the overall stage. The MODWT-SVM models yielded slightly higher dimensionless indices and slightly lower residual error-based indices than the DWT-based models, except for MS4E. The AARE values of the MODWT-SVM models were slightly lower than those of the DWT-based models. For the ARE levels of 0.01%, 0.02%, 0.05%, 0.10% and 0.50%, the TS values of the MODWT-SVM models were mostly higher than those of the DWT-based models. These results demonstrated that the MODWT-SVM models achieved slightly better efficiency and effectiveness than the DWT-based models for the overall stage, based on statistical indices.
For the overall stage, the performance of the MODWT-SVM models was compared with that of the MODWT-MLP models. The MODWT-SVM models yielded slightly higher dimensionless indices and slightly lower residual error-based indices than MODWT-MLP models. The AARE values of the MODWT-SVM models were slightly lower than those of the MODWT-MLP models. For the ARE levels of 0.01%, 0.02%, 0.05%, 0.10% and 0.50%, the MODWT-SVM models yielded higher TS values than the MODWT-MLP models. From these results, it was found that the MODWT-SVM models performed better than the MODWT-MLP models for the overall stage, in terms of model efficiency and effectiveness.
When all the models were compared for the overall stage, the MODWT-SVM models yielded better dimensionless indices and lower residual error-based indices than the other models, except for MS4E. The AARE values of the MODWT-SVM models were lower than those of the other models.
The TS values of the MODWT-SVM models were mostly higher than those of the other models. These results demonstrated that the MODWT-SVM models produced better efficiency and effectiveness than the other models for the overall stage. For the overall stage, the MODWT-SVM2-c12 model achieved the best efficiency, and the MODWT-SVM2-c12 and the MODWT-SVM1-c12 models produced the best effectiveness among all the models. For more specific model comparison, the model performance was evaluated for low, medium and high stages. Table 3 summarizes performance evaluation for single, DWT-and MODWT-based models with higher performance for the low stage. For the low stage, the performance of the MODWT-SVM models was compared with that of the single models. The MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices than the single models for the low stage, except for MS4E. The AARE values of the MODWT-SVM models were lower than those of the single models. For the ARE levels of 0.01%, 0.02%, 0.05% and 0.10%, the TS values of the MODWT-SVM models were higher than those of the single models. These results indicated that the MODWT-SVM models achieved better efficiency and effectiveness than the single models for the low stage, based on the statistical indices.
The performance of the MODWT-SVM models was compared with that of the DWT-based models for the low stage. The MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices than the DWT-based models. The AARE values of the MODWT-SVM models were lower than those of the DWT-based models. For the ARE levels of 0.01%, 0.02%, 0.05% and 0.10%, the MODWT-SVM models yielded higher TS values than the DWT-based models. These results demonstrated that the MODWT-SVM models achieved better efficiency and effectiveness than the DWT-based models for the low stage, based on the statistical indices.
For the low stage, the performance of the MODWT-SVM models was compared with that of the MODWT-MLP models. The MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices than the MODWT-MLP models. Also, the MODWT-SVM models produced lower AARE values than the MODWT-MLP models. For the ARE levels of 0.01%, 0.02% and 0.05%, the TS values of the MODWT-SVM models were higher than those of the MODWT-MLP models. These results indicated that the MODWT-SVM models achieved better efficiency and effectiveness than the MODWT-MLP models for the low stage, based on the statistical indices.
When all the models were compared for the low stage, the MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices, except for MS4E. The AARE values of the MODWT-SVM models were lower than those of the other models. The TS values of the MODWT-SVM models were mostly higher than those of the other models. From these results, it was found that the MODWT-SVM models produced better efficiency and effectiveness than the other models for the low stage. Among all the models, the MODWT-SVM2-c12 and MODWT-SVM1-c12 models achieved the best efficiency and effectiveness for the low stage. Table 4 summarizes the performance evaluation for the single, DWT-and MODWT-based models with higher performance for the medium stage. For the medium stage, the performance of the MODWT-SVM models was compared with that of the single models. The MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices than the single models. Also, the MODWT-SVM models produced lower AARE and higher TS values for the ARE levels of 0.01%, 0.02%, 0.05% and 0.10%. These results indicated that the MODWT-SVM models achieved better efficiency and effectiveness than the single models for the medium stage.
The performance of the MODWT-SVM models was compared with that of the DWT-based models for the medium stage. The MODWT-SVM models yielded higher dimensionless indices than the DWT-SVM models, but lower dimensionless indices than the DWT-MLP models. Also, the MODWT-SVM models produced lower residual error-based indices than the DWT-SVM models, but higher residual error-based indices than the DWT-MLP models, except for MAE. The AARE values of the MODWT-SVM models were lower than those of the DWT-SVM models. The MODWT-SVM models, except for the MODWT-SVM1-s18 model, yielded lower AARE values than the DWT-MLP models. For the ARE levels of 0.01%, 0.02%, 0.05% and 0.10%, the TS values of the MODWT-SVM models were higher than those of the DWT-SVM models. The MODWT-SVM models, except for the MODWT-SVM1-s18 model, yielded higher TS values than the DWT-MLP models for the ARE levels of 0.01% and 0.02%, whereas the MODWT-SVM models produced lower TS values than the DWT-MLP models for the ARE levels of 0.05% and 0.10%. These results indicated that the DWT-MLP models performed better than the MODWT-SVM models for the medium stage, in terms of model efficiency and effectiveness.
For the medium stage, the performance of the MODWT-SVM models was compared with that of the MODWT-MLP models. The MODWT-SVM models, except for the MODWT-SVM1-s18 model, yielded slightly higher dimensionless indices than the MODWT-MLP models. Also, the MODWT-SVM models, except for the MODWT-SVM1-s18 model, produced slightly lower residual error-based indices, except for MS4E in the MODWT-MLP models. The AARE values of the MODWT-SVM models were lower than those of the MODWT-MLP models. For the ARE levels of 0.01%, 0.02% and 0.05%, the MODWT-SVM models yielded higher TS values than the MODWT-MLP models. These results indicated that the MODWT-SVM models, except for the MODWT-SVM1-s18 model, achieved better efficiency than the MODWT-MLP models for the medium stage. Also, the MODWT-SVM models performed better than the MODWT-MLP models for the medium stage, in terms of model effectiveness.
When all the models were compared for the medium stage, the DWT-MLP models yielded higher dimensionless indices and lower residual error-based indices than the other models, except for the MAE of the MODWT-SVM2-c12 and the MODWT-SVM1-c12 models. The MODWT-SVM2-c12 and MODWT-SVM1-c12 models yielded better effectiveness indices than the other models. From these results, it was found that the DWT-MLP models produced better efficiency than other models, whereas the MODWT-SVM2-c12 and MODWT-SVM1-c12 models achieved better effectiveness than the other models for the medium stage. Among all the models, the DWT-MLP1-d18 model achieved the best efficiency and the MODWT-SVM2-c12 model produced the best effectiveness for the medium stage, based on the statistical indices. Table 5 summarizes performance evaluation for the single, DWT-and MODWT-based models with higher performance for the high stage. For the high stage, the performance of the MODWT-SVM models was compared with that of the single models. The MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices than the single models. The AARE values of the MODWT-SVM models were lower than those of the single models. For the ARE levels of 0.01%, 0.02%, 0.05%, 0.10% and 0.50%, the TS values of the MODWT-SVM models were higher than those of the single models. These results demonstrated that the MODWT-SVM models achieved better efficiency and effectiveness than the single models for the high stage, based on the statistical indices.
The performance of the MODWT-SVM models was compared with that of the DWT-based models for the high stage. Although the MODWT-SVM1-s18 model yielded slightly higher dimensionless indices and slightly lower residual error-based indices than the DWT-MLP models, the MODWT-SVM and the DWT-MLP models produced similar dimensionless indices and residual error-based indices.
The AARE values of the MODWT-SVM models were lower than those of the DWT-MLP models. For the ARE levels of 0.01%, 0.02% and 0.05%, the TS values of the MODWT-SVM models were mostly higher than those of the DWT-MLP models. These results indicated that the MODWT-SVM and DWT-MLP models achieved similar efficiency, whereas the MODWT-SVM models produced better effectiveness than the DWT-MLP models for the higher stage, based on the statistical indices. The MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices than the DWT-SVM models. The AARE values of the MODWT-SVM models were lower than those of the DWT-SVM models. For the ARE levels of 0.01%, 0.02%, 0.05% and 0.10%, the TS values of the MODWT-SVM models were higher than those of the DWT-SVM models. These results demonstrated that the MODWT-SVM models achieved better efficiency and effectiveness than the DWT-SVM models for the high stage, based on the statistical indices.
For the high stage, the performance of the MODWT-SVM models was compared with that of the MODWT-MLP models. The MODWT-SVM models yielded higher dimensionless indices and lower residual error-based indices than the MODWT-MLP models. The AARE values of the MODWT-SVM models were lower than those of the MODWT-MLP models. For the ARE levels of 0.01%, 0.02%, 0.05% and 0.10%, the MODWT-SVM models yielded higher TS values than the MODWT-MLP models. These results indicated that the MODWT-SVM models achieved better efficiency and effectiveness than the MODWT-MLP models for the high stage, based on model performance.
When all the models were compared for the high stage, the MODWT-SVM and DWT-MLP models achieved better efficiency and effectiveness than the other models. Among all the models, the MODWT-SVM1-s18 and MODWT-SVM2-c12 models were the best for the high stage, in terms of efficiency and effectiveness.

Graphical Comparison
This study compared the accuracy of single, DWT-and MODWT-based models graphically. The graphical comparison included scatter plots, error time series plots and error boxplots. Figures 9-11 show the scatter plots for the single, DWT-and MODWT-based models during the testing period.
MODWT-MLP models. These results indicated that the MODWT-SVM models achieved better efficiency and effectiveness than the MODWT-MLP models for the high stage, based on model performance.
When all the models were compared for the high stage, the MODWT-SVM and DWT-MLP models achieved better efficiency and effectiveness than the other models. Among all the models, the MODWT-SVM1-s18 and MODWT-SVM2-c12 models were the best for the high stage, in terms of efficiency and effectiveness.

Graphical Comparison
This study compared the accuracy of single, DWT-and MODWT-based models graphically. The graphical comparison included scatter plots, error time series plots and error boxplots. Figures 9-11 show the scatter plots for the single, DWT-and MODWT-based models during the testing period.   MODWT-MLP models. These results indicated that the MODWT-SVM models achieved better efficiency and effectiveness than the MODWT-MLP models for the high stage, based on model performance. When all the models were compared for the high stage, the MODWT-SVM and DWT-MLP models achieved better efficiency and effectiveness than the other models. Among all the models, the MODWT-SVM1-s18 and MODWT-SVM2-c12 models were the best for the high stage, in terms of efficiency and effectiveness.

Graphical Comparison
This study compared the accuracy of single, DWT-and MODWT-based models graphically. The graphical comparison included scatter plots, error time series plots and error boxplots. Figures 9-11 show the scatter plots for the single, DWT-and MODWT-based models during the testing period.     From Figures 9 and 11d-f, the scatter points of the MODWT-SVM models were closer to y = x lines (blue lines) than those of the single models. The best-fitting lines (red lines) of the MODWT-SVM models were closer to the y = x lines than those of the single models. These results indicated that the MODWT-SVM models were more accurate than the single models. From Figures 10 and 11, the scatter points of the MODWT-SVM and DWT-MLP models were located closer to the y = x lines than those of the DWT-SVM and MODWT-MLP models. The best-fitting lines of the MODWT-SVM and DWT-MLP models were closer to the y = x lines than those of the DWT-SVM and MODWT-MLP models. These results indicated that the MODWT-SVM and DWT-MLP models were more accurate than the DWT-SVM and MODWT-MLP models. Figures 12-14 show error time series plots and error boxplots for the single, DWT-and MODWTbased models during the testing period. The error is defined as the difference between the estimated and observed river stage time series as follows: From Figures 9 and 11d-f, the scatter points of the MODWT-SVM models were closer to y = x lines (blue lines) than those of the single models. The best-fitting lines (red lines) of the MODWT-SVM models were closer to the y = x lines than those of the single models. These results indicated that the MODWT-SVM models were more accurate than the single models. From Figures 10 and 11, the scatter points of the MODWT-SVM and DWT-MLP models were located closer to the y = x lines than those of the DWT-SVM and MODWT-MLP models. The best-fitting lines of the MODWT-SVM and DWT-MLP models were closer to the y = x lines than those of the DWT-SVM and MODWT-MLP models. These results indicated that the MODWT-SVM and DWT-MLP models were more accurate than the DWT-SVM and MODWT-MLP models. Figures 12-14 show error time series plots and error boxplots for the single, DWT-and MODWT-based models during the testing period. The error is defined as the difference between the estimated and observed river stage time series as follows: where Err i is the ith error; S * i is the ith estimated river stage value; and S i is the ith observed river stage value. The error boxplots summarize the distribution of the error values graphically. From Figures 12 and 13d-f, the errors of the MODWT-SVM models were lower than those of the single models. From Figures 13 and 14, the errors of the DWT-MLP and MODWT-SVM models were lower than those of the MODWT-MLP and DWT-SVM models. From Figures 12-14, it can be seen that the MODWT-SVM models produced lower errors than the other models. These results indicated that the MODWT-SVM models were more accurate than the single models. The DWT-MLP and MODWT-SVM models produced more accurate results than the MODWT-MLP and DWT-SVM models. Also, the MODWT-SVM models were more accurate than the other models, based on the graphical comparison. Consequently, the MODWT-SVM and DWT-MLP-d18 models were found to produce better performance and accuracy than the other models, based on the performance assessment and graphical comparison. The MODWT-SVM1-c12 model was the optimal model among all the models. These indicated that the model performance depended on the input combination and mother wavelets, and the MODWT-SVM model using the c12 mother wavelet can improve model performance and accuracy in daily river stage modeling. MODWT-SVM models produced lower errors than the other models. These results indicated that the MODWT-SVM models were more accurate than the single models. The DWT-MLP and MODWT-SVM models produced more accurate results than the MODWT-MLP and DWT-SVM models. Also, the MODWT-SVM models were more accurate than the other models, based on the graphical comparison. Consequently, the MODWT-SVM and DWT-MLP-d18 models were found to produce better performance and accuracy than the other models, based on the performance assessment and graphical comparison. The MODWT-SVM1-c12 model was the optimal model among all the models. These indicated that the model performance depended on the input combination and mother wavelets, and the MODWT-SVM model using the c12 mother wavelet can improve model performance and accuracy in daily river stage modeling. MODWT-SVM models produced lower errors than the other models. These results indicated that the MODWT-SVM models were more accurate than the single models. The DWT-MLP and MODWT-SVM models produced more accurate results than the MODWT-MLP and DWT-SVM models. Also, the MODWT-SVM models were more accurate than the other models, based on the graphical comparison. Consequently, the MODWT-SVM and DWT-MLP-d18 models were found to produce better performance and accuracy than the other models, based on the performance assessment and graphical comparison. The MODWT-SVM1-c12 model was the optimal model among all the models. These indicated that the model performance depended on the input combination and mother wavelets, and the MODWT-SVM model using the c12 mother wavelet can improve model performance and accuracy in daily river stage modeling.  MODWT-SVM models produced lower errors than the other models. These results indicated that the MODWT-SVM models were more accurate than the single models. The DWT-MLP and MODWT-SVM models produced more accurate results than the MODWT-MLP and DWT-SVM models. Also, the MODWT-SVM models were more accurate than the other models, based on the graphical comparison. Consequently, the MODWT-SVM and DWT-MLP-d18 models were found to produce better performance and accuracy than the other models, based on the performance assessment and graphical comparison. The MODWT-SVM1-c12 model was the optimal model among all the models. These indicated that the model performance depended on the input combination and mother wavelets, and the MODWT-SVM model using the c12 mother wavelet can improve model performance and accuracy in daily river stage modeling.

Conclusions
This study proposes a conjunction model of MODWT, SVM and GA for modeling daily river stages. MODWT was adopted for decomposing an original river stage time series into sub-time series (detail and approximation components). The SVM computed the daily river stages using sub-time series as inputs. The GA was adopted for selecting the optimal hyperparameters of the SVM. The performance of the MODWT-SVM models was compared with that of the single models (MLP3 and SVM2 models), DWT-based models (DWT-MLP and DWT-SVM models) and the MODWT-MLP models. The model performance for the overall stage was assessed based on the statistical indices (efficiency and effectiveness indices) and a graphical comparison. Furthermore, the model performance was assessed more specifically for the low, medium and high stages based on the statistical indices. The main conclusions are summarized as follows: (1) For the overall stage, the MODWT-SVM models achieve better efficiency and effectiveness based on the statistical indices, and are more accurate than the single models based on the graphical comparison. For the low, medium and high stages, the MODWT-SVM models perform better than the single models, in terms of efficiency and effectiveness. These results indicate that the conjunction of MODWT, SVM and GA can improve the performance of SVM models and outperform single models in daily river stage modeling. (2) For the overall stage, the MODWT-SVM models achieve better efficiency and effectiveness based on the statistical indices, and are more accurate than the MODWT-MLP and DWT-based models based on the graphical comparison. For the low and high stages, the MODWT-SVM models performed better than the MODWT-MLP and DWT-based models, in terms of efficiency and effectiveness. For the medium stage, the DWT-MLP models outperform the MODWT-SVM models, in terms of the statistical indices. These results demonstrate that the MODWT-based models using the SVM model can improve model performance and accuracy better than those using the MLP model in daily river stage modeling. Also, hybrid models coupling MODWT, SVM and GA can enhance model performance and accuracy in daily river stage modeling as compared with those combined with DWT. (3) The MODWT-SVM2-c12 model achieves the best efficiency for the overall, low and high stages, based on the efficiency indices; the MODWT-SVM1-c12 model for the low stage; the DWT-MLP1-d18 model for the medium stage; and the MODWT-SVM1-s18 model for the high stage. Also, the MODWT-SVM1-c12 model achieves the best effectiveness for the overall and low stages; the MODWT-SVM2-c12 model for the overall, low, medium and high stages; and the MODWT-SVM1-s18 model for the high stage. These results indicate that the performance of the MODWT-SVM models is dependent on input combination and mother wavelets. Furthermore, the MODWT-SVM model using the c12 mother wavelet can improve model efficiency and effectiveness in daily river stage modeling. Therefore, the results obtained from this study demonstrate that the conjunction of MODWT, SVM and GA can be an efficient and effective method for modeling daily river stages.
This study investigated the performance of single and hybrid models for a single watershed. In order to enhance the applicability of the models, a hydrological modeling approach which utilizes the river stage modeling approach that was proposed in this study for different hydrological, geographical and climate conditions can be suggested for future study.

Conclusions
This study proposes a conjunction model of MODWT, SVM and GA for modeling daily river stages. MODWT was adopted for decomposing an original river stage time series into sub-time series (detail and approximation components). The SVM computed the daily river stages using sub-time series as inputs. The GA was adopted for selecting the optimal hyperparameters of the SVM. The performance of the MODWT-SVM models was compared with that of the single models (MLP3 and SVM2 models), DWT-based models (DWT-MLP and DWT-SVM models) and the MODWT-MLP models. The model performance for the overall stage was assessed based on the statistical indices (efficiency and effectiveness indices) and a graphical comparison. Furthermore, the model performance was assessed more specifically for the low, medium and high stages based on the statistical indices. The main conclusions are summarized as follows: (1) For the overall stage, the MODWT-SVM models achieve better efficiency and effectiveness based on the statistical indices, and are more accurate than the single models based on the graphical comparison. For the low, medium and high stages, the MODWT-SVM models perform better than the single models, in terms of efficiency and effectiveness. These results indicate that the conjunction of MODWT, SVM and GA can improve the performance of SVM models and outperform single models in daily river stage modeling. (2) For the overall stage, the MODWT-SVM models achieve better efficiency and effectiveness based on the statistical indices, and are more accurate than the MODWT-MLP and DWT-based models based on the graphical comparison. For the low and high stages, the MODWT-SVM models performed better than the MODWT-MLP and DWT-based models, in terms of efficiency and effectiveness. For the medium stage, the DWT-MLP models outperform the MODWT-SVM models, in terms of the statistical indices. These results demonstrate that the MODWT-based models using the SVM model can improve model performance and accuracy better than those using the MLP model in daily river stage modeling. Also, hybrid models coupling MODWT, SVM and GA can enhance model performance and accuracy in daily river stage modeling as compared with those combined with DWT. (3) The MODWT-SVM2-c12 model achieves the best efficiency for the overall, low and high stages, based on the efficiency indices; the MODWT-SVM1-c12 model for the low stage; the DWT-MLP1-d18 model for the medium stage; and the MODWT-SVM1-s18 model for the high stage. Also, the MODWT-SVM1-c12 model achieves the best effectiveness for the overall and low stages; the MODWT-SVM2-c12 model for the overall, low, medium and high stages; and the MODWT-SVM1-s18 model for the high stage. These results indicate that the performance of the MODWT-SVM models is dependent on input combination and mother wavelets. Furthermore, the MODWT-SVM model using the c12 mother wavelet can improve model efficiency and effectiveness in daily river stage modeling. Therefore, the results obtained from this study demonstrate that the conjunction of MODWT, SVM and GA can be an efficient and effective method for modeling daily river stages.
This study investigated the performance of single and hybrid models for a single watershed. In order to enhance the applicability of the models, a hydrological modeling approach which utilizes the river stage modeling approach that was proposed in this study for different hydrological, geographical and climate conditions can be suggested for future study.
Author Contributions: Youngmin Seo designed this research, reviewed literature, developed models, and prepared this manuscript; Yunyoung Choi reviewed model conceptualization and revised the manuscript; Jeongwoo Choi supervised the research and revised the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Model Efficiency Indices
Coefficient of efficiency (CE): Index of agreement (d): Coefficient of determination (r 2 ): Root-mean-square error (RMSE): where S * i is the ith estimated river stage value, S i is the ith observed river stage value, S is the average of the observed river stage values, S is the average of the estimated river stage values, and N is the data length.

Appendix B. Model Effectiveness Indices
Average absolute relative error (AARE): Threshold statistics (TS): where n x is the total number of estimated river stage data in which the absolute relative error is less than x%.