A Novel Runoff Forecasting Model Based on the Decomposition-Integration-Prediction Framework

: Runoff forecasting is of great importance for ﬂood mitigation and power generation plan preparation. To explore the better application of time-frequency decomposition technology in runoff forecasting and improve the prediction accuracy, this research has developed a framework of runoff forecasting named Decomposition-Integration-Prediction (DIP) using parallel-input neural network, and proposed a novel runoff forecasting model with Variational Mode Decomposition (VMD), Gated Recurrent Unit (GRU), and Stochastic Fractal Search (SFS) algorithm under this framework. In this model, the observed runoff series is ﬁrst decomposed into several sub-series via the VMD method to extract different frequency information. Secondly, the parallel layers in the parallel-input neural network based on GRU are trained to receive the input samples of each subcomponent and integrate their output adaptively through the concatenation layers. Finally, the output of concatenation layers is treated as the ﬁnal runoff forecasting result. In this process, the SFS algorithm was adopted to optimize the structure of the neural network. The prediction performance of the proposed model was evaluated using the historical monthly runoff data at Pingshan and Yichang hydrological stations in the Upper Yangtze River Basin of China, and seven various single and decomposition-based hybrid models were developed for comparison. The results show that the proposed model has obvious advantages in overall prediction performance, model training time, and multi-step-ahead prediction compared to several comparative methods, which is a reasonable and more efﬁcient monthly runoff forecasting method based on time series decomposition and neural networks.


Introduction
Flood control and power generation are the two important operation objectives of reservoirs; accurate runoff forecasting provides reliable information of inflow in the forecast period so as to maximize the flood control benefits of the reservoir. It can also provide effective and exact input conditions for the preparation of power generation plans of hydropower stations to guide the economic and optimized operation of hydropower stations [1]. Therefore, accurate runoff forecasting is crucial. However, the formation process of hydrological runoff is affected by rainfall, evaporation, underlying surface conditions, and human activities. The runoff time series usually shows nonlinear and non-stationary characteristics [2][3][4][5]. In addition, many factors can impact the streamflow from upstream to downstream, such as tributaries, agricultural utilization, and dams; accurate runoff prediction is thus difficult and challenging [6][7][8].
In the past few decades, several runoff forecasting methods and models have been developed, ranging from conceptual models based on physical formation processes to fully data-driven models [9][10][11][12]. Although conceptual models can help people understand the process of runoff formation clearly, they need substantial hydrological, meteorological, The remainder of this paper is organized as follows: Section 2 briefly introduces the above-mentioned methods and presents the details of the proposed framework for monthly runoff forecasting. Section 3 contains several prediction performance evaluation indicators selected. Section 4 presents the case studies of the monthly runoff forecasting for the proposed method and contrast methods. Section 5 carries out result analysis and discussion, and the conclusions are provided in Section 6. Abbreviation descriptions of this paper are shown in Table A1.

Variational Mode Decomposition
To address the limitations of EMD for sensitivity to noise and sampling, Variational Mode Decomposition (VMD), an adaptive nonstationary signal decomposition method, was proposed by Dragomiretskiy and Zosso [37]. EMD uses recursive decomposition to extract sub-signals with different frequencies [46]. In contrast, the significant difference is that VMD uses the non-recursive method to transform the signal decomposition into a constrained variational problem. Its goal is to minimize the sum of the estimated bandwidth of each mode and uses the Alternate Direction Method of Multipliers (ADMM) algorithm to continuously update each mode and its center frequency, and gradually shift the mode's frequency spectrum to "baseband". Finally, each mode and its center frequency are extracted together. In this process, we need to assume that each mode is a finite bandwidth with different center frequencies. The constrained variational problem can be expressed as follows: where u k is the kth mode, w k is the center frequency of the kth mode, δ(t) is the Dirac distribution, × denotes convolution, and f is the given observed signal.
In order to render the problem unconstrained. The quadratic penalty factor and Lagrangian multiplier are introduced to transform the constrained variational problem into an unconstrained problem of the following equation: where α and λ are the penalty parameter and Lagrange multiplier, respectively. Parameter α can effectively reduce the interference of Gaussian noise, and λ can enhance the strictness of the constraint. Therefore, the original minimization problem is now converted to an alternative model, in which, ADMM is adopted to seek the saddle point of the Lagrangian function by alternately updating the mode u n+1 k , frequency ω n+1 k , and Lagrangian multiplier λ n+1 . The iterative equations of u n+1 k , ω n+1 k , λ n+1 , are as follows: where n is the number of iterations,f (ω),û i (ω),λ(ω), andû n+1 k denotes the Fourier transforms of f (t), u i (t), λ(t), and u n+1 k , respectively.

Gated Recurrent Unit
Gated Recurrent Unit (GRU) [26,27] is a neural network model first proposed by Cho et al. in 2014 to capture the long-term dependence of time series. GRU network can be regarded as a very effective variant of the LSTM network. Its structure is simpler than the LSTM network and it requires less time for model training [28]. Therefore, the GRU model has been widely used in the field of artificial intelligence [27,47], especially in processing time series [48,49]. LSTM import three gate operations: input gate, forgetting gate, and output gate to control input value, storage value, and output value [50]. In the GRU model, gate operations are simplified to update gate and reset gate. The specific structure of GRU is shown in the following Figure 1: Gated Recurrent Unit (GRU) [26,27] is a neural network model first proposed by Cho et al. in 2014 to capture the long-term dependence of time series. GRU network can be regarded as a very effective variant of the LSTM network. Its structure is simpler than the LSTM network and it requires less time for model training [28]. Therefore, the GRU model has been widely used in the field of artificial intelligence [27,47], especially in processing time series [48,49]. LSTM import three gate operations: input gate, forgetting gate, and output gate to control input value, storage value, and output value [50]. In the GRU model, gate operations are simplified to update gate and reset gate. The specific structure of GRU is shown in the following Figure 1 GRU cell is mainly composed of update gate t z and reset gate t r . The update gate controls the degree to which the state information at the previous time step will be brought into the current time step. The larger the value of the update gate is, the more the state information at the previous time step is brought in; reset gate is used to control how much information from the previous state is written into the current candidate set. More historical information will be ignored when the reset gate approaches the biggest value 1. The calculation equation between variables is as follows:

Stochastic Fractal Search Optimization
Stochastic Fractal Search algorithm (SFS) searches the optimal solution by the diffusion property of fractal [51][52][53]. The algorithm mainly includes two processes: diffusion process and update process. In diffusion process, the Gaussian distribution is selected as the random walk mode, in which each individual is diffused around its current position to generate a new generation. This step can be regarded as the exploitation phase of SFS algorithm. For each individual that diffuses, the position of each new individual is created GRU cell is mainly composed of update gate z t and reset gate r t . The update gate controls the degree to which the state information at the previous time step will be brought into the current time step. The larger the value of the update gate is, the more the state information at the previous time step is brought in; reset gate is used to control how much information from the previous state is written into the current candidate set. More historical information will be ignored when the reset gate approaches the biggest value 1. The calculation equation between variables is as follows: where, subscript t denotes the current time step, W z , W r , W h represents the weights matrixes of the reset gate, the update gate, the hidden state in the GRU cell, respectively. Parameters b z , b r , and b h denotes the different biases corresponding to the different weight matrices. Parameter h t is the calculated element of the hidden state (h t ), and the symbol • is an element-wise multiplication.

Stochastic Fractal Search Optimization
Stochastic Fractal Search algorithm (SFS) searches the optimal solution by the diffusion property of fractal [51][52][53]. The algorithm mainly includes two processes: diffusion process and update process. In diffusion process, the Gaussian distribution is selected as the random walk mode, in which each individual is diffused around its current position to generate a new generation. This step can be regarded as the exploitation phase of SFS algorithm. For each individual that diffuses, the position of each new individual is created through Gaussian walking and find the best individual among all individuals. The Gaussian walking function can be expressed as one of the following two equations: where ε and ε are random numbers between [0, 1], BP is the optimal individual, P i is the position of the ith particle, µ BP and σ are Gaussian distribution parameters. After, the update process is carried out, which can be considered as the exploration stage of the SFS algorithm. The process consists of two updates. The first update is for individual components, and the chance of individual component change is determined by sorting individual fitness values and calculating performance indicators. The better the where P i is the new modified position of P i , P r and P t are random selected individuals in the population. The second update process aims to change the position of an individual by considering the position of other individuals in the population. This procedure improves the quality of exploration and thus meets the attributes of diversity. Before the second update process begins, all individuals obtained from the first update process need to be sorted again based on Equation (10), and then modify the position of P i as follows: where P i is the position of the individual after the second update, P r and P t are two different random points selected from the group, BP is the best point and ε is a random number within the range [0, 1].

The Novel Runoff Forecasting Model Based on DIP Framework
As a result of climate change and human activities, hydrological time series are often nonlinear and nonstationary [2,54]. To explore the better application of time series decomposition technology in runoff forecasting and enhance the prediction accuracy effectively, we developed a framework of runoff forecasting named Decomposition-Integration-Prediction (DIP) using a parallel-input neural network, and proposed a novel runoff forecasting model using VMD-GRU-SFS (DIP) with VMD, GRU, and SFS algorithms under this framework, as shown in the Figure 2, the parallel-input neural network structure illustrated in Figure 3.  In the developed DIP framework, the observed runoff series is first decomposed into several sub-series via the time series decomposition method to extract different frequency information and reduce the difficulty of the model prediction. Secondly, the parallel layers of the neural network to obtain the best prediction performance. Under the DIP framework, VMD, GRU, and SFS are adopted to form a model of a novel runoff forecasting hybrid model VMD-GRU-SFS (DIP), in which VMD is used to decompose the original runoff series, GRU is selected to establish the parallel-input neural network, SFS is adopted to optimize hyper-parameter of the model. Meanwhile, it is important to note that when we train the model, the output samples of the model are the raw observed series, not the sub-series of decomposition.  The structure of the constructed parallel-input neural network mainly includes two composite layers. The first layers are the parallel layers, in which several parallel neural networks are used to receive the input variable samples of each runoff sub-components. The second layers are the concatenation layers, the output of each parallel neural network is adaptively aggregated through the concatenation layer network to output the predicted runoff value. In the developed DIP framework, the observed runoff series is first decomposed into several sub-series via the time series decomposition method to extract different frequency information and reduce the difficulty of the model prediction. Secondly, the parallel layers in the parallel-input neural network are trained to receive the input samples of each subcomponent and integrate their output adaptively through the concatenation layers. Finally, the output of concatenation layers is treated as the final runoff forecasting result. In this process, the hyper-parameter optimization algorithm is used to optimize the structure of the neural network to obtain the best prediction performance. Under the DIP framework, VMD, GRU, and SFS are adopted to form a model of a novel runoff forecasting hybrid model VMD-GRU-SFS (DIP), in which VMD is used to decompose the original runoff series, GRU is selected to establish the parallel-input neural network, SFS is adopted to optimize hyper-parameter of the model. Meanwhile, it is important to note that when we train the model, the output samples of the model are the raw observed series, not the sub-series of decomposition.

EER REVIEW 7 of 23
The structure of the constructed parallel-input neural network mainly includes two composite layers. The first layers are the parallel layers, in which several parallel neural networks are used to receive the input variable samples of each runoff sub-components. The second layers are the concatenation layers, the output of each parallel neural network is adaptively aggregated through the concatenation layer network to output the predicted runoff value.
The overall procedure used by the proposed forecasting method can be described as the following steps.
(1) Data Pre-processing. Data cleaning is first used to process the runoff time series if the consistency check result is rejected. Next, we use VMD to decompose the runoff data into a set of sub-series with different frequencies. For optimal selection of decomposition level, check whether the last extracted component exhibits centralfrequency aliasing. Then, the data in all sub-series are divided into training and testing datasets that are normalized to a range of [0, 1] as where x norm and x i are the normalized and raw value of the ith runoff data sample, respectively. Parameters x max and x min are the maximum and minimum of data samples. (2) Lag period selection. For each sub-series decomposition by VMD, Partial Autocorrelation Function (PACF), is used to select the optimal Lag period for generating modeling samples, where the PACF lag count is set to 36. Assuming x k (t) is the value to be predicted of the kth runoff component, we select the values which PACF exceeds To improve the forecasting performance, SFS is applied to optimize the hyper-parameters of the proposed framework. In this work, we mainly consider the numbers of hidden layers and hidden nodes for parallel layers and concatenation layers in parallel-input neural network. The number of hidden layers is initialized from 1 to 3 and the number of hidden nodes ranging from 5 to 100. (4) Forecasting application. In the third step, the parallel-input neural network has been trained. In forecasting application, the prediction result of the model should be denormalized to obtain the final predicted value.

Model Evaluation
In this section, four statistical indicators are employed for evaluating the forecasting performance of the proposed framework, namely Root Mean Square Error (RMSE), Normalized Root Mean Square Error (NRMSE), Mean Absolute Percentage Error (MAPE), and Nash-Sutcliffe Efficiency coefficient (NSE). RMSE and NRMSE can effectively describe the loss of regression; MAPE is used to test the relative error between the predicted value and the observed value; NSE is adopted to measure the consistency between the runoff forecast process and the actual process, and can evaluate the stability of the prediction results. Generally, forecast models with larger values of NSE or smaller values of RMSE, NRMSE, and MAPE provide better forecasting performance. These four performance metrics are defined as follows: where n is the length of observed runoff series,ŷ i stands for the forecasting value, y i represents the observed value, y denotes the average observed value of runoff, y max and y min represent the maximum and minimum values of observed runoff series respectively.

Study Area and Dataset
In this research, the Upper Yangtze River (UYR) Basin, from the source of the Yangtze River to Yichang, Hubei Province, is considered as the study object. As shown in Figure 4, the basin area is about of 1 × 10 6 km 2 (90 • 55 -111 • 52 E, 24 • 49 -35 • 71 E), approximately 55% of the total area of the Yangtze River Basin [55]. The UYR includes nine provinces, autonomous regions, and municipalities, including Qinghai, Tibet, Sichuan, Yunnan, Chongqing, and Hubei. The topography of the basin is complex and diverse, including the plateau (Qinghai Tibet Plateau) and the basin (Sichuan Basin), with an altitude level ranging from about 40 to 7143 m above sea level. Influenced by the southeast monsoon, southwest monsoon and Qinghai Tibet Plateau, the climate in this area is very sensitive and the rainfall distribution is significantly uneven. The flood season usually occurs from May to September, which accounts for 78% of the total annual rainfall, while the dry season is affected by water shortage. Therefore, the impact of climate diversity has brought many difficulties to the development and utilization of water resources in the region [43].
The monthly runoff data analyzed in this paper are from the Pingshan and Yic hydrological station in UYR Basin, two key control stations in Upper Yangtze River. T position is shown in Figure 4. The monthly runoff data covers the period from Jan 1940 to December 2010 for Pingshan, and January 1884 to December 2020 for Yich respectively. 80% of the monthly runoff data were used for training, 20% were use testing. In the training dataset, approximately 90% of the data were utilized for m parameter optimization and the remainder were utilized for validation. In this wor of the data preprocessing and management was performed by Numpy and Pandas s tific computing package in Python.  The monthly runoff data analyzed in this paper are from the Pingshan and Yichang hydrological station in UYR Basin, two key control stations in Upper Yangtze River. Their position is shown in Figure 4. The monthly runoff data covers the period from January 1940 to December 2010 for Pingshan, and January 1884 to December 2020 for Yichang, respectively. 80% of the monthly runoff data were used for training, 20% were used for testing. In the training dataset, approximately 90% of the data were utilized for model parameter optimization and the remainder were utilized for validation. In this work, all of the data preprocessing and management was performed by Numpy and Pandas scientific computing package in Python.

Data Preparation
To reduce the difficulty of prediction, VMD was utilized to decompose the original runoff series. The setting of decomposition parameter K is crucial to the decomposition result; too large or too small will affect the prediction accuracy directly. In our work, through the iterative calculation of K from 2 to 20, when K = 9, there was mode aliasing, so we could determine the optimal number of decomposition levels K = 8 [56]. The frequencies of the generated subseries varied from low to high, which indicates the components of the runoff series are quite complicated. After decomposition, the complexity level of the forecast greatly decreased due to the more regular data pattern.
Generally, the selection of input variables of the prediction model is of significance in the performance of predictions, so it is necessary to optimize the selection of input variables. Previous research has established that sequence decomposition may reduce the correlation between subcomponents and meteorological factors [57]. Therefore, in this study, we only used historical data autocorrelation to predict runoff. According to the procedure of Section 2.4, PACF is used to determine the optimal lag period. After calculation, the PACF of each subsequence was different. On the basis of fully considering the information in PACF and domain knowledge in practical application, the input variables for each series of the Yichang station monthly runoff forecasting model are listed in Table 1, where X t-n denotes the nth antecedent variables of the sample to be predicted. The different number of input variables for each sub-series prediction model also explains the complexity of runoff sequence composition and the difficulty of forecasting.

Model Parameter
After data preparation of monthly runoff series for two hydrological stations, a proper forecasting model was selected, which can be used for runoff series. According to Section 2.4, we built a VMD-GRU-SFS (DIP) runoff forecasting model based on the DIP framework. The model parameters of VMD-GRU-SFS (DIP) for Pingshan and Yichang stations were optimized by the SFS algorithm, including the number of hidden layers and node number of hidden layers of each parallel GRU neural network, the hidden layers, and the number of node number of hidden layers of concatenation layers. The optimized network parameters are summarized in Table 2, where Adam, an extension to stochastic gradient descent, is selected as the optimizer [58].

Comparative Experiment Develop
To verify the rationality and prediction accuracy of the proposed runoff forecasting method, seven forecasting models were developed for comparison: four single forecasting models of AR, BP, LSTM, and GRU and three decomposition-based hybrid models of VMD-LSTM, VMD-GRU, and VMD-GRU-SFS. These models were implemented in Python programming. For the standard BP, LSTM, and GRU neural network models, the prediction problem can be directly transformed into a supervised learning problem, in which the observed runoff series of length N were divided into N−L data samples (where L is the length of the lag period), the input of the model is the L observation data before the forecast value, and the output is the forecast target. In general, the three-layer structure is proven to be able to fit arbitrary nonlinear functions [41]. Therefore, a three-layer structure was selected for these models. By contrast, for the hybrid forecasting method VMD-LSTM and VMD-GRU, the original runoff was first divided into several sub-series utilizing VMD. Each sub-components were then modeled utilizing a standard LSTM or GRU, respectively, and finally aggregated results of individual sub-series to obtain the final forecasting results. In addition, VMD-GRU-SFS combined with the SFS method to optimize the parameters of the model based on VMD-GRU, which is helpful in the selection of a more suitable model structural hyper-parameters. It should be noted that the above three hybrid models are based on the DPR prediction framework.

Result Analysis and Discussion
Based on the descriptions above, the forecast results of the proposed method and the contrast methods including AR, BP, LSTM, GRU, VMD-LSTM (DPR), VMD-GRU (DPR), and VMD-GRU-SFS (DPR) are discussed in this section. Figures 5 and 6 show the comparison chart of prediction process and residual process of eight different prediction models in testing period for the Pingshan and Yichang stations. Figures 7 and 8 show the scatter plot of the runoff series prediction results in testing period for the Pingshan and Yichang stations. In general, the prediction accuracy of the hybrid runoff forecasting model based on time series decomposition is higher than that of the single prediction model, and after SFS hyper-parameters optimization, the prediction accuracy of model has been effectively improved.  In the single prediction model, the AR and BP models have poor fitting ability and high prediction error. Compared with AR and BP, the LSTM model has recurrent neural network structure, which can better learn the long-term dependence of runoff series, the prediction accuracy of the model is improved. GRU model is a variant of LSTM, which simplifies the model structure without losing the prediction accuracy, and the prediction accuracy is basically consistent with LSTM. By contrast, the prediction performance of hybrid model is always better than single model, which can better fit the peak flow and have better prediction accuracy. However, the neural network model parameters are more difficult to select, which may lead to under-fitting and over-fitting. By using the SFS optimi- In the single prediction model, the AR and BP models have poor fitting ability and high prediction error. Compared with AR and BP, the LSTM model has recurrent neural network structure, which can better learn the long-term dependence of runoff series, the prediction accuracy of the model is improved. GRU model is a variant of LSTM, which simplifies the model structure without losing the prediction accuracy, and the prediction accuracy is basically consistent with LSTM. By contrast, the prediction performance of hybrid model is always better than single model, which can better fit the peak flow and have better prediction accuracy. However, the neural network model parameters are more difficult to select, which may lead to under-fitting and over-fitting. By using the SFS optimization algorithm to optimize the parameters of the GRU model, the prediction performance of the model can be obviously improved. Moreover, it is apparent from these figures that the prediction results of the proposed VMD-GRU-SFS (DIP) method based on DIP framework are always better than the other seven comparison models.     Observed(m 3 · s -1 ) It can be clearly seen from the scatter diagram of Figures 7 and 8 that the prediction performance of various models is obviously different. The scattered points of AR and BP models deviate far from the ideal line, especially the peak part of runoff. The prediction effects of LSTM and GRU models are similar, which are improved compared with above two models on the whole, but the scatter points are still relatively scattered. In general, it is obvious through the scatter plot that the prediction performance of the decomposition -based hybrid model is significantly better than that of the single model. However, at the peak of runoff, the scatter points of the VMD-LSTM (DPR) and VMD-GRU (DPR) model deviated from the ideal fitting line. By comparison, the VMD-GRU-SFS (DPR) model optimized with hyper-parameters has less deviation at the peak. The most striking finding is that for two hydrological stations, the forecasted and observed data scatter points of the proposed model are very close to the ideal baseline, whether it is the whole or the peak prediction part, which shows that the proposed method can effectively improve the generalization ability of the runoff forecasting model.

(e) VMD-LSTM (DPR) (f) VMD-GRU (DPR) (g) VMD-GRU-SFS (DPR) (h) Proposed
Although the predicted-observed runoff comparison hydrograph and scatter diagram can intuitively evaluate the corresponding relationship between the predicted value and the observed data, the statistical indicators can more accurately evaluate the prediction ability of a different model. To decrease the accidental error of neural network model, the statistical indexes were obtained after 10 trainings and the average value was calculated. Tables 3 and 4 show the evaluation results of eight models in Pingshan and Yichang stations during training period and testing period. Figures 9 and 10 show the prediction performance improvement ratio of the proposed model compared with other models for two stations. In summary, the application of time series decomposition algorithm and hyperparameter optimization method can enhance the prediction performance of the runoff forecasting model. Compared with GRU and VMD-GRU (DPR), the NSE of VMD-GRU-SFS (DPR) model in the testing set of both Pingshan and Yichang stations can be increased by more than 7% and 1%. It shows that time series decomposition can effectively reduce the difficulty of prediction by extracting the internal variation characteristics of runoff series, and also demonstrating the importance of hyper-parameter optimization to improve the accuracy of the forecasting model. Furthermore, one can clearly see from the table that the statistical indexes of the proposed runoff forecasting method are optimal in both the training period and the testing period. Taking Yichang station as an example, the RMSE, NRMSE, MAPE, and NSE in the test period are 3145.08, 8.28, 16.2, and 0.878, respectively. As shown in the Figure 10, the predicted statistical indexes during the test period of the proposed model have been improved to varying degrees compared with the other seven models. Compared with the single GRU model, the RMSE and NRMSE decreased by 23.3%, the MAPE decreased by 7.95% and the NSE increased by 10.6%. Compared with the VMD-GRU-SFS (DPR) hybrid model, RMSE and NRMSE decreased by 7.88%, MAPE decreased by 15.2%, and NSE increased by 2.57%. The analysis shows that the proposed runoff forecasting method can provide more satisfactory prediction results for runoff forecasting than other methods, and also demonstrates that the DIP prediction framework developed in this paper has better prediction performance than DPR framework for complex hydrological time series. and the observed data, the statistical indicators can more accurately evaluate the prediction ability of a different model. To decrease the accidental error of neural network model, the statistical indexes were obtained after 10 trainings and the average value was calculated. Tables 3 and 4 show the evaluation results of eight models in Pingshan and Yichang stations during training period and testing period. Figures 9 and 10 show the prediction performance improvement ratio of the proposed model compared with other models for two stations.     In summary, the application of time series decomposition algorithm and hyper-parameter optimization method can enhance the prediction performance of the runoff forecasting model. Compared with GRU and VMD-GRU (DPR), the NSE of VMD-GRU-SFS (DPR) model in the testing set of both Pingshan and Yichang stations can be increased by more than 7% and 1%. It shows that time series decomposition can effectively reduce the difficulty of prediction by extracting the internal variation characteristics of runoff series, and also demonstrating the importance of hyper-parameter optimization to improve the accuracy of the forecasting model. Furthermore, one can clearly see from the table that the statistical indexes of the proposed runoff forecasting method are optimal in both the training period and the testing period. Taking Yichang station as an example, the RMSE, In addition, one interesting finding is that the training time of our proposed model is significantly less than that of other VMD-LSTM, VMD-GRU, and VMD-GRU-SFS hybrid models based on the DPR framework. Figure 11 shows the training time of different runoff forecasting models for the two stations. In general, the training time of a single model is much less than that of the hybrid model because of its simpler model structure and only requires training one runoff series. It should be pointed out that for the method with hyper-parameter optimization, the time for hyper-parameter optimization is not included, because the process of hyper-parameter optimization is very time-consuming. of its more concise structure, this finding is consistent with that of Gao et al. [28]. VMD-GRU-SFS (DPR) combines SFS hyper-parameter optimization based on VMD-GRU (DPR), it is helpful to find the best and most concise model structure, and the training time of the model is also significantly shortened. It is worth noting that the training time of the proposed model is the least in the hybrid model. The training time in the two stations is 64.9 s and 98.3 s, respectively, which is significantly reduced compared to the other three hybrid models. Taking Yichang station as an example, the training time of the proposed model is reduced by 46%, 42%, and 37% respectively compared with VMD-LSTM (DPR), VMD-GRU (DPR) and VMD-GRU-SFS (DPR). Therefore, the proposed method can save substantial model training time and has better practicability, which again proves the advantages of the method in runoff forecasting. Compared with other models, the performance of the proposed model is improved to varying degrees, which shows a satisfactory monthly runoff forecasting ability. In addition to the accuracy of one-step prediction, multi-step monthly runoff forecasting is also of great significance for flood control and disaster reduction of reservoirs and compilation of long-term operation plans. Therefore, we can move on to discuss the multi-step-ahead forecasting performance of the proposed method next. The proposed method and VMD-GRU-SFS (DPR) model can be used as an example to compare the prediction results of the two models when the prediction steps are 1, 2 and Here we focus on the comparison of training time between hybrid models. As shown in the Figure 11, the training time of VMD-LSTM (DPR), VMD-GRU (DPR), VMD-GRU-SFS (DPR), and the proposed method in Pingshan station and Yichang station is shortened in turn. The VMD-LSTM (DPR) model in Pingshan station and Yichang station has 108 and 182.5 s training time respectively, and the total training time is the longest. Compared with VMD-LSTM (DPR), training time of the VMD-GRU (DPR) model is reduced because of its more concise structure, this finding is consistent with that of Gao et al. [28]. VMD-GRU-SFS (DPR) combines SFS hyper-parameter optimization based on VMD-GRU (DPR), it is helpful to find the best and most concise model structure, and the training time of the model is also significantly shortened. It is worth noting that the training time of the proposed model is the least in the hybrid model. The training time in the two stations is 64.9 s and 98.3 s, respectively, which is significantly reduced compared to the other three hybrid models. Taking Yichang station as an example, the training time of the proposed model is reduced by 46%, 42%, and 37% respectively compared with VMD-LSTM (DPR), VMD-GRU (DPR) and VMD-GRU-SFS (DPR). Therefore, the proposed method can save substantial model training time and has better practicability, which again proves the advantages of the method in runoff forecasting.
The studies above show that the performance of AR, BP, LSTM, GRU, VMD-LSTM (DPR), VMD-GRU (DPR), VMD-GRU-SFS (DPR), and the proposed model in single-stepahead monthly runoff forecasting is different. The NSE of the proposed VMD-GRU-SFS (DIP) method of the testing set can reach 0.87 both in Pingshan and Yichang station, and both RMSE and MAPE are small. Compared with other models, the performance of the proposed model is improved to varying degrees, which shows a satisfactory monthly runoff forecasting ability. In addition to the accuracy of one-step prediction, multi-step monthly runoff forecasting is also of great significance for flood control and disaster reduction of reservoirs and compilation of long-term operation plans. Therefore, we can move on to discuss the multi-step-ahead forecasting performance of the proposed method next. The proposed method and VMD-GRU-SFS (DPR) model can be used as an example to compare the prediction results of the two models when the prediction steps are 1, 2 and 3. The scatter diagrams of the predicted and observed values of the two models with different prediction steps ahead are shown in Figure 12. Table 5 shows forecast results for the above two models for under the prediction steps of 1, 2 and 3. 3. The scatter diagrams of the predicted and observed values of the two models with different prediction steps ahead are shown in Figure 12. Table 5 shows forecast results for the above two models for under the prediction steps of 1, 2 and 3.  813 It can be seen from the above figure that, on the whole, the prediction performance of the proposed method is better than the VMD-GRU-SFS (DPR) model for different prediction steps of 1, 2 and 3, and the scatter is relatively more concentrated. Especially in the peak part, the scatter of the proposed method is closer to the baseline when the flow exceeds 30,000 m 3 . With the increase of the prediction step size, the scattered points become more dispersed, and the prediction accuracy of the model decreases gradually. It seems that these results are due to the accumulation of prediction errors with the increase of prediction steps. It can be seen from Table 5 that with the increase of prediction steps, RMSE, NRMSE, and MAPE of VMD-GRU-SFS (DPR) model and the proposed model gradually increase, and the NSE value also decreases. When the three-step-ahead prediction is carried out, the NSE of VMD-GRU-SFS (DPR) model is 0.765, while NSE of the proposed model can still reach more than 0.8, indicating the better performance of the proposed model in multi-step prediction, it further shows the outstanding advantages of the proposed model in forecasting stability and extending the forecast period.
Via the above comparative analysis, the prediction results of the proposed VMD-GRU-SFS (DIP) method based on DIP framework are always better than the other seven comparison models in overall prediction performance, model training time, and multistep-ahead prediction, mainly reasons for the superiority of the hybrid method are analyzed below. On the one hand, the parallel-input neural network in the proposed method adaptively aggregates the output of each parallel GRU model to obtain the prediction result. The model structure has the function of error self-calibration, which can effectively avoid the error accumulation in the integration of subsequence prediction results, thus achieving better single-step and multi-step prediction performance compared with other  It can be seen from the above figure that, on the whole, the prediction performance of the proposed method is better than the VMD-GRU-SFS (DPR) model for different prediction steps of 1, 2 and 3, and the scatter is relatively more concentrated. Especially in the peak part, the scatter of the proposed method is closer to the baseline when the flow exceeds 30,000 m 3 . With the increase of the prediction step size, the scattered points become more dispersed, and the prediction accuracy of the model decreases gradually. It seems that these results are due to the accumulation of prediction errors with the increase of prediction steps. It can be seen from Table 5 that with the increase of prediction steps, RMSE, NRMSE, and MAPE of VMD-GRU-SFS (DPR) model and the proposed model gradually increase, and the NSE value also decreases. When the three-step-ahead prediction is carried out, the NSE of VMD-GRU-SFS (DPR) model is 0.765, while NSE of the proposed model can still reach more than 0.8, indicating the better performance of the proposed model in multi-step prediction, it further shows the outstanding advantages of the proposed model in forecasting stability and extending the forecast period.
Via the above comparative analysis, the prediction results of the proposed VMD-GRU-SFS (DIP) method based on DIP framework are always better than the other seven comparison models in overall prediction performance, model training time, and multi-stepahead prediction, mainly reasons for the superiority of the hybrid method are analyzed below. On the one hand, the parallel-input neural network in the proposed method adaptively aggregates the output of each parallel GRU model to obtain the prediction result. The model structure has the function of error self-calibration, which can effectively avoid the error accumulation in the integration of subsequence prediction results, thus achieving better single-step and multi-step prediction performance compared with other models. On the other hand, the parallel-input neural network obtains the decomposed runoff sub-series samples as the input and the original runoff sequence samples as the output. Although the structure of the model is more complex than that of single input, it only needs one model training, and does not need to train each sub-series separately. Therefore, it can save the training time of prediction model to a great extent.

Conclusions
In this paper, to improve the prediction performance of nonlinear, nonstationary runoff series, we proposed a novel runoff forecasting method based on the Decomposition-Integration-Prediction (DIP) framework. This method first involved obtaining the subseries samples decomposed from the original runoff series by VMD as the model input. Then, several parallel GRU of the parallel-input neural network were used to fit trend, cycle, and seasonal variation characteristics of runoff subcomponents and integrate their output adaptively through the concatenation layers. Finally, the output of concatenation layers was treated as the final runoff forecasting result. In this process, we used the SFS algorithm to optimize the hyper-parameters of the model. In addition, using RMSE, NRMSE, MAPE, and NSE as the performance evaluation indices, the prediction results of the proposed method were compared with those of seven prediction models: AR, BP, LSTM, GRU, VMD-LSTM (DPR), VMD-GRU (DPR), and VMD-GRU-SFS (DPR).
In general, the method proposed in this paper can significantly improve the prediction performance of monthly runoff. Compared with the other seven runoff forecasting models, the key features of the proposed method lie in the following aspects: (1) This method can more effectively identify the internal variation characteristics of runoff series, adaptively aggregate the prediction results of sub-series, and improve the overall prediction accuracy of the model. (2) Compared with other hybrid runoff forecasting models based on DPR framework, this method can significantly shorten the training time of the model and has better practical value. (3) Compared with the VMD-GRU-SFS hybrid model based on DPR, the proposed method has better multi-step prediction ability and can effectively extend the prediction period.
The prediction results show that among all methods mentioned, the proposed method in this paper is the best on prediction accuracy, model training time, and multi-step-ahead prediction. Overall, the proposed new framework is a useful tool for forecasting nonlinear and nonstationary runoff series and is thus a promising model for runoff forecasting. However, it is worth noting that the main limitation of this study is that only the data series of two stations are used for model verification. Future research needs to be extended to stations in other basins to verify the robustness of the model. In addition, the proposed model needs interpolation pretreatment for the missing values in the runoff series. In the future, we will use advanced time-frequency analysis methods such as LSWA for runoff prediction research.  Data Availability Statement: Some data used during the study are proprietary or confidential in nature and may only be provided with restrictions.

Conflicts of Interest:
The authors declare that they have no known competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.