A Compound Approach for Monthly Runoff Forecasting Based on Multiscale Analysis and Deep Network with Sequential Structure

: Accurate runoff forecasting is of great significance for the optimization of water resource management and regulation. Given such a challenge, a novel compound approach combining time-varying filtering-based empirical mode decomposition (TVFEMD), sample entropy (SE)-based subseries recombination, and the newly developed deep sequential structure incorporating convolutional neural network (CNN) into a gated recurrent unit network (GRU) is proposed for monthly runoff forecasting. Firstly, the runoff series is disintegrated into a collection of subseries adopting TVFEMD, considering the volatility of runoff series caused by complex environmental and human factors. The subseries recombination strategy based on SE and recombination criterion is employed to reconstruct the subseries possessing the approximate complexity. Subsequently, the newly developed deep sequential structure based on CNN and GRU (CNNGRU) is applied to predict all the preprocessed subseries. Eventually, the predicted values obtained above are aggregated to deduce the ultimate prediction results. To testify to the efficiency and effectiveness of the proposed approach, eight relevant contrastive models were applied to the monthly runoff series collected from Baishan reservoir, where the experimental results demonstrated that the evaluation metrics obtained by the proposed model achieved an average index decrease of 44.35% compared with all the contrast models.


Introduction
The implementation of reliable and seasonable water resource management is of considerable significance to the hydrological system in various aspects, including water distribution, flood control, and disaster relief, while accurate forecasting and the corresponding scientific evaluation of monthly runoff play a vital role in responding to such challenges [1,2]. To effectively handle the runoff forecasting task, a large number of forecasting approaches have been developed in previous processes have been widely developed. For instance, the calculated entropy value of each decomposed series is plotted to observe the tendency of various entropy values, after which the recombination is achieved by the subjective consciousness of researchers [28,29]. Moreover, by comparing the entropy value of the raw data with the entropy values of all the decomposed series, the subseries possessing larger entropy values than that of the original data are recombined [30]. It can be found that the strategies mentioned above are flawed in terms of subjectivity and incomplete analysis. To this end, an adaptive recombination strategy based on the approximation criterion is proposed by Zhou et al. [31]. The criterion is deduced in light of the difference between the maximum and minimum entropy values. Nevertheless, the severe degree of the above recombination strategy depends on the denominator of the approximation criterion. Hence, to achieve better forecasting performance based on the approaches mentioned above, the approximation criterion is appropriately adjusted.
In summary, to construct an accurate monthly runoff forecasting approach balancing efficiency and effectiveness, a novel compound approach integrating TVFEMD, sample entropy (SE)-based subseries recombination, and the newly developed CNNGRU is proposed in this study. To begin with, TVFEMD is applied to preprocess the runoff data into a collection of intrinsic mode functions (IMFs). The subseries recombination is implemented to reduce the number of decomposed series. Subsequently, CNNGRU is adopted to predict each recombined subsequence, while all the predicted subseries are further cumulated to deduce the final prediction for the raw runoff series. Furthermore, eight relevant contrastive models and the proposed one are applied to the monthly runoff data collected from the Baishan reservoir to testify to the superiority of the proposed approach quantificationally.
The remaining parts of our study are summarized as follows: Section 2 denotes the basis of TVFEMD, SE, CNN, and GRU. Section 3 presents the subseries recombination based on SE, the newly developed deep sequential structure based on CNN and GRU, and the specific framework of the proposed compound approach. Section 4 exhibits the efficiency and effectiveness of all the experiments based on comprehensive evaluation methods. Section 5 illustrates the conclusions. Additionally, a list of abbreviations is appended in Appendix.

Time-Varying Filtering-Based Empirical Mode Decomposition (TVFEMD)
Empirical mode decomposition (EMD) is an adaptive decomposition method to filter the given signal x(t) into a set of intrinsic mode functions (IMFs) with various frequencies. However, the intermittency signal in the sifting process, resulting in modal-aliasing, will restrict the forecasting performance of predictors. To this end, a time-varying filter (TVF) is incorporated into the sifting process by Li et al. [27] for handling the modal-aliasing problem. Additionally, the realignment for local cut-off frequencies φ bis ' (t) is developed to handle the intermittence problem of EMD, where the detailed procedures of the realignment are expressed as follows [27]: Step 1： Locate the maximum timing {u i |i = 1, 2, … } of the given signal x(t).
Step 4： Access the ultimate local cut-off frequency based on the interpolation achieved among the peaks.
Furthermore, the major procedures of the shifting process based on TVF are summarized as (1) estimate the local cut-off frequency, (2) apply TVF on the signal to obtain the local mean, (3) terminate the process based on the improved measurement criterion, while the detailed mathematical representation can be found in [27].

Sample Entropy (SE)
SE is a modified version of approximate entropy (AE), which possesses better performance and consistent measurements for time series than AE. For the given bounded time series {x i |i = 1, 2, …, N}, the specific implementation of SE is exhibited below: Step 1： Reconstruct the given time series Step 2： Find out the maximum difference of the components between Xi and Xj, which is defined Step 3： Calculate the ratio B i m (r) corresponding to the total number of D m (X i , X j ) < r for the i-th vector, after which the mean value of B i m (r) is defined as B m (r).
Step 4： Given a new dimension as m + 1, deduce B m+1 (r) by repeating Step 1 to Step 3.
Step 5： For the given bounded time series, the se value can be expressed as follows: where the calculated se value can be defined as se (N, m, r), N is the length of the series, m represents the embedded dimension, and r is the similarity tolerance set in the scope of [0.1SD, 0.25SD] (SD indicates the standard deviation of the time series) [32].

Convolutional Neural Network (CNN)
CNN is a unique deep network consisting of multiple convolutional layers, pooling layers, as well as fully connected layers. The automagical feature extraction for the input matrix can be implemented by the filters. Moreover, the weights for the convolutional layer will be shared between the neurons, thus formulating the forward propagations of the convolutional layers as follows [33]: where W k implies the weights of the k-th feature map, and bk represents the bias corresponding to the k-th feature map [34]. It is worth mentioning that the rectified linear unit (ReLU) function is generally employed as the activation function for CNN, formulated as f(x) = max(0, x).

Gated Recurrent Unit Network (GRU)
GRU, possessing gate units and recursive structure, is developed based on a long short-term memory network (LSTM) and achieves computation reduction by altering the gate operations in LSTM. Specifically, there exist two gated units, namely, the update and reset gates, with which the superior capability of LSTM to capture the dependencies within various scales can be inherited by GRU. The structure of a single GRU cell is depicted in Figure 1. Additionally, for the two gate units within GRU, the irrelevant information can be discarded following the reset gate, while quantitative information from the previous state will be controlled by the update gate and further affects the current state. The specific forward propagation processes of GRU are as below: where rt and zt indicate the outputs of the reset and the update gates, respectively. Wr, Wz, and W h � represent the weight matrixes. σ( ) denotes the sigmoid function.

SE-Based Subseries Recombination for TVFEMD
According to the previous investigations applying time-frequency decomposition technologies [35,36], it can be found that such combined methods achieve significant promotion in forecasting accuracy compared with the corresponding individual ones. Nevertheless, the time computation of the traditional decomposition-based approaches will increase significantly with the number of decomposed subseries. Therefore, the entropy-based subseries recombination strategies for reducing the number of subseries to be predicted have been widely investigated by numerous researchers, where SE is one of the extensively studied entropies [31,32,37]. Additionally, the implementation of the subseries recombination is generally a subjective process based on the personal experiences of each researcher, where the calculated entropy value of each decomposed series will be plotted, thus recombining the subseries in the light of the approximate entropy values [28,29]. It can be seen that such recombination strategies, based on the observation of researchers, possess intense subjectivity and nonadaptivity to various datasets. To this end, a recombination criterion considering an averaging of the two times of difference between the maximum and minimum entropy values has been proposed by Zhou et al. [31], which can contribute to adaptively recombining the decomposed series. However, such a recombination criterion employed in the field of vibration tendency prediction is not suitable for runoff prediction because the inapposite recombination is implemented based on the loose criterion. Hence, the denominator of the approximation criteria is set as G/1.5 in this study to strictly bind the recombination, where the detailed representation of the recombination criterion is as follows: max( =1,..., ) min( =1,..., ) where G is the number of subseries decomposed by TVFEMD.

CNN Incorporated into GRU with Deep Sequential Structure (CNNGRU)
Considering the integration of feature extraction ability within CNN and the superior time series forecasting performance of GRU, a deep sequential structure incorporating CNN into GRU is developed, of which the component of CNN is composed of a convolution layer and a max-pooling layer. The flatten layer is set following the max-pooling layer to reduce the dimension of tensors, after which the outputs of the flatten layer are imported into the GRU layer. Subsequently, the final outputs can be obtained by concatenating a fully connected layer to the GRU layer. The specific framework of CNNGRU developed for monthly runoff forecasting is presented in Figure 2.

Specific Procedures of the Proposed Compound Approach
The principal procedures of the proposed compound approach, integrating TVFEMD, the SEbased subseries recombination, and CNNGRU for monthly runoff forecasting, are summarized as follows: Step 1. Normalize the collected runoff dataset and divide it into training and testing sets.
Step 2. Decompose the normalized runoff data into a series of IMFs with TVFEMD, applying appropriate parameters.
Step 3. Calculate the SE value for each IMF, and adaptively recombine the IMFs based on the recombination criterion.
Step 5. Accumulate all the prediction results of the recombined subseries and implement denormalization to deduce the ultimate prediction results of the collected runoff series.
The entire flow chart of the developed hybrid runoff forecasting approach is shown in

Study Area and Data
Baishan Hydropower Station in eastern Jilin Province of China has an installed capacity of approximately 1.5 GW and a storage capacity of 6.51 billion cubic meters, which plays a vital role in peak shaving and frequency regulation of the power system, as well as flood control during the flood period. The location of Baishan Hydropower Station is depicted in Figure 4. Moreover, the Baishan reservoir, located in the upstream of the Second Songhua River, has possessed incomplete regulation performance for many years, where the annual average runoff amount is 228 m 3 /s. Hence, it is necessary to construct an accurate monthly runoff forecasting approach to achieve better dispatching effectiveness for the reservoir. Additionally, the annual dispatching cycle of Baishan reservoir is generally from April of the current year to March of the next year. For this purpose, the monthly runoff series collected from April 1933 to March 2001 is employed to evaluate the performance of the proposed method in this study, of which the collected dataset contains 816 samples and is illustrated in Figure 5. Meanwhile, the statistical information of the runoff data, such as mean value, maximum value, minimum value, and standard deviation value, is presented in Table 1. It can be seen from Figure 5 and Table 1

Experimental Description
To adequately assess the forecasting performance of the proposed compound approach, seven relevant models, including SVR, back propagation neural network (BPNN), CNN, GRU, CNNGRU, EMD-CNNGRU, CEEMDAN-CNNGRU, and TVFEMD-CNNGRU, were applied as comparison experiments, among which EMD and CEEMDAN were successively combined with CNNGRU to verify the superiority in terms of decomposition performance for the employed TVFEMD. Additionally, TVFEMD-CNNGRU was constructed based on TVFEMD and CNNGRU to test the effectiveness of the SE-based subseries recombination employed in the proposed model. Furthermore, it is worth noting that the data preprocessing approaches, including EMD, CEEMDAN, TVFEMD, and SE-based subseries recombination, were implemented with MATLAB (Mathematical computing software, Natick, Massachusetts, USA). In addition, the forecasting modules, including SVR, BPNN, CNN, GRU, and CNNGRU, were developed with Python, where the optimizer Adam was employed to optimize the basic parameters of the neural networks. Furthermore, for the approaches mentioned above, the corresponding inherent parameter settings are illustrated in Table 2, among which the parameters of SVR were determined by grid search, and the hyperparameters within all the deep networks were obtained by the trial and error approach [38]. Subsequently, the collected runoff series was decomposed into several sets of subsequences by adopting EMD, CEEMDAN, TVFEMD, and TVFEMD combined with subseries recombination with the parameters expressed in Table 2, where the processed subseries are depicted in Figure 6. Based on the comparisons among Figure 6a-c, it can be found that modal-aliasing can be observed from the subseries decomposed by EMD and CEEMDAN, while such a phenomenon can be effectively handled by TVFEMD. Furthermore, it can be seen from Figure 6c,d that the number of subseries is significantly decreased by incorporating the SE-based subseries recombination into TVFEMD, which contributes to further reducing the time computation of the proposed model.   Furthermore, four commonly employed indicators, namely, root-mean-square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), correlation coefficient (R 2 ), and the Nash-Sutcliffe efficiency coefficient (CE), were adopted to quantitatively evaluate all the experimental models, which can contribute scientifically to interpreting the improvements obtained by the proposed approach [39]. They are illustrated in Table 3. Table 3. Evaluation metrics of the root-mean-square error (RMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), and the correlation coefficient (R 2 ).

RMSE
Root-mean-square error (m 3 /s) Mean absolute error (m 3 /s) where Y and Y � indicate the actual value and predicted value, Y � is the mean value of the actual values, and N is the number of predicted series. Additionally, the decline rates of RMSE, MAE, and MAPE, as well as the Diebold-Mariano (DM) test [40], were employed to reveal the differences between the experimental models, where the definition of the decline ratios are demonstrated in Table 4. Moreover, for the given confidence level α, the critical difference in terms of forecasting performance obtained between the proposed approach and the contrastive ones can be claimed as less, applying the null hypothesis H0. In contrast, H1 possesses the opposite meaning of H0. Hence, for the predicted errors obtained by all the models, the definition of the DM test is exhibited as follows:

Contrastive Analyses
In this section, the quantitative evaluations for all the forecasting approaches will be analyzed and discussed, among which evaluation indicators RMSE, MAE, MAPE, R 2 and CE, obtained by all the experimental models, are exhibited in Table 5, where the metrics obtained by the proposed model are marked in boldface. Moreover, the corresponding decline ratios of RMSE, MAE, and MAPE, calculated between the proposed model and the contrast ones, are demonstrated in Table 6. The results of the DM test for all the contrastive models are also illustrated in Table 6. Following the sufficient evaluation metrics presented in Tables 5 and 6, several hypotheses and conclusions can be drawn as follows: Furthermore, it can be observed from the results of the DM test illustrated in Table 6 that all the values are larger than 2.5800, which practically corresponds to the critical value of significance level 1%, except for TVFEMD-CNNGRU, with which it can be concluded that the proposed model achieves a significant promotion in forecasting accuracy, as well as a reduction of the computational cost, without significantly reducing prediction accuracy when compared with TVFEMD-CNNGRU. On the other hand, the fitting curves obtained by single models and the combined ones are exhibited in Figures 7 and 8, respectively, where the actual values are represented in the blue histogram for better distinction from the predicted and the actual values. It can be observed from Figure 7 that the fitting curve of CNNGRU is closer to the actual values, and achieves better performance at the peaks, while satisfactory forecasting results cannot be obtained by the single models. In contrast, as illustrated in Figure 8, it can be found that the combined models, applying decomposition approaches, possess fitting curves that are closer to the actual values.   1  4  7  10  13  16  19  22  25  28  31  34  37  40  43  46  49  52  55  58  61  64  67  70  73  76  79  82  85  88  91  94  97  100  103  106  109  112  115  118  121  124  127  130  133  136  139  142  145  148  151  154  157  160  Additionally, the bar diagrams for RMSE, MAE, MAPE, and R 2 are expressed in Figures 9-11 in order to visually observe the fluctuation trends of the metrics obtained by the various models, from which it can be seen that the conclusion claimed above can be drawn out intuitively. The combined model, based on decomposition approaches, can obtain much better forecasting results, while the TVFEMD-based models are superior to all the models. Moreover, the proposed model, applying TVFEMD, subseries recombination, and CNNGRU, achieves a significant reduction in computational resources without a significant decrease in forecasting accuracy. In addition, the Taylor diagram, which demonstrates the performance of the forecasting models in terms of the correlation coefficient, centered root mean square difference (RMSD), and standard deviation visually [26], is depicted in Figure 12. As shown in Figure 12, the proposed model, represented by the brown dot, is closer to the observation in the light of the above three indicators. Furthermore, based on the visualization of all the metric decline ratios presented in Figure 13, the metrics of the proposed model are generally (more than 40%) decreased compared with all the other models except TVFEMD-CNNGRU, thus intuitively testifying to Conclusion (3) discussed above.  1  4  7  10  13  16  19  22  25  28  31  34  37  40  43  46  49  52  55  58  61  64  67  70  73  76  79  82  85  88  91  94  97  100  103  106  109  112  115  118  121  124  127  130  133  136  139  142  145  148  151  154  157  160

Conclusions
To implement runoff forecasting with an improved balance between accuracy and efficiency, a novel composite approach coupling TVFEMD, subseries recombination based on SE, and the newly constructed deep network CNNGRU is proposed in this study. Among the methods, TVFEMD was applied to decompose the collected runoff series into a set of subsequences, thus weakening the volatility of the runoff series. The initially decomposed subsequences were recombined according to the SE values of all the subseries and the recombination criterion. Subsequently, CNNGRU was adopted to obtain the prediction values of all the preprocessed subseries, after which the final forecasting results were deduced by the accumulation of the prediction results obtained above. Additionally, eight relevant contrastive models were applied to the runoff series collected from the Baishan reservoir, where the experimental results demonstrated that (1) TVFEMD possesses superior decomposition performance compared with EMD and CEEMDAN. Thus, the TVFEMD-based combined models achieved satisfactory forecasting results. (2) The SE-based subseries recombination is conducive to reducing the time computation of the whole model without significantly decreasing prediction accuracy. (3) The newly developed deep network, namely, CNNGRU, can capture the intrinsic characteristics within runoff series commendably, thus obtaining satisfactory estimation results. In summary, the proposed compound runoff forecasting approach, which balances prediction accuracy and computational efficiency, can be applied as a potent tool to implement better water resource management.
Author Contributions: All the authors contributed to this paper. Conceptualization, S.D.; methodology, software, validation, formal analysis, investigation, writing, C.S.; resources, Z.C.; funding acquisition, J.G. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.