1. Introduction
Hydrological forecasting, especially medium and long-term runoff forecasting, is an indispensable part of water resources management and water conservancy projects’ operation [
1,
2,
3]. Forecasting at different time scales can provide valuable information for flood control, power generation, water supply, and drought resistance [
4,
5,
6,
7]. Medium and long-term runoff forecasting, with a forecast period of more than three days and less than one year, refers to scientific predictions of the future runoff before the occurrence of rainfall according to early hydrometeorological elements. In order to improve the accuracy and reliability of a runoff forecasting model, scholars at home and abroad have carried out massive application studies, in terms of selecting forecasting models and screening forecasting factors.
As far as forecasting models are concerned, cause analysis methods, mathematical statistics methods, and artificial intelligence methods proposed for improving the runoff prediction accuracy have received tremendous attention over the past decades [
8,
9]. Cause analysis methods pay attention to the physical formation process of hydrological phenomena, which comprehensively consider the influence of atmospheric circulation, meteorological factors, and the underlying surface physical environment on runoff variation. It is demonstrated that key hydrometeorological events, such as sunspot, EI Nino, ocean currents oscillation, and plateau snow, are closely related to runoff [
10,
11]. Nevertheless, cause analysis methods are mostly used for exploring the relationship between the atmospheric circulation and the hydrological elements, which are highly dependent on meteorological data and difficult to popularize. Time series methods and regression analysis methods are representative mathematical statistics methods that have been extensively adopted in runoff forecasting [
12,
13]. The former methods focus on the single-factor forecasting, while the latter methods place more emphasis on the multi-factor forecasting. Auto-regressive (AR), auto-regressive moving average (ARMA), auto regressive integrated moving average (ARIMA), and Markov chain methods have been general and popular time series models employed in hydrological forecasting [
14,
15]. Taking regression analysis methods as an example, key forecasting factors are screened from multiple forecasting factors that have a greater effect on the forecasting object on the basis of investigating the statistical rule between the forecasting factors and the forecasting object. As a whole, mathematical statistics methods avoid a mass of computation by taking some simple principles, but these methods have the disadvantages of low reliability and poor accuracy. It is also worth pointing out that the integrity and reliability of historical statistical data are equally important in mathematical statistics methods. Artificial intelligence methods, such as fuzzy mathematics, grey system, artificial neural network (ANN), and wavelet analysis, have the most applications in the current medium and long-term runoff forecasting. Mahabir et al. (2003) researched whether the fuzzy expert system was an alternative methodology for predicting the potential snowmelt runoff, and found that it was more reliable than the regression models in spring runoff forecasts, especially in terms of identifying low or average runoff years [
16]. Trivedi et al. (2005) recommended that grey system theory may be a valuable tool for those watersheds possessing scanty hydrological data due to its uncertain mechanisms and insufficient information [
17]. Compared to other intelligence methods, ANN has a wide application range in the hydrological fields because of its good robustness, strong nonlinear mapping and self-learning ability [
18]. In spite of the good performance of these intelligence methods, there is still room to improve its prediction accuracy. With regard to ANN, there are certain differences in the results with each prediction for parameter uncertainty of neural network models. As a consequence, radial basis function (RBF), Elman neural network (ENN), adaptive neuro-fuzzy inference system (ANIFS), and long short-term memory (LSTM) are all alternative methods applied to predict runoff [
19,
20,
21,
22]. Moreover, in order to overcome the characteristics of complicated nonstationary runoff time series, empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD) proposed by Huang et al. (2003, 2008) have been new methods for nonstationary and nonlinear time series analysis [
23,
24]. In addition, hybrid models have been performed in many studies, because these models are capable of providing a high degree of accuracy and reliability compared to a single forecasting model [
25]. Zhao et al. (2015) introduced a novel hybrid model made up of EEMD and AR for predicting nonstationary time series, and EEMD-AR was suitable for predicting the annual runoff of four hydrologic stations in the upper reaches of the Fenhe River basin [
26]. A hybrid support vector machine–quantum behaved particle swarm optimization (SVM–QPSO) model was employed in predicting monthly streamflows, and it was able to deal with complex and highly nonlinear data patterns. The prediction results indicated that the proposed hybrid model was a far better technique compared to the original support vector machine (SVM) model [
27].
Apart from selecting appropriate forecasting models, identifying key predictors driving runoff variation is another step towards developing a reliable forecasting model [
28]. Rough set (RS), global sensitivity analysis (GSA), factor analysis (FA), principal component analysis (PCA), Gamma test (GT), and forward selection (FS) techniques are used to reduce the number of input variables for recognizing forecasting factors [
29]. With the development of information theory, mutual information (MI) as a measure representing information between two random variables provides optional means for screening forecasting factors [
30]. To tackle key problems of generating minimal inference rule set and selecting complex factors, Zhu et al. (2009) proposed a forecasting model integrating the rough set theory with the fuzzy inference technique to improve the medium and long-term forecast precision [
31]. Five principal impact factors were recognized by Li et al. (2012) by means of GSA and the back-propagation arithmetic, and these were pivotal factors that make a great difference to runoff during the flood season in the Nenjiang River Basin [
29]. Some input selection techniques (e.g., GT, FS) designed to reduce the number of input variables, were fed to an SVM model to predict the monthly streamflow, and the developed GT-SVM model was superior to the original SVM model [
32]. As a multivariate statistical technique used to identify important factors, PCA has been proposed to reduce the number of variables by providing a better interpretation of variables involving large volumes of information, as well as reducing the computational dimension [
33,
34]. Moreover, the information from independent and linear compound input variables is capable of presenting us with the minimum losses by employing this method [
35]. Thus, PCA is acknowledged to be pivotal towards reducing the complexity of input variables and has been widely adopted into simplifying forecasting factors.
Nevertheless, it is simply not stable to rely on a single forecasting model, such as multiple linear regression (MLR), back propagation neural network (BPNN), Elman neural network (ENN), and particle swarm optimization-support vector machine for regression (PSO-SVR), to predict annual runoff, for runoff time series tending to be nonlinear, nonstationary and, even, chaotic. In view of this, multi-model information fusion technology and residual error correction methods are introduced to acquire more accurate annual runoff, taking the advantages of different forecasting methods into account [
36,
37]. The main objective of this paper is to develop a modified coupling forecasting model to predict annual runoff time series. Firstly, key forecasting factors are screened from two conventional factors (i.e., rainfall and runoff) and 130 atmospheric circulation indexes, with the help of PCA. Then, annual runoff time series are predicted by using the MLR model, the BPNN model, the ENN model, and the PSO-SVR model, respectively. Subsequently, a coupling model is constructed to predict annual runoff according to the multi-model information fusion technique. Finally, the residual error correction method is employed to further modify annual runoff time series in the validation period.
5. Conclusions and Discussions
For the sake of improving the forecasting accuracy of annual runoff time series, the principal component analysis (PCA) method is adopted to screen forecasting factors from rainfall, runoff, and 130 monitoring indexes. A modified coupling runoff forecasting model is proposed based on multiple linear regression (MLR), back propagation neural network (BPNN), Elman neural network (ENN), and particle swarm optimization-regression support vector machine (PSO-SVR), by means of multi-model information fusion and residual error correction in this paper. The main conclusions of this study are as follows:
Firstly, seven principal components are screened as key forecasting factors from the Eastern Pacific Subtropical High Northern Boundary Position Index (in June last year), the Pacific Subtropical High Northern Boundary Position Index (in June last year), the Atlantic Meridional Mode SST Index (in July last year), and other monitoring indexes, as well as annual rainfall.
Then, compared to the MLR model, the BPNN model, the ENN model, and the PSO-SVR model provide better prediction performances involving predicting annual runoff. In terms of a single forecasting model, the PSO-SVR model has the best prediction performances in the validation period, while the ENN model has the best prediction performances in the calibration period.
Subsequently, from the point of hydrological years (i.e., wet year, normal year, dry year), a coupling model is proposed by means of multi-model parameter equations by taking the advantages of three forecasting models (i.e., BPNN, ENN, PSO-SVR) into account. MARE, RMSE, QR, and NS of the coupling model are 2.92%, 42.13, 100%, and 0.980, while that of the ENN model are 4.63%, 73.64, 97.44%, and 0.941 in the calibration period. MARE, RMSE, QR, and NS of the coupling model are 8.06%, 157.90, 84.62%, and 0.642, while that of the PSO-SVR model are 15.73%, 202.50, 84.62%, and 0.411 in the validation period.
Finally, the residual error correction technique is referenced to modify runoff sequences predicted by the coupling model in the validation period. Compared to the coupling model, the modified coupling model has the smaller MARE, the smaller RMSE, and the larger NS.
In conclusion, the modified coupling method proposed in this paper can be applied to the Ganjiang River Basin to improve prediction performances of annual runoff sequences. It would also play an important role in providing a significant improvement of runoff forecasting in other similar river basins, especially by means of multi-model information fusion techniques to combine the advantages of different forecasting models, which is an important innovation in this paper. Besides, the residual error correction method in the modified coupling model is another feature in this study that would further improve the performance of the predicted annual runoff in the validation period, based on the least-square sum of errors. Nevertheless, there is still some room for perfection in this study, for example the number of nodes in the input layer and the hidden layer is determined by the trial and error method, rather than the optimization method, which may reduce the calculation efficiency. Using optimization algorithms to determine the number of nodes and weights of neural networks may be the trend of future development. The least-square method is not always a good correction strategy when there are outliers in the predicted results, that is, the error between the predicted value and the observed value is large in the coupled model. Mean absolute error method and smoothed mean absolute error method might be alternative options for data fitting. Therefore, making full use of the advantages of different models to achieve the optimality of the coupled model remains a challenge in the forecasting runoff model, which will also be the focus of future research.