A Comparison of BPNN, GMDH, and ARIMA for Monthly Rainfall Forecasting Based on Wavelet Packet Decomposition

Accurate rainfall forecasting in watersheds is of indispensable importance for predicting streamflow and flash floods. This paper investigates the accuracy of several forecasting technologies based on Wavelet Packet Decomposition (WPD) in monthly rainfall forecasting. First, WPD decomposes the observed monthly rainfall data into several subcomponents. Then, three data-based models, namely Back-propagation Neural Network (BPNN) model, group method of data handing (GMDH) model, and autoregressive integrated moving average (ARIMA) model, are utilized to complete the prediction of the decomposed monthly rainfall series, respectively. Finally, the ensemble prediction result of the model is formulated by summing the outputs of all submodules. Meanwhile, these six models are employed for benchmark comparison to study the prediction performance of these conjunction methods, which are BPNN, WPD-BPNN, GMDH, WPD-GMDH, ARIMA, and WPD-ARIMA models. The paper takes monthly data from Luoning and Zuoyu stations in Luoyang city of China as the case study. The performance of these conjunction methods is tested by four quantitative indexes. Results show that WPD can efficiently improve the forecasting accuracy and the proposed WPD-BPNN model can achieve better prediction results. It is concluded that the hybrid forecast model is a very efficient tool to improve the accuracy of midand long-term rainfall forecasting.


Introduction
Hydrological time series forecasting is essential for a variety of real-world managements or operation of water resources systems [1,2]. Precipitation is affected by many factors such as atmospheric circulation, topography, climate change, and human activities. The improvement of precipitation prediction has received a lot of attention across the world and many models have been constructed to improve the hydrological process simulation and prediction accuracy [3][4][5].
These models can fall into two categories: knowledge-based models and data-based models [6]. Knowledge-based model is a numerical simulation technology that describes natural phenomena on the basis of an internal physical mechanism of the system [7]. However, the lack of multisource information and optimization complexity of computation parameters limit the generalization of physical-based models [8]. In contrast, databased models can obtain satisfying results by using historical data without involving the Most common decomposition approaches perform well only when the input variables meet certain conditions. For example, EMD may suffer from mode mixing due to intermittent signal [45], and this effect is important to hydrological applications. The stationarity of time series has a great influence on the accuracy of position in the domain identified by FT method [46]. Nevertheless, hydrological time series are non-stationary, which means that statistical properties will fluctuate over time [47]. In recent years, researchers have paid great attention in WPD. The main idea of WPD method is using multiple filters to decompose the original signal into more linear sub-signals with different frequency characteristics, which can be regarded as an improved version of the wavelet decomposition (WD). In discrete wavelet transform (DWT), when performing next layer decomposition, only approximate coefficients obtained from the upper layer can pass through the filter [48]. However, when the WPD method performs the next level of decomposition, both the low-frequency sequence and high-frequency sequence can pass the filter [49], and the total number of coefficients is still the same without redundancy. Therefore, WPD can extract the features of the original signal more comprehensively, which not only provides a wide range of possibilities for signal analysis but also allows the best matching analysis of the signal. Meanwhile, compared with DWT, the decomposition structure of WPD provides more opportunities to improve computational efficiency [50]. Therefore, WPD is preferred in this paper in consideration of the complex nonlinearity and non-stationary characteristics of hydrologic time series.
The purpose of this paper is to investigate the accuracy of ARIMA, GMDH, and BPNN models based on WPD in monthly rainfall forecasting. Most former research often improve the accuracy of prediction models by optimizing model parameters using optimization algorithms, and the improvement effect of this method is often not obvious. In this paper, the data preprocessing method is adopted to improve the accuracy of forecasting models, which can attain more linear sub-series and significantly reduce the difficulty of prediction. Firstly, we use WPD to decompose original monthly rainfall series into a series of sub-series with different frequencies and spatiotemporal resolutions. Then, the subseries decomposed by WPD are used as input data of ARIMA, GMDH, and BPNN to train for prediction. Finally, the prediction results of each hybrid model are obtained by linearly accumulating the outputs of each submodule.
The paper is arranged as follows: Section 2 introduces the basic theory principles of methods and evaluations indices. The forecasting experiments and discussion are presented in Section 3. Finally, Section 4 concludes the paper.

Study Region
Two hydrological stations located in Yiluo River Basin on the south bank of the middle stream of Yellow River are considered as the case study. Yiluo River is an important first-level tributary of Yellow River and one of the main sources of floods in the lower reaches of Yellow River, with a drainage area of 18,881 km 2 . The mainstream Luo River is 446.9 km long, and the tributary Yi River is 264.8 km long. Luoning and Zuoyu Stations are located in the middle stream of Luo River and the middle and upper stream of Yi River, respectively. The average annual rainfall of the two stations are 635.2 mm and 834.3 mm, respectively. The inter-seasonal fluctuations of rainfall in two stations are very strong. For Luoning station, the average annual rainfall in December and January are 7.9 mm and 7.5 mm, respectively, and the average annual rainfall in July and August are 115.6 mm and 96.6 mm, respectively, indicating high difficulty of modeling. The location of the study area is shown in Figure 1.

Data Sets and Pre-Processing
Monthly rainfall data from two stations are used to investigate the accuracy of several prediction methods. Table 1 shows statistical parameters of monthly rainfall data at Luoning and Zuoyu stations. It can be observed that the original data shows obvious standard deviation, indicating a high difficulty of modeling. Figures 2 and 3 present rainfall data for the two stations, where the data run from 1980 to 2016. In this study, data from 1980 to 2013 are used for training and the final three years are utilized for testing.   WPD is used to decompose two observed monthly rainfall series into a series of subseries. The data of all series are divided into training and testing datasets that are normalized to a range of [0, 1] as where ′ and are the normalized and the observed value of the i-th data sample, respectively.

ARIMA Model
The ARIMA model proposed by Box and Jenkins [15] has been extensively utilized for analyzing and forecasting hydrologic time series [51]. The principle of ARIMA is to use historical times series to find the forecasting noise, so that the data can be processed smoothly, thus solving the random disturbance problem of the series [52]. ARIMA model construction includes six steps: data acquisition, data preprocessing, model identification, model order determination, parameter estimation, and model verification. Two monthly series collected from China are taken as the test cases. Data preprocessing is the test of stationarity of time series. Recently, ACF (autocorrelation function) and PACF (partial autocorrelation function) are generally adopted to test the stationarity of data. In this paper, the Box-Jenkins method is used for model identification, and Bayes information criteria (BIC) method is used for model order determination. In ARIMA (p, d, q), p represents the number of autoregressive terms, q is the number of moving average terms, and d is the order of differential.
AR model of order p, which is written as () AR p , can be expressed as follows: The () MA q model is: Thus, the expression of ARMA (p, q) is defined below as: The ARIMA model is obtained by the d-order difference of the ARMA model. Therefore, the ARIMA (p, d, q) model is:  [53], is a typical multilayer ANN on the basis of error back propagation. BPNN uses the slope reduction method to find the point(s) with minimum error [54]. These three layers, that is, input layer, hidden layer, and output layer, are employed in BPNN (as shown in Figure 4). The signal is input into the network by the input layer and output by the output layer. BPNN adds several layers (one or more layers) of neurons between the input layer and the output layer. These neurons are called hidden layer neurons. They have no direct contact with the outside world, but the change of their state can affect the relationship between input and output. A conventional three-layer BPNN is used to establish the prediction model of monthly precipitation series in this paper. Tansigmoid is the transfer function between the output layer and hidden layer, and the nonlinear Levenberg-Marquardt (LM) algorithm is the training function of BPNN.    The mathematical principle of BPNN model is as follows: where xj is input neuron and j ∈ (0, m), m is the number of input neurons, wij is weight of the ith neuron in the input layer corresponding to the th j neuron in the hidden layer, j  is bias-related weight of hidden neurons, yi is input of the hidden layer node ( , and n is the number of neurons in the hidden layer. Tan-sigmoid is the transfer function between the layer output and the hidden layer, and its form is as follows: The output layer is estimated by the following equation: Among them, gk and O represent input and output values of the output layer, respectively.
The formulas above are the principles of the feedforward propagation mode of the BPNN model. In the process of cyclic simulation, errors generated by the system are collected and returned to the output value. By adjusting the weights and thresholds of neurons, network parameters corresponding to the minimum error are determined to generate an ANN system which can simulate the original problem.

GMDH Model
GMDH was developed by Ivakhnenko [55] as a self-organizing approach, which can be applied for multivariate analysis and modeling of complex systems. GMDH has been used to deal with problems of high-order polynomial regression, especially modeling and classification of systems [56]. An important feature of the GMDH method is that external information (i.e., information and data not used in model construction and parameter estimation) is used in modeling, the data of training period is used for modeling, and the information of testing period is only used to select the optimal complexity model. Input and output variables of GMDH are connected by a complex Volterra function in the following form [57]: where x denotes the input variable of system, i s is the weight, ŷ is output variable, and n is the number of input variables. Many applications in the quadratic form with only two variables are termed partial descriptions, and use the following form to predict output:  (12) where N is the sample size of the training set. The GMDH model adopts the principle of the classic neural network whose signal propagates forward through network nodes. After the weight has been computed, optimal transfer function of the node is obtained and then its output is passed to the next layer of nodes. As shown in Figure 5, the structure of GMDH network is constantly changing during the training process. GMDH will select the input variables that affect the prediction, which means that the connection between neurons in the network is not fixed, but is selected during training to optimize the network structure; The number of layers in the network is also automatically selected to produce maximum accuracy and avoid over fitting. Solid neurons in each layer are selected neurons, and hollow ones represent unselected neurons.

Wavelet Packet Decomposition (WPD)
Mallat [58] proposed a Wavelet Representation (WR) theory to compute and interpret multiresolution representation by decomposing the original signal utilizing orthogonal wavelets. WPD reduces the noise of signal by decomposing the signal into different frequencies, which can be regarded as a special WR. In the procedure of orthogonal wavelet decomposition, the signal is decomposed into approximate coefficients and detail coefficients after passing through multiple filters. When performing the next layer decomposition, the upper low-frequency series and high-frequency series are split into two components, and so on. Wavelet function ψ(t) can be defined as: The form of SWT (successive wavelet transform) of x(t) is: where () t  represents the mother wavelet and *  denotes its complex conjugate, are the scale expansion parameter and time translation parameter, respectively. In engineering applications, input signal is usually discrete. Let  , j and n denote the frequency localization and time localization. The DWT form of () xt is shown in the following equation: Normally, we can set 0 2 a = and 0 1 b = , and this case is the most efficient for practical applications [58]. Therefore Equation (14) becomes the binary orthogonal wavelet transform: Unlike DWT, WPD passes more filters, which decompose the signal using both highfrequency components and low-frequency components: is the scaling function or the approximation coefficients, and j () t  is wavelet function (also termed detail coefficient). The two functions correspond to two finite pulse filters, namely, low-pass filter (LPF) () hn and high-pass filter (HPF) () gn . Hence, the equation of orthogonal wavelet packet is: where () hn and () gn are subject to the following condition: The wavelet packet function is written by: The wavelet packet coefficients can be computed by: Figure 6 illustrates the binary tree of a three-layer WPD. The original signal is shown by x and each node corresponds to a frequency band. LPF and HPF represent low-pass filter and high-pass filter, respectively. The original signal is decomposed into eight subsequences by a three-level WPD. AAA3 and DDD3 represent the lowest frequency and highest frequency, respectively.

Evaluation Indices
To evaluate the forecast capacity of different models, four generally adopted standard statistical metrics are used in this study to estimate the global and local errors of models. They are namely RMSE (root mean-squared error) [59], MAE (mean absolute error) [60], R (coefficient of correlation), and NSEC (the Nash-Sutcliffe efficiency coefficient) [61]. RMSE is sensitive even to small errors, which can size the model performance for high rainfall values. However, MAE is suitable for measuring the goodness of fit of model in cases of moderate precipitation. R sizes the degree of collinearity criterion of two variables. NSEC is a widely used index to evaluate the performance measurement of hydrological models. The following formulas are used for computing these parameters: Qi are the observed and predicted rainfall, respectively, x Q and y Q represent their average values, and n is the total number of input samples.

Hybrid Forecasting Models
This study investigates the accuracy of ARIMA, BPNN, and GMDH models based on WPD in monthly rainfall forecasting. The framework of hybrid models is showed in Figure 7. It can be summarized from Figure 4 that the main steps of the hybrid model prediction architecture are: Step 1: Observed monthly rainfall series are decomposed into eight subsequences with different frequencies and spatiotemporal resolutions, four low-frequency series, and four high-frequency series using WPD.
Step 2: In this study, ACF and PACF are employed to select the number of input variables for the model, and then set values of basic model parameters.
Step 3: ARIMA, BPNN, and GMDH models are used as forecasting tools to model and predict each decomposed sub-sequence separately.
Step 4: Finally, the ensemble monthly rainfall forecasting result of model is formulated by summing the outputs of all submodules.
To sum up, the hybrid WPD-ARIMA, WPD-BPNN, and WPD-GMDH forecasting models use the idea of "decomposition and ensemble". The paper takes 35-year monthly rainfall data from Luoning and Zuoyu stations in Luoyang, China as the test cases.

Decomposition Results Using WPD and Input Variables Determination
The original monthly rainfall time series are decomposed into eight subsequences with different frequencies and amplitudes using the WPD method. The frequency characteristics of each subsequence are different, and each sub-series plays a different role in the original dataset. The results of WPD of the original monthly rainfall time series data at level 3 are shown in Figures 8 and 9.
Generally, it is very important to set an appropriate number of input variables for databased prediction models because it is closely related to the characteristics of system to be modeled [62]. In this paper, ACF and PACF are selected as the potential indicators for determining the appropriate input variable. ACF and PACF are normally utilized to pre-determine the sequence of the autoregressive process and modeling of time series [63]. Figures 10 and 11 show ACF and PACF values of the original precipitation series for Luoning and Zuoyu stations, whilst the values of ACF and PACF for all decomposed subseries are not presented here. Referring to ACF and PACF values of the series and influencing factors of precipitation, Table 2 lists input variables of the original series and their subsequences at Luoning and Zuoyu stations. Among them, tp q − represents the th variable before the target output variable.

Model Development
Six models, namely BPNN, WPD-BPNN, GMDH, WPD-GMDH, ARIMA, and WPD-ARIMA models, are employed for benchmark comparison to study the prediction performance of these conjunction methods.
(1) ARIMA Generally, the ARIMA model based on the difference process is applied to the modeling of non-stationary series. In this paper, the stationarity of the original monthly rainfall series and subsequences are tested by the Augmented Dickey-Fuller (ADF) test. The results of ADF unit root tests are shown in Table 3. The h value of the original and all subsequences of the two stations are zero. The p-value of the original sequence and all sub sequences of the two stations is zero, except that the p-value of the original sequence of Zuoyu station is 0.0004. When 1 h = , p-value < 0.05, and the value of t-statistic is less than the preset upper limit, the null hypothesis is rejected, and the sequence can be considered as stationary; otherwise, the series needs to be differential. It can be seen from Table 3 that the sample set data is stationary series without a single root effect. The next step is to choose the optimal ARIMA (p, d, q) model, and the best fitted values of p and q are selected according to the BIC method. ACF and PACF are used to predetermine the structure of data sets. Furthermore, referring to the BIC minimum criterion, the best fitting model is determined for the original sequence and the decomposed subsequence of the two stations. The values of p and q are determined based on ACF and PACF, and the significance test has to be passed, that is, when p-value is less than 0.05, select the parameter with minimum BIC statistics. ARIMA models for various sequences are shown in Table 4. The decomposed sub-sequences of Luoning and Zuoyu stations are modeled by the ARIMA model. The original time series are modeled by seasonal ARIMA model (SARIMA), where p, d, and q represent the autoregressive term, the order of difference, and the moving average term of SARIMA model, respectively.  (2) BPNN A conventional three-layer BPNN is used to establish the prediction model of monthly precipitation series in this paper. Tan   (3) GMDH The number of input layer nodes is the same as the number of input variables, and then the regression of output value of upper layer is computed to create the second layer network. GMDH uses the best new variables in each layer to build the next layer network. The GMDH model includes three parameters, namely a denoting the maximum number of layers, b denoting the maximum number of nodes in each layer, and p denoting the selection pressure. In this paper, a and b are determined as 3 and 15, respectively, whilst p is set equal to 0.75 via a trial-and-error method, and the convergence criteria is RMSE. This paper determines an appropriate maximum number of hidden layers and nodes of GMDH model by a trial-and-error method. We set a equal to 2, 3, and 5, and b equal to 5, 10, and 15. The results (not supplied) show that the numbers of a and b have a significant effect on the performance of the GMDH model.
(4) WPD WPD is adopted for data preprocessing, which can eliminate noises in hydrological time series. The selection of an appropriate mother wavelet is very significant to WPD. The Symlet wavelet function is an improved version of the classical Daubechies wavelet function, which evades the change of waveform in the process of signal decomposition [64]. Therefore, the fourth order Symlet wavelet function is considered as the mother wavelet function. In this paper, three-scale wavelet WPD is selected because large-scale wavelet packet decomposition may lead to information loss.

Results and Discussion
Based on the above description, different methods are utilized to model the observed rainfall and extracted sub-sequences. Tables 5 and 6 list the statistical indexes of different algorithms for Luoning and Zuoyu stations during training and testing periods.
For Luoning station, the WPD-BPNN model attains the best RMSE, MAE, R, and NSEC values during the training period, which are 3.292, 2.384, 0.998, and 0.956, respectively. In the testing phase, the WPD-BPNN model also attains the best R, RMSE, MAE, and NSEC statistics of 0.997, 4.054, 2.912, and 0.994, respectively. Meanwhile, for Zuoyu station, the WPD-BPNN model attains the best RMSE, MAE, R, and NSEC values during the training period, which are 5.935, 4.102, 0.997, and 0.994, respectively. In analyzing the results during the testing phase, the WPD-BPNN model attains the best R, RMSE, MAE, and NSEC statistics of 0.998, 3.705, 2.889, and 0.996, respectively. Referring to the four evaluation indicators in this paper, WPD-BPNN can attain the best performance in monthly precipitation prediction.
Tables 7 and 8 list the comparison of results on model prediction performance by different indicators. When forecasting monthly rainfall at Luoning station, WPD-BPNN is able to attain the best improving capability of RMSE and MAE in the training phase, while WPD-GMDH is able to attain the best improving capability of R and NSEC in the training phase. In analyzing the figures during the testing phase, WPD-BPNN attains the best improving capability of RMSE and MAE, while WPD-ARIMA attains the best improving capability of R and NSEC. In addition, it can be seen from Table 8 that the prediction performance of the models is similar for Zuoyu and Luoning stations. Therefore, the monthly rainfall series decomposed by WPD method as the input of BPNN model can drastically improve the forecasting accuracy. This reaffirms the superior performance of WPD. Furthermore, the enhancement capabilities of different evaluation methods are different in terms of different phases and different forecasting measures.
For the two research objects in this paper, the performance of all models during training and test periods are shown in Figures 14-17. The performances of hybrid models for monthly rainfall simulation are able to attain better performance than those of conventional ARIMA, BPNN, and GMDH methods. WPD-BPNN presents the best performance, and its trend line is almost perfectly close to the smooth line of the observed data. In contrast, there are huge deviations between the prediction results obtained by ARIMA, BPNN, and GMDH methods and observed data. In addition, the prediction values of the extreme points of the three single models are far less than the observed value, and the peak prediction also has an obvious lag effect. However, compared with ARIMA, GMDH, and BPNN, the three WPD-based models have greatly improved the peak value accuracy and time positioning. Meanwhile, the models prior to improvement cannot capture abrupt changes of precipitation in rainy season. Therefore, compared with several existing methods in this paper, WPD-BPNN is the most efficient tool for monthly rainfall forecasting, since it can achieve excellent prediction results.

Conclusions
In recent years, the improvement of hydrological forecasting accuracy has attracted widespread attention around the world. In order to broaden the scope of hydrological forecasting theory, this study explores the performance of several data-driven methods based on WPD in monthly precipitation forecasting. Firstly, the observed monthly rainfall time series are decomposed into eight subsequences with different frequencies and spatiotemporal resolutions by WPD. Then, three data-based models, namely BPNN, GMDH, and ARIMA models, are utilized to complete the prediction for the decomposed monthly rainfall series, respectively. Finally, the ensembled prediction result of the model is formulated by summing the outputs of all submodules. Monthly rainfall data from two stations in China are utilized to test the performance of these methods. To evaluate the forecast capacity of different models, four standard statistical metrics are adopted to estimate the global and local errors of the models.
The results reveal that the WPD model is suitable for the decomposition of monthly rainfall series, and WPD-BPNN can provide the best performance during both training and testing periods in terms of the four evaluation indicators in this paper. The following briefly introduces the advantages of the WPD-BPNN method. Firstly, the principle of WPD is simple and inclusive, and it can comprehensively and deeply analyze the characteristics of monthly precipitation series. Secondly, the prediction performance of BPNN only depends on the characteristics of input variables. Finally, the proposed model does not require complex decision-making for the explicit form of the model in different cases. Therefore, the hybrid forecast model based on WPD technology is an efficient tool to improve the accuracy of mid-and long-term rainfall forecasting.
It should be pointed out that, although this paper has fully verified the feasibility of WPD-BPNN in monthly precipitation forecasting, there are still several limitations to be explored in the future research. Firstly, the study is carried out based on two time series, so we will test the generalization of the proposed model. The second is to test the performance of other algorithms combined with WPD. The last major issue is to develop an appropriate optimization algorithm to improve the performance of WPD-BPNN. In future research, it is necessary to conduct in-depth research on the three aspects above to explore more efficient and accurate forecasting techniques and make contributions to the field of hydrological forecasting. Data Availability Statement: All authors made sure that all data and materials support our published claims and comply with field standards.

Conflicts of Interest:
The authors declare that they have no conflict of interest.