Examining the Applicability of Wavelet Packet Decomposition on Different Forecasting Models in Annual Rainfall Prediction

: Accurate precipitation prediction can help plan for different water resources management demands and provide an extension of lead-time for the tactical and strategic planning of courses of action. This paper examines the applicability of several forecasting models based on wavelet packet decomposition (WPD) in annual rainfall forecasting, and a novel hybrid precipitation prediction framework (WPD-ELM) is proposed coupling extreme learning machine (ELM) and WPD. The works of this paper can be described as follows: (a) WPD is used to decompose the original precipitation data into several sub-layers; (b) ELM model, autoregressive integrated moving average model (ARIMA), and back-propagation neural network (BPNN) are employed to realize the forecasting computation for the decomposed series; (c) the results are integrated to attain the ﬁnal prediction. Four evaluation indexes (RMSE, MAE, R, and NSEC) are adopted to assess the performance of the models. The results indicate that the WPD-ELM model outperforms other models used in this paper and WPD can signiﬁcantly enhance the performance of forecasting models. In conclusion, WPD-ELM can be a promising alternative for annual precipitation forecasting and WPD is an effective data pre-processing technique in producing convincing forecasting models.


Introduction
Precipitation prediction takes a key role in many practical applications, i.e., agriculture [1], streamflow forecasting [2,3], flood prediction [4], water resources management [5], urban flooding prediction [6], facilities maintenance and control [7], etc. Accurate precipitation prediction can help plan for different environmental water demands and provide an extension of lead-time for the tactical and strategic planning of courses of action. In hydrology, the forecasts of precipitation are commonly obtained from numerical weather prediction models. However, numerical weather prediction models are unable to provide quantitative precipitation forecasts in sufficient spatial and time resolutions compatible with the time-space variability of precipitation processes [8]. Soft computing approaches have several advantages: they are easy to operate and can carry out large-scale data operation, and can adjust the model structure according to the characteristics of the watershed, and the performance is indeed very competitive compared to numerical models [9].
The Box-Jenkins models [10], which can be considered as the most conventional and comprehensive technology for time series prediction, include AR (auto-regressive), ARMA (autoregressive moving average), ARIMA (auto-regressive integrated moving average), etc. These methods have been extensively utilized in non-stationary data analysis and prediction in recent decades [11][12][13][14][15]. Rainfall forecasting was performed assuming that quency bands. WPD was a further extension of the WT technique [38]. WPD could provide a more complete wavelet packet tree, which extracted the features of the original signal more comprehensively [28]. WPD has been proved to exhibit good performance for time series forecastings, such as wind speed forecasting [39,40] and river stage forecasting [41]. However, few studies have investigated the performance of combined WPD and prediction models for prediction in the hydrology field, which needs further exploration.
Motivated by the principle of "decomposition and ensemble" [42,43], the raw annual rainfall data can be decomposed into different components. Each component can be predicted with the purpose of fine results and easy forecasting tasks, and the forecasting results of all components are aggregated to generate the final prediction [2,14]. In this paper, a novel hybrid precipitation prediction framework is presented based on an extreme learning machine (ELM) and WPD. The framework can be described as follows: (a) WPD is used to decompose the original precipitation data into several linear sub-layers; (b) ELM model is employed to realize the forecasting computation for the decomposed series; (c) results of (b) are integrated to produce the final prediction. To ascertain the performance of the proposed framework, six models are employed for benchmark comparison: ARIMA, ARIM-WPD, BPNN, BPNN-WPD, ELM, and ELM-WPD. Validation of the model performance is made using four evaluation criteria, namely RMSE (root mean square errors), R (coefficient of correlation), NSEC (Nash-Sutcliffe efficiency coefficient), and MAE (mean absolute error).

Study Area
In this paper, annual precipitation data measured at Jinsha weather station, Chishui River located in the northwest of Guizhou Province were used. Figure 1 displays the location of the study area. The average annual precipitation of the Chishui River Basin is 800-1200 mm, and the average surface precipitation depth is 1020.6 mm. Abundant precipitation, frequent torrential rain events, and a fragile geological environment lead to the frequent occurrence of debris flow, landslide, and other mountain torrents. Therefore, it is of practical significance to forecast precipitation in this area for flood control, disaster reduction, and agricultural production. The annual rainfall data from 1958 to 2016 were employed. Figure 2 shows the original data for Jinsha station, data from 1958 to 2011 were adopted for training and those from 2012 to 2016 were used for testing. Statistical parameters of the original data are listed in Table 1. It can be observed that the original data shows obvious skewness, indicating a high difficulty of modeling.  The annual rainfall data from 1958 to 2016 were employed. Figure 2 shows the original data for Jinsha station, data from 1958 to 2011 were adopted for training and those from 2012 to 2016 were used for testing. Statistical parameters of the original data are listed in Table 1. It can be observed that the original data shows obvious skewness, indicating a high difficulty of modeling.

Wavelet Packet Decomposition WPD
WPD is similar to WD, yet the former complements the defect of WD [44]. A three-layer binary tree structure of WPD is shown in Figure 3. WPD decomposes the signal into approximation coefficients and detail coefficients by a mother wavelet function. The decomposition level and mother wavelet function has a significant influence on the performance of WPD. WPD comprises continuous wavelet transform (CWT) and discrete wavelet transform (DWT). CWT is expressed as follows: where is the input signal, a is the scale parameter, b is the translation parameter, * is the complex conjugate, is the mother wavelet function. and in the DWT are expressed as: where and are the scale parameter and translation parameter, respectively.

Wavelet Packet Decomposition WPD
WPD is similar to WD, yet the former complements the defect of WD [44]. A threelayer binary tree structure of WPD is shown in Figure 3. WPD decomposes the signal into approximation coefficients and detail coefficients by a mother wavelet function. The decomposition level and mother wavelet function has a significant influence on the performance of WPD. WPD comprises continuous wavelet transform (CWT) and discrete wavelet transform (DWT). CWT is expressed as follows: Water 2021, 13, 1997 5 of 15 where x(t) is the input signal, a is the scale parameter, b is the translation parameter, * is the complex conjugate, ψ(t) is the mother wavelet function. a and b in the DWT are expressed as: where i and j are the scale parameter and translation parameter, respectively.

Extreme Learning Machine (ELM)
The ELM proposed by Huang et al. [45] is a single hidden layer feed-forward network (SLFN), which is a modified version of a traditional ANN with the characteristic of no adjustment made on internal parameters [46]. The ELM selects the hidden threshold randomly in training the network. The computation of output weight does not need complicated iteration, which greatly enhances the training speed.
For a vector ( , ), , , ⋯ , ∈ , , , ⋯ , ∈ , ELM can be modeled by: where denotes the weight connecting the nodes of the hidden layer and output layer, denotes the activation function, , , ⋯ denotes the weight vector connecting the nodes of the input layer and output layer, denotes input vectors, denotes the threshold of the hidden node, denotes the output value, and denotes the number of hidden nodes.
The goal of the ELM is to minimize the error between the original and predicted values, ∑ ∥ ∥ 0, there exist , and . So, Equation (4) can be expressed by the following equation: where: , ,

Extreme Learning Machine (ELM)
The ELM proposed by Huang et al. [45] is a single hidden layer feed-forward network (SLFN), which is a modified version of a traditional ANN with the characteristic of no adjustment made on internal parameters [46]. The ELM selects the hidden threshold randomly in training the network. The computation of output weight does not need complicated iteration, which greatly enhances the training speed.
For a vector (x i , y i ), , y i2 , · · · , y in ] T ∈ R n , ELM can be modeled by: where β i denotes the weight connecting the nodes of the hidden layer and output layer, , ω i2 , · · · ω im ] denotes the weight vector connecting the nodes of the input layer and output layer, x j denotes input vectors, α i denotes the threshold of the hidden node, z j denotes the output value, and L denotes the number of hidden nodes. The goal of the ELM is to minimize the error between the original and predicted values, Equation (4) can be expressed by the following equation: where: H stands for the output matrix of hidden nodes, β is the output weight, Y is the target output. Thus the minimum norm least-squares solutionβ of Equation (5) is: where H + represents the Moore-Penrose generalized inverse (MPGI) of H and Y is the target.

Back-Propagation Neural Network (BPNN)
The BPNN proposed by McClelland and Rumelhart [47] is a common multilayered ANN. BPNN includes input nodes, hidden layers, and output layers. The characteristic of BPNN is the forward transmission of the signal and the reverse transmission of the error. BPNN minimizes the global error according to the gradient descent approach [48]. The connection weights are modified based on errors between the input and output data in the reverse transmission. Forward and backward propagation is repeated until the errors reach the expected precision. In this study, the Levenberg-Marquardt (LM) method, sigmoid function, and the purelin formula were adopted as the training function, transfer function, and output function, respectively. The mathematical formula of the BPNN model can be expressed as follows: where x m−1 i represents the input data of node i in layer m − 1, w m is weight of x m−1 i , β m is the bias of layer m, x m j is the output value of node i in layer m; f (x) denotes the transfer function of layer m, which can be written as: The purelin formula is expressed as follows:

ARIMA
ARIMA, proposed by Box [49], is normally utilized in time series analysis. The mathematical forecasting equation of ARIMA is linear, in which the predictors include autoregressive (AR) terms and moving average (MA) terms. The representation of ARIMA is ARIMA (p, d, q), and the representation of SARIMA (seasonal ARIMA) is ARIMA (p, d, q) × (P, D, Q) s , where (p, d, q) represents the non-seasonal order and (P, D, Q) s denotes the seasonal order. The ARIMA [50] model can be expressed as: (12) where y t is the time series, φ i is the AR coefficient, θ i is the MA coefficient, µ is the model parameter, p is the order of AR component, q is the order of MA component, and d is the order of differentiation.

Framework of the Proposed Hybrid Model
The architecture of the hybrid precipitation prediction framework (WPD-ELM) is presented in Figure 4. Detailed steps of the model are indicated as follows: Dickey-fuller Test (ADF); (d) normalize all data within [0, 1] by: ′ (13) where is the original data, is the normalized data series, n the number of data.
Step 2: Model building. (a) partial autocorrelation function (PACF) and precipitation theory were employed to select the number of input variables; (b) values of the model parameters, such as the number of hidden layers in ANN and the optimized parameters for ARIMA, were set; (c) ELM, ARIMA, and BPNN were used as forecasting tools to model and predict each decomposed sub-sequence separately; (d) the results were integrated to produce the final prediction.  Step 1: Data pre-processing. (a) WPD was adopted to decompose the original precipitation series into several sub-series; (b) data (both in original series and sub-series) were partitioned into training and testing sets; (c) all data were tested by the Augmented Dickey-fuller Test (ADF); (d) normalize all data within [0, 1] by: where x i is the original data, x i is the normalized data series, n the number of data.
Step 2: Model building. (a) partial autocorrelation function (PACF) and precipitation theory were employed to select the number of input variables; (b) values of the model parameters, such as the number of hidden layers in ANN and the optimized parameters for ARIMA, were set; (c) ELM, ARIMA, and BPNN were used as forecasting tools to model and predict each decomposed sub-sequence separately; (d) the results were integrated to produce the final prediction.

Evaluation Indicators
Results of the models were evaluated with respect to four evaluation indicators. These indexes included root mean square errors (RMSE) [51], mean absolute error (MAE) [52], Nash-Sutcliffe efficiency coefficient (NSEC) [53], and coefficient of correlation (R). Their equations are provided below.
where y e (i), y o (i), y e and y o are the estimated, observed, mean estimated, and mean observed values of precipitation, respectively.

Decomposition Results
WPD was adopted to split the original annual precipitation data into several sub-series. The frequency of the sub-series was different, and each sub-series played a different role in the original dataset. The decomposed results for the Jinsha weather station using WPD at level 3 are shown in Figure 5.

Evaluation Indicators
Results of the models were evaluated with respect to four evaluation indicators. These indexes included root mean square errors (RMSE) [51], mean absolute error (MAE) [52], Nash-Sutcliffe efficiency coefficient (NSEC) [53], and coefficient of correlation (R). Their equations are provided below.
where , , and are the estimated, observed, mean estimated, and mean observed values of precipitation, respectively.

Decomposition Results
WPD was adopted to split the original annual precipitation data into several sub-series. The frequency of the sub-series was different, and each sub-series played a different role in the original dataset. The decomposed results for the Jinsha weather station using WPD at level 3 are shown in Figure 5.

Selection of Input Variable
The selection of input variables has a significant effect on the prediction results. In this paper, two methods were utilized to select input combinations: (a) trial-and-error method; (b) PACF statistical approach. PACF values for all series from the Jinsha weather station are shown in Figure 6, where PACF0 denotes the original rainfall series and others are for the sub-series. Twelve ANN models with different input combinations were employed. Table 1 lists the input variables for different series with respect to the information in Figure 6 and the trial-and-error method, where q (t) represents the estimated value of rainfall and q (t−p) is the rainfall at time t − p.
Water 2021, 13, x FOR PEER REVIEW 10 of 16

Selection of Input Variable
The selection of input variables has a significant effect on the prediction results. In this paper, two methods were utilized to select input combinations: (a) trial-and-error method; (b) PACF statistical approach. PACF values for all series from the Jinsha weather station are shown in Figure 6, where PACF0 denotes the original rainfall series and others are for the sub-series. Twelve ANN models with different input combinations were employed. Table 1 lists the input variables for different series with respect to the information in Figure 6 and the trial-and-error method, where represents the estimated value of rainfall and is the rainfall at time t−p.

Model Development
To verify the WPD-ELM model, six models, namely ELM, BPNN, ARIMA, WPD-ELM, WPD-BPNN, WPD-ARIMA, were employed for benchmark comparison. Detailed information relating to these models is described in this section.
(1) ELM and BPNN models For conventional ELM and BPNN models, the observed precipitation values are adopted as the target value. The number of nodes in the input and output layers is equal to the number of input variables and one, respectively. The best number of nodes in the hidden layer is determined by the trial-and-error method. In this paper, the best numbers of nodes in the hidden layer in the ELM and BPNN were set to twenty and eight, respectively. The LM algorithm is adopted to train the BPNN model and the training epoch was set to 500. The sigmoidal function was selected as the activation function of the ELM and the MPGI method was adopted to determine the hidden output weights after randomly setting its hidden threshold and weight vector between the hidden and input layers.

Model Development
To verify the WPD-ELM model, six models, namely ELM, BPNN, ARIMA, WPD-ELM, WPD-BPNN, WPD-ARIMA, were employed for benchmark comparison. Detailed information relating to these models is described in this section.
(1) ELM and BPNN models For conventional ELM and BPNN models, the observed precipitation values are adopted as the target value. The number of nodes in the input and output layers is equal to the number of input variables and one, respectively. The best number of nodes in the hidden layer is determined by the trial-and-error method. In this paper, the best numbers of nodes in the hidden layer in the ELM and BPNN were set to twenty and eight, respectively. The LM algorithm is adopted to train the BPNN model and the training epoch was set to 500. The sigmoidal function was selected as the activation function of the ELM and the MPGI method was adopted to determine the hidden output weights after randomly setting its hidden threshold and weight vector between the hidden and input layers.
(2) ARIMA The Augmented Dickey-Fuller (ADF) unit root test was first employed to judge the stationarity of the input series. If the dataset was nonstationary, the difference and MA could be used to smooth it. The results are shown in Table 2. Value h = 1 means that the test rejects the null hypothesis of a unit root. The significance level of the p-value was determined as 0.05. Thus, when the t-statistic value was less than the critical value and the p-value < 0.05, the test results could be considered feasible and it would reject the null hypothesis. It can be seen from Table 2 that the sample dataset was stationary without a single root effect. Subsequently, the optimal ARIMA (p, d, q) structure was mainly determined according to the minimum Bayes information criteria (BIC) value. The resulting ARIMA models are shown in Table 3.  (3) WPD-ANN and WPD-ARIMA For WPD-ELM, WPD-BPNN, and WPD-ARIMA models, the original precipitation is split into eight sub-series using WPD, and several special hybrid models are reconstructed for each subsequence. For the WPD method, the selection of an appropriate wavelet basis function is very significant. The symlet wavelet is an improved approximate symmetric wavelet function based on Daubechies wavelet, which can avoid signal distortion during decomposition and reconstruction. Thus, a three-scale and four-order Symlet wavelet was considered as the wavelet basis function in this paper.

Comparative Analysis
The forecast of the series was implemented using the above models with the input data of original annual precipitation and extracted subseries. The assessment results obtained by different models in training and test phases are shown in Tables 4 and 5. Figure 7 exhibits the forecasting results using the six models.   One can see from Table 4 that when forecasting the annual precipitation in the Jinsha weather station, WPD-ELM could attain the best results in terms of four evaluation in- One can see from Table 4 that when forecasting the annual precipitation in the Jinsha weather station, WPD-ELM could attain the best results in terms of four evaluation indexes. For example, in the testing period, the proposed hybrid model exhibited the highest R (0.997) and NSEC (0.973) and exhibited the lowest RMSE (23.069) and MAE (19.051).
Compared with the prediction accuracy of six models in the training period, WPD-ARIMA provided the best results in terms of four evaluation indexes, the performance of WPD-ELM was slightly lower than WPD-ARIMA. Table 5 lists the improvements obtained by several models. In the training period, WPD-ELM improved the ELM model with a 77.69% and 78.33% reduction in MAE and RMSE, respectively. Besides, improvements in prediction results regarding the NSEC and R were 146.35% and 56.91%, respectively. In the testing period, compared to the ELM model, WPD-ELM yielded reductions in terms of MAE and RMSE with 75.78% and 72.44%, respectively. Besides, the improvements in the forecasting results of WPD-ELM regarding the NSEC and R were 49.89% and 21.66%, respectively. Compared with the WPD-BPNN and WPD-ELM models, WPD-ARIMA could exhibit the best improvements in terms of all measures during both training and testing periods. Meanwhile, we could see that the improvements of WPD-BPNN were superior to WPD-ELM, whilst there was no dramatic difference between them.
The following conclusions could be drawn based on this analysis: (a) when comparing the performance of three single models with three hybrid models, the hybrid models attained better performance during both training and testing periods; (b) ELM outperformed the other single methods; (c) WPD-ARIMA demonstrated the optimal performance in terms of all measures in the training period, and WPD-ELM attained the best forecasting accuracy in terms of four evaluation indexes during the testing phase; (d) ARIMA provided the worst results during both training and testing periods; (e) there was no significant difference in the forecasting accuracy between BPNN and ELM and those between WPD-BPNN and WPD-ELM.
In addition, the prediction accuracy was different in terms of different phrases. The performance of the ARIMA method during the testing period was far inferior to that during the training period. The WPD-ARIMA model was of the highest level in the training period, but the performance in the testing period was the worst among several hybrid models, which meant the generalization ability of WPD-ARIMA was very poor. However, it was considered that the model had strong generalization capability for reliable performance when it performed modestly during the training period and well in the testing period, e.g., BPNN, WPD-BPNN, ELM, WPD-ELM.
The performances of all forecasting models developed in this paper during the training and testing phases are shown in Figure 7. One can clearly see that the performances of the hybrid models were better than those single methods as their trend lines were very close to the original data line. Meanwhile, the algorithm prior to the improvements had difficulty capturing the drastic changes in rainfall.

Discussion of Results
The following should be considered before analyzing the results. The forecasting performance of the testing phase plays a greater role than that of the training phase because the training phase is utilized to train the model, and its performance is measured by the data related to the modeling. However, the testing dataset does not participate in modeling, so its performance can truly reflect the model application efficiency. Based on the above considerations, we can draw the following conclusions from the above analysis. Firstly, the WPD-ELM model proposed in this paper can attain the best performance in terms of all statistical measures. Secondly, WPD is suitable for decomposing the annual precipitation series as it can overcome the complicated abrupt change of precipitation. Thirdly, there are obvious differences between the accuracies of the ARIMA, BPNN, and ELM models, which highlight the significant role of the training tool in modeling. Finally, the annual rainfall series decomposed by WPD as input in modeling can substantially enhance forecasting accuracy.
The WPD-ELM model outperforms all compared models. The reasons why the novel method can improve the accuracy are analyzed below. Firstly, WPD decomposes the original data into more linear subseries, reducing the modeling difficulty. Secondly, the ELM is employed to model the input-output relationship in each subsequence. The random initialization of the feature mapping parameters in the algorithm enhances the mutual independence of each input signal, creates a larger solution space, and improves the generalization ability. Finally, the WPD-ELM hybrid model overcomes the shortcomings of the single model by generating a synergistic effect in prediction.
Besides, this paper investigates the performance of WPD-ELM in annual precipitation forecasting. There are still several research directions to fill the research gap in future work. The first is to study an appropriate optimization algorithm to improve the performance of the WPD-ELM model. The study is carried out based on a limited dataset, so the second is to test the generalization of the proposed model. Due to uncertainties caused by the stochastic processes in the neural network model, the last major issue is to solve the problem of randomness. In future research, it is necessary to conduct in-depth research on the above three aspects to study a more accurate prediction model and make contributions to the field of hydrological forecasting.

Conclusions
Improving the accuracy of long-term annual precipitation forecasting is an important yet challenging work in water resources management. We propose a novel hybrid precipitation prediction framework (WPD-ELM). Firstly, WPD is adopted to decompose the original annual precipitation series into several sub-series. Secondly, ANN models are utilized to predict each sub-series. Finally, results are integrated to produce the final prediction. The annual precipitation series gathered at the Jinsha weather station of Guizhou Province in China are taken to develop the empirical study. Results indicate that WPD-ELM outperforms other benchmark methods in this paper in terms of four evaluation indicators. Meanwhile, the performances of the hybrid models are better than those of the standard BPNN, ARIMA, and ELM, which means that WPD can significantly improve the forecasting accuracy. In conclusion, WPD can be used to preprocess precipitation data and WPD-ELM can provide more accurate and reliable results, thus, it may be a promising alternative for long-term precipitation prediction. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: All authors made sure that all data and materials support our published claims and comply with field standards.