A New Hybrid Wind Power Forecaster Using the Beveridge-Nelson Decomposition Method and a Relevance Vector Machine Optimized by the Ant Lion Optimizer

As one of the most promising kinds of the renewable energy power, wind power has developed rapidly in recent years. However, wind power has the characteristics of intermittency and volatility, so its penetration into electric power systems brings challenges for their safe and stable operation, therefore making accurate wind power forecasting increasingly important, which is also a challenging task. In this paper, a new hybrid wind power forecasting method, named the BND-ALO-RVM forecaster, is proposed. It combines the Beveridge-Nelson decomposition method (BND), relevance vector machine (RVM) and ant lion optimizer (ALO). Considering the nonlinear and non-stationary characteristics of wind power data, the wind power time series were firstly decomposed into deterministic, cyclical and stochastic components using BND. Then, these three decomposed components were respectively forecasted using RVM. Meanwhile, to improve the forecasting performance, the kernel width parameter of RVM was optimally determined by ALO, a new Nature-inspired meta-heuristic algorithm. Finally, the wind power forecasting result was obtained by multiplying the forecasting results of those three components. The proposed BND-ALO-RVM wind power forecaster was tested with real-world hourly wind power data from the Xinjiang Uygur autonomous region in China. To verify the effectiveness and feasibility of the proposed forecaster, it was compared with single RVM without time series decomposition and parameter optimization, RVM with time series decomposition based on BND (BND-RVM), RVM with parameter optimization (ALO-RVM), and Generalized Regression Neural Network with data decomposition based on Wavelet Transform (WT-GRNN) using three forecasting performance criteria, namely MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error) and RMSE (Root Mean Square Error). The results indicate the proposed BND-ALO-RVM wind power forecaster has the best forecasting performance of all the tested options, which confirms its validity.


Introduction
Facing the unfavorable situation of fossil energy resource depletion and environmental deterioration, people are increasingly focusing on the exploitation and utilization of renewable energy resources, such as wind power and solar photovoltaic power [1].Nowadays, wind power has become one of the fastest growing and most promising renewable energy power sources, and the share of wind power generation in the total electricity output has been increasing yearly [2,3].According to the data released by the Global Wind Energy Council, by the end of 2016, the installed wind power capacity around the world reached 486.7 GW.The cumulative installed wind power capacity in China amounts to 168.7 GW, which accounts for 34.7% of the world total.
Wind power is environmentally friendly.However, wind power output has the characteristics of stochastic fluctuation, intermittency and uncertainty [4].When a wind generator is connected to the power grid, it will impose new requirements and challenges on electric power systems, such as efficient scheduling of power resources and continuing guarantee of smooth power system operation [5].With the increase of wind power penetration, it is necessary to increase operation costs to deal with the electric energy unbalance issues due to the stochastic fluctuation of wind power [6].Meanwhile, to coordinate with wind generators, thermal power generating units have no choice but to frequently adjust their power output, which will reduce the operational efficiency and increase running costs [7].Accurate wind power forecasting is an effective and important way to alleviate the abovementioned adverse effects.
In past years, many researchers have developed models and methods for wind power/speed forecasting, which have made great achievements [8,9].Currently, there are mainly two kinds of forecasting techniques related to wind power/speed, which are physical-based forecasting techniques and statistical-based forecasting techniques.Physical forecasting techniques represent a traditional forecasting approach, which needs detailed physical descriptions related to the on-site conditions of wind farms, such as the wind farm layout, wind turbines, and atmospheric conditions [10].The main representatives are Prediktor developed by the Risoe National Laboratory in Denmark [11], Previento developed by University of Oldenburg in Germany [12], and eWind developed by AWS True Wind Inc. (New York, NY, USA) [13].In the past few years, statistical forecasting techniques, which conduct wind power/speed forecasting based on historical power/speed data and other meteorological data have been developed greatly.This kind of forecasting technique includes two kinds of approaches, namely conventional statistical approaches and emerging artificial intelligent approaches.Among conventional statistical approaches, the auto-regressive moving average (ARMA) model [14,15] and auto-regressive integrated moving average (ARIMA) model [16] have been widely used to forecast wind speed and wind power.The conventional statistical approaches hold the assumption that wind power and wind speed have linear relationships with their influencing factors.However, in fact, the relationships between wind power/speed and their influencing factors are non-linear.Therefore, the conventional statistical approaches fall short of obtaining high forecasting accuracy and satisfactory forecasting results.The other statistical forecasting techniques, namely the emerging artificial intelligence approaches, do not assume a linear relationship between wind power/speed and their influencing factors can effectively cover the abovementioned shortcomings of conventional statistical approaches.Currently, there are several emerging artificial intelligence approaches which have been employed to forecast wind power and wind speed, such as artificial neural networks [17,18], support vector machine [19], and extreme learning machine [20,21].
When emerging artificial intelligence approaches are employed to forecast wind power/speed, there are several parameters that must be set first, such as neuron number of artificial neural networks and kernel parameters of support vector machines.This is very difficult for practitioners.To tackle this issue, an intelligent optimization algorithm is usually introduced to determine the optimal parameters of emerging artificial intelligence approaches.Ren, et al. [22] applied particle swam optimization (PSO) to automatically determine the parameters of a back propagation neural network (BPNN) for short-term wind speed forecasting.Amjady, et al. [23] used enhanced particle swarm optimization to optimize a modified neural network for wind power prediction.Jursa and Rohrig [24] employed PSO and differential evolution (DE) to optimize artificial neural networks for short-term wind power forecasting.Liu, et al. [25] used a genetic algorithm to select the optimal parameters of support vector machines for short-term wind speed forecasting.Salcedo-Sanz, et al. [26] used a coral reefs optimization algorithm to optimize an extreme learning machine for wind speed prediction.Generally speaking, wind power time series have both nonlinear and nonstationary characteristics, so the decomposition of wind power time series is often needed to improve wind power forecasting accuracy [27].Currently, there are mainly two kinds of wind power time series decomposition methods, which are the wavelet transform (WT) [28,29] and empirical mode decomposition (EMD) [30,31].The Beveridge-Nelson decomposition (BND) method, proposed by researchers Stephen Beveridge and Charles Nelson in 1981, is a kind of non-stationary time series decomposition technique [32], which has been widely applied in economic fields, such as business cycle analysis [33], GNP and stock prices [34], and regional income fluctuations [35].As an effective time series decomposition method, it is very regretful to find that the BND has not been used for wind power time series.To fill this gap, in this paper the BND technique is employed to decompose wind power time series, which is a new application.
In this paper, relevance vector machine (RVM), a kind of sparse and supervised learning probabilistic method, is also used for wind power forecasting.RVM has some merits compared with other machine learning methods, such as better adaptability, a need for fewer sample data, sparsity, and simplified parameter setting [36].Nowadays, RVM is employed in many practical issues, such as silent speech classification [37], battery health monitoring [38], canal flow prediction [39], daily potential evapotranspiration forecast [40], and system fault diagnosis [41].However, the RVM technique has rarely been used for wind power forecasting.To improve RVM-based forecasting performance of wind power, a new Nature-inspired meta-heuristic algorithm, named the ant lion optimizer (ALO) [42] is also employed in this paper to automatically determine the optimal parameters of RVM.Therefore, a new hybrid BND-ALO-RVM method for wind power forecasting is proposed in this paper.To verify the effectiveness and applicability of this proposed method, real-world hourly wind power data from the Xinjiang Uygur autonomous region in China is selected as our empirical analysis example, and the forecasting results are compared with other forecasting methods, including single RVM, BND-RVM, ALO-RVM, and WT-GRNN.The main contributions of this paper are as follows: (1) A new hybrid BND-ALO-RVM method for wind power forecasting is proposed, which combines Beveridge-Nelson decomposition (BND), relevance vector machine (RVM) and ant lion optimizer (ALO).Empirical results indicate that the proposed method can improve wind power forecasting accuracy and shows superiority over other compared methods.The proposed method in this paper can be a promising alternative forecasting technique for wind power, which enriches the current wind power forecasting method toolbox.(2) The Beveridge-Nelson decomposition (BND) method, which has been frequently and widely used for economic issues, is employed in energy issues for the first time.In this paper, the wind power time series are decomposed into three components, namely the deterministic, cyclical and stochastic component.Empirical results show the wind power forecasting accuracy can be improved after decomposing wind power time series by using BND, which indicates BND is an effective method for wind power time series decomposition.It can be said that this paper expands the application domains of the BND method, and enriches the data decomposition library for wind power time series.(3) Relevance vector machine (RVM) technique is employed to forecast the different decomposed components of wind power time series.To improve the forecasting performance of RVM, a new Nature-inspired meta-heuristic algorithm, namely the ant lion optimizer (ALO), is used to optimally determine the kernel width parameter of RVM model.Forecasting results reveal the ALO is effective, which can determine the optimal kernel width parameter of RVM and improve the RVM-based wind power forecasting accuracy.In our previous study [43], has was verified that the ALO can improve GM (1,1)-based power load forecasting accuracy.Therefore, ALO, as a new intelligent optimization algorithm, can be promising with a good development foreground.
This paper makes a new attempt to use ALO for parameter optimization of RVM, which also enlarges the application scope of the ALO algorithm.
The reminder of this paper is organized as follow: Section 2 gives a brief introduction of the basic methods and algorithms used, including the Beveridge-Nelson decomposition (BND) method, relevance vector machine (RVM), and ant lion optimizer (ALO); the proposed hybrid BND-ALO-RVM forecaster for wind power is described in Section 3; Section 4 conducts an empirical analysis, and the forecasting performance of the proposed method is compared with other methods.The main conclusions are drawn in Section 5.

Beveridge-Nelson Decomposition Method (BND)
In 1981, two researchers Stephen Beveridge and Charles Nelson proposed a new general procedure for non-stationary time series decomposition, named the Beveridge-Nelson decomposition (BND) method [32].The BND method decomposes the stationary first-order difference of original time series with first-order co-integration characteristics into permanent components and transitory (cyclical) components [44].The permanent component is a random walk process with drift, which includes deterministic component and stochastic component.The deterministic component can be estimated using ARIMA technique, and the transitory (cyclical) component is a stationary process with a zero average value.
The first step of the BND method is to determine whether the first-order difference of a non-stationary wind power time series is stationary or not [45].If yes, the detailed steps of wind power time series decomposition by using BND method are as follows: The wind power time series are represented as WP.According to the Wold theorem, under the the condition of first-order stationarity, the natural logarithm of the wind power time series at time t (denoted as lnWP t ) satisfies: where WP t is the wind power at time t; µ is the long-run mean value of ∆lnWP t ; ε t ∼ i.i.d.N 0, σ 2 (i.i.d.represents independently and identically distribute); λ i is the coefficient; and ∆ ln WP t = ln WP t − ln WP t−1 .
Taking the expectation on both sides of Equation (1), we can obtain: where E(•) represents the expected computation on variables.
According to the Beveridge-Nelson decomposition theorem, the deterministic component (represented as D t ) of the wind power time series can be decomposed as: where D t represents the deterministic component of the wind power time series at time t; and lnWP 0 is the natural logarithm value of the initial wind power data.According to Morley [44], the time series can be forecasted by using first-order difference AR(1) model, namely: ( where |φ| < 1, and ε t ∼ i.i.d.N 0, σ 2 . According to the Wold theorem, the expected value of minimum mean squared error (MMSE) of first-order difference ∆lnWP t at next j period under the assumption of normality is: The BN trend of wind power time series, denoted as T t , is defined as the MMSE forecast of time series long-term level, namely: Thus, substituting Equation ( 5) into Equation ( 6), the BN trend of lnWP t for the case AR (1) can be obtained as: Meanwhile, the cyclical component C t of wind power time series can be calculated by: Finally, the stochastic component T t of wind power time series can be computed as:

Relevance Vector Machine (RVM)
Relevance vector machine (RVM), proposed by Tipping, is a kind of sparse and supervised learning probabilistic method [46].Compared with traditional supervised learning algorithms, RVM under a Bayesian framework has a better non-linear mapping capability, which can be used in the case of a small number of samples and can also obtain a good generalization performance.
Given a training sample set {p i , v i } N i=1 , p i is a two-dimensional input vector, and v i is a one-dimensional target value.The RVM model can be formulated as: where N is the number of training sample; w i is the weight; and K(p, p i ) is a kernel function.
There are several kinds of kernel functions, and the Gaussian kernel function is usually employed, namely: where σ is the width of kernel function.Suppose that the noise ε i obeys normal distribution with mean of 0 and variance σ 2 .Then: Since v i is independent and identically distributed, the likelihood function of the training sample is: where To avoid the issues of too many relevance vectors, over-fitting and poor generalization capability, the weight w should obey normal distribution, namely: According to the Bayesian criterion, it can be inferred that the a posteriori distributions of w and t are both normal distributions, namely: P w|v, α, σ 2 = P v|w, σ 2 P( w|α) P( v|α, where To maximize the hyper-parameter likelihood distribution P v|α, σ 2 , the optimal values of α best and σ 2 best can be obtain according to the following iterations, namely: where α best i is the ith value of optimal parameter α best ; µ i is the ith posterior mean value of µ; and N ii is the ith diagonal element of posterior variance matrix.
When the new input data p * is given, the corresponding probability distribution of forecasting output can be obtained as: The forecasting variance is: The mean value of the forecasting output is: During the parameter estimation process, most α i will tend to ∞, and then the corresponding w i will equal 0. This means that many terms of the kernel matrix will not participate in the forecasting process, and this is why RVM can achieve sparsity [36].Compared with the support vector machine (SVM) technique, there is only one parameter which needs to be set for RVM, namely the kernel width parameter σ [36].

Ant Lion Optimizer (ALO)
In 2015, Mirjalili proposed a new Nature-inspired meta-heuristic algorithm, namely the ant lion optimizer (ALO) [42].The ALO was put forward by the inspiration of intelligence behavior of ant lions hunting for ants.The detailed steps of the ALO algorithm are as follows: Step 1: Set initial parameters.
When ALO is used, the initial values of five parameters need to be set, which are the number of ants and ant lions Agents_no; maximum iteration number Max_iteration; variables number dim; Step 2: Initialize the positions of ants and antlions.
The positions of ants and antlions need to be initialized, which can be represented by Equations ( 22) and (23).
where M Ant represents each ant position; A ij represents the j-th parameter's value of the i-th ant; M Antlion represents each antlion position; AL ij represents the j-th parameter's value of the i-th antlion; The positions of ants and antlions are generated at random.Therefore, the entry of position matrix of ant and antlion can be obtained according to Equation ( 24): A * j or AL * j = rand × ub j − lb j + lb j (24) where A * j and AL * j are the j-th column values of position matrix; rand represents the generated random number with uniform distribution in the interval [0, 1]; lb j and ub j respectively, represent the lower and upper boundary of the j-th variable.
The elite refers to the best antlion obtained in the iteration process.The best antlion can be determined according to fitness function, and the antlion with the maximal fitness value is elite, which is also called as the fittest antlion.In ALO, the fitness function of antlion is represented by f [ * ], and matrix M OAL is used to store fitness values of antlions, namely: where M OAL is the fitness matrix of antlions.According to Equation (25), the fitness values of antlions can be calculated, and then the initial elite can be selected.
In ALO, the ant position is influenced by both the antlion and the selected elite.The ants randomly walk around the selected elite and antlions are selected by a roulette wheel algorithm.According to this rule, the i-th ant position at the t-th iteration can be obtained by: where Ant t i is the i-th ant position at the t-th iteration, R t E represents the random walk around the selected elite at the t-th iteration, and R t A represents the random walk around the antlion selected by the roulette wheel algorithm.
Random walk is an important strategy for modelling the positions and movements of ants and antlions in the ALO algorithm.For a detailed interpretation related to random walk in ALO readers can refer to [42,43].
In the iteration process, the ants update their positions according to Equation (26).In order to keep the random walk inside the search space, the ant position needs to be normalized at each iteration as follows: where X t j represents the normalized value of the j-th variable at the t-th iteration; a j and b j respectively represent the minimum and maximum of random walk of the j-th variable; c t j and d t j are the minimum and maximum of random walk of the j-th variable at t-th iteration, respectively.
In the ALO algorithm, the random walk of ants is influenced by the traps of antlions, which can be modeled as follows: where c t i and d t i respectively represent the minimum and maximum of variables related to the i-th ant at t-th iteration; c t and d t respectively represent the minimum and maximum of variables at t-th iteration; and Antlion t i represents the i-th antlion position at the t-th iteration.During the iteration and optimization process, antlions build their pits proportional to their fitness values.The antlions with better fitness values have larger pits, which indicate these antlions have better chances of catching ants.When the antlions find that ants are trapped, they will throw sand outwards the centers of pits.Therefore, the random walk range is set to decrease adaptively to simulate the movement behavior of ants sliding towards antlions, which can be modelled as follows: where I is a decreased ratio as follows: Step 5: Select the optimal antlion (final elite).
The antlion will catch an ant when ant reaches the bottom of the cone-shaped pit, and then it pulls the ant into the sand and consumes it.To improve the probability of catching a new ant, the antlion will update its position according to the position of the latest caught ant and then build a new pit for catching prey.In the ALO algorithm, when the ant is fitter than the antlion, it will be caught.The position of the antlion can be updated by: For each iteration, the fitness and position of antlions can be updated according to Equation (32), and then the new elite can be redetermined.When the stopping criteria for iteration (such as maximum iteration number) is satisfied, the ALO algorithm will end.At this time, the final elite, namely the optimal antlion can be obtained.

The Proposed Hybrid BND-ALO-RVM Forecaster for Wind Power
In this paper, a new hybrid BND-ALO-RVM forecaster is proposed for wind power.Considering the non-stationary characteristics of wind power time series, the BND method is firstly employed to decompose the initial wind power time series into three components: the deterministic, cyclical and stochastic component.Then, the RVM technique is used to forecast these three components, respectively.To improve the forecasting performance, the kernel width parameter σ of the RVM model is optimally determined using a new swarm intelligent algorithm ALO.Finally, the forecasted wind power can be obtained by multiplying the forecasting results of three decomposed components.
The detailed procedures of the proposed hybrid BND-ALO-RVM method for wind power forecasting are elaborated as below: Step 1: Perform unit root test.
When the Beveridge-Nelson decomposition method is used, the first thing is to examine whether the logarithmic sequence of initial wind power time series is first-order stationary or not.If the first-order difference of logarithmic sequence of initial wind power time series is stationary, then we can proceed to the next step.In this paper, the Augmented Dickey-Fuller (ADF) method is used to perform unit root tests.
Step 2: Decompose wind power time series.
Once the condition that the first-order difference of logarithmic sequence of initial wind power data is stationary is confirmed, the wind power time series can be decomposed into the deterministic, cyclical and stochastic components by using the BND method.
Step 4: Start optimization search.
The fitness function f [ * ] needs to be determined firstly when the ALO is used to optimize the kernel width parameter of RVM.In this study, the root mean square error (RMSE) between actual wind power data and forecasted wind power data is employed to build the fitness function, namely: where x(k) is actual wind power data at time k; and x(k) is the where forecasted wind power value at time k.
The kernel width parameter of RVM is represented by the antlion's position M Antlion , namely each column of M Antlion .The optimal position of antlions will be updated at each iteration, and then the optimal kernel width parameter of RVM can be also updated so far.Suppose that the actual wind power decomposition data {x (0) (1), x (0) (2), ..., x (0) (n)} is used in the first iteration, the forecasted wind power decomposition value { x(0) (1), x(0) (2), ..., x(0) (n)} can be calculated using the optimized RVM model.At this time, the fitness function and optimization object can be built at this iteration by minimizing RMSE as follows: Step 5: Determine the optimal parameter of RVM.
During the iteration and optimization process, different RMSEs will be obtained with different kernel width parameters of RVM.When the iteration reaches a maximum, the minimum RMSE can be found, and then the optimal kernel width parameter σ of RVM can be determined, which will be used for three components forecasting of wind power time series by using the RVM technique.
Step 6: Integrate three components forecasting results and forecast wind power.After the initial wind power time series are decomposed into three components and the optimal kernel width parameter of RVM is determined, the RVM optimized by ALO will be employed to respectively forecast the three decomposed components, namely the deterministic, cyclical and stochastic component.Then, the forecasting results of these three components are integrated.Finally, the forecasting result of wind power can be obtained by multiplying the forecasting results of the corresponding deterministic, cyclical and stochastic components.
The procedure of the proposed BND-ALO-RVM method used for wind power forecasting in this paper is shown in Figure 1.

Empirical Analysis
In this paper, hourly wind power from the Xinjiang Uygur autonomous region in China collected during June 2015 was employed for empirical analysis to validate the proposed BND-ALO-RVM forecaster.The Xinjiang Uygur autonomous region, which has plentiful wind resources, is located in the northwest of China, as shown in Figure 2. The sample set in this paper includes 408

Empirical Analysis
In this paper, hourly wind power from the Xinjiang Uygur autonomous region in China collected during June 2015 was employed for empirical analysis to validate the proposed BND-ALO-RVM forecaster.The Xinjiang Uygur autonomous region, which has plentiful wind resources, is located in the northwest of China, as shown in Figure 2. The sample set in this paper includes 408 hourly wind power points from 14 June to 30 June, which are shown in Figure 3.It can be seen that the wind power, which ranges from about 5 MW to 164 MW, fluctuates greatly.No apparent variation pattern of the wind power time series can be seen.

Wind Power Time Series Decomposition Using BND Method
The BND method is applied to decompose the original wind power time series.Before data decomposition, the unit root test based on the ADF test is necessary to judge whether the logarithmic sequence of wind power time series is first-order stationary.The ADF test results are listed in Table

Wind Power Time Series Decomposition Using BND Method
The BND method is applied to decompose the original wind power time series.Before data decomposition, the unit root test based on the ADF test is necessary to judge whether the logarithmic sequence of wind power time series is first-order stationary.The ADF test results are listed in Table

Wind Power Time Series Decomposition Using BND Method
The BND method is applied to decompose the original wind power time series.Before data decomposition, the unit root test based on the ADF test is necessary to judge whether the logarithmic sequence of wind power time series is first-order stationary.The ADF test results are listed in Table 1, where it can be seen that the wind power time series are stable after first-order difference.Then, the BND method can be used to decompose the wind power time series.The decomposition result of the original wind power time series is shown in Figure 4, which includes the deterministic, cyclical and stochastic components.

Forecasting Results
After the original wind power time series are decomposed, the deterministic, cyclical and stochastic components will be respectively forecasted using ALO-RVM, which means the kernel width parameter of RVM will be respectively optimized and determined by ALO.The sample set of wind

Forecasting Results
After the original wind power time series are decomposed, the deterministic, cyclical and stochastic components will be respectively forecasted using ALO-RVM, which means the kernel width parameter of RVM will be respectively optimized and determined by ALO.The sample set of wind power from 14 June to 30 June is divided into a training sample set and a testing sample set.Two hundred and sixteen (216) sample data points of hourly wind power from 14 June to 23 June are used as training sample, and the remaining 168 sample data points from 24 June to 30 June are treated as testing sample.
The inputs of ALO-RVM model are historical hourly wind power 1 h ahead and hourly wind power at the same time yesterday.For example, for forecasting the deterministic component of wind power time series at the 1st hour of 24 June, the deterministic component of wind power time series at 1 h ahead (i.e., deterministic component of the 24th hour wind power data at 23 June) and deterministic component of wind power time series at the same time yesterday (i.e., deterministic component of the 1st hour wind power data at 23 June) will be employed as the input variables.In this way, the deterministic components of wind power time series from the 2nd hour of 24 June to the 24th hour of 30 June can be forecasted.Meanwhile, the cyclical components and stochastic components of the wind power time series from the 1st hour of 24 June to the 24th hour of 30 June can also be forecasted.Finally, the wind power from the 1st hour of 24 June to the 24th hour of 30 June can be respectively obtained by conducting the multiplication of forecasted deterministic components, cyclical components and stochastic components from the 1st hour of 24 June to the 24th hour of 30 June.For the training period, the same way is taken in this paper.The training sample data and testing sample data will be respectively normalized before training and testing by using ALO-RVM method, and the normalization method is as follows: where x and x are the original and normalized wind power data, respectively; x max and x min respectively represent the maximum and minimum value of each input wind power time series.
During the training period, the kernel width parameter of RVM will be optimally determined using the ALO algorithm for the deterministic, cyclical and stochastic components, respectively.The optimal values of RVM kernel width parameter σ for the deterministic, cyclical and stochastic components are respectively 29.2504, 0.0137 and 11.8443, which are also the RVM kernel widths for the deterministic, cyclical and stochastic component during the testing period.The forecasting results of the deterministic, cyclical and stochastic component of the hourly wind power from 24 June to 30 June are shown in Figure 5.Then, the hourly wind power from 24 June to 30 June can be forecast by multiplying the forecasted deterministic, cyclical and stochastic components from 24 June to 30 June, which are shown in Figure 6.
Energies 2017, 10, 922 14 of 21 where x and  x are the original and normalized wind power data, respectively; max x and min x respectively represent the maximum and minimum value of each input wind power time series.
During the training period, the kernel width parameter of RVM will be optimally determined using the ALO algorithm for the deterministic, cyclical and stochastic components, respectively.The optimal values of RVM kernel width parameter σ for the deterministic, cyclical and stochastic components are respectively 29.2504, 0.0137 and 11.8443, which are also the RVM kernel widths for the deterministic, cyclical and stochastic component during the testing period.The forecasting results of the deterministic, cyclical and stochastic component of the hourly wind power from 24 June to 30 June are shown in Figure 5.Then, the hourly wind power from 24 June to 30 June can be forecast by multiplying the forecasted deterministic, cyclical and stochastic components from 24 June to 30 June, which are shown in Figure 6.

Forecasting Performance Evaluation
To evaluate the forecasting performance of the proposed BND-ALO-RVM forecaster for wind power, four comparison methods are selected, which are single RVM without data decomposition and parameter optimization (referred to as RVM), RVM only with data decomposition using BND (referred to as BND-RVM), RVM only with parameter optimization (referred to as ALO-RVM), and generalized regression neural network with data decomposition using wavelet transform (referred to as WT-GRNN).The input variables and output variable of these four comparing forecasting methods are the same as those of the BDN-ALO-RVM method.

Forecasting Performance Evaluation
To evaluate the forecasting performance of the proposed BND-ALO-RVM forecaster for wind power, four comparison methods are selected, which are single RVM without data decomposition and parameter optimization (referred to as RVM), RVM only with data decomposition using BND (referred to as BND-RVM), RVM only with parameter optimization (referred to as ALO-RVM), and generalized regression neural network with data decomposition using wavelet transform (referred to as WT-GRNN).The input variables and output variable of these four comparing forecasting methods are the same as those of the BDN-ALO-RVM method.For RVM and BND-RVM, the kernel width parameter is set as 3.For ALO-RVM, through training, the optimal kernel width parameter of RVM is determined as 7.4645.For WT-GRNN, a fast discrete wavelet transform based on four filters developed by Mallat [47] is employed, and the spread parameter value of GRNN is selected as 3.The forecasted hourly wind powers from 24 June to 30 June using RVM, BND-RVM, ALO-RVM, and WT-GRNN are shown in Figure 7.The relative errors of forecasted hourly wind power using BND-ALO-RVM, RVM, BND-RVM, ALO-RVM, and WT-GRNN methods are shown in Figure 8. From Figure 8, it can be roughly seen that the proposed BND-ALO-RVM method has the best forecasting performance due to its much smaller relative errors, and RVM without data decomposition and parameter optimization has the poorest forecasting capacity due to its larger relative errors.For RVM and BND-RVM, the kernel width parameter is set as 3.For ALO-RVM, through training, the optimal kernel width parameter of RVM is determined as 7.4645.For WT-GRNN, a fast discrete wavelet transform based on four filters developed by Mallat [47] is employed, and the spread parameter value of GRNN is selected as 3.The forecasted hourly wind powers from 24 June to 30 June using RVM, BND-RVM, ALO-RVM, and WT-GRNN are shown in Figure 7.The relative errors of forecasted hourly wind power using BND-ALO-RVM, RVM, BND-RVM, ALO-RVM, and WT-GRNN methods are shown in Figure 8. From Figure 8, it can be roughly seen that the proposed BND-ALO-RVM method has the best forecasting performance due to its much smaller relative errors, and RVM without data decomposition and parameter optimization has the poorest forecasting capacity due to its larger relative errors.For RVM and BND-RVM, the kernel width parameter is set as 3.For ALO-RVM, through training, the optimal kernel width parameter of RVM is determined as 7.4645.For WT-GRNN, a fast discrete wavelet transform based on four filters developed by Mallat [47] is employed, and the spread parameter value of GRNN is selected as 3.The forecasted hourly wind powers from 24 June to 30 June using RVM, BND-RVM, ALO-RVM, and WT-GRNN are shown in Figure 7.The relative errors of forecasted hourly wind power using BND-ALO-RVM, RVM, BND-RVM, ALO-RVM, and WT-GRNN methods are shown in Figure 8. From Figure 8, it can be roughly seen that the proposed BND-ALO-RVM method has the best forecasting performance due to its much smaller relative errors, and RVM without data decomposition and parameter optimization has the poorest forecasting capacity due to its larger relative errors.To further compare the forecasting performances of the different methods, three forecasting error criteria are selected to evaluate the forecasting performances of the different models, which are Mean Absolute Percentage Error (MAPE, see Equation ( 36)), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE, see Equation ( 37)): The MAPEs, RMSEs and MAEs of different forecasting methods, namely BND-ALO-RVM, RVM, BND-RVM, ALO-RVM, and WT-GRNN at each forecasted day are listed in Table 2 62% lower than that of RVM, BND-RVM, ALO-RVM, and WT-GRNN, respectively.Therefore, it can be seen that the proposed BDN-ALO-RVM method has the best forecasting performance in terms of short-term wind power, and the single RVM has the worst forecasting performance.The MAPE of BND-RVM is larger than that of ALO-RVM, but smaller than that of WT-GRNN.The RMSE and MAE of BND-RVM are both smaller than that of ALO-RVM and WT-GRNN.Therefore, it can be said that the forecasting performance of BND-RVM is better than that of WT-GRNN, but it is hard to say which is better between BND-RVM and ALO-RVM.The MAPE of ALO-RVM is smaller than that of WT-GRNN, but the RMSE and MAE of ALO-RVM are larger than that of ALO-RVM, so for ALO-RVM and WT-GRNN, we cannot decide in this paper which one is better in term of wind power forecasting.To further compare the forecasting performances of the different methods, three forecasting error criteria are selected to evaluate the forecasting performances of the different models, which are Mean Absolute Percentage Error (MAPE, see Equation ( 36)), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE, see Equation ( 37)): The MAPEs, RMSEs and MAEs of different forecasting methods, namely BND-ALO-RVM, RVM, BND-RVM, ALO-RVM, and WT-GRNN at each forecasted day are listed in Therefore, it can be seen that the proposed BDN-ALO-RVM method has the best forecasting performance in terms of short-term wind power, and the single RVM has the worst forecasting performance.The MAPE of BND-RVM is larger than that of ALO-RVM, but smaller than that of WT-GRNN.The RMSE and MAE of BND-RVM are both smaller than that of ALO-RVM and WT-GRNN.Therefore, it can be said that the forecasting performance of BND-RVM is better than that of WT-GRNN, but it is hard to say which is better between BND-RVM and ALO-RVM.The MAPE of ALO-RVM is smaller than that of WT-GRNN, but the RMSE and MAE of ALO-RVM are larger than that of ALO-RVM, so for ALO-RVM and WT-GRNN, we cannot decide in this paper which one is better in term of wind power forecasting.From the perspective of forecasting performance during each day, the proposed BDN-ALO-RVM method obtains the smallest MAPEs, RMSEs and MAEs among all five forecasting models on 24 June, 25 June, 27 June, 28 June, 29 June and 30 June.On 26 June, the BND-RVM obtains the smallest MAPE, RMSE, and MAE.The RVM model gets the largest MAPE, RMSE, and MAE on each day between 24 June and 30 June, which indicates it has the worst forecasting performance again.This is because the RVM without data decomposition and parameter optimization cannot grasp the characteristics of wind power time series.The forecasting performance of BND-RVM method is better than single RVM because BND-RVM method obtains smaller MAPEs, RMSEs and MAEs, which indicates BND is an effective wind power time series decomposition method to improve the wind power forecasting accuracy.Meanwhile, the forecasting performance of the ALO-RVM method is better than single RVM, which indicates the ALO is an effective algorithm for kernel width parameter optimization of RVM to improve the wind power forecasting accuracy.For BND-RVM and ALO-RVM methods, the BND-RVM method has better forecasting performance than ALO-RVM on 24 June, 25 June, 26 June and 30 June, and the ALO-RVM has better forecasting performance than BND-RVM on 27 June.It is hard to judge which one has better forecasting performance between BND-RVM and ALO-RVM because the different rankings related to MAPEs, RMSEs, and MAEs on 28 June and 29 June.For BND-RVM and WT-GRNN methods, except the MAE criterion on 24 June and 28 as well as MAPE criterion on 28 June, the BND-RVM shows better forecasting performance than WT-GRNN.On the whole, the BND-RVM has better forecasting performance than WT-GRNN, which indicates the BND employed in this paper is an effective technique for wind power time series decomposition.
To sum up the above analysis, we can conclude that the proposed BDN-ALO-RVM forecaster is an effective and applicable technique, which can improve the forecasting accuracy of hourly wind power.Meanwhile, the BND method is a valid wind power time series decomposition technique, and ALO is an efficient meta-heuristic algorithm for kernel width parameter determination of RVM in the field of wind power forecasting.

Conclusions
Wind power is a kind of environmentally friendly renewable energy power, which has developed rapidly in recent years.However, the large-scale penetration of wind power with its stochastic and intermittent characteristics into electric power systems will pose some threats to the stable and safe operation of these systems.Improving wind power forecasting accuracy can alleviate these threats.Therefore, a new hybrid BND-ALO-RVM forecaster for wind power is proposed in this paper, which combines the Beveridge-Nelson decomposition method, relevance vector machine and ant lion optimizer.The wind power time series are firstly decomposed into deterministic, cyclical and stochastic components.Then, these three decomposed components are respectively forecasted by using ALO-RVM method, which mean that the kernel width parameter of RVM is optimally determined by the ALO algorithm.Finally, the wind power forecasting results can be obtained by multiplying the forecasted deterministic, cyclical and stochastic components.Taking hourly wind power from the Xinjiang Uygur autonomous region in China as an example, the empirical results indicate the proposed BND-ALO-RVM forecaster obtains the best forecasting performance compared with the single RVM, BND-RVM, ALO-RVM, and WT-GRNN methods.The proposed BND-ALO-RVM method is effective and practical for short-term wind power forecasting.The BND method is a valid wind power time series decomposition technique, and ALO is an attractive meta-heuristic algorithm for RVM parameter determination.This paper enriches the methodology library related to wind power forecasting, and also extends the application domains of BDN and ALO.
In future research, the proposed BND-ALO-RVM method may also be employed for other issues, such as photovoltaic power forecasting and power load forecasting.

Figure 3 .
Figure 3. Wind power time series from 14 June to 30 June (408 sample points).

Figure 2 . 21 Figure 2 .
Figure 2. Geographical location of Xinjiang and its wind resources.

Figure 3 .
Figure 3. Wind power time series from 14 June to 30 June (408 sample points).

Figure 3 .
Figure 3. Wind power time series from 14 June to 30 June (408 sample points).

Figure 4 .
Figure 4. Decomposed deterministic component, cyclical component and stochastic component of the original wind power time series.
power from 14 June to 30 June is divided into a training sample set and a testing sample set.Two hundred and sixteen (216) sample data points of hourly wind power from 14 June to 23 June are used as training sample, and the remaining 168 sample data points from 24 June to 30 June are treated as testing sample.The inputs of ALO-RVM model are historical hourly wind power 1 h ahead and hourly wind power at the same time yesterday.For example, for forecasting the deterministic component of wind power time series at the 1st hour of 24 June, the deterministic component of wind power time series at 1 h ahead (i.e., deterministic component of the 24th hour wind power data at 23 June) and deterministic component of wind power time series at the same time yesterday (i.e., deterministic component of the 1st hour wind power data at 23 June) will be employed as the input variables.In this way, the deterministic components of wind power time series from the 2nd hour of 24 June to the 24th hour of 30 June can be forecasted.Meanwhile, the cyclical components and stochastic components of the wind power time series from the 1st hour of 24 June to the 24th hour of 30 June can also be forecasted.Finally, the wind power from the 1st hour of 24 June to the 24th hour of 30 June can be respectively obtained by conducting the multiplication of forecasted deterministic ln P ln P Δ

Figure 4 .
Figure 4. Decomposed deterministic component, cyclical component and stochastic component of the original wind power time series.

Figure 5 .
Figure 5. Forecasted deterministic component, cyclical component and stochastic component of the wind power time series from 24 June to 30 June.

Figure 5 .
Figure 5. Forecasted deterministic component, cyclical component and stochastic component of the wind power time series from 24 June to 30 June.

Figure 7 .
Figure 7. Forecasting results of hourly wind power by using different comparison methods.

Figure 6 .
Figure 6.Forecasting result of wind power from 24 June to 30 June.

Figure 7 .
Figure 7. Forecasting results of hourly wind power by using different comparison methods.Figure 7. Forecasting results of hourly wind power by using different comparison methods.

Figure 7 .
Figure 7. Forecasting results of hourly wind power by using different comparison methods.Figure 7. Forecasting results of hourly wind power by using different comparison methods.

Figure 8 .
Figure 8. Relative errors of the forecasted hourly wind power of different methods.

Figure 8 .
Figure 8. Relative errors of the forecasted hourly wind power of different methods.

Table 1 .
ADF test result of wind power time series.

Table 2 .
Forecasting performances of different methods.
The proposed BND-ALO-RVM method has the minimum MAPE, RMSE and MAE, which are 8.95%, 8.79 MW and 6.86 MW, respectively.However, MAPE, RMSE and MAE of single RVM method are respectively 29.20%, 33.64 MW and 26.79 MW; MAPE, RMSE and MAE of BND-RVM method are respectively 11.01%, 10.29 MW and 8.08 MW; MAPE, RMSE and MAE of ALO-RVM method are respectively 10.74%, 11.10 MW and 8.37 MW; and MAPE, RMSE and MAE of BND-RVM method are respectively 11.07%, 10.46 MW and 8.13 MW.