A New Hybrid Short-Term Interval Forecasting of PV Output Power Based on EEMD-SE-RVM

: The main characteristics of the photovoltaic (PV) output power are the randomness and uncertainty, such features make it not easy to establish an accurate forecasting method. The accurate short-term forecasting of PV output power has great signiﬁcance for the stability, safe operation and economic dispatch of the power grid. The deterministic point forecast method ignores the randomness and volatility of PV output power. Aiming at overcoming those defects, this paper proposes a novel hybrid model for short-term PV output power interval forecasting based on ensemble empirical mode decomposition (EEMD) as well as relevance vector machine (RVM). Firstly, the EEMD is used to decompose the PV output power sequences into several intrinsic mode functions (IMFs) and residual (RES) components. After that, based on the decomposed components, the sample entropy (SE) algorithm is utilized to reconstruct those components where three new components with typical characteristics are obtained. Then, by implementing RVM, the forecasting model for every component is developed. Finally, the forecasting results of every new component are superimposed in order to achieve the overall forecasting results with certain conﬁdence level. Simulation results demonstrate, by comparing them with some previous methods, that the hybrid method based on EEMD-SE-RVM has relatively higher forecasting accuracy, more reliable forecasting interval and high engineering application value.


Introduction
With the development of industrialization, traditional fossil fuels are faced with the increased depletion and the environmental pollution problems brought by fossil fuels' combustion become the main obstacle to global economic development. To solve this problem, in the past few decades, more and more attentions have been paid on the renewable energy sources, such as biomass energy, tidal energy, wind energy, solar energy, etc. [1]. However, due to the intermittency and variability of those renewable energies, they would cause unavoidable fluctuations and instability if they are highly integrated in the power grid. Therefore, how to obtain the accurate forecast of renewable energy sources is massively important for the safe, steady and reliable operation of power grid [2].
Regarding the short-term renewable energy generation forecasting, the existing models are roughly divided into four categories: artificial intelligence based models (AIBM), statistical models, physical models and hybrid models [3]. In [4], the statistical smoothing techniques were utilized to create a statistical normalization of the solar energy, which was beneficial to implement the online short-term power forecasting of photovoltaic (PV). In [5], the ARIMA model was taken as a statistical model to realize the output power forecasting of a PV-grid-connected system. As a method of statistical and original sequence into different new components. That method was also used to construct the different components in order to analyze the complexity. Then, the characteristic of EEMD method was optimized. The results considering the interval forecasting methods by the hybrid method including EEMD, SE and relevance vector machine (RVM), which have great challenge and importance can enhance the accuracy of the conventional RVM method.
Based on the above discussions, this paper proposes a new hybrid model based on EEMD-SE-RVM for short-term interval forecasting of PV output power. Several intrinsic mode functions (IMFs) and residual (RES) components can be obtained by using the EEMD to decompose the original PV power output sequences. Consequently, three new components with typical characteristics are obtained based on the SE algorithm. Then, for each new component, a prediction model is established using RVM, respectively, and, the forecasting results of every new component are superimposed so that the overall forecasting result with a certain confidence level is obtained. Considering the simulated case study, the results show that this hybrid approach is very effective and has a robust generalization ability as well as a strong practical application value.
The rest of this article is organized as follows. Section 2 introduces the basic models of EEMD, SE and RVM algorithms, respectively. Section 3 develops the hybrid model interval forecasting of PV output power. Case studies and numerical results are given in Section 4. Finally, conclusions are drawn in Section 5.

EEMD Principle
The most obvious drawback of conventional EMD is that it will produce mode mixing, which indicates that either a single IMF consisting of obvious different proportion or composed of signals of the same proportion in different IMF components, and it usually leads to signal instability. Aiming at solving this drawback, a new method named EEMD was proposed, which is basically a noise-assisted data analysis method. This demonstrates that noise can be performed using in the EMD method.
In EEMD, there are two important parameters. One is the amplitude k of the white noise and the other is maximum number of iterations M of EMD. Usually, the values of M and k are chosen according to the characteristics of personal experience and data. Without loss of generality, in this paper, M was taken as 100 and the range of k was 0.05-0.5.
The detailed steps of EEMD can be highlighted in the following five points [18]: (1) Set both values of k and M.
(2) The white noise sequence is added to the signal.
(3) EMD is used to decompose the signal that has been added with white noise to IMFs. (4) Repeat steps (2) and (3) for a certain amount of white noise each time and the decomposition of corresponding IMF components is obtained. The average of all the corresponding IMFs was calculated where it is the final result of each IMF. Then, the average value of all residual components was calculated, and the average value was taken as the final result of the residual.

SE Principle
For the IMF components and the RES component that are decomposed by the EEMD, if the forecasting model is developed individually, the calculation will be greatly increased, and the correlation between different components will be ignored. In this paper, the sample entropy theory was used for recombination of these components with relevant characteristics. For a given k, r and N, where k represents embedding dimension, r denotes tolerance, N represents number of data points. SampEn(N, k, r) is the negative logarithm of the conditional probability. For a data sequence {x i } = x(1), . . . , x(N) , the specific algorithm of sample entropy is expressed as follows: (1) Construct the sequence {x i } constitute m-dimensional vector (2) Define the distance d k (X(i), X( j)) between vectors X(i) and X( j) as the absolute maximum difference between their scalar components (3) For a given value of r, count the number of d k (X(i), X( j)) ≤ r, and then calculate the ratio of N − k. Be defined as where r denotes the threshold, which serves as a noise filter, r > 0; i = 1, · · · , N − k + 1.
(4) The mean value of B k i (r) can be represented as (5) By increasing the iteration to k + 1, repetition step (1) to step (4), the mean value of B k+1 i (r) can be represented as (6) Finally, SampEn for a finite data length of N can be estimated as In general, r is between 0.1 and 0.25 SD, k equals to 1 or 2, among them SD represents the standard deviation of time series. Here k is set as 2 and r is 0.15 SD.

RVM Principle
Comparing with other forecasting algorithms, RVM not only has the characteristics of modeling highly sparse, less optimized parameters, flexible kernel selection and strong generalization ability, but also can directly implement the interval forecasting. Therefore, RVM is used to develop the interval forecasting model for those new components reconstructed by SE.
For a specified input training sample {x n } N n=1 and the corresponding output set {t n } N n=1 , the relevance vector machine regression model can be defined as follows where ε ∼ N(0, σ 2 ) is the error of the independent sample, ω i are the model weights, N is the sample size and K(x, x i ) is a nonlinear kernel function. Given a training sample set {x i , t i } N i=1 , suppose the target value t i is independent and the noise in data follows the Gaussian distribution with the variance σ 2 , then the likelihood function of the training sample set can be described as where t = (t 1 , · · · , t n ) T , ω = (ω 0 , ω 1 , · · · ω n ) T and Φ is the design matrix defined by Based on the priori probabilities distribution and likelihood distribution, the posterior distribution over the weight form Bays rule can be written as At last, the hyper parameter α and the variance σ 2 can be estimated by using the maximum likelihood algorithm.
The input value is x * i , then the corresponding forecasting value can be described as [13] Under the confidence level of α, the interval forecasting value results can be described as [25] [ where L α b and U α b represents lower and upper bound of forecasting value. Z α/2 represents standard Gaussian distribution, which depends on the confidence level.

Hybrid Forecasting Model
The proposed hybrid method mainly has three stages in PI construction. Those stages are historical PV output power series decomposition stage by using EEMD, the components construction stage utilizing SE and the construction stage by RVM. This part is divided into five sections. The first section is to introduce the principle of sample selection. The second section is to describe the decomposition of the data using EEMD and the third section is to demonstrate the reconstruction of components using SE. In the last section, the analysis of the RVM method and the corresponding flow chart as well as the pseudo-code program are given.

Sample Selection
For the sake of validating the forecasting ability of the method proposed in this paper, the PV output power simulation data of a PV power plant in Jiangsu province from July 2011 to June 2012 was obtained. Considering the different sunrise and sunset time in each season, and in order to ensure that the data obtained has value, only 10 h data from 8:00 to 17:00 was taken. If different seasons are Energies 2020, 13, 87 6 of 17 selected, then the sunrise and sunset time of different seasons are different. In order to unify the data, 8:00-17:00 time period was selected. Otherwise, the changes of weather have massive impacts on the PV output power. By comparing the historical output power curve with the meteorological curve, it can be found that the meteorological conditions have a great influence on the PV power output. In order to ensure the consistency of the same kind of data and to predict the PV output power more accurately, the PV historical output power data was divided into three types (sunny days, cloudy days and rainy days) according to the numerical weather prediction (NWP). The photovoltaic historical output power was classified according to the weather type, and the model prediction was respectively carried out on the photovoltaic historical output power. Using the EEMD to decompose the historical PV output power. The forecasting model was developed respectively. The historical photovoltaic output power data of 6 h to be predicted and the NWP at the time to be predicted were used as the input of the model. The model in this paper was a rolling prediction model. For different time to be predicted, the input data was updated online and in real time.

Decomposing the Classified PV Output Power Using EEMD
While PV output power contains randomness and volatility with the influence of weather changes and other factors, the result of direct forecasting would have a large error. For the sake of enhancing the forecasting results, it is essential to preprocess the original data. In the performed comparison, the EEMD shows better noise robustness and decomposing result than other decomposition algorithms. In this paper, the PV output power was decomposed by using EEMD, and some new components were achieved. For example, Figure 1 shows the decomposition results of a sunny-day PV output power data by applying EEMD. prediction was respectively carried out on the photovoltaic historical output power. Using the EEMD to decompose the historical PV output power. The forecasting model was developed respectively. The historical photovoltaic output power data of 6 hours to be predicted and the NWP at the time to be predicted were used as the input of the model. The model in this paper was a rolling prediction model. For different time to be predicted, the input data was updated online and in real time.

Decomposing the Classified PV Output Power Using EEMD
While PV output power contains randomness and volatility with the influence of weather changes and other factors, the result of direct forecasting would have a large error. For the sake of enhancing the forecasting results, it is essential to preprocess the original data. In the performed comparison, the EEMD shows better noise robustness and decomposing result than other decomposition algorithms. In this paper, the PV output power was decomposed by using EEMD, and some new components were achieved. For example, Figure 1 shows the decomposition results of a sunny-day PV output power data by applying EEMD.

Reconstructing the New Components Using SE
As it can be seen from Figure 1, there was a similar trend for some components. If these components are highly similar, the value of the sample entropy between them will be small. Therefore, the rules to reconstruct the new components based on SE are as presented as follows: (1) The sample entropy of the given PV data sequence, IMF components and RES component were calculated.
(2) The components with obviously lower sample entropy value than that of the given sequence could form the trend component.
(3) The components with obviously higher sample entropy value than that of the given sequence could form the random component.
(4) The detail component's sample entropy value was within a given threshold of θ around the

Reconstructing the New Components Using SE
As it can be seen from Figure 1, there was a similar trend for some components. If these components are highly similar, the value of the sample entropy between them will be small. Therefore, the rules to reconstruct the new components based on SE are as presented as follows: (1) The sample entropy of the given PV data sequence, IMF components and RES component were calculated. (2) The components with obviously lower sample entropy value than that of the given sequence could form the trend component.
(3) The components with obviously higher sample entropy value than that of the given sequence could form the random component.
(4) The detail component's sample entropy value was within a given threshold of θ around the given sequence. In this paper, we chose θ = 0.7. Figure 2 gives the trend graph of the new components after reconstruction.  Table 1 shows the composition of each new component. For further simplification of the calculation, the forecasting interval was reduced. The trend component was selected for point forecasting, the detail and random components were selected for interval forecasting. Then, the result of the different component forecasting was superposed, the interval forecasting at a certain degree of confidence was obtained and the optimal prediction was realized.

Kernel Function of RVM
RVM is a pattern recognition as well as regression forecasting method, which is based on kernel function, the kernel implements non-linear transformation among plurality of feature spaces. The basic method of mixed kernel is to combine plurality of kernels having different characteristics together with a certain proportion, and optimizes the combined kernel function so as to have better performance. Considering that RVM has the advantages of less limitation of kernel function selection and the excellent properties of RBF kernel in solving local fluctuations and polynomial kernel in dealing with global fluctuations, the combination of the global kernel of polynomial kernel and the typical local kernel of RBF kernel is used for short-term PV output power interval forecasting so as to obtain better forecasting results. The hybrid kernel is shown as [13,28] ( , )  Table 1 shows the composition of each new component. For further simplification of the calculation, the forecasting interval was reduced. The trend component was selected for point forecasting, the detail and random components were selected for interval forecasting. Then, the result of the different component forecasting was superposed, the interval forecasting at a certain degree of confidence was obtained and the optimal prediction was realized.

Kernel Function of RVM
RVM is a pattern recognition as well as regression forecasting method, which is based on kernel function, the kernel implements non-linear transformation among plurality of feature spaces. The basic method of mixed kernel is to combine plurality of kernels having different characteristics together with a certain proportion, and optimizes the combined kernel function so as to have better performance. Considering that RVM has the advantages of less limitation of kernel function selection and the excellent properties of RBF kernel in solving local fluctuations and polynomial kernel in dealing with Energies 2020, 13, 87 8 of 17 global fluctuations, the combination of the global kernel of polynomial kernel and the typical local kernel of RBF kernel is used for short-term PV output power interval forecasting so as to obtain better forecasting results. The hybrid kernel is shown as [13,28] K(x, y) = θG(x, y) + (1 − θ)P(x, y) P(x, y) = (x · y) = (x · y + 1) 2 (16) where G(x, y) is the Gaussian kernel function, P(x, y) is the binomial kernel function, θ is the weight of the kernel function and σ is the kernel width. θ and σ are the parameters that need to be optimized. In this paper, the optimal values of θ and σ are obtained by using the method of grid search [36].

Evaluating Indicator
There are many evaluation indicators for the forecasting, an evaluation index different from the well-known point forecasting, such as MAPE and RMSE. The following evaluation indicators were used in this paper.
(1) Mean absolute percentage error where y f or is the value of forecasting, y tru is the actual value of sample and N represents the number of the sample.
(2) Forecasting interval coverage percentage 18) where N denotes the number of the sample, ξ is the number of the actual PV output power within the interval under the level 1 − β.
(3) Forecasting interval average width (19) where N represents the number of the sample, y tru is the actual value of the sample, U β is the upper boundary and L β is the lower boundary under the level 1 − β. This paper proposed a new EEMD-SE-RVM method used for the PV output power short-term interval forecasting. A simplified pseudo-algorithm that summarizes this process is provided in Algorithm 1. The EEMD method has better performance used in the interval forecast by eliminating the mode mixing problem, which exists in the EMD method. However, prediction interval forecast based on conventional EEMD is still influenced by the high complexity. The proposed method uses SE to analyze the decompositions so that the complexity is reduced. According to the analysis above, SE recombined the decomposition into trend, detail and random components to optimize the forecasting method. The trend component, which is smoother and steadier, was used to achieve point forecasting, and the detail component and random component were difficult to be used in the conventional point forecast method because of the uncertainty and non-stationary. The method that achieved point and interval forecasts respectively could guarantee better performance by reducing the numerical value fluctuation.

Case Study
In this part, the PV data of Jiangsu photovoltaic power station from July 2011 to June 2012 was used to test the accuracy and effectiveness of the EEMD-SE-RVM model proposed in this paper. The installed capacity of this PV plant was 30 MW, consisting of 28 PV arrays of 1.09 MW. The data were collected once an hour and 24 times a day. What is collected is the instantaneous value of PV output power at the current time. The prediction date was randomly selected and the data before the prediction date was used as the training data of the model.
For the sake of validating the interval forecasting effect of the model proposed in this paper under different confidence levels, two confidence levels of 90% and 60% were chosen as examples. Figures 3-8 depict the results in different case interval forecasting. In this paper, three common indices forecasting interval coverage percentage (FICP), forecasting interval average width (FIAW) and mean absolute percentage error (MAPE) were used to assess the effect of the interval forecasting [24,27]. Tables 2-4 give the different case interval forecasting results and analysis. under different confidence levels, two confidence levels of 90% and 60% were chosen as examples. Figure 3 to Figure 8 depict the results in different case interval forecasting. In this paper, three common indices forecasting interval coverage percentage (FICP), forecasting interval average width (FIAW) and mean absolute percentage error (MAPE) were used to assess the effect of the interval forecasting [24,27]. Table 2 to Table 4 give the different case interval forecasting results and analysis.          To prove the superiority of the method proposed in this paper, the RVM model, EMD-RVM model and EEMD-RVM model were also used for the same PV output power short-term interval forecasting, respectively. In this case, the forecasting results in the sunny day were chosen for example. In this paper, three evaluation indexes FICP, FIAW and MAPE and model running time were used to evaluate the effect of interval prediction. In Table 5, the results at 90% confidence level of four different models were provided.   To prove the superiority of the method proposed in this paper, the RVM model, EMD-RVM model and EEMD-RVM model were also used for the same PV output power short-term interval forecasting, respectively. In this case, the forecasting results in the sunny day were chosen for example. In this paper, three evaluation indexes FICP, FIAW and MAPE and model running time were used to evaluate the effect of interval prediction. In Table 5, the results at 90% confidence level of four different models were provided. According to the four days original PV output power data, the probability of one hour-ahead of the PV output power in these four days at 90% confidence was predicted, and the results are illustrated in Figure 9. At the same time, in Table 6, the results of evaluating the indicators FICP, FIAW and MAPE are given.   Taking sunny days as an example, the short-term interval prediction of two different time scales was carried out under a 90% confidence level are depicted in Figure 10. At the same time, in Table 7, the results of evaluating the indicators FICP, FIAW and MAPE are given. Taking sunny days as an example, the short-term interval prediction of two different time scales was carried out under a 90% confidence level are depicted in Figure 10. At the same time, in Table 7, the results of evaluating the indicators FICP, FIAW and MAPE are given.    It can be clearly noted from the comparison results that the forecasting effects obtained by the proposed method were better than the other methods. Furthermore, the superiority and wide adaptability of this proposed model were fully confirmed based on the above comparison.

Conclusions
Firstly, considering the influence of different meteorological conditions on the output power of PV, the original PV output power data has been classified into three categories. Strong theoretical basis in addition to noise robustness are some of the advantages of EEMD. Those features overcome the drawbacks that the wavelet analysis requires, which are the artificial selection of the basic functions and the mode mixing phenomenon of EMD. Consequently, the original PV output power achieves better decomposition by the use of EEMD. Secondly, the use of SE excavates the correlation among the components as well as reduces the model complexity, which creates contributions to enhance the running efficiency. Thirdly, the hybrid kernel RVM method was implemented to achieve the PV output power short-term interval forecasting. In the part of illustrative results, comparing the EEMD-SE-RVM with other models, the obtained MAPE and FIAW indices had better values than other models, and the FICP of the proposed model was higher than that obtained from the compared models. In this paper, the proposed hybrid model not only improved the forecasting precision, but also enhanced the interval coverage rate, and at the same time, reduced the width of the prediction interval, which made it suitable for practical application on other renewable energies output power forecasting.