An Advanced Bayesian Method for Short-Term Probabilistic Forecasting of the Generation of Wind Power

Currently, among renewable distributed generation systems, wind generators are receiving a great deal of interest due to the great economic, technological, and environmental incentives they involve. However, the uncertainties due to the intermittent nature of wind energy make it difficult to operate electrical power systems optimally and make decisions that satisfy the needs of all the stakeholders of the electricity energy market. Thus, there is increasing interest determining how to forecast wind power production accurately. Most the methods that have been published in the relevant literature provided deterministic forecasts even though great interest has been focused recently on probabilistic forecast methods. In this paper, an advanced probabilistic method is proposed for short-term forecasting of wind power production. A mixture of two Weibull distributions was used as a probability function to model the uncertainties associated with wind speed. Then, a Bayesian inference approach with a particularly-effective, autoregressive, integrated, moving-average model was used to determine the parameters of the mixture Weibull distribution. Numerical applications also are presented to provide evidence of the forecasting performance of the Bayesian-based approach.


Introduction
Currently, there is a vast amount of research, as well as many new visions and concepts, concerning future electrical systems [1][2][3][4][5][6].For example, super grids, smart grids, micro grids, intelligent grids, active networks, and virtual power plants are becoming the keywords related to the future development of power systems.In this context, it is expected that the penetration of distributed generation (DG) systems, especially those based on renewable energy sources, will become increasingly important in future Smart Grids due to environmental and technical reasons.In fact, the presence of DG systems in Smart Grids will result in advantages for energy-users and for the social wellness [7,8].
The foreseeable extensive use of DG systems in the future requires that distribution system engineers properly account for their impact in the system.In fact, their interconnection with the system significantly alters the characteristics of the distribution systems, traditionally designed with the assumption of a passive network.The consequence of the presence of DG systems is that the assumption of a passive network is no longer valid; instead, the network becomes active, which generates a number of new technical considerations that must be addressed, such as distribution network planning and operation, especially protection coordination, steady-state analysis, and power quality issues.
Among renewable DG systems, wind generators currently are receiving a great deal of interest; in fact, the unsubsidized cost of energy at the bus has decreased by more than 80% [9].
Wind energy is a new, emerging research field characterized by a high degree of interdisciplinary studies, and there are several related topics of interest in the relevant literature.
Increasing the quality and value of wind power generation will be one of the priorities in wind energy research in the coming years, and this requires that we improve our ability to predict the performance of wind systems [10].In fact, accurate forecasting is needed to solve several distribution system engineers' problems, and in particular to allow for unit commitment and the provision of ancillary services in the framework of competitive electricity markets as well as for the scheduling and dispatch of the required hourly ramping and load following [11].Then, accurate and reliable forecasts are mandatory for the optimal design and management of the Smart Grid resources.
In the relevant literature, several wind forecasting methods have been proposed with different levels of success [12].These include physical, statistical, artificial neural network, and hybrid methods, which differ in the use of input and output data, as well as in the time horizon of their application.
In particular, deterministic and probabilistic forecasting techniques have been provided depending on the type of information on the predicted output.In deterministic forecasting, a single value is provided without any other information about the nature of the wind's uncertainties; in probabilistic forecasting, the output value is accompanied by information on the wind's random nature.A review of existing methods is reported in [12].
Recently, interest in probabilistic forecasting has been increasing because of the need to take into account the unavoidable uncertainties that characterize the availability of wind energy resources [13,14].
On the other hand, deterministic forecast does not meet various applications needs, such as power system operations where uncertainties and risks have to be quantified [14,15].In particular, given the significant variability of the level of forecasting errors, forecast-users usually need additional information about forecast uncertainty; in fact, this additional information, given i.e., in terms of risk indices or quantile or interval forecasts, can be introduced in users' decision-making processes.
In this paper, a new method for short-term probabilistic forecasting is proposed that directly supports the probabilistic representation of the predicted wind power output.The proposed method uses the classical relationships that link wind active power to wind speed, the probability density function (PDF) of which is predicted by applying the Bayesian inference (BI) approach.
The BI approaches were extensively used to improve the forecasts for economic, social and weather time series [16][17][18] and in relevant literature they were demonstrated to significantly improve the forecasts obtained through AutoRegressive Integrated Moving Average (ARIMA) models, by estimating the parameters from a probabilistic point of view [19][20][21].Bayesian methods are being used increasingly in wind energy conversion systems due to the significant advantages they offer when uncertainty and variability are predominant concerns [17].They have been used in several fields of interest to wind energy engineers, such as forecasting long-and short-term production, modeling extreme wind conditions, evaluating the reliability of the systems, and in the process of designing the systems' components.However, the applications of wind energy are still in the early stage, and they are limited in number, so it can be expected that their use will increase as much more attention is paid to both methodologies and applications [17].
As is well known [22], two of the key steps of a Bayesian-based method for the short-term, probabilistic forecasting of wind speed are (i) the choice of the analytical expressions of the PDF modeling the uncertainties associated with wind speed and (i) defining the best time series model to determine the PDF parameters that are not assumed to be prior random parameters of the Bayesian approach.Some studies addressing these steps were presented in [16,19,23,24].In [19], a Gaussian PDF was used to model wind speed, and a sixth-order autoregressive model that involved only wind speed was used.In [23], a mixture model was used with a normal distribution that fit the values around the stall speed, and a Weibull distribution was used to fit the remaining values.In [16,24], a Weibull distribution was used to model wind speed, after which a first-order autoregressive model that involved the mean value of the wind speed was used.A review of existing Bayesian applications to short-term wind forecasting is reported in Section 2.3 of Reference [17].In this paper, a more complex PDF for modeling the uncertainties of wind speed and a particularly effective approach are used to identify the most adequate time series model.In particular, a mixture distribution of two Weibull distributions is used as the PDF analytical expression.
We considered a mixture distribution of two Weibull distributions because it was concluded in [25] that the use of the classical Weibull distribution of two parameters cannot represent all of the wind regimes encountered in nature, such as, for example, those with bimodal distributions.Therefore, a more suitable PDF must be selected for each wind regime in order to minimize errors in the estimation of the energy produced.A mixture of two Weibull distributions seemed most suitable for both unimodal and bimodal wind regimes, and it was evaluated experimentally for some actual cases.
Concerning the autoregressive, integrated, moving-average time series model, we applied the Box-Jenkins approach based on the use of the sample autocorrelation function [26].This approach is particularly effective in determining the orders and the parameters of the model itself.
The aims of the research reported in this paper are (i) to propose a new Bayesian-based method for the short-term forecasting of wind power; (ii) to include a new probability function and improved time series models in the frame of the Bayesian method; and (iii) to conduct a critical comparison of the performances of the new Bayesian method with both a traditional Bayesian method and a reference predictor (the probabilistic persistence method) in order to outline the advantages and disadvantages of the proposed method.
Note that there exists a large extend of literature on wind power forecasting, including several state-of-the-art papers, that evidences that numerical weather prediction (NWP) models have enabled relatively accurate wind forecasts [27][28][29][30][31][32].However, as the operating time moves closer to the near-term (e.g., hour-ahead or 15 minute-ahead), at a high spatial resolution, the computation complexity (in terms of simulation time and memory requirements) often renders NWP models intractable [31,33].In sharp contrast, data-driven statistical model is thought to be the most competitive method for near-term wind forecasting problems being able to capture the rapidly changing dynamics of the atmosphere and with nice model interpretation [32].Then, our proposed probabilistic method is targeted directly at computationally-efficient, near-term wind forecasts (e.g., hour-ahead or 15 minutes-ahead forecasts).The emphasis in our work was on computational efficiency because computational complexity (in terms of simulation time and memory requirements) often makes numerical weather prediction models excessively burdensome and expensive to operate [34].
This paper is organized as follows.Section 2 describes the probabilistic method we used that was based on Bayesian theory.In Section 3, the results of the numerical applications of the proposed method are reported, and they are discussed and compared with the results provided by both a traditional Bayesian method that uses distributions of two parameters and a probabilistic extension of the persistence method in order to show the advantages and benefits of the proposed method.

A Probabilistic Approach for Forecasting Wind Power Production: The Bayesian-Based Method
In the research reported in this paper, the Bayesian-based method was used to predict the PDF of the active power generated by wind systems.In particular, a relationship linking the wind active power with the wind speed was selected.Then, using the selected relationship in the frame of a Monte Carlo simulation approach, we forecasted the PDF of the active power production at the time horizon standing at the origin time , where is the lead time, starting from the evaluation of the PDF of the wind speed at time .The forecast of the PDF of the wind speed at time step was obtained by selecting an appropriate analytical expression for the wind speed PDF and evaluating the PDF parameters by applying the BI with an autoregressive, integrated, moving-average, time-series model.
Details about the various steps of this method are reported in the following subsections; for the sake of conciseness, only the results of the numerical applications with reference to the case of 1 are shown in Section 3.

Description of the Relationship that Links Wind Active Power with Wind Speed
In the most general case, the wind active power depends not only on wind speed, but also on meteorological variables such as wind direction, temperature, local air density, and precipitation.Moreover, the behavior of power curves when the wind speed increases can be different from the behavior when the speed decreases.We should also consider that, in many cases, there is the problem to predict wind power for an entire wind farm so that the choice of a deterministic power curve can be complicated by the fact that the wind turbines in a wind farm can have different cut-in and rated speeds; finally, there may be changes in the power capacity of the wind farm due to the addition of new turbines and turbine maintenance [35][36][37][38][39].
Considering the wind active power as a random variable dependent not only on wind speed, but also on other explicative variables (such as wind direction or air density) would require complexity increase in the Bayesian inference, given the augmented number of parameters to be estimated.Therefore, since our aim was to propose a computationally-efficient forecasting tool, we decided to use the deterministic power curve furnished by manufacturers.
The following analytical relationship between the active power, , and wind speed, , at the time horizon can be written: where is a non-linear function usually approximated by a linear function, linear pieces, a parabolic function, or a cubic function; and , , and are the cut-in, rated, and cut-off characteristic values, respectively, of the wind turbine power generation unit.

Selection of the Analytical Expression of the PDF of the Wind Speed
As is well known, wind speed is frequently modeled using the Weibull distribution (WB), as reported in [27]: where  is the scale parameter and β is the shape parameter.The scale parameter  can be expressed in terms of the mean value μ of the distribution of the wind speed, according to the following relationship: where  • is the Gamma function.Consequently, one can treat the PDF in Equation ( 2) as a function of the mean value and the shape factor.
In [25], it was concluded that the Weibull distribution of two parameters presents a series of advantages that simply its use, i.e., (i) flexibility; (ii) dependence on only two parameters; (iii) the simplicity of the estimation of its parameters; and (iv) its closed form.However, the Weibull PDF cannot represent all the wind regimes encountered in nature, e.g., those with bimodal distributions.The mixture of two Weibull distributions can be particularly suitable for these wind regimes.
As is well known, mixture density is a probability density function that is a convex linear combination of other probability density functions [45,46].
A two-component mixture Weibull distribution (MWB) depends on five parameters (ω ,  , β ,  , β ) and is given by: with 0 ω 1.The scale parameter  in Equation ( 4) can be expressed in terms of the mean value μ of the distribution of the wind speed and of the other parameters ω , β ,  , β , according to the following relationship [25]: As a result of the analysis of Equations ( 4) and ( 5), for the time horizon , the estimation of the mean value μ and of the parameters ω , β ,  , β is sufficient to unequivocally predict the probability density function . In this paper, the parameters ω , β ,  , β were assumed to be prior random parameters of the Bayesian approach, while the mean value μ was estimated using the AutoRegressive Moving Average (ARMA) and ARIMA time-series models reported in the next subsection.Note that we considered both relationships Equations ( 1) and ( 4) separately, because they were to be used in a Monte Carlo simulation approach that can handle them very easily.
Unfortunately, some time-series can present non-stationary characteristics.To obtain a better mathematical representation of such time-series, an extended version of the ARMA model must be used in order to take into account the past values of the stochastic variable and the differences among actual and past values of the stochastic variable, i.e., ).Such models belong to the ARIMA family, and, for the generic stochastic variable , they can be represented as: where is the backward difference operator defined by .Note that the polynomial Φ must satisfy the condition of stationary mentioned above.Expanding Equation ( 8) in terms of past values of and , we obtain the following form of the difference equation: where the coefficients φ , … , φ are the coefficients of the operator φ Φ 1 1 φ φ ⋯ φ .In practice, the polynomial φ can be separated into two contributions, i.e., the polynomial 1 that has solutions equal to unity and the polynomial Φ that presents the aforesaid stationary requirements consisting of all of the roots of Φ to be greater than unity.
Therefore, an ARIMA model is determined unequivocally by fixing its orders , , and the 2 unknown parameters θ , φ , … , φ , θ , … , θ , σ .Note that the ARIMA family includes the ARMA family in the particular case of 0; so, one can use a general methodology for the identification of an ARIMA model to represent an examined time-series presenting either stationary characteristics ( equal to 0) or non-stationary characteristics ( not equal to 0).
In [26], Box and Jenkins proposed different techniques for the identification of the orders , , of an ARIMA model; in this paper, we used the Box-Jenkins approach based on the use of the sample autocorrelation function , which is an estimation of the following theoretical autocorrelation function ρ at different lags : where μ and σ are the theoretical mean and the theoretical variance of the stochastic variable , respectively.Since time-series always consist of a finite number of samples, , only an estimation ρ of ρ can be provided as follows: where μ is the sample mean of the time-series.The first step of the Box-Jenkins approach is to identify the degree of differencing, , exploiting the properties of the autocorrelation functions.In fact, for a stationary time-series, the sample autocorrelation function ρ quickly decays to zero for moderate lags , while the non-stationary characteristics in an examined time-series can be observed by the fact that the sample autocorrelation function ρ decreases very slowly and does not tend to reach zero even for large lags .This fact suggests that:  if the sample autocorrelation function ρ decreases quickly for increasing values of , the time-series can be represented by a stationary model, and therefore is assumed to be equal to zero;  if the sample autocorrelation function ρ does not decrease quickly for increasing values of , the stochastic process is supposed to be non-stationary in but stationary in for 1.Specifically, the stochastic process is studied iteratively for 1,2, … ; at each iteration, the autocorrelation function ρ of is investigated, and the iterative process is stopped when the autocorrelation function ρ decreases quickly for increasing values of .Therefore, is assumed to be equal to the number of the iteration that achieved this result; in practice, is normally equal to 1 or 2, and is sufficient to inspect the first 20 estimated autocorrelation coefficients ( 1,2, … ,20) of the original series and of its first and second differences to determine the value of .
Once the value of the differencing order, , is selected, the appropriately-differenced time-series 1 shows characteristics of a stationary process; therefore, it can be modeled by an ARMA process of order , .Having built the time-series in such a way, the ARMA , process representing and the ARIMA , , process representing the original time-series share the same orders , ; therefore, in the second step of the Box-Jenkins approach, one can study the differenced time-series , and, by fixing the orders , of the correspondent ARMA model, the orders , of the original ARIMA model also are individuated automatically.
Specifically, in [26], it was shown that different behaviors of the autocorrelation function ρ for the differenced series suggest different values of , , and Table 1 reports the values for the most common time-series.
Table 1.Behavior of the sample autocorrelation function, ρ , for the th difference of an ARIMA process of order , , .
Order of the ARIMA Model , , , Once the three orders , , of the ARIMA process have been determined, a consolidated estimation procedure can be used to obtain estimates of the 2 unknown parameters θ , Φ , … , Φ , θ , … , θ , σ in Equation ( 9), which unequivocally identify the time-series model.
In this paper, the parameters of the ARIMA model were evaluated by minimizing the unconditional log-likelihood function of samples of via the unconditional least squares estimates reported in [26].
Here, the stochastic variable, , was assumed to be the wind speed, .Moreover, once the minimum mean square error forecast of the wind speed for the time horizon was obtained by using the appropriately-estimated ARIMA model, we assumed it to be the expected value of the forecasted distribution, i.e., the mean value μ to be included in Equation (5).

Evaluation of the PDFs of the Parameters , ,  ,
Once the mean value of the distribution at the time horizon were determined as described in Section 2.3, the remaining parameters ω , β ,  , β of the distribution in Equation ( 4) must be obtained.The BI allows the probabilistic estimation of these parameters, identifying their joint posterior probability distribution, by the inference of an array of observations upon the known (or hypothesized) prior probability distributions of each parameter.
The full procedure that is used is described as follows.
The set , . . ., , composed of measurements of wind speed observed until the origin time , is provided initially.In addition, the prior distributions of the parameters are chosen.Let be the generic parameter whose prior distribution must be provided for the time horizon h; the parameters of the prior distributions commonly are called hyperparameters.There is a great debate in the relevant literature [47] concerning how to determine the type of the prior distribution of the parameter and the corresponding hyperparameters.For example, when little or no prior information is provided on the parameter , an uninformative distribution, such as Jeffreys prior or uniform distribution, is commonly used.However, when some prior statistical information is provided on the parameter , an informative, appropriate distribution can be used, such as the Gaussian PDF with hyperparameters μ , σ .
The main advantage of the Gaussian distribution is the simplicity of the operations in that only the estimates of two hyperparameters are needed, and one can fix the variance σ immediately on the basis of her or his confidence in the estimate of the mean value μ ; a large variance yields a larger, more uniform distribution around the mean value, while a small variance yields a distribution that is more concentrated around the mean value.Coherently, with the behavior of uninformative distributions, specifying a large variance σ ensures that the historical data used for the inference determines the relevant changes in the posterior distribution of to a greater extent than the prior distribution [47].
In this paper, an initial estimation ̂ was performed for each time horizon for each parameter β ,  , β , by applying the well-known moment estimation procedure [48] on the set of observations of wind speed .Then, the resulting value of each estimate was assumed to be equal to the mean value of the corresponding prior informative Gaussian distribution, i.e., ̂ μ ; then, the variance was assumed to be very high (σ 10 for each parameter β ,  , β , as in [23]) in order to ensure that the historical data used for the inference, more than the prior distribution, determines the relevant changes in the posterior distribution of parameters.Instead, for the parameter ω , a completely uninformative uniform distribution in the interval from 0 to 1 was chosen due the restricted domain in which the parameter is defined.

Now, let
ω , β ,  , β be the random parameter vector to be estimated for each time horizon in the BI approach.
Once the prior distributions are set, the BI allows the estimation of the joint posterior distribution | , given the set of measurements , through the extension of the Bayes' Theorem.Unfortunately, a closed-form expression of | cannot be provided analytically, but the expression of the un-normalized posterior distribution, | , which is directly proportional to | , is sufficient for the probabilistic estimation of the parameters.
We can calculate the un-normalized posterior distribution | of the random parameters by: where | is the likelihood function; and z is the prior distribution of the th prior random parameter of the vector .The likelihood function | in Equation ( 12) is given by: where  μ is the Equation ( 5) evaluated in correspondence to parameters and μ , where μ is the minimum mean square error forecast [26] drawn from the selected ARIMA , , model for the time horizon , given the past values of wind speed , … ., contained in the set .
The explicit expression of the likelihood function can be provided in the following form: Numerical values of Equation ( 12) can be obtained through different methods that have been extensively used in Bayesian relevant literature [22,47,49,50].In this paper, the Monte Carlo Markov Chain simulation method based on the Metropolis-Hasting algorithm was used to obtain samples of the posterior distributions of the parameters in from the evaluation of the un-normalized posterior distribution | [22,47,49,50].Moreover, the size of the historical data can be selected with adequate criteria, thus improving the accuracy of the forecasting method.

Evaluation of the Samples of the PDF
The samples of the posterior distributions of the parameters ω , β ,  , β for the time horizon , obtained as shown in Section 2.4, and the mean value μ of Section 2.3, can be used together to obtain samples of the parameter  from Equation (5); then, the samples of wind speed, , can be acquired from the estimated distribution ω ,  , β ,  , β (4).In this paper, these samples were acquired by applying the random rejection sampling algorithm by Von Neumann [51].
Then, the samples of can be used in a Monte Carlo procedure to obtain samples of from Equation ( 1) and to provide a probabilistic estimation of the wind active power for the time horizon .

Experimental Section
The procedure presented in Section 2 was firstly used on an USA.wind speed time-series (TS1) to forecast the hourly active power produced by a 75 kW wind generator for the time horizon with a lead time 1 h, acquiring the hourly measurements until the origin time 1.The wind generator was characterized by the wind speed values of 2.3 m/s, 9 m/s, and 16 m/s.The non-linear part of the function in Equation ( 1) was approximated numerically through a sixth-order polynomial, interpolating the data provided by the manufacturer.
Then, to further validate the proposed approach, the results of forecasts in case of three additional time-series related to two USA sites (TS2 and TS3) and an Italian site (TS4) are reported in Section 3.3.

Characteristics of the Data
The dataset of available measurements consisted of 8760 hourly observations of wind speed obtained from 1 January 2013 to 31 December 2013 by the NWTC M2 Tower at latitude 39°54′ north and longitude 105°14′ west (USA).Table 2 provides the monthly and annual mean values of wind speed observed during the entire year.The procedure for identifying the ARIMA model and estimating the corresponding parameters, described in Section 2.3, was applied to the first 4380 measurements of wind speed, taken during the first six months of the year; then, wind speeds were predicted for the second half of the year.
Figure 1 shows the autocorrelation coefficients and the corresponding significance levels for the original time-series (Figure 1a) and for the once-differenced time-series (Figure 1b) of the observations of wind speed.
Figure 1a shows that the sample autocorrelation function decreases quickly for increasing values of lag, ; this suggests that the original time-series was stationary and non-seasonal.The behavior of the autocorrelation function for the once-differenced time-series shown in Figure 1b supports this hypothesis.Therefore, for the ARIMA model the order 0 was selected.Furthermore, the decay of the autocorrelation function is pseudo-exponential, so the ARIMA model (1, 0, 0) and the ARIMA model (2, 0, 0) were used to evaluate the mean value of the distribution of wind speed; using these models in the form of difference Equation ( 9), we have Equations ( 15) and ( 16), respectively: Forecasts were made by using the MWB Bayesian method presented in Sections 2.4 and 2.5, in which the ARIMA (1, 0, 0) model (B-MWB-1 case) and the ARIMA (2, 0, 0) model (B-MWB-2 case) were used to individuate the mean value μ of the distribution of wind speed.

Evaluation of the Quality of the Forecasts
The quality of the probabilistic forecast methods was quantified in this Section by using either a single (spot) value (i.e., the mean value) or the entire probability function.In particular, we define the "spot-value framework" as the case in which the quality of the probabilistic forecast is quantified using a single value of the random variable wind power production, whereas we define "distribution framework" as the case in which the quality of the probabilistic forecast is quantified using a score that uses the entire probability function.
Then, considering the spot-value framework, the mean absolute error (MAE), the root mean square error (RMSE), and their corresponding normalized indices, i.e., NMAE and NRMSE, respectively were selected to quantify the quality of the forecast.These values were calculated as shown below: where is the spot-forecasted value (i.e., the mean value); * is the observed hourly value of active power produced; and is the total number of forecasting hours.
With reference to the distribution framework, it was assumed that the continuous ranked probability score (CRPS) and its corresponding normalized index, i.e., NCRPS (probabilistic indices), can be used to quantify the quality of the forecast, and they were calculated as follows: where Θ is the Heaviside function, and is cumulative density function (CDF) of the entire forecasted power.From the analysis of relationships Equation ( 18) it clearly appears that the CRPS is linked to the total area between the CDF of the forecast and the Heaviside function along all of the hours that were considered.The calculation of the CRPS will result in a value that has the units of the forecast variable, and, therefore, CRPS can be interpreted as a probabilistic version of the MAE [52,53].
Furthermore, an evaluation of the reliability of the forecasting methods also could be obtained by following the well-known reliability diagrams approach shown in [13,14].
The values of indices Equations ( 17) and ( 18), obtained by using the Bayesian method proposed in Section 2, were compared with the values obtained using a reference method, i.e., in the spot-value framework, the well-known persistence method was used as in references [54,55], and, in the distribution framework, the selected benchmark was the probabilistic extension of the persistence method (PPM) proposed in [56,57].The probability function used in the frame of the PPM was a classical, two-parameter Weibull distribution, the mean value (μ of which was assumed to be the observed hourly value of active power at the origin time 1.The shape parameter β was calculated through an iteration of the variance of samples, coherently to the model that was used for the iteration of the mean value, as shown in [56,57].Further comparisons of the results obtained with the proposed method were provided by using the Bayesian method (B-WB case) proposed in [23], which was based on the use of a two-parameter Weibull distribution, and the Bayesian method which was based on the use of a Beta distribution (B-β case) and a Gamma distribution (B-γ case).
Forecasts were made for different time intervals and for several days during the period from 14 June 2013 to 31 December 2013.For sake of conciseness, only the results of the forecasts made for 90 days (2160 h), from 14 June 2013 to 12 August 2013, are reported and discussed here to validate the proposed Bayesian method.
Figure 2 shows the results of the one-hour-ahead forecasts as an example of the obtained results.Figure 2a specifically shows the results obtained using the proposed Bayesian-based approaches (B-MWB-1 and B-MWB-2) and using the PPM, and Figure 2b shows the results obtained using the Bayesian methods with unimodal distributions (B-WB, B-β and B-γ); the actual hourly wind power is reported in both figures for comparison.For all of the Bayesian forecasts, we assumed that the mean value is the spot forecast value. the forecasts of all of the Bayesian methods were closer to the observed values of wind active power than the forecasts of PPM for almost all hours of the day.This is an interesting result, since the persistence method usually has good behavior for low values of lead time, ;  all of the Bayesian methods had very similar results due the similar approaches that were used to evaluate the spot value by the ARIMA models.Thus, a more comprehensive analysis was required in terms of sharpness and reliability of the forecasting methods, and it had to consider the entire forecasted PDF in the distribution framework.
Figure 3 shows the CDFs of forecasts and the CDF of a one-hour observation during the day (i.e., 1:00 PM). Figure 3a compares the forecasted CDFs obtained through the proposed method (B-MWB-1 and B-MWB-2 cases) and through PPM to the CDF of the actual value of wind power.Figure 3b compares the forecasted CDFs obtained through the Bayesian methods with unimodal distributions (B-WB, B-β and B-γ) to the CDF of the actual value of wind power.Figure 2 shows that, at 1:00 PM, the deviations of the forecasted spot values from the actual value of wind power were very similar for all the Bayesian forecasting methods that were considered; however, this behavior was not the same in terms of hourly contribution to the CRPS.In fact, when Figure 3a,b were compared, it was evident that the total areas between the measured CDFs and the forecasted CDF obtained by using the proposed method were smaller than the total areas obtained by using the other methods.In this case, both underproduction error area (where the measured CDF is greater than the forecasted CDF) and overproduction error area (where the measured CDF is smaller than the forecasted CDF) for the reference methods were greater than the corresponding areas for the proposed method.The best performances were obtained by the proposed methods; B-γ performed slightly worse, leading to a greater underproduction area, while PPM, B-WB and B-β led to greater underproduction and overproduction areas.There were no appreciable differences between the results obtained in the B-MWB-1 and B-MWB-2 cases.As a further example of the results that were obtained, Table 3 shows the values of the absolute and percentage error indices Equations ( 17) and ( 18) obtained for all of the methods and averaged over the time interval from 14 June 2013 to 12 August 2013.The values in Table 3 indicate that:  Bayesian methods performed better than PPM in the spot-value framework; in fact, the MAE index decreased by about 5%.In addition, the proposed Bayesian method provided results that were slightly better than the B-WB, B-β and B-γ methods; for example, the RMSE calculated with B-MWB-2 was 2% lower than the calculated with all of the Bayesian methods with unimodal distributions;  in the distribution framework, all of the Bayesian methods performed better than PPM; in fact, they provided CRPS values that were about 11.5% even lower than the values obtained by PPM.The proposed method had the best performances; for example, the CRPS values obtained by using either B-MWB-1 or B-MWB-2 were 4% lower than the values obtained by using B-WB, 2.5% lower than the values obtained with B-γ and 2% lower than the values obtained with B-β;  a comparison of B-MWB-1 and B-MWB-2 indicates that the latter provided slightly better performance, since the fact that all of its indices had relatively smaller values.
The CRPS is a consolidated tool to evaluate the calibration and the sharpness of forecasts.However, for sake of completeness, the calibration of forecasts can be further checked through the inspection of PIT histograms [58][59][60].The PIT histogram is a visual, informal diagnostic tool; deviations from uniformity usually evidence forecast failures and model deficiencies.U-shaped histograms indicate under-dispersed predictive distributions, inverse-U shaped histograms indicate over-dispersed predictive distributions, and skewed histograms usually evidence biased forecasts.
Figure 4 shows the PIT histograms obtained through the proposed methods B-MWB-1 and B-MWB-2 and PPM (Figure 4a) and the PIT histograms obtained through the Bayesian benchmark methods (B-WB, B-β and B-γ in Figure 4b).
From the graphical inspection of Figure 4, the behavior of PIT histograms seems to be coherent to the corresponding values of CRPS in Table 3; the proposed method provides almost uniform histograms, coupled with the lowest values of CRPS.B-γ also provides an almost uniform histogram, even if the corresponding value of CRPS is higher, while PPM, B-WB and B-β show a typical inverse-U shaped histogram, suggesting an over-dispersion in the forecasted distributions.The reliability of the proposed methods was evaluated by comparing the empirical coverages of the various quantiles of the forecasted PDFs to the nominal coverages for the entire interval of forecasts.Figure 5 shows the estimated coverage with respect to the nominal coverage obtained by the four methods that were considered.Figure 5 shows that the forecasts produced by either B-MWB-1 or B-MWB-2 provided the best reliabilities, with their reliability diagrams virtually overlapping each other.B-γ performed slightly worse than the proposed method, especially in correspondence to the higher quantiles.B-WB and B-β deviated more than the proposed methods from the ideal reliability curve, and the reliability of PPM was particularly poor.

Further Analysis
The proposed forecasting method was validated using several additional wind time-series.For sake of conciseness, in this section the results of forecasts performed for two U.S. locations (TS2 at latitude  34°59′ and longitude 104°02′; TS3 at latitude 45°20′ and longitude 104°25′, respectively) and for an Italian location (TS4, at latitude 16°21′ and longitude 40°57′) are reported.The selected locations were chosen in order to test the proposed method in different conditions of wind occurrences.In the following, the results of a 1-month time interval forecasts are shown.Table 4 shows the yearly mean values of wind speed, the ARIMA model chosen through the procedure described in Section 2.3, and the characteristics of the corresponding wind turbines for the three locations.The quality of the forecasts was quantified by using the indices defined in Section 3.2; the values of indices are shown in Tables 5-7 for TS2, TS3, and TS4, respectively, and compared with the values obtained through the same benchmark methods used in Section 3.2.The values in Tables 3, 5-7 indicate that:  the proposed method seems to be the best forecasting tool, since it provides the lowest values in all of the indices, with the only exception of MAE for TS4; particular, it seems to be very suitable to provide sharp probabilistic forecasts, since it provides the lowest values of CRPS index;  comparing the proposed method to the PPM benchmark, the values of MAE decrease by about 2% and 8% for TS2 and TS3, respectively, while for TS4 the value of MAE provided by the proposed method is slightly (0.5%) higher than PPM benchmark.The dispersion of forecasts around the mean value is however smaller for the proposed method, since it provides lower values of RMSE (about 5%, 9%, and 3% lower for TS2, TS2, and TS4, respectively) and CRPS (about 8%, 3% and 2% lower for TS2, TS2, and TS4, respectively);  among the Bayesian benchmarks, B-γ seems to provide the most accurate forecasts; however, the proposed method outperforms B-γ in all of the considered time-series.In fact, the values of MAE decrease by about 4%, 9%, and less than 1% using the proposed method; the values of RMSE decrease by about 3%, 7%, and less than 2%; the values of CRPS decrease by about 3%, 10% and 11% for TS2, TS3, and TS4, respectively;  the global behavior of the proposed method seem to be coherent for all of the considered time-series, even if the application is significantly different in terms of rated power.This behavior can be detected by comparing the normalized values of the indices provided by the proposed method; they are very similar for all of the four considered time-series, even if the rated power of the considered turbines vary significantly (from 75 to 1500 kW).

Conclusions
In this paper, we proposed a new probabilistic method for forecasting the generation of wind power.This method was based on the Bayesian theory and was particularly appropriate for forecasting wind power in correspondence with wind speed regimes that vary significantly over time; this result was achieved by using a more complex probability distribution to characterize wind speed, i.e., the mixture Weibull distribution, which seemed the most suitable to represent both unimodal and bimodal wind regimes.
The results obtained with the proposed method were compared with the results obtained by using two probabilistic forecasting approaches that have been used extensively, i.e., the persistence method and a traditional Bayesian method using two-parameter distributions.
The numerical applications were performed with respect to various wind turbines; the proposed method proved to be useful in short-term probabilistic forecasting of wind power, performing better than the reference methods in terms of both point-value and distribution forecasts.In particular, the proposed method offered significant improvements in terms of the sharpness and reliability of forecasts.
We concluded that the proposed Bayesian method is the most suitable for representing both unimodal and bimodal wind regimes.Also, additional significant improvements of the forecasts are expected in particular regimes that are characterized by a significant number of days of bimodal wind speed distributions.
Future work will focus on the application of the Box-Jenkins approach, based on the use of the sample autocorrelation function, to forecast the photovoltaic power generation.In addition, since in this paper we assumed a deterministic power curve, studies on the probabilistic behavior of power curve depending on meteorological variables such as wind direction, temperature, local air density, and precipitation will be carried out.

Figure 1 .
Figure 1.Autocorrelation coefficients of observations of wind speed and corresponding significance levels: (a) for the original time-series; (b) for the differenced time-series.

Figure 2
Figure 2 indicates that:

Figure 5 .
Figure 5. Reliability diagrams for the proposed methods and the reference methods compared to the ideal reliability; (a) Proposed methods: B-MWB-1 and B-MWB-2 cases, and PPM; and (b) Bayesian methods with unimodal distributions: B-WB, B-β and B-γ cases.

Table 2 .
Monthly and annual mean values of observed wind speed.

Table 3 .
Time-series TS1: Values of absolute and percentage indices for the forecasts averaged from 14 June 2013 to 12 August 2013.

Table 4 .
Values of yearly mean wind speed and characteristics of the wind turbines for TS2, TS3, and TS4.

Table 5 .
Time-series TS2: Values of absolute and percentage indices for the forecasts averaged from 1 October 2013 to 31 October 2013.

Table 6 .
Time-series TS3: Values of absolute and percentage indices for the forecasts averaged from 1 October 2013 to 31 October 2013.

Table 7 .
Time-series TS4: Values of absolute and percentage indices for the forecasts averaged from 1 October 2013 to 31 October 2013.