A Bayesian Method for Short-Term Probabilistic Forecasting of Photovoltaic Generation in Smart Grid Operation and Control

A new short-term probabilistic forecasting method is proposed to predict the probability density function of the hourly active power generated by a photovoltaic system. Firstly, the probability density function of the hourly clearness index is forecasted making use of a Bayesian auto regressive time series model; the model takes into account the dependence of the solar radiation on some meteorological variables, such as the cloud cover and humidity. Then, a Monte Carlo simulation procedure is used to evaluate the predictive probability density function of the hourly active power by applying the photovoltaic system model to the random sampling of the clearness index distribution. A numerical application demonstrates the effectiveness and advantages of the proposed forecasting method.


Introduction
In recent years, power systems have been undergoing radical changes and in the near future their planning and operation will be undertaken according to the Smart Grid (SG) vision.The SG initiatives aim at introducing new technologies and services in power systems to make the electrical networks more reliable, efficient, secure and environmentally-friendly [1].
Increasing the exploitation of renewable energy sources (such as wind and solar energy) is certainly one of the most important goals of SGs.Indeed, the random behavior of such energy sources introduce challenging issues in the design of advanced tools and techniques for the optimal SG operation and control.In tackling these issues, forecasting is a fundamental task for an efficient utilization of the available distributed energy resources and for a secure and economic behaviour of the power system [2].
In general, the power system operator can use accurate forecast information about renewable power generation and load consumption to guarantee a balance between generation and demand at all the time with reduced capacity and costs of the operating reserves [3,4].From the perspective of the producers, forecasting the renewable power output can be very useful for decision making on the energy market.In this way, not only the deviation between scheduled and actual generation can be minimized, but also the revenues are increased, thus reducing the penalties related to regulation costs and enhancing the competitiveness of renewable energies in comparison with dispatchable energy sources [5].Finally, prosumers can use prediction models to plan their consumption patterns so as to match the power they generate on-site thus maximizing their benefits [6].
In the relevant literature, various forecasting methods have been proposed to estimate the expected power generated from a renewable energy source, which essentially differ in the type of the information characterizing the predicted output and in the time horizon of their application.
Concerning the type of information, two main forecasting methods can be adopted, referred to as deterministic and probabilistic forecasting.In the former one, a single value of the renewable power generation is provided and no uncertainty of the prediction is considered.In the latter one, the output value is accompanied with information on its intrinsic unpredictability and, then, it is more appropriate to solve problems of management and control in future SGs [3,5,7].Probabilistic forecasting methods can be distinguished in two further categories according to the adopted approach: the prediction error or the direct approach.While the first one provides the uncertainty of the error deriving from the application of a deterministic forecasting method, the second one directly yields the statistic representation of the predicted output.
Concerning the time horizon, renewable generation forecasting can basically be divided into different time intervals, depending on the time frames corresponding to the tasks of grid operation and control and to the sessions of electricity markets.Short-term forecasting covers time intervals ranging from less than 1 hour to few hours ahead and is very useful for frequency regulation and load balancing.Medium term forecasting, up from several hours to few days ahead, is needed for unit commitment and energy trading.Finally, long-term forecasting can be required to support system planning and economic analyses in seasonal and annual horizons.However, recent renewable integration studies have shown that it is the short term forecasting that gains the most in a SG [3].
One of the most promising renewable energy conversion system to be integrated in SGs is the PhotoVoltaic (PV) power generation, due to the expected cost reduction and the increased efficiency of both PV panels and converters [8].The power generated by a PV power system varies according to the solar radiation on the earth's surface, which mainly depends on the installation site and the weather conditions.While the dependence on the specific location can be essentially predicted on a deterministic way, the atmospheric conditions (such as cloud cover, ambient temperature, relative humidity) are the main causes of the randomness of the solar radiation and it is very important to consider them when short-term forecasting is concerned [9,10].
Several methods have been proposed in the relevant literature for forecasting the PV power generated in a short time horizon.In [11] a recurrent neural network has been proposed to perform a short term forecasting of the PV power production using meteorological data of the last 16 days and has been compared with a feed-forward neural network.A method to predict PV power output has been presented in [12] by deriving hourly site-specific irradiance forecasts from data provided by a weather forecasts center.In [13] an advanced Grey-Markov chain model has been applied to predict the daily power production of grid-connected PV systems using operating data collected at 15 minute intervals.A two-stage method to predict hourly value of the PV power for time horizon up to 36 hours has been proposed in [14].In [15] Kalman filters are applied to predict sub-hourly and hourly PV power production using solar irradiance as input.First studies on the application of Bayesian theory for PV power production forecasting are shown in [16,17].
In this paper, extending and improving the approach based on the Bayesian theory outlined in [16][17][18], a new method for short-term probabilistic forecasting is proposed, that directly yields to the statistic representation of the predicted PV power output.The proposed method forecasts at the generic hour the probability density function (pdf) of the active power produced by a PV power system at the hour with 1, . . ., , starting from the evaluation of samples of the pdf of the hourly clearness index at hour .The forecast of the pdf of the hourly clearness index is obtained firstly selecting for the pdf an analytical expression and, then, evaluating the pdf parameters by applying the Bayesian Inference (BI).To this aim, an Auto Regressive (AR) time series model, representing the relationship between the pdf parameters, the clearness index and some explanatory meteorological variables, is used together with appropriate sets of historical measurements of the random variables involved in the AR model.Finally, a Monte Carlo (MC) simulation procedure is applied to generate the predicted pdf of the PV active power: a random sampling of the pdf of the hourly clearness index is performed and, using the PV system model, the PV power samples are obtained.
The key steps of the proposed method are: (i) the choice of the analytical expression of the pdf modeling the hourly clearness index; (ii) the definition of an adequate AR time series model so as to consider only the meteorological variables that most affect the hourly clearness index behaviour; and (iii) the selection of appropriate data vectors from historical measurements collected before the time of the forecast.
The peculiarity of the method is that it takes into account the dependence of the terrestrial solar radiation on some explanatory atmospheric variables and combines probabilistic techniques, such as BI and MC simulation, to provide a probabilistic forecasting of the PV power generation useful for optimal SG operation and control.
This paper is organized as follows: Section 2 briefly recalls the probabilistic forecasting method based on the Bayesian approach.In Section 3 the probabilistic method is applied to forecast the power production of a PV system.Finally, numerical simulations are reported in Section 4 to give evidence of the effectiveness of the proposed approach.

Probabilistic Forecasting method based on the Bayesian approach
The probabilistic forecasting method based on the Bayesian approach predicts at the generic hour the pdf of a random variable at the hour , with 1, … , K. For the sake of simplicity, in the following the analysis is referred to the case of 1.In applying this method, the starting point is the knowledge of the analytical expression of the pdf of the random variable to be forecasted.Usually, the analytical expression of the pdf is characterized by some distribution parameters and it is modeled as a conditional pdf.For the sake of conciseness, reference is made to only one distribution parameter (i.e., the mean value), generically referred to as , and the conditional pdf is indicated as | .
Forecasting at the pdf | requires an estimation of .To this aim, a first order AR time series model can be used, representing the relationship between and both the measurements and ( , , … , , collected at the hour of, respectively, the random variable to be forecasted and the explanatory random variables , , … , , influencing : where , , , … , are the coefficients of the AR model.Explanatory variables are variables such that changes in their value are thought to cause changes in another variable.
In the classical statistics, , , , … , are assumed to be constant.Indeed, when Bayesian approach are adopted, such coefficients are modeled as random variables, known as prior random parameters, and the BI [19] allows to estimate the conditional pdf , , , … , | of the parameters , , , … , given the set , . . ., composed of measurements of observed before the hour .The pdf , , , … , | is known as a posteriori distribution of the prior random parameters and it is very difficult to obtain its expression in closed form.Actually, only a simplified expression, known as unnormalized a posteriori distribution of the prior random parameters, and indicated as , , , … , , can be provided.Fortunately, the knowledge of the unnormalized a posteriori distribution of the prior random parameters is sufficient for developing algorithms that provide information about the normalised a posteriori distributions.The unnormalized a posteriori distribution of the prior random parameters is derived from the application of the Bayes' rule assuming the independency of the prior random parameters so that: where , , , … , is the likelihood function; and and are the a priori distributions of the prior random parameters.
The a priori distributions are the initial pdfs of the prior random parameters which are not conditional on observed data.Their expressions can be vague or informative and reflects the knowledge that we have in advance about the pdfs that we are interested in.
The likelihood function is the conditional data distribution, that is the pdf modeling , whose realizations are contained in , given the prior random parameters.
Once the unnormalized a posteriori distribution of the prior random parameters is known, it is trivial to evaluate the normalised a posteriori distributions of each parameter by applying the theory of the joint pdfs [20].Then, the Monte Carlo Markov Chain (MCMC) simulation method based on the Metropolitan-Hasting algorithm [21] can be directly applied to the unnormalized distributions of every parameter to obtain samples of their a posteriori distributions.In the MCMC approach, a Markov chain is constructed, characterized by a transition probability matrix reflecting the a posteriori distributions of the prior random parameters.Then, the Markov chain is simulated until the samples are representative of the a posteriori distributions of every parameter.
Eventually, incorporating the AR time series model in this procedure, the samples derived from the a posteriori pdfs of , , , … , can be used together with the measurements and , , … , , collected at the hour to obtain samples of .Finally, for each simulated sample of , the samples of the random variable are drawn from the analytical expression of the pdf so as to provide the full predictive distribution | .

Probabilistic Forecasting of the Photovoltaic Generation
In the following, the probabilistic forecasting method described in Section 2 is applied to predict at hour the pdf of the PV power production at hour 1, starting from an estimation of the terrestrial hourly solar radiation, expressed in terms of the pdf of the hourly clearness index at hour 1.The next four subsections dealt with:  The description of the adopted model for the PV system;  The description of the pdf modeling the hourly clearness index;  The definition of the AR time-series model including meteorological variables; and  The probabilistic characterization of the prior random parameters.

PV System Model
The hourly active power produced by a PV system depends on the availability of the solar radiation at the installation site.The solar radiation in a given locality cannot be exactly predicted owing mainly to the irregular presence of clouds.The sky conditions are often taken into account by representing the terrestrial solar radiation in terms of clearness index that is defined as the ratio of the surface radiation to the extraterrestrial radiation for a given period [22].
When the PV system is equipped with a maximum power point tracker, an analytical relationship exists between the PV active power at the hour and the corresponding hourly clearness index , [22][23][24] that is defined as the ratio of the hourly total solar radiation on an horizontal plane I t to the extra-terrestrial hourly total solar radiation I 0 ; it results: where is the array surface area;  is the efficiency of the PV system; and and are defined as: (5) where is the ratio of beam radiation on a tilted surface to that on a horizontal surface at any time; ρ is the reflectance of the ground;  is the inclination of the array surface to the horizontal plane; is the ratio between diffuse radiation in hours and diffuse radiation in a day; is the extra-terrestrial total solar radiation; and , are coefficients reported in [23], which link the diffuse fraction of the hourly total solar radiation on horizontal plane with the hourly clearness index.
The analysis of the relationship (4) clearly reveals that is the only variable affecting , once the hour of the day, the installation site and the technical characteristics of the PV system are assigned.The hourly clearness index is a random variable modelling the uncertain behaviour of the terrestrial solar radiation.The hourly PV active power , as function of , is itself a random variable and its pdf can be determined by the pdf of the hourly clearness index.
In this paper, a MC simulation procedure is used to generate at the hour samples of the predictive pdf of the hourly PV active power by performing a random sampling of the pdf of the hourly clearness index estimated for the hour 1 and applying Equation (4).

Probability Density Function of the Hourly Clearness Index
The analytical expression of the pdf of the hourly clearness index can be experimentally obtained by a statistical analysis of historical solar measurements collected in the site in which the PV system is installed.In the literature, starting from the fitting of the hourly clearness index data collected in a specific location, the investigation has often resulted in the individuation of pdfs characterized by standard distributions.In [25] the hourly clearness index measurements recorded at various locations in Algeria are conveniently described by Beta distributions.In [26] bimodal distributions are considered more adequate to model clear and cloudy sky conditions of hourly clearness index measurements collected in different cities in the U.S.A.On the other hand, several attempts have been made to find universal standard pdfs that are independent of the location and the time period used to define the clearness index [27][28][29].Following the latter approach, in this paper the model proposed in [29] has been adopted and the following modified Gamma distribution | ,  is used to model the pdf of the hourly clearness index : where is the upper bound of the observed values of ; and and  are the distribution parameters, defined as: with: where  is the mean value of the hourly clearness index at hour .Assuming the knowledge of , the distribution parameters and  only depend on the mean value  and the pdf in Equation ( 7) can be rewritten as:

AR Time-Series Model
To predict at the hour the pdf | at the hour 1 an estimation of the mean value  is required, as shown in the Subsection 3.2.To this aim, an AR time series model can be used to define the relationship among such mean value and the measurements of the clearness index and of some meteorological variables influencing the solar radiation, such as the ambient temperature , the relative humidity , the wind speed and the cloud cover , where the cloud cover is defined as the ratio in % of the sky hidden by all visible cloud.In this paper, the following first order AR time series model is adopted: where , , , , are the measurements of, respectively, the clearness index, the ambient temperature, the relative humidity, the wind speed and the cloud cover, which are collected at the hour .The inclusion of meteorological variables in the AR model significantly increases the computational efforts in the application of the proposed forecasting method.Despite of the highest complexity of the procedure, taking into account the dependence of the clearness index on the meteorological variables allows to perform a more accurate forecasting [10,30].To reduce computational efforts, a correlation analysis can help to individuate the meteorological variables presenting the highest influence on the solar radiation.Such analysis is performed "off-line" and correlates historical measurements collected in the specific site in which the PV system is installed.In this way, only the meteorological variables presenting the highest correlation value with the clearness index are selected so as to found a good compromise between results' accuracy and computational efforts.

Probabilistic Characterization of the Prior Random Parameters
In this paper, the coefficients , , , , , of the AR time series model in Equation ( 12) are assumed to be the prior random parameters of the BI.The a priori pdfs of the prior random parameters  ,  , , , , are usually chosen with a large variance so that the data, rather than the a priori distributions, determine the relevant parameters values in the a posteriori distributions [19][20][21].In this paper the a priori pdfs are assumed to be Gaussian with a mean value 0.5 and a standard deviation 0.5 so as: The likelihood function , , , , , is the pdf in Equation ( 11) specified for the set , . . ., , . . ., of measurements of observed before the hour : where: Relationship ( 15) is obtained by substituting for  the AR time series model in Equation ( 16).It should be noted that the choice of the measurements contained in the sets , , , , represent a key issue in the BI, since they are used to make inference about the prior random parameters , , , , , .In general, these sets contain measurements recorded before the hour of the forecast.Actually, these data are not necessarily the ones collected from the hour 1 to 1, but can be selected with adequate criteria thus improving the accuracy of the proposed forecasting method.In [16] the homologue and the coded group criteria have been proposed.In the first one, the sets contain measurements at the hour which are collected days before the forecast (e.g., if the forecast has to be performed at h = 10:00, the sets include measurements recorded days before at 10:00).In the second one, the sets contain a coded group of measurements around the hour collected some days before the forecast (e.g. if the forecast has to be performed at h = 10:00, the sets include measurements from the 8:00 to 10:00 recorded 3 ⁄ days before).In addition, the measurements contained in , , , , can be collected at time intervals different from an hour.In [17] the data of the clearness index and of the meteorological variables contained in the vectors are extracted from measurements registered at time intervals of 15 minutes.If this is the case, the application of the proposed forecasting method will provide at the hour the predictive distribution of the clearness index at the first 15 minutes of the hour 1.To estimate the pdf at the hour 1 the following approach is adopted in this paper: that is the pdf forecasted at 1 is assumed to be equal to the pdf forecasted at the first 15 minutes of the hour 1.Eventually, Figure 1 shows a block diagram describing the main steps applied in the proposed Bayesian approach. AR model [12]  pdfs of hourly prior random parameters [13,14] Obtain samples of power

Experimental Section
In this section, the proposed Bayesian forecasting method is applied to a 75-kWp PV system, presenting an array surface S 600 m 2 and an efficiency  0.09.Measurements of the clearness indexand of the meteorological variables cited in the Subsection 3.4 (air temperature, relative humidity, wind speed and total cloud cover) are available at the website of the National Renewable Energy Laboratory.In particular, a meteorological station in Colorado (39.742°N, 105.18°W) has been selected and measurements referred to the time interval [8 a.m., 8 p.m.] and collected every 15 minutes are chosen.
In the following, at first a correlation analysis is performed to individuate the meteorological variables to be included in the time series AR model; then, the proposed method is used to forecast the pdf of the hourly active power produced by the PV system.
To individuate the most suitable AR time series model, an "off-line" correlation analysis between the clearness index and the meteorological variables is carried out, on the basis of measurements recorded from January to December 2010.Figure 2 reports the time evolution of the correlation coefficient between the clearness index and the meteorological variables.To avoid excessive computational efforts, only the meteorological variables furnishing the highest values of the correlation coefficient are taken into account.As such, the analysis of the Figure 2 clearly reveals that the total cloud cover and the relative humidity are the meteorological variables presenting the highest influence on the clearness index; consequently, the AR time-series model in Equation ( 12) reduces to: To make inference about the prior random parameters α , α , β and β , the sets , ,  The application of the proposed approach to forecast the hourly PV active power is performed referring to the four seasons of the 2011.Figures 3-6 show, respectively, the actual measured values of the PV active powers (red lines), the mean value (blue line) and the range between the 5th and 95th percentile values of the forecasted pdfs of hourly PV active power.In particular, the results refer to the Recommended Average days of winter (Figure 3), spring (Figure 4), summer (Figure 5) and autumn (Figure 6).In [22] Recommended Average Days are days which have the extraterrestrial radiation closest to the average value in the month.A similar behavior characterizes the vast majority of the considered days (in almost all considered days).From the analysis of the figures, it appears that the actual values of the hourly PV active power are always comprised between the 5th and 95th percentile values.In addition, it should be noted that the mean value appears in most cases a good estimator for the forecasted pdfs, particularly in the range of hours between 11 a.m. and 3 p.m.At the beginning (end) of the period characterized by the presence of solar radiation, usually higher (lower) percentiles appear the most adequate estimators.Anyway, if the mean value of forecasted pdf would be used as the only estimator of the forecasted PV power, the Mean Absolute Relative Error (MARE), defined as: is estimated between 14.5% (winter season) and 18.0% (autumn season).
Finally, Figures 7 and 8 show the forecasted (represented by a blue histogram) and the analytical (represented by a continuous red line) pdfs of the hourly PV active power in March (Figure 7a), April (Figure 7b), August (Figure 8a) and September (Figure 8b).The analytical pdf is obtained applying the fundamental theorem for the function of a random variable to Equation (9) proposed in [31].From the analysis of Figures 7 and 8 it is evident that in different conditions of solar radiation the forecasted pdfs are close to the analytical distributions.

Conclusions
A new method based on the Bayesian inference has been proposed to perform a short-term forecasting of the active power produced by a photovoltaic system starting from an estimation of the hourly clearness index.It takes into account the dependence of the terrestrial solar radiation on some explanatory atmospheric variables, including the cloud cover and humidity.The combination of probabilistic techniques, such as Bayesian inference and Monte Carlo simulation, allows to provide the predictive probability density function of the photovoltaic generated power, which is very useful for the optimal operation and control of the smart grids of the future.
However, if only a value is requested as estimator of the forecasted photovoltaic power, arises the problem of individuate which pdf parameter (mean value, percentiles ...) is the most representative for the distribution.Moreover, the non-linear relationship between the clearness index and the photovoltaic power output can reflect in not negligible errors in the forecasted distributions of the photovoltaic power.Then, future works will investigate the direct application of the proposed probabilistic forecasting method to the active power produced by the PV system, even if the application of the Bayesian inference in this case seems to be arduous.The research will also focus on the choice of the best parameter to be extracted from the predicted probability distribution so as to test the performance of the proposed method in terms of traditional measures of the forecasting accuracy; in this case comparison with ARIMA and neural networks methods will be affected.
of the individual a posteriori distributions are evaluated by applying the MCMC simulation method based on the Metropolitan-Hasting algorithm.The samples of the a posteriori pdfs of , , , , , are used in Equation (12) together with the measurements , , , , collected at the hour to provide samples of the mean value  .Finally, for each samples of  , the samples of the hourly clearness index are drawn from the analytical expression of the pdf in Equation (11).The simulated samples of describes the predictive distribution | .

Figure 1 .
Figure 1.Block diagram describing the main steps applied in the proposed Bayesian approach.
see Subsection 3.4) contain 144 measurements collected at time intervals of 15 minutes recorded before the hour of the forecast.

Figure 2 .
Figure 2. Time evolution of the correlation coefficient between the clearness index and the selected meteorological variables.

Figure 3 .Figure 4 .Figure 5 .
Figure 3. Actual measures of the hourly PV active power (red line); mean values (blue line) and range between 5th and 95th percentile values of the forecasted pdfs of the hourly PV active power.(a) 17 January; (b) 16 February; and (c) 10 December.

Figure 6 .
Figure 6.Actual measures of the hourly PV active power (red line); mean values (blue line) and range between 5th and 95th percentile values of the forecasted pdfs of the hourly PV active power.(a) 15 September; (b) 15 October and; (c) 14 November.