Probability Density Function Characterization for Aggregated Large-Scale Wind Power Based on Weibull Mixtures

: The Weibull probability distribution has been widely applied to characterize wind speeds for wind energy resources. Wind power generation modeling is different, however, due in particular to power curve limitations, wind turbine control methods, and transmission system operation requirements. These differences are even greater for aggregated wind power generation in power systems with high wind penetration. Consequently, models based on one-Weibull component can provide poor characterizations for aggregated wind power generation. With this aim, the present paper focuses on discussing Weibull mixtures to characterize the probability density function (PDF) for aggregated wind power generation. PDFs of wind power data are ﬁrstly classiﬁed attending to hourly and seasonal patterns. The selection of the number of components in the mixture is analyzed through two well-known different criteria: the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Finally, the optimal number of Weibull components for maximum likelihood is explored for the deﬁned patterns, including the estimated weight, scale, and shape parameters. Results show that multi-Weibull models are more suitable to characterize aggregated wind power data due to the impact of distributed generation, variety of wind speed values and wind power curtailment.


Introduction
The growing integration of renewable resources into the electricity sector can be attributed to different factors, including deregulation of the electricity market, environmental goals, economic incentives, and technical maturity.The share of energy consumption produced from renewable resources is currently considered a relevant short-and mid-term target in many countries.Among the different renewable resources, wind and solar power currently receive the most attention, with wind power the most prevalent in terms of installed capacity [1].In fact, the amount of wind power generation integrated into power systems, together with other time-variable, non-dispatchable electricity generation, has been increasing exponentially during the past decade [2].This increase can be easily identified in power systems with significant penetration of variable renewable generation, such as in Spain, where the share of wind power can't be neglected from the supply-side.Indeed, Spain is the world's fourth biggest producer of wind power, with a year-end installed capacity of 22.8 GW and a share of total electricity consumption of 20.4% in 2014 (21.2% in 2013).
In contrast to traditional power sources, wind power is highly variable and uncertain [3].Under the assumption that the share of renewable energy sources is expected to increase very significantly in the next few years, wind and its impact on power systems have been widely studied.Actually, multiple operational timescales have been considered in the literature as an attempt to assess the impact on future systems that high penetration of renewable resources have on power systems [4].Likewise, the impact of wind integration on reserve requirements is a current topic of interest for integration studies and power system operators [5].Some contributions have been focused on evaluating how power systems with large shares of variable and uncertain renewables can be efficiently designed and operated to maintain reliability and economic efficiency, maximizing the penetration of these resources.A computationally efficient probabilistic wind energy production simulation to determine the variable effects of systems for varying levels of wind power penetration is presented in [6].A technique to evaluate operational reliability and energy utilization efficiency of power systems with high wind power penetration is discussed in [7].As for wind power production and forecasts of wind resources, [8] discusses different simulated scenarios based on high-resolution numerical weather prediction models and wind speed measurements to forecast wind power production.Gaussian processes combined with numerical weather prediction are applied to wind power forecasting up to one day ahead [9].Apart from deterministic prediction, other contributions describe probabilistic wind power forecasting algorithms [10], incorporating economic dispatch via a probability distribution model-versatile distribution [11].Correlations of wind speeds following different distributions and a literature review are presented in [12].In [13], three new mixture distributions (Weibull-lognormal, generalized extreme value (GEV)-lognormal and Weibull-GEV) are introduced for wind speed forecasting purposes.
The characterization of wind power is usually carried out by using metrics that highlight two principal properties: variability and uncertainty.In this context, probability density functions (PDFs) are considered as a suitable solution to evaluate wind power conditions in time series, identifying areas with high and low wind power occurrence.Under highly aggregated wind power generation conditions, this characterization represents an important advantage for wind power plant owners and transmission system operators (TSOs), since the wind power PDF is widely used in various system functions, such as: determining required reserves, stochastic modeling or transmission planning scenarios.However, there is a lack of works focused on characterizing and modeling wind power production for large areas or whole power systems with high wind power penetration.In that case, the target is to find a suitable but simple solution to characterize the large amount of data corresponding to wind power production for a multi-year period, and including geographically dispersed wind power generation.In this framework, this paper characterizes the PDF through the estimation of the empirical density function for aggregated large-scale wind power production, geographically distributed, and by using Weibull mixtures.Wind power data spanning several years from Spain's power system is used to evaluate the proposed characterization methodology.
The rest of the paper is organized as follows: in Section 2, relevant aspects of wind speed and wind power are discussed, including the main wind power characteristics in power systems with high wind power penetration.Section 3 describes the proposed model based on a mixture of Weibull density functions, providing a suitable and reliable characterization for aggregated wind power time-series.Real data corresponding to the Spanish power system are used to assess our proposal.Results and comparisons between different criteria to select the number of components are discussed in Section 4. Finally, conclusion is drawn and future work is given in Section 5.

Characterization of Probability Density Function for Wind Power Generation
The Weibull distribution has been traditionally used to model wind speed distributions for applications in wind energy studies [14][15][16][17].This density function profile provides a suitable fit when measured wind speed data are considered [18].The wind speed Weibull distribution function can be represented by: where λ is the Weibull scale parameter, with units equal to the wind speed units, and β is the unitless Weibull shape parameter.Following the international electrotechnical commission (IEC) standard for power performance measurements of the generation provided by wind turbines [19], estimations of the annual energy production of wind turbines can be determined by using the Weibull density function profile.According to this standard, the annual energy production can be estimated by assuming 100% availability of the wind turbines and by using different reference wind speed frequency distributions, such as the Weibull and the Rayleigh density functions, where the latter is a particular case of a Weibull distribution with a shape factor of 2. Weibull distributions have been commonly proposed in the literature to model wind resources, taking into account that 10 min or hourly averaged wind speeds throughout a year are the result of a considerable degree of random variation [14][15][16]20,21].Weibull distributions have been thus applied to characterize PDF for wind speeds, mainly when wind speed data are restricted to a specific geographical location with a unique meteorological tower (also known as a met mast).However, there are locations where the wind speed distributions are not properly characterized by only one Weibull distribution, as depicted in [22].Moreover, the variability of wind distribution based on the wind direction can require a more complex representation based on a double-peaked bi-Weibull distribution [23][24][25][26], with different scale factors and shape factors according to the seasons [27].Bi-Weibull distribution presents several advantages, such as flexibility, the dependence on only two parameters, the simplicity of the parameter estimation process, as well as its specific goodness of fit tests when its parameters are estimated from the sample [28].Other PDFs have been recently proposed in the specific literature for these purposes [29,30].These previous contributions and many other found in the literature discuss about the application of mixtures of Weibull functions to estimate the wind energy potential for a given area [26,27,31,32].However, minor attention has been given to address the problem of characterizing large-aggregated wind power generation, including the effect of geographical dispersion.Therefore, and considering the wind power generation obtained from a geographical dispersion of wind power plants, a natural and interesting issue is whether one Weibull distribution is suitable to characterize the wind power production of a large area or a whole country.In this context, a first difference is due to the natural smoothing effects as a consequence of the aggregation of wind power productions [33,34].Additionally, other aspects should be considered to characterize in detail aggregated large-scale wind power generation: • The relationship between wind speed and wind power generation is derived from the wind turbine power curve.Each wind turbine type has different power curves, depending their characteristics on the wind turbine class.A wind power curve is a non-linear function defining the relationship between wind speed and wind power production.The main characteristics are the cut-in speed, rated power output and cut-out speed.• The variability and uncertainty of the wind speed affect the wind power generation, resulting in highly variable and only partially controllable power output.These wind power fluctuations can lead to oscillations and occasionally intermittent features, directly related to weather phenomena [35].In fact, storms and other unstable weather events induce random variability in wind power generation.On the contrary, and under stable weather conditions, wind power is mainly driven by the diurnal cycle.This fact is highlighted in the daily aggregated wind power generation in Spain for years 2007-2012, Figure 1.For each hour, an annual averaged wind power production is determined accounting for the wind power supplied by the aggregated wind farms.• In countries or areas with high integration of wind power generation, the power system operation can have a significant influence on wind power production.For example, fluctuations as a consequence of either technical or operational requirements can influence on wind power generation even more than meteorological phenomena [37].Economic or reliability issues based on wind power curtailment is another relevant example of large influence on the wind power production.Additionally, these operating system procedures are influenced by most power system disturbances, mainly voltage dips.These events can produce a sudden drop in wind power generation, particularly in wind turbines not-equipped with fault ride-through capability [37].• Wind power generation presents inter-annual oscillations as well [38].Figure 2  • The time-aggregation unit selected for wind power also affects the characterization of the wind power production for a large area or a power system: the longer time interval is selected, the slighter smoothing effects are shown by the aggregated data [28].Summarizing, wind speed PDF is usually well fit with unimodal models and its main applications are resource assessment and wind farm design.In contrast, wind power PDF is preferred for probabilistic forecast, reserves quantification and stochastic operation in power systems.Furthermore, the "natural" variability of the wind combined with the power curve and the artificial or imposed variability (curtailments, voltage sags, maintenance, . . . ) produce important differences with wind speed PDF.When wind power is aggregated, smoothing effect has also taken into account.The novel contribution of this paper is the implementation of PDF model for aggregated wind power production including the effects of the previously described events.This issue is not considered in previous works.

The Proposed Model: Weibull Mixture Characterization
In this section, we formulate the Weibull mixture characterization for aggregated wind power production PDFs, describing the model selection criteria as well as the iterative algorithm to estimate their parameters.

The Mixture Model Formulation
The candidate density function to represent the wind power generation PDF is expressed as: where c is the number of components; w l (with l = 1, . . ., c) are the components' weights required to add 1; f (x|θ l ) is the Weibull density function with parameters θ l = (λ l , β l ), see Equation (1).
A particular and important case is obtained when c = 1, in which a single Weibull distribution is fitted to the data.If c is set to 2, the model describes a double-peaked bi-Weibull distribution, whereas if c is set to 3, it describes a tri-Weibull distribution.The weights w 1 , w 2 , . . .determine the contribution of each individual Weibull component to the global PDF.

Estimation and Model Selection
We aim to fit a mixture of Weibull distributions from empirical density functions corresponding to wind power data.To obtain a better fit, a relevant issue is whether to choose a single Weibull distribution, a bi-Weibull, or tri-Weibull distribution.Additional components can also be included, but over-fitting may become an issue.For this reason, parsimonious representations of the data are preferred.
The choice of the number of components in Equation ( 2) is a model selection problem, and a wide variety of information criteria have been proposed in the literature to compare a finite set of models.
The most widely used are the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).A general exposition of information theoretic criteria and model selection can be found in [39].
For a given model, with parameters θ , the AIC is defined as: where θ is the maximum likelihood estimator for the model based on the observed data x, L( θ|x) is the log-likelihood function evaluated at θ, and k is the number of parameters involved in the model.For a given dataset, we can determine the AIC for a single Weibull distribution (AIC 1 ), a bi-Weibull (AIC 2 ), and a tri-Weibull (AIC 3 ).The preferred model is the one that has the lowest AIC [39].A corrected AIC (AICC) is used for small samples ( n k j < 40) given by: where n is the number of observed data.The (BIC) is defined as: Given any two proposed models, the model that has the lower value of BIC is preferred.It is generally considered that BIC penalizes the number of parameters in the model more strongly than does AIC, since BIC takes into account the length of the dataset, as seen in the log(n) value in Equation (5). Figure 3 shows an example of Weibull estimation where, for the same data source, BIC and AIC suggest a different number of components: BIC selects a single Weibull distribution, and AIC gives a tri-Weibull solution to characterize the empirical density distribution function.To estimate a finite number of Weibull components through both AIC and BIC, an iterative algorithm such as expectation-maximization and its variants are typically used for estimation [40].In this paper, the rough-enhanced-bayesian finite mixture modeling (REBMIX) algorithm was chosen [41].It is an iterative algorithm introduced to estimate the component weights and component parameters for finite mixture models [42], particularly tailored to mixtures of Weibull distributions [43].This algorithm has been implemented in the rebmix package (Version 2.7.1) for R language and environment for statistical computing.

General Overview: Optimal Weibull-Mixture Component Analysis
The Spanish power system is a suitable example of a power system with high power penetration, accounting for over 800 wind farms and 20,000 wind turbines.The installed capacity has been increased from 15,071 MW (2007) to 22,784 MW (2012).Real data corresponding to 10-min samples of the Spain's aggregated wind power generation from 2007 to 2012, including events and operations related to wind power generation, have been used to evaluate the proposed model.The considered 10-min time unit thus provides 6 points per hour.The aggregated data have been normalized by the installed power capacity for each month of the year.Each year is divided into four quarters, from Q1 (January, February and March) to Q4 (October, November and December), according to similarities between monthly wind power generation due to seasonal effects.As a consequence, each bin (year, quarter and hour of the day) includes approximately 540 data points, which is a significant number of samples for a density estimation problem.
With the aim of providing a preliminary example of the proposed methodology, three hours within the fourth quarter (Q4) of year 2010 are analyzed.Figure 4 depicts the empirical density functions and their corresponding Weibull-mixture estimations.Additionally, AIC and BIC values are displayed in Table 1.As can be seen, different numbers of components are selected for each hour, as a consequence of the empirical density profile diversity.In this case, both AIC and BIC lead to the same number of components.These three examples offer an overview of the heterogeneity of aggregated wind power distribution for different hours of an arbitrary day.The results point out the Weibull mixture suitability rather than a unique Weibull probability function to characterize in detail PDFs for certain hours of the daily aggregated wind power generation, particularly when a large-scale aggregated wind power is considered.The number of components can be highly dependent on the high density bins at medium and large wind power values.This is the case study shown in Figure 4b, where two local maximum values can be identified.As described in Section 2, the aggregation of the wind power over large geographic areas and wind power variability can be identified as important causes of the distribution features.

Discussion of Seasonal Weibull-Parameters
An extension of the previous analysis is based on discussing the hourly pattern evolution of the estimated parameters (shape, scale, and weights), comparing the AIC and BIC model selection.Therefore, and as an example of a quarter for a given year, hourly wind power distribution for the entire fourth-quarter (Q4) of the year 2010 is discussed.Figure 5 shows the estimated weights, scale, and shape parameters for AIC and BIC approaches to estimate the optimal number of components.The number of Weibull mixture components is associated with the number of relative maximum values (peaks in the distribution), which are identified based on the empirical density function profiles.In addition, the scale parameter gives information about the position of each Weibull component, being the shape parameter associated with the slope of the peaks.
Regarding the shape parameter, it is smaller and then closer to 1 when a dominating component -in terms of weight in the mixture-can be identified.This fact is particularly relevant for the one-component cases.This smaller value of the shape parameter is associated with a positive skew distribution (probability mass closer to 0), which can be observed when the data can be fitted more accurately with a single-Weibull, see Figure 4c.When more components are estimated, the mixture itself allows for more flexibility of the skewness characteristic of the resulting distribution and each component usually present a larger shape parameter (almost symmetric distribution).
Considering first the AIC-based solution, see Figure 5a,b, it is pointed out that the density bins are fitted with tri-Weibull mixtures during the night time hours (from Hour 0 to Hour 5).Furthermore, Figure 5a,b shows that the winter-spring demand peak hour, around 8 p.m., is also fitted with a tri-Weibull mixture.This is in line with the wind power curtailment influence on the overall distribution shape, since these hours correspond to relatively high levels of wind power curtailment and have noticeably different distribution shapes.
The weight of the third component is not available for other periods of the day.Consequently, the wind power generation distribution is characterized through a single Weibull density function.The scale parameter is increasing during the afternoon and evening time hours, giving rise to a second and third well-separated Weibull-component.For the rest of the hours, this parameter remains in similar values for the main component of the mixture, independently from the other Weibull component.For hours with medium power demand, bi-Weibull components are suggested to characterize the aggregated wind power generation histogram.
As an example of the imposed varibility influence in the PDF, the hours with highly variable power and curtailments, i.e., those times that usually contain large and sudden wind generation changes, normally present large additional peaks in their probability functions and are thus better characterized by bi-and tri-Weibull mixtures.The number of additional components can be then associated with the characteristics and the quantity of the curtailment periods.For the rest of the hours, no wind curtailments or wind curtailment actions that are applied in a less drastic manner are usually applied and the generated wind power distribution is properly modeled by a single Weibull density.
When BIC is selected as model selection criterion, see Figure 5c,d, more conservative results in terms of the number of components are obtained.This is to be expected due to the BIC penalizes significantly the number of components, as detailed in Section 3. Nevertheless, in hours with high wind power curtailments, the distribution profiles are more accurately characterized with three-Weibull components, presenting scale and shape parameters similar to the previous approach.This result confirms the relevance and necessity of a mixture of distributions to characterize and estimate aggregated wind power generation density function for real power system data.Spite a more conservative criterion is used, the shape of the density function and the existence of peaks calls for several Weibull components.

Comparison of Optimal Number of Components in the Fitted Weibull-Mixture
This Section is focused on estimating the number of optimal components for the Weibull mixture, considering both AIC and BIC approaches, for all years and quarters.Subsequently, all years from 2007 to 2012 are considered and the hourly pattern of the selected number of components for the fitted Weibull-mixture model is discussed for each quarter.Both AIC and BIC are thus considered and the corresponding results are summarized in Figure 6, where a color code is proposed to identify the optimal number of Weibull-mixture components for each hour of the day.From this analysis, Weibull mixtures with two and three components are relatively common, as discussed in Section 4.2.The number of Weibull components depends on the analyzed year and quarter, as well as on the season of the year.As shown in the previous section, curtailment actions clearly affect the aggregated wind power generation evolution and consequently the corresponding optimal Weibull-mixture, though they are not the only parameters with influence on the aggregated wind power PDF multi-modality.For the context of the Spanish power system, until 2009, larger wind power curtailments were applied due to important grid limitations at the distribution level.Since the end of 2009, wind power curtailments have been programmed in real time based on the scheduled mix of generation and according to the "Non-integrable wind power excess" as defined by the Spanish transmission system operations within the operational procedure 3.7 [44].Curtailments typically start between 9 p.m. and 1 a.m. and finish between 5 a.m. and 8 a.m., depending on the season and the operating conditions.This fact contributes strongly to the need for additional Weibull components aiming to characterize properly the distributions during high wind periods, especially in spring and autumn.Tables 2 and 3 summarize the relative frequencies for each year of the optimal number of Weibull components according to BIC and AIC, respectively.As previously discussed, the BIC tends to favor a smaller number of Weibull components compared to AIC.Nevertheless, when using the BIC approach, around 66% of the aggregated hourly wind power generation distributions should be characterized by using more than one Weibull component.To complete this analysis, a comparison between the number of Weibull components selected by AIC and BIC is illustrated in Figure 7.In most cases, there is no difference between the optimal number of components when comparing between the criteria.In fact, around 70% of the estimations give the same number of components by both approaches.Therefore, the proposed solution provide a suitable and accurate estimation for aggregated wind power PDF by using a mixture of a simple and reliable distribution: Weibull.The use of this alternative approach with historical data provides a tool based on probabilistic methodology for the applications previously described: reserve estimation, forecast, etc.In addition, this solution can be also applied on other aggregated wind power dataset, obtaining more accurate mixtures to fit the corresponding PDF.The percentage of each mixture would vary according to the wind speed variability, curtailments or smoothing effects due to the aggregated power output from geographically dispersed wind farms.

Conclusions
This paper proposed and evaluated a Weibull mixture as a solution to estimate PDFs for aggregated wind power generation.The smoothing effects of geographical dispersion, wind power aggregation and operational actions carried out by the TSO affect the estimation of the PDFs for different hours of the day, and then, different Weibull mixture components should be estimated.To determine this optimal number of Weibull components, two different well-known information-based criteria have been proposed: the BIC and AIC approaches.A comparison between both criteria is carried out for different hours of the day and seasons.The estimated Weibull mixtures by using both criteria mostly yields similar results, providing the same number of components for around 70% of the hours and quarters analyzed.Consequently, the selection of AIC or BIC is not critical in this case.
Specific Weibull mixture parameters -estimated weight, scale, and shape parameters-are also discussed in detail for different periods and operational considerations.Aggregated wind power production along several years from the Spanish power system is used to provide extensive analysis of the proposed Weibull-mixture approach.The results show that over 66% of the aggregated hourly wind power generation PDF require more than one-Weibull component to be accurately characterized.More specifically, 44% are modeled by bi-Weibull distributions and 22% corresponds to tri-Weibull distributions.This is a clear contrast with the single Weibull component, which is typically proposed for the characterization of wind speed distributions and, according to the results, it should be not extended to the aggregated wind power generation characterization.

Figure 3 .
Figure 3. Example of Weibull mixture estimation for aggregated wind power generation (Bayesian information criterion (BIC) and Akaike information criterion (AIC) approaches).

Figure 5 .
Figure 5. Number of components (according to rough-enhanced-bayesian finite mixture modeling (REBMIX) algorithm), estimated weights, scale and shape parameters for Q4 of year 2010 using wind power generation data.The size of the plotted points is proportional to the estimated weights of the corresponding fitted mixture of Weibull distribution.(a) Scale parameter λ.AIC approach; (b) shape parameter β.AIC approach; (c) scale parameter λ.BIC approach; and (d) shape parameter β.BIC approach.

Figure 6 .
Figure 6.AIC and BIC approaches: optimal number of components for the Weibull mixture model according to the hour of the day for Q1 through Q4 for the years 2007-2012 using wind power generation data.(a) AIC approach; and (b) BIC approach.

Figure 7 .
Figure 7. Weibull mixture model divided into quarters from aggregated wind power generation data (2007-2012).Hours with similar optimal number of components for AIC and BIC.
shows the influence of the month and season on the aggregated wind power generation in Spain from 2007 to 2012.February 2010 and March 2008 were the months with the highest wind power generation (over 36% of installed capacity in those months, 19.1 GW and 15.1 GW, respectively).On the other hand, April 2007, May 2008, August 2009 and September 2011 showed the lowest monthly production (near 15% of installed capacity), illustrating both strong inter-annual and intra-annual variability.

Table 1 .
For year 2010, 4th quarter and three different hours of the day, values of the information criteria for the estimated mixture models with c = 1, 2 and 3 components and the estimated weights for the optimum mixture model of aggregated wind power generation.

Table 2 .
Relative frequencies of the optimal number of Weibull components (AIC approach) using aggregated wind power generation data.

Table 3 .
Relative frequencies of the optimal number of Weibull components (BIC approach) using aggregated wind power generation data.