Multivariate Weibull Distribution for Wind Speed and Wind Power Behavior Assessment

The goal of this paper is to show how to derive the multivariate Weibull probability density function from the multivariate Standard Normal one and to show its applications. Having Weibull distribution parameters and a correlation matrix as input data, the proposal is to obtain a precise multivariate Weibull distribution that can be applied in the analysis and simulation of wind speeds and wind powers at different locations. The main advantage of the distribution obtained, over those generally used, is that it is defined by the classical parameters of the univariate Weibull distributions and the correlation coefficients and all of them can be easily estimated. As a special case, attention has been paid to the bivariate Weibull distribution, where the hypothesis test of the correlation coefficient is defined.


Introduction
The Weibull distribution is a continuous probability distribution that was described by Waloddi Weibull in 1951 [1].It can be applied in a wide range of fields such as survival analysis, reliability engineering or weather forecasting, to mention just a few [1][2][3][4][5].Mathematically, it can be written as a function of three parameters, i.e., location parameter γ, scale parameter C and shape parameter k.The OPEN ACCESS location parameter gives information about the minimum value of the set, so it can be substituted by 0 in the case of wind speed sets.
When more than one Weibull-described variable is being considered and the dependence among them has certain relevance, the so called multivariate distribution [2,3] has to be applied.The multivariate distribution describes the outcome of a number of variables at the same time.Therefore, if the simultaneous behavior of a number of dependent variables, each of them described by a Weibull distribution, is being evaluated, a multivariate Weibull distribution needs to be used.Obviously, when the variables are independent, they can be evaluated separately.
So far, most of the multivariate Weibull expressions for Cumulative Distribution Function (CDF) or Probability Distribution Function (PDF) are based on models [4,5] that approximately describe the joint probability distribution of a group of variables.The most noteworthy one was introduced by Marshall and Olkin in 1967 [6], but others are also interesting such as those presented by Lee [7], Roy and Mukherjee [8], Crowder [9], Lu and Bhattacharyya [10] and Patra and Dey [11].Bivariate Weibull models have also been developed based on the following Copula functions [4]: Farlie-Gumbel-Morgenstern, Clayton, Ali-Mikhail-Haq, Gumbel-Hougaard, Gumbel-Barnett, Nelsen Ten.However, most of these models are based on the estimation of parameters with no significance in order to define the model completely, i.e., in most cases those parameters have no direct relationship to the univariate Weibull model parameters and with the correlation coefficients which are used as a measurement of the dependence.
In this paper, a model is proposed for the multivariate Weibull PDF, based on the classic parameters used in the definition of a univariate Weibull model and on the correlation coefficients among the marginal distributions.It develops the change of variables from Normal to Weibull used in [12,13].It is applied to wind speed and wind power, as there seems to be some agreement on the importance of Weibull distribution for the modeling and simulation of these in wind turbines or wind farms [14][15][16].Therefore, this paper presents the relationship of the simultaneous behavior between wind speed and power in a pair or more locations.The bivariate case is analyzed on its own due to its importance, and the correlation coefficient inference procedure is established.
The structure of the paper is as follows: Section 2 derives the multivariate Weibull PDF from the Standard Normal one and outlines its application to wind speed and wind power, Section 3 deals with the bivariate Weibull PDF and Section 4 states the conclusions.

Multivariate Weibull Distribution
The proposed multivariate Weibull distribution is obtained by means of a Normal to Weibull change of variables.The procedure to derive it is as follows: (1) The key point of the procedure is to define a change of variable from a Standard Normal to a Weibull distributed one.This transformation must be differentiable and the inverse function has to exist.The Standard Normal distributed variable is created in order to use its known features and transfer them to the Weibull one; (2) Establish as many changes as the number of variables, n, considering the different Weibull parameters for each case; (3) Obtain the multivariate Weibull PDF from the multivariate Standard Normal PDF applying the change of variable.
Finally, the relationship between the correlation coefficients in the multivariate Standard Normal PDF and in the Weibull one has to be checked in order to establish a multivariate Weibull PDF that depends on marginal Weibull parameters and the correlation coefficients between pairs of Weibull variables.

Normal to Weibull Change of Variables
In order to define a change of variables from a Normal distributed variable to a Weibull one, the Probability Integral Transform is applied [17], based on the obtaining of Uniform distributed variables.The Uniform distributed variables derived from Normal and Weibull ones are then equalized and a relationship between them is established.
The CDF of a univariate Weibull distribution [2][3][4] with scale parameter C and shape parameter k is defined in Equation (1): where exp( ) is the exponential function; u is the Weibull variable; and F u (u) is the CDF of u.
On the other hand, the CDF of the univariate Normal distribution [2][3][4] with mean value µ and standard deviation σ is defined in Equation (2): where erf( ) is the error function, defined in [18]; x is the Normal variable; and F x (x) is the CDF of x.
As mentioned earlier, the Probability Integral Transform states that variables from any given continuous distribution can be converted into variables having a uniform distribution, i.e., applying its CDF to a variable provides a new variable, uniformly distributed.So, y u = F u (u) and y x = F x (x) are uniform distributed variables.
Both variables y u and y x can be matched in order to establish a relationship between a Weibull and a Normal distributed variable.The whole process can be understood in two steps: First, the conversion of a Normal distributed variable into a Uniform one and then the conversion of this one into a Weibull one.Therefore, if both CDFs are matched, then the Weibull distributed variable, u, can be expressed as a function of the Normal one x, such as in Equation (3): where log( ) is the natural logarithm.Thus, from a Normal distributed variable with given parameters (µ, σ), a Weibull one can be derived, with the desired parameters (C, k).Notice that this transformation can be applied to other types of variables to obtain an equation that relates two variables with different distributions.
In order to derive further results, the transformation given in Equation (3) will be referred to as ntw(x;C,k) where x is a Standard Normal variable (µ = 0 and σ = 1), the Weibull parameters are C and k.For a single variable, x i , the notation is expressed in Equation (4): where C i and k i are the parameters corresponding to the ith variable.
The inverse transformation is also needed and denoted as ntw −1 (u;C,k) where u is the Weibull variable.For a single variable, u i , that function is shown in Equation ( 5): where erf −1 ( ) is the inverse of the error function [18].The function ntw −1 (u;C,k) is shown in Figure 1 for various values of the parameter C. The derivative of ntw( ) is denoted as ntwʹ(x;C,k) and expressed in Equation ( 6), also for a single variable.

ntwʹ(x
In Equation ( 6) the derivative of the error function is used [18], which is shown in Equation ( 7): By using Equations (4-6), the change of variables can be extended to multiple variables.

Multivariate Normal to Weibull Change
In order to broaden the transformation to several variables, the multivariate Standard Normal distribution has to be considered.Its PDF for when the covariance matrix is positive definite is shown in Equation ( 8): where x is a vector formed by several Standard Normal variables; n is the number of them; det( ) means the determinant of a matrix, Σ is the covariance matrix; Σ = (Var ij ) i,j = 1,…,n; Var ii = 1; and Var ij = ρ ij (i ≠ j); ρ ij is the correlation coefficient between variables x i and x j ; Σ −1 means inversion of matrix Σ; and x t means transposition of vector x.By using Equation ( 8), the PDF corresponding to the multivariate Weibull distribution is obtained through Equation ( 9): where u is the vector formed by several Weibull variables; J is the Jacobian matrix; and ⏐⏐ means absolute value.The elements of the Jacobian matrix are shown in Equation (10): Thus, the Jacobian matrix has non-zero elements in its diagonal.The determinant of this matrix is obtained through the equation expressed in Equation (11): where Π means the product of a sequence of terms featuring the i index.
The multivariate Weibull PDF of a group of variables is shown in Equation ( 12) as a function of the multivariate Standard Normal PDF and ntw( ).
Equation ( 12) can be expressed as in Equation ( 13) if Equation ( 8) is taken into account.
Notice that Equation ( 13) depends on two parameters per variable (C i and k i ) and one for every pair of them (ρ ij ).However, the correlation coefficient ρ, between pairs of variables is established according to Standard Normal distributed variables, so the relationship between both parameters has to be obtained.By making a numerical approach based on the Cholesky decomposition (see Appendix), an interval can be obtained for each parameter where the correlation coefficients in Equation ( 13) can be considered according to Weibull variables, i.e., where the elements of the covariance matrix, Σ, in Equation ( 13) correspond to the correlation coefficients between pairs of variables u i and u j .The intervals are shown in Equation ( 14): Therefore, in many cases the parameters used in Equation ( 13) are defined by the behavior of the group of Weibull variables.
Even though Equation (13) seems a trifle complex, it should be emphasized that, when introducing it in a software application, its complexity does not depend on the number of variables, n.Moreover, some models referred to in section I serve only the bivariate case and those applied to the multivariate one do not consider the correlation coefficients, which are the classical measurements of dependence.

Multivariate Wind Speed Distribution
According to [14][15][16] the cumulative behavior of the wind speed at location i, v i , can be described by a Weibull distribution with parameters C i and k i , and the relationship between every two distributions can be described by its correlation coefficient [19][20][21][22].So, the multivariate wind speed PDF, or multilocation wind speed PDF, for a group of n variables (v 1 ,…,v n ), with Weibull parameters C i and k i and correlation matrix R v referred to the pairs of Normally distributed variables, is shown in Equation ( 15): where ntw −1 ( ) and ntwʹ( ) are defined in Equations ( 5) and ( 6) respectively.On the other hand, in most cases the Weibull parameters of the wind speed distributions lie in the intervals expressed in Equation ( 14), so, as it is explained in Section 2.1, R v can also represent the correlation matrix of the v i variables.

Multivariate Wind Power Distribution
The relationship between wind speed and wind power [14,16] is defined by Equation ( 16): where P i is the power contained in an airstream that is flowing through a surface of area A i and d i is the air density at location i.Therefore, applying a change of variables, P i can be described by a Weibull distribution of parameters C i ʹ and k i ʹ [14], expressed in Equation ( 17): where C i and k i are the Weibull parameters of the wind speed at location i.Therefore, as defined in Equation ( 13) the multivariate wind power PDF is expressed in Equation ( 18) as a function of the parameters C i ʹ and k i ʹ.
And in Equation ( 19), as a function of the parameters of the wind speed distribution.
In both cases, Equations ( 18) and ( 19), the matrix R P contains the correlation coefficients corresponding to the pairs of normally distributed variables.In most cases the parameters of the Weibull distributions defined in Equation ( 17) lie outside the intervals expressed in Equation ( 14), so, as it is explained in Section 2.1, R P does not represent the correlation matrix of the P i variables.

Bivariate Weibull Distribution
In many cases the bivariate Weibull distribution is sought in order to describe the wind speed or wind power behavior in a pair of locations.Due to its importance, we have considered it interesting to develop here as a particular case.

Bivariate Weibull Distribution Applied to Wind Speed
Equation ( 9) specifically for n = 2 is shown in Equation ( 20): And the bivariate Standard Normal PDF is expressed in Equation ( 21): By using Equations ( 20) and ( 21) the bivariate Weibull PDF Equation ( 22) is obtained as a function of C 1 , k 1 , C 2 , k 2 and ρ, which stands for the correlation coefficient between x 1 and x 2 but, as has been stated above, can be considered as the correlation coefficient between v 1 and v 2 .
In order to simplify Equation ( 22), as it depends on x 1 and x 2 , their relationships with v 1 and v 2 are shown in Equation ( 23): As stated, in most cases the Weibull parameters to define the wind speed behavior lie in the intervals given in Equation ( 14), so Normal and wind speed correlation coefficients can be considered equal.The bivariate wind speed PDF for several values of ρ is shown in Figures 2-4

Correlation Coefficient Inference
In order to perform the correlation coefficient inference between two variables (u 1 , u 2 ), the sample correlation coefficient has to be tested with a hypothesis [23][24][25].The sample value, r, is obtained using Equation ( 24), once the Weibull variables are changed to Normal ones (y 1 , y 2 ).r = ∑ y 1j − y 1 y 2j − y 2 where y ij is the jth sample value of the variable y i and i y is the sample mean of the variable y i .
The hypotheses to be checked in this case are the following: where ρ 0 is a known value that corresponds to a bivariate Normal distribution, which needs to be tested.In order to do so, a new variable z is obtained from the sample correlation coefficient r, as expressed in Equation ( 25): According to [23][24][25], the variable Z is Normal distributed with the parameters given in Equation ( 26): where n is the number of elements in the sample.
Depending on the significance level, α, a confidence interval, CI = [zmin, zmax], is established, in which its limits are expressed in an implicit way in Equation ( 27): where F z ( ) is the CDF of the variable Z, according to the parameters of Equation ( 26).So, if z, obtained through Equation ( 25), belongs to the CI, the Null Hypothesis H 0 can be accepted with significance level α, and if not, it cannot be accepted.
If H 0 is accepted, the correlation coefficient corresponding to the bivariate Weibull distribution has to be obtained.
According to previous sections, the correlation coefficients corresponding to the bivariate Normal distribution and to the Weibull one, when it represents a pair of wind speed variables, are approximately the same, so if H 0 is accepted, it can be said that ρ 0 can be taken as an estimation of the correlation coefficient in a bivariate Weibull distribution with significance level of α.

Bivariate Weibull Distribution Applied to Wind Power
According to Equation ( 19), the bivariate wind power PDF is expressed in Equation ( 28): where x 1 and x 2 are shown in Equation ( 29): The relationship between both parameters is shown in Figure 5.

Case Study
As a case study, it can be assessed if a certain model of correlation coefficient between a pair of locations can be accepted in order to estimate its value as a function of the distance between them.
The main features of the behavior of the wind in Galicia, in the Northwest of Spain, are that during winter, the winds blow from the Southwest and are very constant and powerful and during the summer, the winds normally blow softly from the Northeast.There are a great number of meteorological stations spread throughout Galicia [26], if wind speed series of data are collected through them, the Weibull distribution parameters for each location can be obtained.Moreover, the correlation coefficients of each pair can be derived according to Equation (24).
In order to estimate the correlation coefficient for locations with a low number of simultaneous sample values of wind speed measures, the relationship between the correlation coefficient and the distance can be analyzed [27].The great-circle distance between two locations (i and j) can be obtained through Equation (30), as a function of their geographical coordinates.d ij = R E •arccos sin(lat i )sin lat j + cos(lat i )cos lat j cos lon i − lon j (30) where d ij is the distance between location i and j; R E is the Earth's radius; lat i and lon i are the latitude and longitude coordinates of the location I; sin( ) and cos( ) are the sine and cosine functions; and arccos( ) is the inverse of the cosine function.Therefore, including all the possible pairs (distance, correlation coefficient) in Galicia and by means of the least square method, the relationship Equation ( 31) is derived.
where ρ is the correlation coefficient; dist is the distance between locations in km; and a and b are parameters obtained for each case.Here, a = −0.0007km −1 and b = 0.6589.So, if the correlation coefficient between a location not included in the previous analysis (Coto Muiño), and another one that is included (Melide), needs to be estimated, all that has to be done is to apply Equation (30) between both locations, and then Equation (31), after which a value of ρ 0 = 0.6012 is obtained.
Moreover, by utilizing simultaneous sample data (n = 1000) from both locations, the sample correlation coefficient can be obtained through Equation ( 24), r = 0.6202, and the change suggested in Equation ( 25) applied, to obtain z = 0.7253.
Considering α = 0.05, Z ~ N(0.6951, 0.0317), the CI obtained is CI = [0.6330,0.7571].Therefore, as explained in the previous sections, as z min < z < z max , ρ 0 = 0.6012 can be accepted as the correlation coefficient of the bivariate Weibull distributions corresponding to the wind speed data of those locations.The results are shown in Table 1.

Conclusions
In this paper, a Normal to Weibull change of variables has been defined.It should be noticed that the process can be applied to any type of variables, even inversely.The multivariate Weibull PDF has been obtained and justified, depending on the classic parameters of a single variable and the correlation coefficients between pairs of them.It upgrades former approaches that mainly consist of models based on parameters that have no direct relationship with the univariate parameters and the usual dependence measurement.The function proposed can be easily implemented in a software application regardless of the number of variables.The bivariate case seems a bit complex, compared to other models, but it uses the correlation coefficient between both variables.From the point of view of n variables, each defined by a Weibull distribution with correlation coefficients between pairs given, the PDF proposed is not an approximation, it provides exact results.Additionally, the application of the multivariate Weibull PDF to wind speed and wind power has been explained and derived.The bivariate case for both has also been specified due to its relevance, and some figures are given for clarification purposes.Moreover, the inference of the correlation coefficient in the bivariate wind speed distribution is explained and applied to a particular case.

Appendix
The decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose is called the Cholesky decomposition.A Hermitian matrix is a square matrix with complex entries that is equal to its conjugate-transpose.
Therefore, given Ω, Hermitian and positive-definite, the Cholesky decomposition consists of obtaining L, fulfilling Equation (32): where L * means the conjugate transpose of the matrix L. The Cholesky decomposition is mainly used for the numerical solution of linear equations, linear least squares problems, non-linear optimization or in Kalman filters.Here it is utilized in another application: the Monte Carlo [28][29][30] simulation.
Given X, a matrix of uncorrelated series of samples, and Ω, the desired correlation matrix for these series, Equation (33) is applied in order to obtain Y, a matrix of correlated series of samples according to Ω, where L is the result of the Cholesky decomposition of Ω.
Moreover, if X is a matrix of series of samples where each of these series follows a Normal distribution, the resulting matrix in Equation (33), Y, is also a matrix of series of samples that follows a Normal distribution, which can easily be demonstrated.Equation (33) has the shape shown in Equation (34).

Figure 1 .
Figure 1.Standardized Normal variable as a function of a Weibull one, with various values of the scale parameter C.

Figure 5 .
Figure 5. Relationship between Standard Normal and wind power correlation coefficients.

Table 1 .
Results of the case study.