Joint Probabilistic Modeling of Wind Speed and Wind Direction for Wind Energy Analysis: A Case Study in Humansdorp and Noupoort

: South Africa has great potential for considering wind energy as an alternative resource. The climatology allows for signiﬁcant wind energy production. An accurate joint description of the wind speed (linear) and wind direction (circular) characteristics is important for wind farm development. In this paper, a bivariate class of ﬂexible joint probability density functions of wind speed and wind direction for the use in wind energy analysis is presented. This joint model accounts for bimodality, skewness, and a dependency structure between the wind speed and wind direction. For the joint probabilistic description of the wind speed and wind direction, special cases of this bivariate class are evaluated, namely the semi-parametric Möbius model on the disc, the Möbius distribution on the disc, and the Beta type III Möbius distribution on the disc. These three special cases are applied to wind speed and wind direction data recorded every ten minutes at two locations in South Africa. Evaluation of the models is based on three different information criteria and normalized deviation. Overall, the semi-parametric model is superior to the parametric models based on the performance measures. The wind energy potential at the two locations is evaluated using the semi-parametric model.


Introduction
In South Africa, fossil fuels are the main form of energy supply. This results in emissions causing local and global environmental complications. Fossil fuels are not sustainable and, therefore, it is of importance to investigate renewable energy resources. Eskom (the electricity supply commission of SA) is a South African electricity public utility that generates approximately 95% of electricity used in South Africa. It is also well known that Eskom's current fleet of power stations is old and suffers from reliability problems, resulting in frequent load shedding (a controlled blackout to respond to unplanned events to protect the electricity power system from a total blackout), with a negative impact on the South African economy. Renewable resources also have inherent constraints; for example, solar power is limited to the availability of the sun rays and is costly. Considering the climatology of South Africa, wind energy stands out as a viable option. From an environmental impact perspective, wind energy is superior in comparison to other energy sources [1]. As reported by [1], even though wind energy is popular due to the cost efficiency, there are some unfavorable environmental impacts of wind energy, such as the noise produced by the windmills, its impact on the ecology (bird hits), etc. To overcome these negative impacts, the design of the wind farms is of utmost importance.

Site Location and Wind Data
As of 2019, in South Africa, there are 33 wind farms: 22 fully operational and 11 in construction (based on the South African Wind Energy Association https://sawea.org.za/). In order to assess the competence of the proposed models, two stations situated in South Africa, namely Humansdorp and Noupoort, have been selected. These locations already have a fully operational wind farm; however, they were chosen due to the availability of data. The wind speed and wind direction provided by the Wind Atlas for South Africa project (WASA) (http://wasadata.csir.co.za/wasa1/ WASAData (accessed 29 August 2019)) were recorded every 10 min at 20 and 60 m heights. For each location, the main characteristics were reported by the WASA (http://stel-apps.csir.co.za/wasa-data/ docs/WASA1Station_and_Site_Description_Report_April2014.pdf (accessed 29 August 2019)); these characteristics include a photographic documentation of mast design, installation, and surroundings, as well as elevation maps that have been constructed for each site from Shuttle Radar Topography Mission (SRTM) 3 arc-second data. These reported images provide an overall view of the location and terrain. Figure 1 shows the locations of the stations and Table 1 provides the geographical details for the locations.  In Table 2, information on the quantity and quality of the measured wind data for the two locations is given. The period of utilized data, the total number of observations, the number of expected data, and absent data are presented, respectively. The record contained no calm data, as the average of the wind speed over a ten-minute interval was recorded. In Table 3, the descriptive statistics for the wind speed for the two locations are given; specifically, the maximum (Max), mean, standard deviation (std), skewness, and kurtosis (skewness is a measure of the asymmetry, and kurtosis measures the peakedness of the distribution). In Table 4, the values of the main circular statistics [22] of the wind direction for the two locations are given. Specifically, the mean resultant length (ω), mean direction (θ ), circular variance (V Θ ), circular standard deviation (ν Θ ), circular dispersion (δ), circular skewness (s), and circular kurtosis ( k) are shown.  Table 3, Noupoort has the highest mean wind speed with the value of 7.91 m/s, while Humansdorp has the lowest mean wind speed with the value of 6.01 m/s. Since the skewness is positive for both the stations, we can conclude that the distributions are skewed to the right. The wind direction data for the Humansdorp location seems to be more concentrated based on theω, as seen in Table 4. Based on theθ, the Humansdorp location is dominated by winds blowing from the West-Northwesterly (for θ ∈ [1.77; 2.16]) sector while for Noupoort the dominant wind direction is North Westerly (for θ ∈ [1.57; 2.36]).

Models
Based on the assumption that the air density (air density is dependent on altitude, humidity, and temperature; we consider the air density to be independent of the wind characteristics and be equal to a constant value of 1.225 kg/m 3 ), ρ, is constant, the wind power probability density function is expressed as [4] The wind power density defined in (1) is used to evaluate the wind energy at the two locations. Consider the following joint PDF for the wind speed, X, and the wind direction, Θ, namely the General Möbius distribution on the disc (GM) [23], with a ∈ [0, 1), µ ∈ (−π, π] and x ∈ [0, 1], θ ∈ (−π, π], g(·) is a continuous function with support [0, 1], and where g (k) (·) denotes the derivatives of g(·) around zero. We denote this class as (X, θ) ∼ GM (Ψ, g) where the set of parameters is denoted by Ψ with Ψ = (a, µ). The parameters can be interpreted as a controlling the skewness, or the asymmetry for the length from the center of the disc, and µ controlling the orientation of the distribution.
Using the results of [23], we present the following three special models of (2), namely (i) a semi-parametric Möbius model on the disc (SPM), (ii) the Möbius distribution on the disc (MD), and (iii) the Beta type III Möbius distribution on the disc (BM), to jointly model the wind speed and wind direction for the wind data observed at Humansdorp and Noupoort. The parametric special cases are obtained by considering the form of the function g(·) to be as given in Table 5. By substituting the form of g(·) in (2), we obtain the joint PDF of the proposed models, as given in Table 6. Table 5. Form of g(·) considered for the parametric special cases.

Model g(w)
Parameter Conditions The parameters in the MD and BM models can be interpreted as γ controlling the steepness of the concentration and β contributing to the concentration by adding bimodality. As the value of β increases (β > 1), the model introduces another mode in the form of an antimode. The behavior of the parameters is discussed in [23]. Table 6. Proposed joint probability density functions (PDFs) used for the joint modeling of wind speed and wind direction.

Distribution Probability Density Function
Domain of

Fitting of the System of Wind Distributions
For the estimation of the parameters of the SPM model, the computational stages for estimating g(·) and the parameters are given by Algorithm 1. For the parameter estimation of the MD and BM models, maximum likelihood (ML) estimation is performed. Since closed-form expressions could not be obtained for the parameters, numerical optimization of the ML is performed using the "optimization" package available in the R software.
A general algorithm for estimating the parameters of the SPM model is given in [23]. In this paper, a specific case of the general algorithm was used and is described in Algorithm 1.
Take note that ,in Algorithm 1 Step 1, the kernel function is estimated using the argument x 2 . This was chosen as it is the argument before the Möbius transformation and, since the transformation applies to the variable, the function of g(·) can be estimated using the original argument.
In Algorithm 1, the starting values for a and µ are obtained using a randomized technique of simulating random numbers for a ∈ [0, 1) and µ ∈ [−π, π) and selecting the set of parameters,Ψ, that maximize the likelihood function.

Evaluation of Goodness-of-Fit
In this paper, four goodness-of-fit metrics were applied to evaluate the models. The Akaike information criterion (AIC) estimates the relative amount of information lost by a given model. The Bayesian information criterion (BIC) is widely used for model selection and is similar to the AIC. The Hannan-Quinn information criterion (HQC) is similar to the BIC and is strongly consistent. These three measures differ in terms of the amount of model complexity penalization. The two measures BIC and HQC penalize more severely than AIC [24]. The BIC attempts to find the true model among the set of candidates; however, this is not the case for the AIC [25]. The conclusion made using the AIC and BIC will only differ when the AIC chooses a larger model (more complex model) than the BIC. From a statistical viewpoint, information criteria are commonly used when assessing the accuracy Algorithm 1 Maximum likelihood (ML) estimation algorithm for semi-parametric Möbius model on the disc (SPM) model.
Step 1. Given the observations x 1 , x 2 , . . . , x n , we estimate g(·) in (2) using kernel density estimation as follows:ĝ where w i = x 2 i , h is the bandwidth, and K(·) is the kernel function. For this paper, we assume a Gaussian kernel with restricted support of x ∈ [0, 1] and use a plug-in bandwidth selection technique to obtain the optimal h value.
Step 2. For given a and µ, compute the value of the argument of g(·) in (2), i.e., compute the value Using the function estimated in Step 1,ĝ(·), calculate the corresponding function ofĝ(q i ).
Step 4. Repeat Steps 2 and 3 using an optimization technique to achieve convergence for the parameters a and µ. The final estimated values are those that maximize the likelihood function. of a probabilistic model [26]. These three strict criteria are considered most significant in measuring performance accuracy of distributions. These three metrics are defined in Table 7, where p is the number of estimated parameters in the model, n the total number of data points, andL the maximum value of the likelihood function for a specific model. Table 7. Evaluated goodness-of-fit metrics.

AIC
2p − 2 ln(L) BIC p ln(n) − 2 ln(L) HQC 2p ln[ln(n)] − 2 ln(L) Remark 1. It is appropriate to compare parametric with semi-parametric models via AIC or BIC. We can motivate this two ways. The process in this paper of implementing kernel density estimation on the unknown function g(·) and then maximum likelihood estimation on the model is similar to that in [27], where they used the AIC and BIC for model selection in a semi-parametric context. Secondly, from the profile likelihood viewpoint [28], by substituting the estimated g() into the density function (Equation (2)), we obtain a parametric form for which we can then apply maximum likelihood estimation as in [29][30][31].
The fourth goodness-of-fit measure is the normalized deviation, which is used for circular-linear models [4]. The normalized deviation is based on the difference between the observed, N ij , and expected, np (e) ij , number of data points. The expression for the normalized deviation, d ij , is given by where σ (e) ij is the normalizing factor and the expected standard deviation. Normalized deviation values close to zero indicate a model of good fit.

Remark 2.
On the use of AIC, BIC, and HQC in order to compare parametric and semi-parametric models, one must make some regularity assumptions. As such, we will be assuming the estimation of the nonparametric component g(·) using the kernel density estimation is a finite-dimensional problem in our case. Once we estimate the nonparametric component, we apply the plugin approach into the likelihood (as outlined in Algorithm 1) and then maximize the log likelihood to obtain the estimate of parameters; indeed, this a maximization over a restricted parameter space. There is no guarantee that using this method will provide good estimates for the parameters. Therefore, in order to make sure the estimates fulfill satisfactory conditions, we need to use other measures in addition to the aforementioned information criteria. As such, we computed the normalized deviation for a valid comparison.

Influence of Wind Speed and Wind Direction Data
The bivariate relationship between the wind speed and wind direction is measured by the linear-circular correlation coefficient, r x,θ , defined in [22] as where r xc = cor(x, cos θ), r xs = cor(x, sin θ), and r cs = cor(cos θ, sin θ). The correlation coefficients for the two locations are given in Table 8, emphasizing the need for a joint model with an embedded correlation structure. Subsequently, in Figures 2 and 3, the wind speed and wind direction for the two locations are given to assist the practitioner in choosing an appropriate model. Figures 2a-d and 3a-d illustrate the wind speed and wind direction histograms, respectively. From the histograms, the bimodality in the wind direction and the skewness in the wind speed can be seen. Figures 2e-h and 3e-h illustrate the wind speed and wind direction boxplots, respectively.    Figures 2g,h and 3g,h were constructed using the methodology proposed by [32]. The circular boxplots display the skewness present in the wind direction. The windrose diagrams in Figures 2i and 3j illustrate the intensity of the wind speed in the specific wind directions.

Results and Discussion
The results from estimating the joint wind speed and wind direction distributions are presented and discussed in this section. In Table 9, the parameter estimates and the performance measures for the proposed models are given. The following inferences can be drawn from Table 9: 1.
For the Humansdorp location at 20 m, the SPM model performs the best overall. The BM model performs the best from the parametric models.

2.
For the Humansdorp location at 60 m, the SPM model performs the best overall. The MD model performs the best from the parametric models.

3.
For the Noupoort location at 20 m, the SPM model performs the best overall. The BM model performs the best from the parametric models. 4.
For the Noupoort location at 60 m, the SPM model performs the best overall. The BM model performs the best from the parametric models. 5.
The SPM model outperforms overall based on the performance measures. This model has the advantage of limited distributional assumptions. However, it is worth noting that the selection of the bandwidth plays an important role in statistical analysis. 6.
The BM model performs the best from the parametric models based on the performance measures. This is a result of the model's flexibility in capturing bimodality and skewness present in the data. The presence of bimodality and skewness can be seen in the data plots (see Figure 2 and Figure 3).
For comparison purposes, it is important to note that, since our model is fitted using the maximum-likelihood method (for the parameters), we can use the AIC and/or BIC, which are based on the likelihood method. The squared error loss can be used to refine the aforementioned criteria measures as discussed in [33].
In Table 10, a summary of the normalized deviations for the parametric models for each location is provided. Figure 4 illustrates the normalized deviation plots for the parametric models for each location. Using this metric from Table 10, we can conclude that the MD model has a better fit for the Humansdorp location and the BM model fits better for the Noupoort location, as the median of the normalized deviation is closer to zero.     For the wind power density, (1), we substitute the proposed parametric base models f X,Θ (x, θ) (joint model of wind speed and wind direction) to evaluate the wind energy. Based on the parametric models of best fit in Table 9, the wind power density is plotted in Figure 5 for the estimated parameter values given in Table 9.  The intensity plots for the estimated wind power density (1) for when the base model is considered as the SPM model (model of best fit) are given in Figure 6 for various values of θ and x.   Figure 6 illustrates that the estimated wind power density is maximal when the wind is blowing in a North Westerly direction (for θ between 1.57 to 2.36, given that −π is the North direction) and when the wind speed is low to moderate. It can also be noted that at the Humansdorp location a height of 20 m displays a stronger wind power density compared to the height of 60 m, and at the Noupoort location, a height of 60 m displays a stronger wind power density compared to the height of 20 m.

Conclusions
In this paper, we propose a class of joint probability models for the wind speed and wind direction data at Humansdorp and Noupoort. This class accounts for an embedded correlation structure between the wind speed and wind direction and additionally captures the skewness and bimodality present in the data. Based on the circular-linear correlation coefficient, the need to account for dependency between the wind speed and wind direction is emphasized. The wind energies at the two locations were evaluated using the SPM model (best performing model) to determine the wind energy potential at the different heights for the two locations. One of the reviewers pointed out that a comparisons between the models proposed in this paper and existing joint wind speed and wind direction models are inappropriate due to the differences in the definitions of the manifolds. Considering the situation surrounding the supply of electricity in South Africa, the need for investigating and investing in alternative renewable energy sources is of utmost importance to the socioeconomic development of the country.