1. Introduction
The global climate emergency has given rise to great interest in the energy transition, which should contribute to mitigating CO2 emissions caused by energy generation using conventional sources. Renewable energies play a significant role in this transition. It is necessary to estimate accurately the energy obtained from these sources to achieve this transition.
Global electricity demand decreased in 2020 but grew by 6% in 2021, representing one of the most significant annual growth rates since the 2010 monetary crisis [
1]. Electricity consumption is expected to average 2.7% annual growth during the 2022–2024 period [
1]. Considering the potential future demand, and the high costs of conventional energy resources and their environmental impact, the interest in using alternative energy sources has increased, further raising the need to evaluate the technical characteristics of these sources [
2]. In the case of wind energy, estimations are made using a methodology based on medium-term average values (of the order of minutes) [
3]. This methodology specifies a sampling frequency of 1 or 2 Hz to obtain mean velocities every 10 min. However, instantaneous wind speeds with a higher magnitude than the ten-minute average wind speed will not have the impact on the power calculated using the mean value that they would if instantaneous power were calculated because power is directly proportional to the cube of the speed. Thus, the procedure underestimates power when using mean wind speed compared to the cumulative power of instantaneous wind speed values. In other words, the methodology minimizes wind variations in the order of seconds (short-term variability) and, consequently, wind power generation, which results in an underestimation of the wind resource [
4].
Renewable energy has several advantages over fossil fuels, such as the availability of renewable resources to implement distributed generation systems, the access and modularity of its technologies, and the potential for each user to generate their own energy [
5]. Therefore, wind energy is one of the most widely used renewable sources, and by 2021, it contributed with an installed capacity of 840 GW to the global electricity system [
6]. Small wind energy (defined as wind energy that uses wind turbines with sweep areas of less than 200 m
2 [
3]), or low-power wind energy, is a novel contribution to the electricity system, and the use of this technology is expected to increase complementing photovoltaic systems in distributed generation. As of the end of 2019, mini-wind power had an installed capacity of 1.72 GW worldwide [
7]. Small wind installations have particular features, such as their capacity to provide energy in a distributed manner, feasible operation with moderate winds, requirement of small sites, and suitable integration in urban, semi-urban, industrial, and agricultural environments, and they are often used for the generation of energy close points [
8].
Implementing wind energy requires accurate resource assessment [
9,
10], which can be achieved through a probability density function describing wind behavior. Typically, the Weibull probability density function can be used for this task. However, it is important to consider that other functions may better represent different wind regimes. According to Chang [
11] and Cheng [
12], the wind speed distribution for a particular location determines the available wind energy and the performance of an energy conversion system. Therefore, determining the function that best represents the wind regime at a location will contribute to a better estimation. Several studies have used different probability density functions, such as Weibull, Gamma, Raleigh, Beta, log-normal, and their combinations [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29]. According to Wais [
13,
14], the two-parameter Weibull distribution is commonly used to model wind speed distribution in the wind industry. However, it may not always be sufficient for evaluating available wind energy. On the other hand, Wais suggests that the three-parameter Weibull distribution is advantageous because it fits better than the typical Weibull distribution with certain wind patterns. The literature also states that combined functions tend to have better statistical behavior compared to methods using a single function [
19]. These combined distributions can be more efficient than single-function distributions for some wind regimes, although despite their advantages, the main drawbacks of combined distributions are their complexity and the computational time required in estimating the associated parameters [
24]. In this regard, it is crucial to select the probability density function(s) appropriate to the data measured in the study area to obtain reliable energy estimations [
9,
11]; as stated by Cheng [
12], the analysis must consider high-frequency wind data samples, i.e., short-term wind variability must be borne in mind.
Probability density functions are characterized by their statistical parameters of shape and scale, which can be obtained by different numerical methods; among the most used are the Maximum Likelihood Method (ML), Method of Moments (MM), Justus Empirical Method (EJ), Lysen Empirical Method (EL), Energy Pattern Method (EP), Graphical Method (GP), Standard Deviation Method (SD) and Modified Maximum Likelihood Method (MLM) [
30]. Researchers use these methods to compare statistical model adjustment to measured data [
30,
31,
32,
33,
34,
35,
36,
37,
38,
39]. Using monthly data, Tizgui et al. [
32] have found that EJ and MLM achieve a better estimation for the Weibull PDF, while Bilir et al. [
19] consider EJ to be more accurate for annual estimation with hourly data and the Weibull PDF.
To evaluate the wind resource, the International Standard IEC-61-400-12-1 [
3] establishes that wind potential estimates must be obtained using the ten-minute wind speed data, of average wind velocity recorded with a specific sampling frequency (e.g., 1 Hz) every 10 min. The averaging time, or stationarity period, defined as the time interval in which the wind velocity can be considered statistically constant, plays an important role in estimating wind potential. As the stationarity period increases, wind variability recorded in the time series decreases, negatively impacting the energy estimate. Rodriguez-Hernandez et al. [
8] estimate an energy difference of 17% between stationarity periods of 1 and 10 min, while Arredondo et al. [
4] find that the available energy density using 600 s (10 min) as the stationarity period results in underestimations of 1%, 8%, 10%, 13.7%, 19.4%, and 22.7% for stationarity periods of 300, 60, 30, 5, 1, and 0.1 s, respectively.
Consequently, improving the methodology for estimating wind energy considering short-term wind variability, the probability density function, and the appropriate method would help to increase the certainty of the energy that can be generated, thus contributing to the penetration of wind energy in the residential and commercial sectors. Therefore, this study aimed to determine the methodology that best estimates the energy obtained from a small wind turbine. This paper is structured as follows:
Section 2 describes data collection and analysis, the determination of the probability density functions, the calculations for energy estimation and the statistical tests to validate such estimates.
Section 3 presents the results obtained, and
Section 4 presents the study’s conclusions.
2. Methodology
The data used in this study were measured during 2017, 2018, and 2019 using a Campbell Scientific CSAT3 sonic anemometer, which records the orthogonal components
,
, and
of wind speed. The wind speed magnitude was obtained using the equation
where
and
are the easterly and northerly velocity components, respectively, and
v is the velocity magnitude in the horizontal plane.
The anemometer can record at sampling frequencies from 1 to 100 Hz, with measurement errors of ±8.0 cm/s and ±4.0 cm/s in the horizontal and vertical components, respectively [
40]. The instrument was placed 20 m above ground level at the UABC Engineering Institute in Mexicali B.C., which is geographically located at coordinates
N and
W. Mexicali’s climate is classified as dry desert (BW), with extreme summer (average maximum temperature between 41 °C and 43 °C) and winter (average maximum temperature between 11 °C and 13 °C) temperatures. The average annual temperature is between 21 °C and 23 °C [
41]. In 2022, the population of Mexicali was 1,049,792 residents [
42], and its electricity consumption was 4736.29 GWh until 2016 [
43]. A sampling frequency of 10 Hz was used in this study, thus obtaining 864,000 daily data equivalents to more than
data per year. From the original time series, time series of average values were obtained every 5 s, 30 s, 60 s and 600 s, which would be equivalent to recording the wind with those average sampling times.
2.1. Probability Density Functions and Methods for Statistical Parameter Estimation
Four different probability density functions (Weibull, Gamma, Rayleigh, and a combination of the three) were used to describe the data’s statistical behavior. The PDFs take on a wide variety of shapes depending on the value of the shape parameter and the level of stretch or squeeze indicated by the scale parameter . The and values for a dataset are unique, but there are diverse methods to determine them depending on the PDF used as explained in the following subsections.
2.1.1. Weibull Probability Density Function
The Weibull likelihood function was used to determine the characterization of the wind resource because it reliably describes wind behavior in different regions [
27]:
The two parameters, shape and scale , were determined by the empirical methods of Justus, Lysen, and the energy pattern.
Empirical Justus Method (EJ)
In this method, parameters
and
are calculated using the following expressions
where
is the mean wind speed and
is the standard deviation, while
is the Gamma function [
44].
Empirical Lysen Method (EL)
In this method, parameter
is calculated by means of Equation (
3), while parameter
is calculated using the following expression [
45]
Energy Pattern Factor Method ()
In this method, it is necessary to determine the energy pattern factor
on which the shape factor depends; the equation is used for the scale factor. Factor
is the ratio between the total power available in the wind and the power corresponding to the cube of the average wind speed [
46,
47]:
2.1.2. Rayleigh Probability Density Function
The Rayleigh probability density function is a special form of the Weibull distribution, in which the shape parameter always equals 2, and only the dispersion parameter (standard deviation) is used [
23]:
2.1.3. Probability Density Function Gamma
The shape and scale parameters of the Gamma PDF are obtained using the methods of moments and maximum likelihood [
48]:
Method of Moments (MM)
In the Method of Moments, the parameters are obtained as follows [
48]:
Maximum Likelihood Method (ML)
In this method, the parameters are obtained as follows:
where
D is given by [
48]
2.1.4. PDF Mix
The PDF Mix was conceived as a function to describe the statistical behavior of the wind in each speed interval exactly, an objective that cannot be obtained with typical densities. From the Weibull, Gamma, and Rayleigh distributions obtained before, the PDF with the best performance is selected in each speed interval to achieve the goal. The distribution obtained is a continuous function by intervals.
2.2. Energy Estimation with PDF
The annual energy estimate for each year (2017, 2018 and 2019) and each time series (0.1, 5, 30, 60 and 600 s) were calculated using the PDFs and power curve of a wind turbine. The probability that the wind speed fell in the i-th interval [
a,
b] was calculated, first using the equation
where
is the PDF used, while
and
are the lower and upper bounds, respectively, of the i-th class of the velocity frequency histogram,
is the mean value of each class, and the function
is the cumulative probability function given by equation
It is necessary to point out that and belong to the set of natural numbers such that is the histogram class size.
The interval probability found represents the percentage of the time, in the complete series, in which wind speed occurred. This percentage is converted to hours ) by multiplying it by the total number of hours in the time series.
As a second step, the power curve of the wind turbine was used to determine the power that, according to the manufacturer, the wind turbine delivers when the wind is flowing at speed
. Then, the two previous values were multiplied (duration times power) to obtain the energy delivered by the small wind turbine with the wind flowing at wind speed
. This procedure was repeated for each of the
intervals or classes. Finally, the estimation of the total energy generated per year
was obtained by summing the estimated energies in each speed interval as indicated by the equation
where
P is the power corresponding to the mean interval velocity.
Comparison between Estimated and Generated Energy
A 200 W small wind turbine was used hypothetically to evaluate the performance of the different statistical models in estimating energy. To compute the power based on wind speed measurements obtained from the ultrasonic anemometer, we utilized the power curve provided by the manufacturer in the datasheet. The wind turbine can operate at slow wind speeds, starting from 1 m/s up until a survivor speed of 50 m/s. Typically, a power curve is only strictly valid for a subset of all atmospheric conditions, known as the inner range [
49]. The outer range is the complementary subset of atmospheric conditions where wind turbines also operate. This uncertainty associated with wind spatial variability was not analyzed in this study.
The generated energy was obtained considering the instantaneous power corresponding to each of the
values of wind speed in each series using the following equation
where
is the time corresponding to the stationarity period, and
P is the instantaneous power delivered by the wind turbine when the wind has the speed
.
Energy comparisons were performed for each measured year and each stationarity period to determine which PDF resulted in a more accurate estimation compared to the energy produced by a wind turbine. The estimation error percentage was obtained from the expression [
32]
2.3. Statistical Tests
The performance of the probabilistic models obtained was evaluated using the statistical tests described below, where is the relative frequency of the observed velocity values, is the mean relative frequency, and is the expected frequency calculated with the theoretical distributions.
2.3.1. Coefficient of Determination
The coefficient of determination is a measure of the relationship between a predicted probability density function and measured data. Mathematically, it is obtained as follows [
50]:
Its maximum value is 1, so the closer it is to 1, the better its fit.
2.3.2. Chi-Square ()
The chi-square test is a simple and common goodness-of-fit test. It essentially compares a data histogram with the probability density function. The closer the result is to 0, the more accurate it is considered [
32]:
2.3.3. Nash–Sutcliffe Efficiency Coefficient (NSEC)
The efficiency coefficient is another way to determine the accuracy of a prediction model; it is performed between the values of the probability density function and the relative frequency of measured values. As with the
, the closer to 1, the more accurate the value is considered [
51]:
2.3.4. Root Mean Square Error (RMSE)
The root mean square error is an error that estimates the accuracy of the method by comparing the difference between the estimated values and the actual values. The closer the value is to zero, the more accurate the method [
36]:
2.3.5. Mean Square Error (MSE)
The mean square error is a method that calculates the difference of the mean square error between the estimated values and the true value. As with the RMSE, the closer to zero the value, the more certain the result [
36]:
2.3.6. Mean Absolute Error (MAE)
The mean absolute error is an absolute test of the difference between two variables. It is the average of the absolute errors between the frequency of each PDF and the relative frequency of the measured data. The closer to zero, the better the result [
36]:
2.3.7. Mean Absolute Percentage Error (MAPE)
The mean absolute percent error is a relative measure that indicates the percentage error between the PDF and the relative frequency of the measured data. As with MAE, the lower the MAPE value, the more accurate the result [
52]:
4. Conclusions
This study used four probability density functions to estimate the energy that a small wind turbine installed for domestic use in a desert city in Northwest Mexico can generate. When the energy calculated from the wind turbine power curve was used as a reference, the results indicated that the accuracy of the energy estimates decreases as the stationarity period increases due to the short-term wind variability in the averaging process being neglected.
On the other hand, using different numerical methods to calculate shape and scale statistical parameters leads to different ways of the probability density functions, resulting in differences in estimated energy. In general, these differences are lower than those obtained when using a PDF in different stationarity periods. This means that the short-term temporal variability of the wind represents a higher uncertainty than that associated with the statistical models used in the energy estimate, except for the RSD function. However, the combined effect of both aspects causes the highest uncertainty.
Statistical modeling of the wind data showed that the globally most used distribution to describe the behavior of the wind, PDF WEJ, is not the best in the study area. Instead, the W, GMM, and Mix PDFs have, in general, lower errors, which is why they are considered better options for energy estimation in this region. The comparison between the estimated energy and the energy calculated from the wind turbine power curve confirm the above. Moreover, based on the analysis of the seven statistical models, we can infer that an inaccurate depiction of the statistical behavior of the data at high velocities leads to a severe underestimation of the energy, as is the case with the PDF RSD.
The above conclusion highlights the importance of selecting the probability density function and the numerical method a priori to determine the shape and scale parameters, to be used in the feasibility analysis of a small wind energy project.
In this regard, the use of the Weibull probability distribution as a probabilistic model and ten-minute data to estimate energy generation, established by the International Standard IEC61400-12-1 [
3], leads to unreliable evaluations as a result of the underestimation of the resource [
54]. This results in a lower penetration of small wind energy in locations such as Mexicali, where electricity consumption is above the national average due to its intense hot season. Therefore, increasing the reliability of energy estimates using small wind turbines will increase the viability of small wind energy projects due to greater certainty, promoting greater penetration of this renewable source, particularly in the residential and commercial sectors.