In this work, two statistical methods are used to represent the uncertainty of renewable energy resources. They include the following:
2.1. Introducing Uncertainty Using CDF Split Method
The uncertainty of resource availability is often modeled by segmenting the standard normal cumulative distribution function (CDF), with a mean of zero and a standard deviation of one, into five regions of equal probability. Each region is represented by its central point, which serves as the representative sample [
30,
31]. To incorporate this into simulations, the climate data are adjusted to create five distinct datasets, corresponding to these representative points on the CDF. This adjustment is achieved by scaling all variables in the climate data by a constant factor derived from the respective CDF point. However, this approach to accounting for uncertainty in solar irradiation is relatively simplistic. It can lead to unrealistic scenarios, such as excessively high solar irradiance values, which may not accurately reflect real-world conditions [
30]. As a result, this method fails to accurately capture the year-to-year variability in climate data. Therefore, it is essential to determine the true distribution of resource data using hourly climate change data from the past 30 years. Preliminary research suggests that this approach better reflects resource uncertainty. For instance, over the last 30 years, solar irradiation in a given hour has ranged from 512 W/m
2 to 926 W/m
2. The proposed statistical model, referred to as the real-CDF approach, provides a range of values from 629 W/m
2 to 904 W/m
2, which more closely represents the actual variation in the data. In contrast, if a standard normal distribution (normal-CDF) is used, the model would show values between 761 W/m
2 and 864 W/m
2, reflecting a narrower range with less fluctuation. This narrower range does not accurately represent the true variation in climate data.
Data on resource availability, such as solar irradiation, ambient temperature, and wind speed at 12 p.m. (midday), are extracted from the past 30 years. A cumulative distribution function (CDF) is then plotted for each filtered dataset. These CDF plots represent daily resource uncertainty over the past 30 years, resulting in 365 individual CDF graphs. The average of these 365 CDFs is calculated to generate an overall average CDF plot, which summarizes the variation in resource availability. This approach forms the basis of the real-CDF methodology. Additionally, this procedure helps reduce the computational burden associated with generating 8760 individual CDF plots to represent hourly resource availability throughout the year. However, averaging all 8760 hourly CDF plots into a single CDF would be inaccurate, as solar irradiation data include zero values during night-time hours.
Figure 2 depicts the solar irradiation CDF graph at 12 p.m. (midday) on 1 January, using data from the previous 30 years.
Figure 3 depicts the average CDF plot, which is the average value of 365 daily CDFs of solar irradiation based on 365 midday real-CDF samples.
Figure 3 also shows how to identify five different coordinates that correspond to five different CDF regions. To divide the average real CDF into five equal-probability regions, y-axis is divided into five ranges of cumulative probability, namely, (0–0.2), (0.2–0.4), (0.4–0.6), (0.6–0.8), and (0.8–1.0), and the central point of each region, which are 0.1, 0.3, 0.5, 0.7, and 0.9, are marked as the representative sample [
31,
32]. The corresponding x-axis points are then identified using MATLAB R2020b tools. The average solar irradiation and wind speed of each hour are calculated using hourly climate data from the last 30 years. This is calculated by adding all 30 years’
ith hour solar irradiation and dividing the total value by 30. The result is termed as average climate data. These average climate data are then modified in the simulation study based on these five distinct
x coordinate values (multipliers), yielding five distinct datasets. This is performed by multiplying all resource variables in the climate data by a constant factor. For example, in
Figure 3, a CDF of 0.1 equals −1.688; if SD is the standard deviation corresponding to the
ith hour of solar irradiation, the multiplication factor in the
ith hour is (1 + ((−1.688) × SD)). The SD is calculated for each hour in a year (using 30 different data points), for a total of 8760 SDs.
Figure 4 shows the CDF wind speed graph at 12 p.m. (midday) from 1 January 1989 to 2019.
Figure 5 illustrates the average CDF graph for wind speed, derived from the 365 daily CDFs of solar irradiation based on midday CDF samples. The Z-score represents the number of standard deviations away from the mean value of the reference population. In this study, the reference populations are wind speed and solar irradiation. Similarly, the CDF graphs for ambient temperature and load demand are plotted in the same manner, and their average CDFs are determined.
In the real-CDF approach, the CDF curve is divided into five regions with equal probability, resulting in five distinct datasets. This means that variables such as solar irradiation, wind speed, and load demand each have five possible values for every iteration of the simulation. To select the solar irradiation value for the ith hour (where i represents any hour of the year, from 0 to 8760), the algorithm randomly selects one value from the five possible values. Additionally, solar irradiation, wind speed, and load demand are treated as independent variables. For example, during the ith hour, the MATLAB algorithm may select the minimum value of solar irradiation, the median value of wind speed, and the maximum value of load demand, each from their respective five possible values. As a result, the selection of resource and load demand values is a random process for each hour.
To select variables independently, a MATLAB algorithm is developed with 125 permutations based on three independent variables: solar irradiation, wind speed, and load demand, each with five possible values. This results in a total of 125 permutations. The algorithm randomly selects variable values from these permutations and computes the loss of load frequency (LOLF) for all possible combinations. Ambient temperature is also included in this study, but it is not treated as an independent variable. Instead, it follows the solar irradiation climate data. This means that when the MATLAB algorithm selects a multiplication factor for solar irradiation (for example, a multiplication factor equivalent to a CDF of 0.9), the ambient temperature also copies the same multiplication factor.
Using the following approach, the simulation results from the 125 permutations (125 scenarios) will be converted into probabilities. The probability of a specific event occurring is calculated by dividing the number of scenarios where the event occurs by the total number of scenarios (125). For example, if the event of achieving 99% reliability occurs in 10 out of the 125 scenarios, the probability is 8% (calculated as 10 divided by 125, multiplied by 100).
2.2. Introducing Uncertainty Using Confidence Interval
The average hourly values of solar irradiation, ambient temperature, and wind speed are calculated using climate data from the past 30 years. When three different confidence intervals (99.99%, 99%, and 95%) are selected, the data ranges for resource and load demand will vary, resulting in three distinct spreads or deviations from the mean. A higher confidence interval, such as 99.99%, will produce a wider range (spread) compared to the lower confidence intervals. This larger range better reflects the true variation in the data. Consequently, the primary objective of this study is to evaluate the reliability of the SMG system by considering these three confidence intervals.
Calculating the mean value from the past 30 years of hourly load demand data is impractical, as the number of customers in the system can change over time, leading to fluctuations in demand. Therefore, the load demand data from the most recent year (2019) are used, with a 5% standard deviation applied. Using the MATLAB
rand function, 30 different random load demands are then generated to account for variability in the system.
where
is the mean value,
is the value of
ith variable,
n is the total number of variables, and
is the sample’s standard deviation. The mean and standard deviation of solar irradiance, temperature, and wind speed can be computed using (1) and (2), as in (3), where
is the standard error of mean.
is the confidence interval, and
is the T-score at a given degree of freedom and level of significance, α. The level of significance can be calculated for a specific confidence interval selected. For instance, if confidence interval of 95% is chosen, then the level of significance becomes 0.05 (=1 − 0.95). An
x % confidence interval is a rule-based interval that covers the true value
x % of the time under simulated conditions [
32]. When the population standard deviation is unknown, a T-score is used. In this study, we use hourly climate data from the last 30 years as a sample, with the true population’s standard deviation unknown.
The MATLAB algorithm is designed to randomly select the hourly weather parameters within the range defined by the upper and lower bounds, which are determined using the chosen confidence interval. To generate the random values, a uniform distribution is used to ensure an even probability across the specified range:
where
r is the value of the variable,
a is the lower bound,
b is the upper bound, and
rand is the random function that generates random values between 0 and 1.