This section presents the methods designed and applied in this paper regarding energy generation, forecasting methods, and the statistical approach used.
This work main idea is that it is not necessary to have three years of wind measurement in a prospective wind site to have an accurate forecasting, as discussed in the last section. An up to date forecasting method could reduce significantly this time, and bring a set of benefits.
The proposed procedure main benefit is its suitability to represent the intrinsic uncertainty of wind behavior. Both inputs and outputs uncertainties are represented. The input ones are well-represented through the fuzzy variables, that group into clusters the similar wind speed behavior, and the output ones create forecasting intervals, instead of a point forecast, to represent the wind energy stochasticity. Next, the whole procedure is detailed.
2.2. Fuzzy Time Series
Time series are data sets that represent the behavior of one or more variables over time, in which the variable successive observations are not independent of each other [
38]. These variables have groups of behavior that repeat from time to time, and thus, could be used to estimate a similar situation in the future.
The Fuzzy Time Series concept was first proposed by Song and Chissom [
39], aiming at forecasting time series using linguistic values rather than numeric values as data inputs. They proposed to partition a time series into regular intervals, creating sets of linguistic functions that define groups of behavior in each of those intervals. Then, it is determined a mathematical connection that could link a past group of behaviors to the next value in the time series. Further, the pertinence of new elements on the series are cross-checked into this structure, the new element is fitted in the respective group pattern, and then a new output is calculated.
To set the FTS fundamentals, let
, be the universe of discourse in
by which fuzzy sets
are defined, and
t is time. Each fuzzy sets
can equally represent a numeric function or a linguistic one such as
,
, .... The fuzzy sets
maps a partition of the universe of discourse, representing the variables behavior. Thereafter,
is called a FTS defined on
if
is a collection of fuzzy sets as
. If exists a fuzzy relationship
−
, such that
−
−
where ∘ is an arithmetic operator, then
is said to be caused by
−
. The relationship between
and
−
can be denoted by
−
. Now assume
−
and
; a Fuzzy Logical Relationship (FLR) can be defined as
, where
is called the left-hand side (LHS) and and
is called the right-hand side (RHS) of the fuzzy logical relationship, respectively [
39]. The FTS simplest form consists of the six steps presented in Algorithm 1.
Over time, many improvements have been proposed in each of those steps proposed by [
39]. The universe of discourse partition have been one of main research fields in FTS since it affects the forecast performance [
40], and it is an open issue indeed [
41]. Huang [
42] first realized it, proposing the distribution-based and average-based approach to define the intervals size in the Cheng model [
39], improving the interval fit. Many techniques were used aiming at this purpose, like the ant colony algorithm [
43], imperialist algorithm [
44], particle swarm [
45] and genetic algorithms [
46]. A further approach proposes clustering techniques in the universe of discourse partition, such as fuzzy c-means [
47], Gath-Geva cluster [
48] and granular information [
49]. Some works developed a cluster approach directly in data history fuzzification, dismissing the creation of intervals in the universe of discourse [
50,
51]. The last addressed the clustering approach dismissing intervals in the universe of discourse as a better approach, rather than trying to find the best partition for the universe of discourse [
34].
Algorithm 1: Fuzzy Time Series [39] |
|
Thus, regarding the universe of discourse partition, this work implemented a clustering approach. The procedure consists of splitting the input variables into a set of clusters. The clusters are chunks of data that represents a characteristic behavior of the inputs, and it substitutes the universe of discourse partition of the FTS. Further, the weighted linear contribution of each cluster is used to map the output. This linear combination behaves as a FLR. As improvements, a subtractive cluster method is used to define the number of clusters, and do the automatic tuning of the cluster centers. The number of clusters is validated with Bezdek index.
This paper proposes a procedure that could be split into three stages: i.data processing with tuning parameters; ii.training; iii.testing.
Data processing stage implements a subtractive clustering (SC) method [
52] to calculate the
c number of clusters and each most likely center values
for each input set of data
. Where
r is the number of inputs, and
N the number of observations. The matrix
stores the vector of cluster prototypes center of the data set
r. There is a set of
c centers for each input variable
. SC algorithm is a single-pass method for estimating the number of clusters used by the FCM, as well as determining the initial centers in near-optimal values, helping the FCM convergence.
Each point
in the row
r of
is considered as a potential cluster center of the
r input [
35]. The potential
of data point’s
is cross-checked with the
other possible points and is defined as [
35]
where
and
are th cluster radius in data space and the cluster radius penalty, respectively. Then, let
and
be the first cluster center and its respective potential. The potential is revised for each data point by using [
35]:
then becomes
, the first cluster center of the vector
. An amount of potential is subtracted from each data point as a function of its distance from the first cluster center. The data near the first cluster center will have greatly reduced potential, and, therefore, will unlikely be selected as the next cluster center. At last, the optimal number of clusters
c is validated with the Bezdek index as follow [
36]:
where
is the membership of element
to cluster
. The optimal number of clusters is given by [
36]:
Algorithm 2 summarizes this process.
Algorithm 2: Subtractive Clustering Algorithm |
|
The number of clusters and the centers values calculated with Algorithm 2 are inputs in the FTS algorithm. Next stage is training. Thus, let
be the FTS respective output data from the inputs
. Then
become the
jth input data vector and
is its corresponding output. Thus, using Fuzzy C-means, the matrix
X is grouped into the
c calculated clusters. This is done minimizing
[
53]:
where
is an element of
and represents the membership degree of the
jth data vector in the
ith cluster;
m is a parameter which determines the fuzziness of the resulting clusters;
is center of the
ith cluster calculated by the SC in Algorithm 2;
is the distance between the input elements and cluster centers weighted by the inputs covariance norm matrix
. The distance is calculated such as
, and the norm matrix calculated as equation [
53]:
The
minimization is done by an iterative algorithm. Thus
is rewritten as Equation (
8) by Lagrange multiplier, and in each repetition, the values of
and
are updated [
34].
The final cluster center is defined when changes in
U and
V lead to insignificant improvements. Thereafter, Equation (
9) defines the membership function for the
qth variable in the
ith cluster [
34]:
Function (
10) calculates the weighted contribution of each cluster for each bonded
and respective output [
34]:
The output forecasting
(
11) is the weighted linear combination of the inputs
, where [
34]:
and
is the weight parameter of linear combination for each input, considering
such that:
.
in (
12) is minimized to calculate the
values:
Leading to the matrix of weighted contributions
H, used to design the set of
N Equation (
14) [
34]:
for each
. Thereafter,
could be solved minimizing the error
.
Then,
is calculated from (
15), where
is the pseudo-inverse of
[
34].
The Algorithm 3 resumes the procedure.
Algorithm 3: Clustering Fuzzy Time Series Forecast |
|
The Algorithm 3 yields two outputs: a set of equations that represent the wind behavior, and the forecast set of wind power generation. The forecast values are examined and used to find the most likely values, and the pessimist periods of generation along the year. Thus, a statistical analysis is performed onto forecasting. The analysis can be graphically represented, having in the x-axis the time, and in the y-axis, the power generated. Then the limits are drawn for the critical values, most unlikely values, and the risk generation periods in each part of the year.
Given a chosen significance level
, and a moving window
with
samples, two particular situations are from interest here: the critical values (cutoff) from which the above values satisfy the chosen criteria of occurrence probability [
55], and the lowest values in the last
past points of a moving window
where
n is the number of windows. These two calculations indicate, in the forecasting values, the most likely happening values and the worst case region in the last windows. For example, let
. Thus, it means it was chosen to have a 99% probability of having the forecasting values above the cutoff values in a one-sided z test given the past
samples. Also, it means that it is expected to have 1% of the values under the cutoff. Under the cutoff region, the most cautious value is the lowest value in each past
samples (or the minimum value in the past
window), which represents a conservative forecast. The region between the cutoff slope and this cautious forecast slope is the risk taken for further decisions in the energy trade. Lower
values and longer window samples are most-likely to decrease the area between the cutoff and the conservative forecast. The length of these two variables keeps helpful information about the data seasonality to be used both from who sells than who buys the energy.
The procedure is resumed in Algorithm 4:
Algorithm 4: Determining confidence areas in the forecasting |
|
Next section puts forward a case study for the proposed procedure from above.