1. Introduction
Wind power capacity has become a critical issue in recent years, as the shift to renewable energy sources from fossil fuels seems to be an essential demand of developed societies. Both the U.S. and the European Union are making significant efforts to transform their economies into green ones, while simultaneously requiring continuously greater amounts of energy in order to fulfill their needs in the energy-intensive sectors like the computer and internet facilities sector [
1,
2]. For example, in 2023, the global wind power capacity was reported to attain 1047 GW [
3]. Accurate wind prediction is a major challenge to operators of Energy Management Systems (EMS) since reliable predictions are crucial inputs to EMS’s fundamental operating functions [
4].
In recent years, there has been growing interest in developing reliable methods for wind power forecasting, primarily driven by the increased availability of relevant data [
5,
6,
7,
8,
9]. For short-term predictions, traditional methods, such as the autoregressive moving average (ARMA) model, have been explored [
10,
11]. Their advantage is the simplicity in model-building; however, they perform less effectively when wind power variations are irregular. The key challenges in wind power data are nonlinearity, nonstationarity, and high dimensionality. To tackle these challenges, Machine Learning- and Computational Intelligence-based models stepped into the spotlight: in an attempt to effectively handle the data, a two-stage method was proposed in [
12], where wind time-series data are first processed using wavelet decomposition, followed by the prediction phase, which is implemented by an adaptive wavelet neural network (AWNN). Another neural network approach, based on radial basis functions, was presented in [
13], with promising results. As far as the fuzzy pillar of Computational Intelligence is concerned, a number of static fuzzy systems were proposed to offer more interpretable alternatives to the black-box behavior of neural networks [
14,
15].
Additional studies include traditional Machine Learning methods, such as support vector machines (SVM) and regression (SVR) or random forests (RF) [
16,
17]. In an attempt to mine the information of temporal dependencies, data-driven Machine Learning methods were proposed, including lagged-ensemble Machine Learning [
18,
19] and dynamic principal component regression [
20], having shown satisfactory results.
Further improvements were accomplished with the introduction of Gaussian Process Regression (GPR) [
21], a nonparametric kernel-based model, as well as Ensemble Learning approaches (Boosted Trees (BS), Bagged Regression Trees, and eXtreme Gradient Boosting (XG-Boost)) [
22,
23,
24]. These developments have firmly established Machine Learning as a key contributor to the field [
25,
26,
27].
Over the past decade, Deep Learning has been revolutionizing Computational Intelligence, enriching the application fields, and taking advantage of the explosion in computational resources of modern computing systems. Therefore, Deep Learning predictors became a key tool for wind power prediction. All the established deep models (Long Short-term Memory (LSTM), Gated Recurrent Unit (GRU), recurrent neural networks, Convolutional Neural Networks (CNNs), transformers) [
28,
29,
30,
31,
32,
33,
34], as well as alternative ones like Temporal Collaborative Attention and Deep Graph Attention Networks [
35,
36], have been proposed to address such a demanding problem and considerable advances have been accomplished.
One of the most significant issues in time-series prediction problems is the selection of an appropriate input vector from an extensive pool of candidate inputs. In the case of wind power generation where dependencies on temporal and spatial factors, like weather conditions, climate variables, and seasonal patterns, exist, the feature selection step is crucial for the predictor to be effective. In this perspective, the Dynamic Neuro-fuzzy Wind Predictor (DNFWP) is proposed as a wind power predictor. The model has the form of a typical fuzzy system, with the neural components being the fuzzy rules’ consequent parts. They are three-layer neural networks, with the hidden layer consisting of neurons with internal output feedback, thus introducing dynamics to the overall model. No external feedback connections exist, maintaining the classic structure of fuzzy systems, with the rules operating separately from each other and being interconnected through the defuzzification part. A single input is used, in an attempt to alleviate the aforementioned issues of feature selection, as well as to examine whether the time series can be identified without using external variables linked to weather conditions.
The Dynamic Resilient Propagation (DRPROP) is selected to be the tuning algorithm. It is a learning scheme that aims to alleviate the disadvantages existing in methods based on gradient descent and takes into consideration the recurrent connections of the proposed predictor.
The rest of this paper is organized into four sections:
Section 2 outlines the architecture of DNFWP, while in
Section 3, the model-building process is illustrated. In
Section 4, two experimental examples and a comparative analysis with well-known forecasters are hosted, such that the proposed model’s characteristics and forecasting capability are highlighted. Finally,
Section 5 summarizes the conclusions.
2. The Structure of Dynamic Neuro-Fuzzy Wind Predictor
The fuzzy rule base is the core of a fuzzy system with
m inputs and a single output, comprising rules expressed as conditional statements:
where
is the input vector,
k represents the sample (time step) index,
Aj, and
B correspond to the fuzzy sets of the
j-th input and the output, respectively.
The introduction of the Takagi–Sugeno–Kang (TSK) fuzzy rule [
37], in which the consequent part is expressed as a linear combination of the input variables, marked a significant advancement over classic fuzzy systems, by enhancing their learning capabilities. The rule can be considered as a subsystem of the overall fuzzy system, operating in the region where the membership function of the premise part reaches high degrees of membership. In such regions, the consequent part is responsible for modeling the desired output sequence. In this perspective, the linear polynomial in the consequent can be replaced by any continuous and differentiable function,
, offering greater modeling potential. A direct implementation of this idea is the fuzzy polynomial systems or the CANFIS model, with a wide variety of applications [
38,
39,
40]. Stemming from the fact that the inclusion of feedback connections has been proven to be an effective approach in the case of dynamic systems, several fuzzy models have been developed with recurrent components in their consequents [
41,
42,
43].
In this perspective and taking into consideration that neural networks are universal approximators—particularly recurrent networks, which are effective in capturing the model’s dynamics [
44,
45,
46]—the present approach integrates a recurrent neural network into the consequent part of each fuzzy rule. This type of fuzzy rule was initially proposed in [
46] for modeling telecommunications data and later demonstrated effective results in electric load forecasting [
47]. The operation of DNFWP, with a rule base comprising
R fuzzy rules, is described as follows:
The fuzzy rules have premise parts, which include the fuzzy sets that correspond to the inputs and are linked via the AND operator, as depicted in Equation (1). Therefore, the premise part of the l-th rule can be regarded as a m-dimensional hypercell:
composed by
m single-dimensional Gaussian membership functions. The Gaussian type is selected:
Equation (2) describes the degree of fulfillment of the
l-th rule, i.e., the degree to which the particular rule is activated by the input vector
. The Gaussian membership function has two adaptable parameters: the mean value,
, and the standard deviation,
, regarding the
i-th input axis. Thus, each rule has two parameter vectors for the premise part:
and
. The degree of fulfillment is calculated by the algebraic product of the membership functions:
The premise part is static and is fed with the input vector at the current time step,
k.
The consequent part of each rule is a three-layer neural network, with H neurons in the hidden layer.
- ➢
The input layer is static and it simply forwards the weighted input vector to the hidden layer, as shown in
Figure 1. The synaptic weights are noted as
,
,
,
.
- ➢
The output layer is also static, containing a single neuron that receives input from the weighted outputs of the hidden layer’s neurons. The synaptic weights connecting the hidden layer with the output layer are noted as
,
,
, while
are the weights of the bias terms. The activation function of the output layer’s neuron is the hyperbolic tangent
. Noting
as the output of the
j-th neuron of the hidden layer, the rule’s output is given by the following:
- ➢
The hidden layer comprises neurons with unit output feedback connections, as presented in
Figure 1. The particular type of internal feedback is called
local output feedback and does not include external feedback connections of the rule’s output neuron or even of the model’s output (global output feedback). The bias terms are noted as
,
,
, and the feedback weights are noted as
. The outputs of the hidden neurons are extracted as follows:
The DNFWP’s output is the output of the defuzzification part. This part is static, since it averages the rules’ outputs, which are weighted by the values of the respective degrees of fulfillment:
The block diagram of DNFWP is presented in
Figure 2.
As can be seen from
Figure 1 and
Figure 2, the proposed architecture is a locally recurrent fuzzy neural network, belonging to a greater group of recurrent neurofuzzy models [
48,
49]. The particular architecture, however, is clearly distinguished from models with external output feedback [
50] or recurrent premise parts [
51], since in the former cases, the use of feedback connections interconnects the parts of the fuzzy system in multiple ways, thus altering the local-identification approach of the classic TSK model, as well as its interpretability. On the contrary, the proposed scheme retains the discrete connectivity of the rules, which can be regarded as local subsystems, cooperating via the defuzzification part. At each rule, the hypercell, formed by the respective fuzzy sets along each input axis, constitutes an operating region, in which the neural network of the consequent part aims to apply the potential of its recurrent nature in order to identify the temporal dependencies of the wind power time series. Since the fuzzy rules overlap each other through their premise parts, the recurrent consequent parts cooperate intrinsically, and provide an enhanced input–output mapping.
The nature and enhanced modeling capabilities of recurrent neural networks with local output feedback were explored in [
44,
45], following their initial application in a fuzzy system described in [
52]. That implementation, however, was computationally intensive, since it included the following:
(a) complex synapses in the recurrent neurons, feeding the input of the neuron with multiple delayed output values, and
(b) dynamic neurons at the output layer of the antecedent part of the rule, as well. In contrast, [
46] demonstrated that even simple neural networks using unit feedback in hidden neurons can effectively perform system identification tasks with much lower computational demands. Therefore, the proposed DNFWP aims to be regarded as an economical wind power predictor, especially when compared to Deep Learning-based predictors, as will be illustrated in the sequel.
3. The Model-Building Process
The model-building process aims at determining the structure of DNFWP, including (a) the number and kind of inputs, (b) the number of fuzzy rules and their operating regions, and (c) the size of the neural networks’ hidden layers. It also includes the consequent parameter tuning, where the weights of the neural networks form the consequent parameter vector.
As far as input selection is concerned, in the present work, wind power modeling based on a single input is attempted; therefore, there is no need to employ a feature selection mechanism. The issue of forming the fuzzy rule base is closely related to the aim of producing an economical predictor with a reduced size compared to the computationally intelligence-based systems existing in the literature. Therefore, the FCM clustering algorithm is applied [
53,
54] in order to identify the areas of discrete concentrations of data and, consequently, determine the most appropriate centers of the fuzzy sets pertaining to the premise parts of the rules. In this way, a moderate and concise rule base will be built. According to FCM, if
R is the size of the rule base (i.e., the number of clusters), for a single-dimensional dataset (as is the case of DNFWP) containing
N samples, each input sample,
, does not belong to a single cluster but to all clusters, up to a certain degree,
, where
l is the cluster index:
The cluster centers are extracted by the following equation:
where
c > 1 is a scale parameter, controlling the fuzziness introduced to the algorithm.
The centers in Equation (9) constitute the mean values of the Gaussian-type membership function of the premise part’s fuzzy sets. The standard deviations are considered common to the rules comprising the fuzzy rule base and are derived as follows [
55]:
As mentioned above, FCM is based on the assumption that the number of clusters is determined in advance, meaning that the size of the fuzzy rule base should be set somehow. In order to do so, the Davies–Bouldin cluster validity index [
56] is applied in order to provide an internal validation measure, taking into consideration both the compactness of the clusters, as well as the separation of each cluster from the others. The lower the value of the index the higher the compactness and the separation of the resulting clusters.
The aforementioned model-building scheme results in the determination of the size of the rule base and the derivation extraction of the values of the fitting parameters of the premise part’s fuzzy sets. As far as the size of the recurrent neural networks for the consequent part of the fuzzy rules is concerned, the number of hidden neurons is determined by trial and error, while the consequent parameters remain to be tuned.
Due to the feedback connections of the neural networks that form the consequent parts of the fuzzy rules, there exist temporal relations that need to be taken into consideration by the training algorithm. Therefore, consequent parameter tuning is performed using the training method DRPROP introduced in [
46], which is based on Resilient Propagation (RPROP [
57]) that was devised for static structures, but it has been modified to be applied to dynamic structures of fuzzy rules with internal feedback. As it will be shown in the sequel, it overcomes the problems inherent to standard gradient-based training algorithms.
Let
w be any of the consequent synaptic weights in Equations (5) and (6), and
E be the error function that consists of the errors of the actual value of wind power,
, and the predictor’s output. The Mean Square error (
MSE) is selected as
E:
and
are the partial derivatives of
E with respect to
w at the present and the previous iterations,
t and
t − 1, respectively. At each iteration, each weight is updated by an amount
, called weight update factor (WUF), using the following update scheme (
Table 1):
Table 1.
The DRPROP weight update scheme.
Table 1.
The DRPROP weight update scheme.
- (a)
Initialize WUF for all the consequent weights:
|
Repeat |
- (b)
For each w, compute
|
- (c)
For each w, update its step size: (c.1) If |
Then |
(c.2) Else if |
Then |
If |
(c.3) Else |
|
Until convergence |
It becomes evident that in DRPROP the size of the error gradient does not directly affect the weight update, thus the disadvantages of gradient-based methods tailored to the gradient values do not appear in the particular learning method. Moreover, each WUF is calculated based on the behavior of the sign of the error gradient, allowing a more rapid and efficient search in the weight space. In the initial stages of training, where the reduction in the error value is usually significant, the gradients do not change their sign in consecutive iterations, indicating that big updates are necessary. Step c1 is activated and WUF increases by , thus accelerating learning. At later stages, c2 takes charge, decreasing WUF by , with the aim of avoiding oscillations. Additionally, WUF has an upper bound of in order to avoid missing minima, and a lower bound of , to prevent learning from becoming stalled.
Since the output layer of the network’ is static, the derivatives are easily extracted using the standard chain rule:
where
is the derivative of
, with respect to its arguments.
Due to the feedback connections of the hidden layer’s neurons, their operation should be unfolded in time. Thus, the calculation of the ordered derivatives is performed [
58], using Lagrange multipliers [
59]. In the following, the ordered derivatives for the single-input case are calculated:
Equation (18) is a backward difference equation, subjected to the following boundary condition:
Equation (19) is calculated first, and then the Lagrange multipliers are extracted via Equation (18) for . In Equations (15)–(19), is the derivative of , with respect to its arguments.
From the above, it is concluded that the parameter tuning of the consequent weights is an iterative procedure, where at each iteration, firstly, the updated WUFs are calculated through the DRPROP algorithm. Next, the new values of the weights are extracted by Equation (12), and the MSE is derived and compared to a predefined threshold. If the threshold is attained, the learning ends successfully; otherwise, the algorithm proceeds to the next iteration.
Table 2 hosts the DNFWP’s parameters.
In conclusion, the process of building DNFWP can be divided into two phases, as shown in the flow-chart of
Figure 3: in the first phase, a data cleansing scheme is applied to the available raw data, where missing and irregular values are handled and the whole dataset splits to a training dataset and a testing one. The first part of the second phase is dedicated to forming the rule base and calculating the fitting parameters of the fuzzy sets of the premise parts, while in the second part, the DRPROP algorithm tunes the weights of the consequent recurrent neural networks.
5. Conclusions
A recurrent fuzzy neural model for wind load prediction has been presented. The predictor integrates fuzzy logic with recurrent neural networks in the consequent part of each fuzzy rule. Unlike conventional approaches, DNFWP employs a single-input framework and relies on its dynamic internal feedback structure to capture the temporal dependencies inherent in wind power data.
DNFWP was applied to two datasets—a smaller, high-resolution Kaggle dataset and a significantly larger Greek national dataset—demonstrating effective forecasting accuracy in both cases. The model’s performance was evaluated using three metrics (RMSE, MAE, and r-squared), and DNFWP delivered results comparable to or better than state-of-the-art Machine Learning and Deep Learning approaches, including LSTM and GRU models. Moreover, it achieved this while requiring significantly fewer parameters, thus offering an alternative with reduced complexity.
Key contributions and findings of this study include the following:
The elimination of the feature selection stage. Due to the single-input model architecture, there is no feature selection stage, thus simplifying preprocessing without sacrificing prediction quality.
Effective modeling of temporal dependencies using localized recurrent neural networks embedded in the consequent parts of fuzzy rules. The strong generalization performance on both moderate and large-scale datasets confirms the robustness of DNFWP across diverse real-world scenarios.
An overall modular structure that retains the interpretability and transparency of fuzzy systems, while benefiting from the approximation power of neural networks.
In conclusion, DNFWP constitutes a promising tool for wind power prediction, particularly in applications requiring low computational complexity, interpretability, and reliable short-term forecasts without the need for extensive feature engineering. Future research could include the establishment of an automated model-building process using evolutionary computation algorithms and a multiple-step-ahead prediction strategy.