Wind Power Prediction Using a Dynamic Neuro-Fuzzy Model

Kandilogiannakis, George; Mastorocostas, Paris; Voulodimos, Athanasios; Hilas, Constantinos; Varsamis, Dimitrios

doi:10.3390/electronics14122326

Open AccessArticle

Wind Power Prediction Using a Dynamic Neuro-Fuzzy Model

by

George Kandilogiannakis

¹,

Paris Mastorocostas

^1,*

,

Athanasios Voulodimos

²

,

Constantinos Hilas

³

and

Dimitrios Varsamis

³

¹

Department of Informatics and Computer Engineering, Egaleo Park Campus, University of West Attica, 12243 Athens, Greece

²

School of Electrical and Computer Engineering, National Technical University of Athens, 15773 Athens, Greece

³

Department of Computer, Informatics and Telecommunications Engineering, Serres Campus, International Hellenic University, 62124 Serres, Greece

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(12), 2326; https://doi.org/10.3390/electronics14122326

Submission received: 18 May 2025 / Revised: 3 June 2025 / Accepted: 5 June 2025 / Published: 6 June 2025

(This article belongs to the Special Issue Intelligent Optimization and Machine Learning in Power and Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

A Dynamic Neuro-fuzzy Model (Dynamic Neuro-fuzzy Wind Predictor, DNFWP) is proposed for wind power prediction. The fuzzy rules in DNFWP consist of a typical antecedent part with static inputs, while the consequent part is a small three-layer neural network, incorporating unit feedback connections at the outputs of the neurons of the hidden layer. The inclusion of internal feedback targets to capture the intrinsic temporal relations of the dataset, while maintaining the local modeling approach of traditional fuzzy models. Each rule in DNFWP represents a local model, and the fuzzy rules operate cooperatively through the defuzzification process. The fuzzy rule base is extracted employing the Fuzzy C-means clustering algorithm, and the consequent neural networks’ weights are tuned by the use of Dynamic Resilient Propagation. Two cases with datasets of different volumes are tested and the performance of DNFWP is very promising, according to the results attained using a series of metrics like Root Mean Squared Error, Mean Absolute Error, and the r-squared statistic. The dynamic nature of the predictor allows it to operate effectively with a single input, thus rendering a feature selection phase unnecessary. DNFWP is compared to Machine Learning-based and Deep Learning-based counterparts, such that its prediction capabilities along with its reduced parametric complexity are highlighted.

Keywords:

wind power prediction; neural consequents; internal unit feedback

1. Introduction

Wind power capacity has become a critical issue in recent years, as the shift to renewable energy sources from fossil fuels seems to be an essential demand of developed societies. Both the U.S. and the European Union are making significant efforts to transform their economies into green ones, while simultaneously requiring continuously greater amounts of energy in order to fulfill their needs in the energy-intensive sectors like the computer and internet facilities sector [1,2]. For example, in 2023, the global wind power capacity was reported to attain 1047 GW [3]. Accurate wind prediction is a major challenge to operators of Energy Management Systems (EMS) since reliable predictions are crucial inputs to EMS’s fundamental operating functions [4].

In recent years, there has been growing interest in developing reliable methods for wind power forecasting, primarily driven by the increased availability of relevant data [5,6,7,8,9]. For short-term predictions, traditional methods, such as the autoregressive moving average (ARMA) model, have been explored [10,11]. Their advantage is the simplicity in model-building; however, they perform less effectively when wind power variations are irregular. The key challenges in wind power data are nonlinearity, nonstationarity, and high dimensionality. To tackle these challenges, Machine Learning- and Computational Intelligence-based models stepped into the spotlight: in an attempt to effectively handle the data, a two-stage method was proposed in [12], where wind time-series data are first processed using wavelet decomposition, followed by the prediction phase, which is implemented by an adaptive wavelet neural network (AWNN). Another neural network approach, based on radial basis functions, was presented in [13], with promising results. As far as the fuzzy pillar of Computational Intelligence is concerned, a number of static fuzzy systems were proposed to offer more interpretable alternatives to the black-box behavior of neural networks [14,15].

Additional studies include traditional Machine Learning methods, such as support vector machines (SVM) and regression (SVR) or random forests (RF) [16,17]. In an attempt to mine the information of temporal dependencies, data-driven Machine Learning methods were proposed, including lagged-ensemble Machine Learning [18,19] and dynamic principal component regression [20], having shown satisfactory results.

Further improvements were accomplished with the introduction of Gaussian Process Regression (GPR) [21], a nonparametric kernel-based model, as well as Ensemble Learning approaches (Boosted Trees (BS), Bagged Regression Trees, and eXtreme Gradient Boosting (XG-Boost)) [22,23,24]. These developments have firmly established Machine Learning as a key contributor to the field [25,26,27].

Over the past decade, Deep Learning has been revolutionizing Computational Intelligence, enriching the application fields, and taking advantage of the explosion in computational resources of modern computing systems. Therefore, Deep Learning predictors became a key tool for wind power prediction. All the established deep models (Long Short-term Memory (LSTM), Gated Recurrent Unit (GRU), recurrent neural networks, Convolutional Neural Networks (CNNs), transformers) [28,29,30,31,32,33,34], as well as alternative ones like Temporal Collaborative Attention and Deep Graph Attention Networks [35,36], have been proposed to address such a demanding problem and considerable advances have been accomplished.

One of the most significant issues in time-series prediction problems is the selection of an appropriate input vector from an extensive pool of candidate inputs. In the case of wind power generation where dependencies on temporal and spatial factors, like weather conditions, climate variables, and seasonal patterns, exist, the feature selection step is crucial for the predictor to be effective. In this perspective, the Dynamic Neuro-fuzzy Wind Predictor (DNFWP) is proposed as a wind power predictor. The model has the form of a typical fuzzy system, with the neural components being the fuzzy rules’ consequent parts. They are three-layer neural networks, with the hidden layer consisting of neurons with internal output feedback, thus introducing dynamics to the overall model. No external feedback connections exist, maintaining the classic structure of fuzzy systems, with the rules operating separately from each other and being interconnected through the defuzzification part. A single input is used, in an attempt to alleviate the aforementioned issues of feature selection, as well as to examine whether the time series can be identified without using external variables linked to weather conditions.

The Dynamic Resilient Propagation (DRPROP) is selected to be the tuning algorithm. It is a learning scheme that aims to alleviate the disadvantages existing in methods based on gradient descent and takes into consideration the recurrent connections of the proposed predictor.

The rest of this paper is organized into four sections: Section 2 outlines the architecture of DNFWP, while in Section 3, the model-building process is illustrated. In Section 4, two experimental examples and a comparative analysis with well-known forecasters are hosted, such that the proposed model’s characteristics and forecasting capability are highlighted. Finally, Section 5 summarizes the conclusions.

2. The Structure of Dynamic Neuro-Fuzzy Wind Predictor

The fuzzy rule base is the core of a fuzzy system with m inputs and a single output, comprising rules expressed as conditional statements:

I F x_{1} (k) is A_{1} AND \dots AND x_{m} (k) is A_{m} T H E N y (k) is B

(1)

where

x = {[x_{1}, \dots, x_{m}]}^{T}

is the input vector, k represents the sample (time step) index, A_j, and B correspond to the fuzzy sets of the j-th input and the output, respectively.

The introduction of the Takagi–Sugeno–Kang (TSK) fuzzy rule [37], in which the consequent part is expressed as a linear combination of the input variables, marked a significant advancement over classic fuzzy systems, by enhancing their learning capabilities. The rule can be considered as a subsystem of the overall fuzzy system, operating in the region where the membership function of the premise part reaches high degrees of membership. In such regions, the consequent part is responsible for modeling the desired output sequence. In this perspective, the linear polynomial in the consequent can be replaced by any continuous and differentiable function,

g (x (k))

, offering greater modeling potential. A direct implementation of this idea is the fuzzy polynomial systems or the CANFIS model, with a wide variety of applications [38,39,40]. Stemming from the fact that the inclusion of feedback connections has been proven to be an effective approach in the case of dynamic systems, several fuzzy models have been developed with recurrent components in their consequents [41,42,43].

In this perspective and taking into consideration that neural networks are universal approximators—particularly recurrent networks, which are effective in capturing the model’s dynamics [44,45,46]—the present approach integrates a recurrent neural network into the consequent part of each fuzzy rule. This type of fuzzy rule was initially proposed in [46] for modeling telecommunications data and later demonstrated effective results in electric load forecasting [47]. The operation of DNFWP, with a rule base comprising R fuzzy rules, is described as follows:

The fuzzy rules have premise parts, which include the fuzzy sets that correspond to the inputs and are linked via the AND operator, as depicted in Equation (1). Therefore, the premise part of the l-th rule can be regarded as a m-dimensional hypercell:

μ_{l} (k) = f_{μ} (x (k); m_{l}, σ_{l})

(2)

composed by m single-dimensional Gaussian membership functions. The Gaussian type is selected:

μ_{A_{i}^{l}} (x_{i} (k)) = \exp \{- \frac{1}{2} \cdot \frac{{(x_{i} (k) - m_{l i})}^{2}}{σ_{l i}^{2}}\}

(3)

Equation (2) describes the degree of fulfillment of the l-th rule, i.e., the degree to which the particular rule is activated by the input vector

x (k)

. The Gaussian membership function has two adaptable parameters: the mean value,

m_{l i}

, and the standard deviation,

σ_{l i}

, regarding the i-th input axis. Thus, each rule has two parameter vectors for the premise part:

m_{l} = {[m_{l 1}, \dots, m_{l m}]}^{T}

and

σ_{l} = {[σ_{l 1}, \dots, σ_{l m}]}^{T}

. The degree of fulfillment is calculated by the algebraic product of the membership functions:

μ_{l} (k) = \prod_{i = 1}^{m} μ_{A_{i}^{l}} (x_{i} (k))

(4)

The premise part is static and is fed with the input vector at the current time step, k.

The consequent part of each rule is a three-layer neural network, with H neurons in the hidden layer.
➢
The input layer is static and it simply forwards the weighted input vector to the hidden layer, as shown in Figure 1. The synaptic weights are noted as $w_{l j i}^{(1)}$ , $l = 1, \dots, R$ , $i = 1, \dots, m$ , $j = 1, \dots, H$ .
➢
The output layer is also static, containing a single neuron that receives input from the weighted outputs of the hidden layer’s neurons. The synaptic weights connecting the hidden layer with the output layer are noted as $w_{l j}^{(4)}$ , $l = 1, \dots, R$ , $j = 1, \dots, H$ , while $w_{l}^{(5)}$ are the weights of the bias terms. The activation function of the output layer’s neuron is the hyperbolic tangent $f (q) = \frac{e^{q} - e^{- q}}{e^{q} + e^{- q}}$ . Noting $s_{l i} (k)$ as the output of the j-th neuron of the hidden layer, the rule’s output is given by the following:

$O_{l} (k) = f (\sum_{j = 1}^{H} [w_{l j}^{(4)} \cdot s_{l j} (k)] + w_{l}^{(5)})$

(5)

➢
The hidden layer comprises neurons with unit output feedback connections, as presented in Figure 1. The particular type of internal feedback is called local output feedback and does not include external feedback connections of the rule’s output neuron or even of the model’s output (global output feedback). The bias terms are noted as $w_{l j}^{(3)}$ , $l = 1, \dots, R$ , $j = 1, \dots, H$ , and the feedback weights are noted as $w_{l j}^{(2)}$ . The outputs of the hidden neurons are extracted as follows:

$s_{l j} (k) = f (\sum_{i = 1}^{m} [w_{l j i}^{(1)} \cdot x_{i} (k)] + w_{l j}^{(2)} \cdot s_{l j} (k - 1) + w_{l j}^{(3)})$

(6)
The DNFWP’s output is the output of the defuzzification part. This part is static, since it averages the rules’ outputs, which are weighted by the values of the respective degrees of fulfillment:

y (k) = \frac{\sum_{l = 1}^{R} μ_{l} (k) \cdot O_{l} (k)}{\sum_{l = 1}^{R} μ_{l} (k)}

(7)

The block diagram of DNFWP is presented in Figure 2.

As can be seen from Figure 1 and Figure 2, the proposed architecture is a locally recurrent fuzzy neural network, belonging to a greater group of recurrent neurofuzzy models [48,49]. The particular architecture, however, is clearly distinguished from models with external output feedback [50] or recurrent premise parts [51], since in the former cases, the use of feedback connections interconnects the parts of the fuzzy system in multiple ways, thus altering the local-identification approach of the classic TSK model, as well as its interpretability. On the contrary, the proposed scheme retains the discrete connectivity of the rules, which can be regarded as local subsystems, cooperating via the defuzzification part. At each rule, the hypercell, formed by the respective fuzzy sets along each input axis, constitutes an operating region, in which the neural network of the consequent part aims to apply the potential of its recurrent nature in order to identify the temporal dependencies of the wind power time series. Since the fuzzy rules overlap each other through their premise parts, the recurrent consequent parts cooperate intrinsically, and provide an enhanced input–output mapping.

The nature and enhanced modeling capabilities of recurrent neural networks with local output feedback were explored in [44,45], following their initial application in a fuzzy system described in [52]. That implementation, however, was computationally intensive, since it included the following: (a) complex synapses in the recurrent neurons, feeding the input of the neuron with multiple delayed output values, and (b) dynamic neurons at the output layer of the antecedent part of the rule, as well. In contrast, [46] demonstrated that even simple neural networks using unit feedback in hidden neurons can effectively perform system identification tasks with much lower computational demands. Therefore, the proposed DNFWP aims to be regarded as an economical wind power predictor, especially when compared to Deep Learning-based predictors, as will be illustrated in the sequel.

3. The Model-Building Process

The model-building process aims at determining the structure of DNFWP, including (a) the number and kind of inputs, (b) the number of fuzzy rules and their operating regions, and (c) the size of the neural networks’ hidden layers. It also includes the consequent parameter tuning, where the weights of the neural networks form the consequent parameter vector.

As far as input selection is concerned, in the present work, wind power modeling based on a single input is attempted; therefore, there is no need to employ a feature selection mechanism. The issue of forming the fuzzy rule base is closely related to the aim of producing an economical predictor with a reduced size compared to the computationally intelligence-based systems existing in the literature. Therefore, the FCM clustering algorithm is applied [53,54] in order to identify the areas of discrete concentrations of data and, consequently, determine the most appropriate centers of the fuzzy sets pertaining to the premise parts of the rules. In this way, a moderate and concise rule base will be built. According to FCM, if R is the size of the rule base (i.e., the number of clusters), for a single-dimensional dataset (as is the case of DNFWP) containing N samples, each input sample,

x (k)

, does not belong to a single cluster but to all clusters, up to a certain degree,

u_{l} (k)

, where l is the cluster index:

u_{l} (k) = \frac{1}{{\sum_{n = 1}^{R} [\frac{|m_{l} - x (k)|}{|m_{n} - x (k)|}]}^{\frac{2}{c - 1}}}

(8)

The cluster centers are extracted by the following equation:

m_{l} = \frac{\sum_{k = 1}^{N} {(u_{l} (k))}^{c} \cdot x (k)}{\sum_{k = 1}^{N} {(u_{l} (k))}^{c}}

(9)

where c > 1 is a scale parameter, controlling the fuzziness introduced to the algorithm.

The centers in Equation (9) constitute the mean values of the Gaussian-type membership function of the premise part’s fuzzy sets. The standard deviations are considered common to the rules comprising the fuzzy rule base and are derived as follows [55]:

σ_{l} = \frac{\sum_{k = 1}^{N} u_{l} (k) \cdot {(m_{l} - x (k))}^{2}}{\sum_{k = 1}^{N} u_{l} (k)}

(10)

As mentioned above, FCM is based on the assumption that the number of clusters is determined in advance, meaning that the size of the fuzzy rule base should be set somehow. In order to do so, the Davies–Bouldin cluster validity index [56] is applied in order to provide an internal validation measure, taking into consideration both the compactness of the clusters, as well as the separation of each cluster from the others. The lower the value of the index the higher the compactness and the separation of the resulting clusters.

The aforementioned model-building scheme results in the determination of the size of the rule base and the derivation extraction of the values of the fitting parameters of the premise part’s fuzzy sets. As far as the size of the recurrent neural networks for the consequent part of the fuzzy rules is concerned, the number of hidden neurons is determined by trial and error, while the consequent parameters remain to be tuned.

Due to the feedback connections of the neural networks that form the consequent parts of the fuzzy rules, there exist temporal relations that need to be taken into consideration by the training algorithm. Therefore, consequent parameter tuning is performed using the training method DRPROP introduced in [46], which is based on Resilient Propagation (RPROP [57]) that was devised for static structures, but it has been modified to be applied to dynamic structures of fuzzy rules with internal feedback. As it will be shown in the sequel, it overcomes the problems inherent to standard gradient-based training algorithms.

Let w be any of the consequent synaptic weights in Equations (5) and (6), and E be the error function that consists of the errors of the actual value of wind power,

d (k)

, and the predictor’s output. The Mean Square error (MSE) is selected as E:

M S E = \frac{1}{N} \cdot \sum_{k = 1}^{N} {(y (k) - d (k))}^{2}

(11)

\frac{\partial^{+} E (t)}{\partial w}

and

\frac{\partial^{+} E (t - 1)}{\partial w}

are the partial derivatives of E with respect to w at the present and the previous iterations, t and t − 1, respectively. At each iteration, each weight is updated by an amount

Δ (t)

, called weight update factor (WUF), using the following update scheme (Table 1):

Table 1. The DRPROP weight update scheme.

(a): Initialize WUF for all the consequent weights: $Δ (1) = Δ_{0}$

Repeat

(b): For each w, compute $\frac{\partial^{+} E (t)}{\partial w}$

(c): For each w, update its step size:

(c.1) If

\frac{\partial^{+} E (t)}{\partial w} \cdot \frac{\partial^{+} E (t - 1)}{\partial w} > 0

Then

Δ (t) = \min \{η^{+} \cdot Δ (t - 1), Δ_{\max}\}

(c.2) Else if

\frac{\partial^{+} E (t)}{\partial w} \cdot \frac{\partial^{+} E (t - 1)}{\partial w} < 0

Then

Δ (t) = \max \{η^{-} \cdot Δ (t - 1), Δ_{\min}\}

If

(c.3) Else

Δ (t) = Δ (t - 1)

w (t) = w (t - 1) - s i g n (\frac{\partial^{+} E (t)}{\partial w}) \cdot Δ (t)

(12)

Until convergence

It becomes evident that in DRPROP the size of the error gradient does not directly affect the weight update, thus the disadvantages of gradient-based methods tailored to the gradient values do not appear in the particular learning method. Moreover, each WUF is calculated based on the behavior of the sign of the error gradient, allowing a more rapid and efficient search in the weight space. In the initial stages of training, where the reduction in the error value is usually significant, the gradients do not change their sign in consecutive iterations, indicating that big updates are necessary. Step c1 is activated and WUF increases by

n^{+} \in [1.01, 1.2]

, thus accelerating learning. At later stages, c2 takes charge, decreasing WUF by

n^{-} \in [0.5, 0.9]

, with the aim of avoiding oscillations. Additionally, WUF has an upper bound of

Δ_{\max}

in order to avoid missing minima, and a lower bound of

Δ_{\min}

, to prevent learning from becoming stalled.

Since the output layer of the network’ is static, the derivatives are easily extracted using the standard chain rule:

\frac{\partial E}{\partial w_{l j}^{(4)}} = \frac{2}{N} \cdot \sum_{k = 1}^{N} \{(y (k) - d (k)) \cdot \frac{μ_{l} (k) \cdot f^{'} (.) \cdot s_{l j} (k)}{\sum_{i = 1}^{R} μ_{i} (k)}\}

(13)

\frac{\partial E}{\partial w_{l}^{(5)}} = \frac{2}{N} \cdot \sum_{k = 1}^{N} \{(y (k) - \hat{y} (k)) \cdot \frac{μ_{l} (k) \cdot f^{'} (.)}{\sum_{i = 1}^{R} μ_{i} (k)}\}

(14)

where

f^{'} (.)

is the derivative of

O_{l} (k)

, with respect to its arguments.

Due to the feedback connections of the hidden layer’s neurons, their operation should be unfolded in time. Thus, the calculation of the ordered derivatives is performed [58], using Lagrange multipliers [59]. In the following, the ordered derivatives for the single-input case are calculated:

\frac{\partial^{+} E}{\partial w_{l j}^{(1)}} = \sum_{k = 1}^{N} λ_{l j} (k) \cdot f^{'} (.) \cdot x (k)

(15)

\frac{\partial^{+} E}{\partial w_{l j}^{(2)}} = \sum_{k = 1}^{N} λ_{l j} (k) \cdot f^{'} (.) \cdot s_{l j} (k - 1)

(16)

\frac{\partial^{+} E}{\partial w_{l j}^{(3)}} = \sum_{k = 1}^{N} λ_{l j} (k) \cdot f^{'} (.)

(17)

λ_{l j} (k) = λ_{l j} (k + 1) \cdot f^{'} (.) \cdot w_{l .}^{(2)} + \frac{2}{N} \cdot \sum_{k = 1}^{N} \{(y (k) - d (k)) \cdot \frac{μ_{l} (k) \cdot f^{'} (.) \cdot w_{l j}^{(4)}}{\sum_{i = 1}^{R} μ_{i} (k)}\}

(18)

Equation (18) is a backward difference equation, subjected to the following boundary condition:

λ_{l j} (N) = \frac{2}{N} \cdot \sum_{k = 1}^{N} \{(y (N) - d (N)) \cdot \frac{μ_{l} (N) \cdot f^{'} (.) \cdot w_{l j}^{(4)}}{\sum_{i = 1}^{R} μ_{i} (N)}\}

(19)

Equation (19) is calculated first, and then the Lagrange multipliers are extracted via Equation (18) for

k = N - 1, \dots, 1

. In Equations (15)–(19),

f^{'} (.)

is the derivative of

s_{l j} (k)

, with respect to its arguments.

From the above, it is concluded that the parameter tuning of the consequent weights is an iterative procedure, where at each iteration, firstly, the updated WUFs are calculated through the DRPROP algorithm. Next, the new values of the weights are extracted by Equation (12), and the MSE is derived and compared to a predefined threshold. If the threshold is attained, the learning ends successfully; otherwise, the algorithm proceeds to the next iteration.

Table 2 hosts the DNFWP’s parameters.

In conclusion, the process of building DNFWP can be divided into two phases, as shown in the flow-chart of Figure 3: in the first phase, a data cleansing scheme is applied to the available raw data, where missing and irregular values are handled and the whole dataset splits to a training dataset and a testing one. The first part of the second phase is dedicated to forming the rule base and calculating the fitting parameters of the fuzzy sets of the premise parts, while in the second part, the DRPROP algorithm tunes the weights of the consequent recurrent neural networks.

4. Experimental Results

4.1. The Wind Prediction Problem—Data Preprocessing Stage

DNFWP is applied to the problem of one-step ahead wind power prediction, based exclusively on the present value of wind power. Neither weather variables are used, nor a series of past values. The main scope of the present work is to assess the forecasting capabilities of a single-input predictor, where the intrinsic temporal relations of the wind power data are identified by the dynamic nature of the forecaster. This approach stems from the fact that the feature selection process is a cumbersome activity, and one cannot apply a systematic procedure in all circumstances. Statistical methods like auto-correlation and cross-correlation can contribute, along with the domain knowledge of experts; however, the suggested modeling approach is an attempt to overcome this issue and to provide a predictor with reduced complexity. Hence, the actual wind power at time-step k,

d (k)

(kW), is the single input of DNFWP, which predicts the power at the next time-step k + 1, thus performing the mapping:

y (k + 1) = f (d (k))

(20)

The intrinsic temporal relations are attempted to be modeled through the recurrent connections of the neural networks that implement the rules’ consequent parts. The identification process aims at developing and training DNFWP such that it will be capable of performing the input–output data matching to an acceptable level of accuracy. The experiments were conducted on a desktop equipped with an INTEL i7-10700 processor, clocked at 2.9 GHz and having 16 GB RAM. The implementation of the predictors employed Jupyter Notebook 7.4.1 and Keras 3.0 libraries.

In both examples, data were scaled in the interval

[- 0.8, 0.8]

, allowing power values to lie within the neurons’ active region. However, denormalization took place prior to metrics’ calculation, such that the results would be consistent with previous findings that existed in the literature.

The model’s performance will be assessed based on the following evaluation metrics:

(a): The Root Mean Squared Error (RMSE), which is the square root of Equation (11) and measures in a quadratic manner the difference between the actual and the predicted wind power values.
(b): The Mean Absolute Error (MAE), which averages the absolute errors:

M A E = \frac{1}{N} \sum_{k = 1}^{N} |y (k) - d (k)|

(21)

(c): The r-squared statistic, $R^{2}$ , which indicates how well the regression line approximates the actual data:

R^{2} = 1 - \frac{\sum_{k = 1}^{N} {(y (k) - d (k))}^{2}}{\sum_{k = 1}^{N} {(y (k) - \bar{y})}^{2}}

(22)

where

\bar{y}

is the mean value of the actual data.

4.2. Example 1: The Kaggle Dataset

The Kaggle dataset [60] consists of data collected between January 2018 and March 2020, with a time resolution of 10′. The full set comprises meteorological features, turbine and rotor features, wind direction, and wind speed. The dataset contained a considerable number of missing values, which needed to be taken care of. When missing values appeared sporadically, linear interpolation was applied to fill the blanks. In case of consecutive missing values, cubic spline interpolation was used to complete the dataset. Finally, in a small number of periods where the gap was extensive, these data were removed.

Two datasets are employed, adopting the partition used in [61]: the training dataset is used during the model-building process and contains historical data from 1 January to 30 March 2020, while the next three days constitute the testing dataset, used for examining the forecasting capabilities of DNFWP.

The fuzzy rule base was determined by applying the FCM algorithm. Several partitions of up to 18 clusters were examined. The selection of the most appropriate fuzzy rule base was extracted on the basis of the Davies–Bouldin index and its evolution versus the number of clusters is presented in Figure 4. Taking into consideration that the lowest index value corresponds to the best clustering, it becomes evident that, in terms of structural complexity, the optimal fuzzy rule base comprises six rules. Figure 5 hosts the partition of the input space; it is obvious that the six-rule fuzzy rule base adequately covers the entire input axis, with its left part being covered more densely, thus requiring more rules to operate in this subspace. It has been noticed that the same cluster distribution appears in different partitions. For example, when 18 clusters are extracted, 9 of them are placed at the left third of the input axis, proving that the FCM algorithm effectively identifies the areas where data are concentrated. It should be noted that once the fuzzy sets at the premise parts of the rules are extracted, their mean values and standard deviations are set.

As far as the formation of the fuzzy rules’ consequent parts is concerned, the number of hidden neurons was decided on the basis of (a) the influence of the size of the hidden layer on the predictor’s performance and (b) the aim to produce models of decreased complexity. Hidden layers with 2 to 10 neurons were examined and the most suitable size was set to 6. It should be mentioned that the higher the number of rules, the quicker the performance of the predictor’s stalled with respect to the neural networks’ size. Therefore, the best structure with 6 fuzzy rules and 6 hidden neurons at each consequent neural network comprises 12 fixed premise parameters and 150 adaptable consequent parameters, being tuned by the DRPORP training algorithm.

Table 3 hosts the learning parameters of the DRPROP algorithm. The maximum number of epochs is set to 1000.

The performance of DNFWP is initially evaluated in terms of the performance indices. Table 4 summarizes the performance that various structures of the proposed model attained. From the results, the following is concluded:

The three indices are complementary and provide a thorough insight into the predictor’s performance. The lower the values of RMSE and MAE are, the more accurate the predictions become, while a value of 1 in $R^{2}$ indicates the best possible performance.
In all the DNFWP structures, DPROP provides effective training, with the case of two fuzzy rules appearing to be inferior compared to the rest of the structures.
The most appropriate fuzzy rule base that was derived according to the Davies–Bouldin index attains the best results in all three indices and both in the training and the testing datasets, leading to the conclusion that the clustering algorithm significantly facilitates the model-building process and provides a concrete and rather small fuzzy rule base. Moreover, as the complexity of the predictors increases, the performance of the testing set deteriorates, indicating the occurrence of overfitting.
The absence of weather features or past wind power values does not affect the forecasting performance. Therefore, the goal of letting the internal recurrent links reveal and model the temporal relations of the wind power time series seems to have been fulfilled.

The DNFWP’s performance is further assessed with a series of graphs hosted in Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10. The overall testing dataset and some parts of it, which have a special interest, are shown. The blue line corresponds to the actual wind power time series and the red line to the predictor’s output, respectively. As can be seen, the predictor is capable of identifying the nature of the time series, since it tracks the waveform effectively for most of the time steps (Figure 6), even when there are big (Figure 7) or quick fluctuations (Figure 8).

A special case exists when the power attains very low values, as shown in Figure 9. DNFWP accurately tracks the waveform but provides slightly higher positive values compared to the very small negative values existing in the dataset. Another case takes place when there are consecutive peaks, as depicted in Figure 10. DNFWP continues to identify the dynamics but, in some time steps, the output levels are lower than the actual ones. This discrepancy is smoothed when structures with considerably broader fuzzy rule bases are used since the particular subspaces can be covered more efficiently. Nevertheless, as mentioned above, the overall performance of more complex structures is not ameliorated.

As mentioned in Section 1, DNFWP has a classic structure of fuzzy systems, with the rules operating separately from each other and being interconnected through the defuzzification part. In this way, the operation of the predictor offers a degree of interpretability, since the input determines which rules will be fired, and, thus, which sub-models of DNFWP will operate each time. In order to highlight this characteristic of fuzzy modeling, three different input samples are selected, corresponding to different areas of the input space: 22.17 (datum no. 10), 489.12 (datum no. 53), and 1685 kW (datum no. 64), as shown in the upper left graph of Figure 11, were the input dataset is depicted. These samples belong to the clusters of small, medium, and large values and are marked with a red, blue, and black circle, respectively.

The membership functions of the premise parts of the fuzzy rule base are plotted in the upper left graph of Figure 11. As can be seen, the three input samples fire different combinations of the fuzzy rules, according to the part of the input space that each sample belongs to.

Therefore, the first input sample fires the first and second rules, with their degrees of fulfillment appearing in the last graph of Figure 1. The dominant rule is Rule 1, with Rule 2 contributing to the overall output in a ratio of approximately 1 to 4. According to Equation (7), the output of DNFWP is calculated as follows, indicating that the active rules cooperate at the defuzzification part:

y (10) = \frac{0.99993 \cdot O_{1} (10) + 0.23809 \cdot O_{2} (10)}{0.99993 + 0.23809}

(23)

In the case of the second sample, which belongs to the middle area of the input space, Rules 2, 3, and 4 are fired, with Rule 3 being dominant. Now, the respective three sub-models cooperate to calculate the predictor’s output:

y (53) = \frac{0.22348 \cdot O_{2} (53) + 0.97816 \cdot O_{3} (53) + 0.15366 \cdot O_{4} (53)}{0.22348 + 0.97816 + 0.15366}

(24)

Finally, the third sample, which is close to the upper boundary of the input space, practically activates solely the sixth rule, since the fifth being is activated by 6.6%:

y (64) = \frac{0.965 \cdot O_{5} (64) + 0.066232 \cdot O_{6} (64)}{0.965 + 0.066232}

(25)

In the sequel, the proposed predictor is compared to four Machine Learning-based models, along with the Long Short-term Memory (LSTM) and the Gated Recurrent Unit (GRU), which represent Deep Learning-based approaches. The Machine Learning models are GPR [21], SVR [16], RF [17], and BS [22]. These four models provided the best performances when they were applied to the same prediction problem in [61]. All six competitors are dynamic models, since LSTM and GRU are inherently dynamic and the other four counterparts have input vectors consisting of time-lagged values of wind power of length 5 [61]. The SVR uses a linear kernel. The hyperparameters of GPR, BS, and RF have been optimized using the Bayesian Optimization algorithm [62] and the parameter set of Machine Learning models is presented in [61]. A series of setups of the LSTM and the GRU schemes scheme were examined, and are summarized in the following:

Number of layers: two and three;
Number of units per layer: 25, 50 and 60;
Dropout: 0.1 to 0.4, with a step size of 0.05;
Batch size: 16 to 80, with a step size of 8;
Learning rate: 0.0001 to 0.005, with a step size of 0.0001.

The characteristics of the LSTM and GRU setups that provided the best results are given in Table 5, where LSTM-2L and LSTM-3L are LSTMs with two and three layers, respectively. The GRU contains three layers.

Table 6 hosts the performance indices of the comparing rivals. It becomes evident that LSTMs, GRU, and DNFWP outperform the Machine Learning counterparts in all indices, confirming the dominant opinion that the new era of intelligence is based on this kind of model. As far as the proposed predictor is concerned, it exhibits a similar performance with LSTMs and GRU, requiring, however, only a fraction of the enormous parameter set of Deep Learning forecasters, as clearly shown in Table 7.

In summary, the application of DNFWP to the Kaggle dataset leads to the conclusion that, beyond achieving very low MAE and RMSE values, along with a r-squared value that is close to 1, the proposed predictor offers some additional benefits:

It does not require a feature selection stage, either to include past wind power values or to incorporate seasonal and climate-related inputs, thus reducing preprocessing costs.
It maintains a low model complexity, compared to Deep Learning-based forecasters.

4.3. Example 2: The Greek Dataset

The Greek dataset refers to renewable energy sources data that were collected by the European Network of Transmission Systems Operators for the forty-two countries participating in the centralized European energy market [63,64]. In particular, the dataset contains hourly data from 1 January 2017 to 31 December 2020, from 18 sources in Greece, leading to a size of 631,134 data. The aim of the task is to perform a one-hour-ahead prediction of wind power under the constraint of using solely the current value of wind power. The set was divided into a training set, hosting 70% of the available data, and a testing set with the remaining 30% of the data. Since the results of the first example clearly favored DNFWP and the Deep Learning-based forecasters, these models participate in the present experiment. The Greek dataset is two orders of magnitude greater than the Kaggle dataset, and the employment of the proposed predictor aims to explore its behavior when applied to larger datasets.

The FCM method and the Davies–Bouldin index provided a six-rule fuzzy rule base, as shown in Figure 12 and Figure 13. Once again, DNFWP has 162 tuning parameters (12 premises and 150 neural network weights). The learning parameter set hosted in Table 3 is used for operating DRPROP, and the maximum number of epochs is set to 1000 as well.

RNFMWP is compared to a LSTM and a GRU with three layers. These structures performed efficient forecasting in the previous example and their performance indices were similar to those of DNFWP. In this example, the models with 50 neurons in each layer attain the best performance. The parameters of the forecasters are given in Table 4 and the results are depicted in Table 8.

It is concluded from Table 8 that in the case of the larger dataset, RNFMWP and LSTM exhibit a similar performance with respect to RMSE and

R^{2}

, with the proposed predictor attained a slightly better MAE than its Deep Learning competitor. GRU has a decreased number of parameters compared to LSTM, but it was clearly outperformed by its rivals. With regard to the size of the parameter set, the conclusion drawn in the first example also holds, pointing out the reduced complexity of DNFWP.

The quality of time-series modeling is highlighted in Figure 14 and Figure 15, where two instances of the time series at different output levels are shown. The blue line corresponds to the actual wind power time series and the red line to the predictor’s output, respectively. Despite the different evolution of the time series in these time frames, RNFMWP is capable of tracking the wind power curve very closely.

5. Conclusions

A recurrent fuzzy neural model for wind load prediction has been presented. The predictor integrates fuzzy logic with recurrent neural networks in the consequent part of each fuzzy rule. Unlike conventional approaches, DNFWP employs a single-input framework and relies on its dynamic internal feedback structure to capture the temporal dependencies inherent in wind power data.

DNFWP was applied to two datasets—a smaller, high-resolution Kaggle dataset and a significantly larger Greek national dataset—demonstrating effective forecasting accuracy in both cases. The model’s performance was evaluated using three metrics (RMSE, MAE, and r-squared), and DNFWP delivered results comparable to or better than state-of-the-art Machine Learning and Deep Learning approaches, including LSTM and GRU models. Moreover, it achieved this while requiring significantly fewer parameters, thus offering an alternative with reduced complexity.

Key contributions and findings of this study include the following:

The elimination of the feature selection stage. Due to the single-input model architecture, there is no feature selection stage, thus simplifying preprocessing without sacrificing prediction quality.
Effective modeling of temporal dependencies using localized recurrent neural networks embedded in the consequent parts of fuzzy rules. The strong generalization performance on both moderate and large-scale datasets confirms the robustness of DNFWP across diverse real-world scenarios.
An overall modular structure that retains the interpretability and transparency of fuzzy systems, while benefiting from the approximation power of neural networks.

In conclusion, DNFWP constitutes a promising tool for wind power prediction, particularly in applications requiring low computational complexity, interpretability, and reliable short-term forecasts without the need for extensive feature engineering. Future research could include the establishment of an automated model-building process using evolutionary computation algorithms and a multiple-step-ahead prediction strategy.

Author Contributions

Conceptualization, P.M., G.K., A.V., D.V. and C.H.; methodology, P.M., G.K. and C.H.; software, G.K. and A.V.; validation, C.H. and D.V.; formal analysis, P.M., G.K., D.V. and C.H.; investigation, G.K. and P.M.; resources, P.M. and G.K.; data curation, C.H. and G.K.; writing—original draft preparation, P.M., G.K. and A.V.; writing—review and editing, P.M., D.V. and C.H.; visualization, G.K. and P.M.; supervision, P.M.; project administration, P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DNFWP	Dynamic Neuro-fuzzy Model for Wind Prediction
EMS	Energy Management System
ARMA	Autoregressive Moving Average
AWNN	Adaptive Wavelet Neural Network
SVM	Support Vector Machine
SVR	Support Vector Regression
RF	Random Forest
GPR	Gaussian Process Regression
XG-Boost	Extreme Gradient Boosting
LSTM	Long Short-term Memory
CNN	Convolutional Neural Networks
GRU	Gated Recurrent Unit
DRPROP	Dynamic Resilient Propagation
WUF	Weight Update Factor
TSK	Takagi–Sugeno–Kang Fuzzy Rule
CANFIS	Co-active Adaptive Neuro-fuzzy Inference System
BS	Boosted Tree
FCM	Fuzzy C-means
RMSE	Root Mean Squared Error
MAE	Mean Absolute Error

References

Cheba, K.; Bak, I.; Szopik-Depczynska, K.; Ioppolo, G. Directions of green transformation of the European Union Countries. Ecol. Indic. 2022, 136, 108601. [Google Scholar] [CrossRef]
American Wind Energy Association (AWEA). Wind Powers America First Quarter 2020 Report; American Wind Energy Association (AWEA): Washington, DC, USA, 2020. [Google Scholar]
World Wind Energy Association. WWEA Annual Report 2023; World Wind Energy Association: Bonn, Germany, 2024. [Google Scholar]
Suman, A. Role of renewable energy technologies in climate change adaptation and mitigation: A brief review from nepal. Renew. Sustain. Energy Rev. 2021, 151, 111524. [Google Scholar] [CrossRef]
Yan, J.; Ouyang, T. Advanced wind power prediction based on data-driven error correction. Energy Convers. Manag. 2019, 180, 302–311. [Google Scholar] [CrossRef]
Wei, B. Overview of wind power forecasting methods. Appl. Comp. Eng. 2025, 117, 114–119. [Google Scholar] [CrossRef]
Verdone, A.; Panella, M.; De Santis, E.; Rizzi, A. A review of solar and wind energy forecasting: From single-site to multi-site paradigm. Appl. Energy 2025, 392, 126016. [Google Scholar] [CrossRef]
Kumar, K.; Prabhakar, P.; Verma, A. Wind power forecasting technologies: A review. Energy Storage Convers. 2024, 2, 538. [Google Scholar] [CrossRef]
Yang, Y.; Lou, H.; Wu, J.; Zhang, S.; Gao, S. A survey on wind power forecasting with machine learning approaches. Neural Comput. Applic. 2024, 36, 12753–12773. [Google Scholar] [CrossRef]
Karakus, O.; Kuruoglu, E.; Altınkaya, M. One-day ahead wind speed/power prediction based on polynomial autoregressive model. IET Renew. Power Gener. 2017, 11, 1430–1439. [Google Scholar] [CrossRef]
Eissa, M.; Yu, J.; Wang, S.; Liu, P. Assessment of wind power prediction using hybrid method and comparison with different models. J. Electr. Eng. Technol. 2018, 13, 1089–1098. [Google Scholar] [CrossRef]
Bhaskar, K.; Singh, S. AWNN-assisted wind power forecasting using feed-forward neural network. IEEE Trans. Sustain. Energy 2012, 3, 306–315. [Google Scholar] [CrossRef]
Karamichailidou, D.; Kaloutsa, V.; Alexandridis, A. Wind turbine power curve modeling using radial basis function neural networks and Tabu search. Renew. Energy 2021, 163, 2137–2152. [Google Scholar] [CrossRef]
Liu, F.; Li, R.; Dreglea, A. Wind speed and power ultra short-term robust forecasting based on Takagi-Sugeno fuzzy model. Energies 2019, 12, 3551. [Google Scholar] [CrossRef]
Manusov, V.; Matrenin, P.; Nazarov, M.; Beryozkina, S.; Safaraliev, M.; Zicmane, I.; Ghulomzoda, A. Short-term prediction of the wind speed based on a learning process control algorithm in isolated power systems. Sustainability 2023, 15, 1730. [Google Scholar] [CrossRef]
Yang, L.; He, M.; Zhang, J.; Vittal, V. Support-vector-machine-enhanced markov model for short-term wind power forecast. IEEE Trans. Sustain. Energy 2015, 6, 791–799. [Google Scholar] [CrossRef]
Demolli, H.; Dokuz, A.S.; Ecemis, A.; Gokcek, M. Wind power forecasting based on daily wind speed data using machine learning algorithms. Energy Convers. Manag. 2019, 198, 111823. [Google Scholar] [CrossRef]
Lekkas, D.; Price, G.; Jacobson, N. Using smartphone app use and lagged-ensemble machine learning for the prediction of work fatigue and boredom. Comput. Hum. Behav. 2022, 127, 107029. [Google Scholar] [CrossRef]
Bi, J.; Han, T.; Li, H. International tourism demand forecasting with machine learning models: The power of the number of lagged inputs. Tour. Econ. 2020, 28, 621–645. [Google Scholar] [CrossRef]
Shang, H. Dynamic principal component regression for forecasting functional time series in a group structure. Scand. Actuar. J. 2020, 2020, 307–322. [Google Scholar] [CrossRef]
Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Wind power prediction using ensemble learning-based models. IEEE Access 2020, 8, 61517–61527. [Google Scholar] [CrossRef]
Ribeiro, M.; dos Santos Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
Sutton, C. Classification and regression trees, bagging, and boosting. Handb. Stat. 2005, 24, 303–329. [Google Scholar] [CrossRef]
Liu, Z.; Guo, H.; Zhang, Y.; Zuo, Z. A comprehensive review of wind power prediction based on machine learning: Models, applications, and challenges. Energies 2025, 18, 350. [Google Scholar] [CrossRef]
Ye, F.; Ezzat, A. Icing detection and prediction for wind turbines using multivariate sensor data and machine learning. Renew. Energy 2024, 231, 120879. [Google Scholar] [CrossRef]
Gao, X. Monthly wind power forecasting: Integrated model based on grey model and machine learning. Sustainability 2022, 14, 15403. [Google Scholar] [CrossRef]
Taheri, N.; Tucci, M. Enhancing regional wind power forecasting through advanced machine-learning and feature-selection techniques. Energies 2024, 17, 5431. [Google Scholar] [CrossRef]
Huang, B.; Liang, Y.; Qiu, X. Wind power forecasting using attention-based recurrent neural networks: A comparative study. IEEE Access 2021, 9, 40432–40444. [Google Scholar] [CrossRef]
Alcántara, A.; Galván, I.; Aler, R. Direct estimation of prediction intervals for solar and wind regional energy forecasting with deep neural networks. Eng. Appl. Artif. Intell. 2022, 114, 105128. [Google Scholar] [CrossRef]
Tuncar, E.; Saglam, S.; Oral, B. A review of short-term wind power generation forecasting methods in recent technological trends. Energy Rep. 2024, 12, 197–209. [Google Scholar] [CrossRef]
Wu, Z.; Luo, G.; Yang, Z.; Guo, Y.; Li, K.; Xue, Y. A comprehensive review on deep learning approaches in wind forecasting applications. CAAI Trans. Intell. Technol. 2022, 7, 129–143. [Google Scholar] [CrossRef]
Huang, X.; Zhang, Y.; Liu, J.; Zhang, X.; Liu, S. A short-term wind power forecasting model based on 3D convolutional neural network—Gated recurrent unit. Sustainability 2023, 15, 14171. [Google Scholar] [CrossRef]
Lei, P.; Ma, F.; Zhu, C.; Li, T. LSTM short-term wind power prediction method based on data preprocessing and variational modal decomposition for soft sensors. Sensors 2024, 24, 2521. [Google Scholar] [CrossRef]
Tsai, W.-C.; Hong, C.-M.; Tu, C.-S.; Lin, W.-M.; Chen, C.-H. A review of modern wind power generation forecasting technologies. Sustainability 2023, 14, 10757. [Google Scholar] [CrossRef]
Zhang, J.; Li, H.; Cheng, P.; Yan, J. Interpretable wind power short-term power prediction model using deep graph attention network. Energies 2024, 17, 384. [Google Scholar] [CrossRef]
Hu, Y.; Liu, H.; Wu, S.; Zhao, Y.; Wang, Z.; Liu, X. Temporal collaborative attention for wind power forecasting. Appl. Energy 2024, 357, 122502. [Google Scholar] [CrossRef]
Takagi, T.; Sugeno, M. Fuzzy identification of systems and its applications. IEEE Trans. Syst. Man Cybern. 1985, 15, 116–132. [Google Scholar] [CrossRef]
Mizutani, E.; Jang, J.-S.R. Coactive neural fuzzy modeling. In Proceedings of the IEEE International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995. [Google Scholar]
Vu, V.-P.; Wang, W.-J.; Chen, H.-C.; Zurada, J. Unknown input-based observer synthesis for a polynomial T-S fuzzy model system with uncertainties. IEEE Trans. Fuzzy Syst. 2018, 26, 1447–1458. [Google Scholar] [CrossRef]
Azhar, W.; Saleh, T.; Razi, B.M. Application of CANFIS for modelling and predicting multiple output performances for different materials in μEDM. CIRP J. Manuf. Sci. Technol. 2022, 37, 528–546. [Google Scholar] [CrossRef]
Samanta, S.; Hartanto, A.; Pratama, M.; Sundaram, S.; Srikanth, N. RIT2FIS: A recurrent Type 2 fuzzy inference system and its rule base estimation. In Proceedings of the 2019 International Joint Conference on Neural Networks, Budapest, Hungary, 14–19 July 2019. [Google Scholar]
Samanta, S.; Suresh, S.; Senthilnath, J.; Sundararajan, N. A new neuro-fuzzy inference system with dynamic neurons (NFIS-DN) for system identification and time series forecasting. Appl. Soft Comput. 2019, 82, 105567. [Google Scholar] [CrossRef]
Zhou, Y.; Guo, S.; Chang, F. Explore an evolutionary recurrent ANFIS for modelling mult-step-ahead flood forecasts. J. Hydrol. 2019, 570, 343–355. [Google Scholar] [CrossRef]
Mandic, D.; Chambers, J. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability; John Wiley & Sons, Inc.: Hoboke, NJ, USA, 2001. [Google Scholar]
Tsoi, A.; Back, A. Locally recurrent gobally feedforward networks: A critical review of architectures. IEEE Trans. Neural Netw. 1994, 5, 229–239. [Google Scholar] [CrossRef]
Mastorocostas, P.; Hilas, C. ReNFFor: A recurrent neurofuzzy forecaster for telecommunications data. Neural Comput. Appl. 2013, 22, 1727–1734. [Google Scholar] [CrossRef]
Kandilogiannakis, G.; Mastorocostas, P.; Voulodimos, A. ReNFuzz-LF: A recurrent neurofuzzy system for short-term load forecasting. Energies 2022, 15, 3637. [Google Scholar] [CrossRef]
Shihabudheen, K.; Pillai, G. Recent advances in neuro-fuzzy system: A Survey. Knowl. Based Syst. 2018, 152, 136–162. [Google Scholar] [CrossRef]
Ojha, V.; Abraham, A.; Snasel, V. Heuristic design of fuzzy inference systems: A review of three decades of research. Eng. Appl. Artif. Intel. 2019, 85, 845–864. [Google Scholar] [CrossRef]
Jassar, S.; Liao, Z.; Zhao, L. A recurrent neuro-fuzzy system and its application in inferential sensing. Appl. Soft Comput. 2011, 11, 2935–2945. [Google Scholar] [CrossRef]
Juang, C.-F.; Lin, Y.-Y.; Tu, C.-C. A recurrent self-evolving fuzzy neural network with local feedbacks and its application to dynamic system processing. Fuzzy Sets Syst. 2010, 161, 2552–2568. [Google Scholar] [CrossRef]
Mastorocostas, P.; Theocharis, J. A Recurrent fuzzy neural model for dynamic system identification. IEEE Trans. Syst. Man Cybern. B Cybern. 2002, 32, 176–190. [Google Scholar] [CrossRef] [PubMed]
Dunn, J. A fuzzy relative of the ISODATA process and its use in detecting compact, well-separated clusters. J. Cybern. 1974, 3, 32–57. [Google Scholar] [CrossRef]
Bezdek, J. Cluster validity with fuzzy sets. J. Cybern. 1973, 3, 58–73. [Google Scholar] [CrossRef]
Zhou, T.; Chung, F.-L.; Wang, S. Deep TSK fuzzy classifier with stacked generalization and triplely concise interpretability guarantee for large data. IEEE Trans. Fuzzy Syst. 2017, 25, 1207–1221. [Google Scholar] [CrossRef]
Davies, D.; Bouldin, D. A clustering separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 1, 224–227. [Google Scholar] [CrossRef] [PubMed]
Riedmiller, M.; Braun, H. In Proceedings of the IEEE International Joint Conference on Neural Networks, Nagoya, Japan, 25–29 October 1993.
Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1974. [Google Scholar]
Piche, S. Steepest descent algorithms for neural network controllers and filters. IEEE Trans. Neural Netw. 1994, 5, 198–212. [Google Scholar] [CrossRef] [PubMed]
Available online: https://www.kaggle.com/datasets/theforcecoder/wind-power-forecasting (accessed on 12 April 2025).
Alkesaiberi, A.; Harrou, F.; Sun, Y. Efficient wind power prediction using machine learning methods: A comparative study. Energies 2022, 12, 2327. [Google Scholar] [CrossRef]
Protopapadakis, E.; Voulodimos, A.; Doulamis, N. An investigation on multi-objective optimization of feedforward neural network topology. In Proceedings of the 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, Cyprus, 27–30 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
Vartholomaios, A.; Karlos, S.; Kouloumpris, E.; Tsoumakas, G. Short-term renewable energy forecasting in Greece using prophet decomposition and tree-based ensembles. In Database and Expert Systems Applications—DEXA 2021 Workshops; Kotsis, G., Tjoa, A.M., Khalil, I., Moser, B., Mashkoor, A., Sametinger, J., Fensel, A., Martinez-Gil, J., Fischer, L., Czech, G., et al., Eds.; Springer: Berlin/Heidelberg, Germany, 2021; Volume 1479. [Google Scholar] [CrossRef]
Available online: https://github.com/intelligence-csd-auth-gr/greek-solar-wind-energy-forecasting (accessed on 14 April 2025).

Figure 1. Schematic diagram of the recurrent neural network that implements the l-th fuzzy rule’s consequent part.

Figure 2. Block diagram of DNFWP.

Figure 3. Flow-chart of DNFWP’s building process.

Figure 4. Evolution of the Davies–Bouldin index for the first example.

Figure 5. FCM partition of the input space for the first example.

Figure 6. The actual and the DNFWP wind power curves for the first example.

Figure 7. An enlarged time frame where the wind power varies from low to medium values.

Figure 8. An enlarged time frame where the wind power varies from medium to high values.

Figure 9. An enlarged time frame where the wind power reaches the minimum.

Figure 10. An enlarged time frame where the wind power attains consecutive peaks.

Figure 11. Operation of the fuzzy rule base for three samples from different areas of the input space.

Figure 12. Evolution of the Davies–Bouldin index for the second example.

Figure 13. FCM partition of the input space for the second example.

Figure 14. A time frame where the wind power varies from medium to high values.

Figure 15. A time frame where the wind power reaches the minimum.

Table 2. Tunable parameters (weights) of DNFWP.

Parameter	Number
m	Rules
σ	Rules
$w_{l j i}^{(1)}$	Rules ∙ Hidden neurons
$w_{l j}^{(2)}$	Rules ∙ Hidden neurons
$w_{l j}^{(3)}$	Rules ∙ Hidden neurons
$w_{l j}^{(4)}$	Rules ∙ Hidden neurons
$w_{l}^{(5)}$	Rules
Premise parameters	2∙Rules
Consequent parameters	Rules ∙ (4 ∙ Hidden neurons + 1)
Total	Rules ∙ (4 ∙ Hidden neurons + 3)

Table 3. Learning parameters of DRPROP (

n^{+}

and

n^{-}

are the increase and the attenuation factor, respectively, of WUF).

Table 3. Learning parameters of DRPROP (

n^{+}

and

n^{-}

are the increase and the attenuation factor, respectively, of WUF).

$n^{+}$	$n^{-}$	$Δ_{0}$	$Δ_{\min}$	$Δ_{\max}$
1.05	0.5	0.01	0.0001	0.5

Table 4. DNFWP’s performance indices on training and testing datasets.

Number of Rules	Training Dataset			Testing Dataset
Number of Rules	RMSE (kW)	MAE (kW)	r-Squared	RMSE (kW)	MAE (kW)	r-Squared
2	31.65	22.31	0.996	37.28	25.62	0.996
3	25.76	17.87	0.997	29.97	21.18	0.997
6	24.24	17.14	0.998	29.92	20.76	0.997
8	23.59	17.28	0.998	30.85	21.72	0.997
10	24.69	17.79	0.998	30.38	20.48	0.997
12	25.98	18.21	0.997	36.01	23.73	0.996
14	25.26	18.60	0.997	34.90	23.32	0.996
16	26.00	18.26	0.997	37.08	24.44	0.996
18	26.23	17.77	0.997	37.09	23.13	0.996

Table 5. Parameters of LSTM-2L/3L and GRU forecasters.

LSTM-2L/3L
Number of neurons	Optimizer	Dropout	Batch size	Learning rate	Activation function
50	Adam	0.2	24	0.001	Hyperbolic tangent
GRU
Number of neurons	Optimizer	Dropout	Batch size	Learning rate	Activation function
60	Adam	0.2	32	0.001	Hyperbolic tangent

Table 6. Performance indices for Machine Learning-based and Computational Intelligence-based predictors for the Kaggle dataset.

Predictor	RMSE (kW)	MAE (kW)	r-Squared
BS	129.26	87.08	0.92
RF	133.70	90.41	0.91
GPR	129.06	84.94	0.92
LSTM-2L	29.83	22.11	0.997
LSTM-3L	28.54	22.54	0.998
GRU	32.07	22.33	0.996
DNFWP	29.92	20.76	0.997
SVR	130.89	85.36	0.91

Table 7. Complexity of DNFWP and Deep Learning predictors.

Model	No. of Parameters
DNFWP	162
GRU	54,781
LSTM-2L	30,651
LSTM-3L	50,851

Table 8. Performance indices and sizes of tuning parameter vectors for RNFM-WP and LSTM-3L for the Greek dataset.

Predictor	RMSE (MW)	MAE (MW)	r-Squared	No of Parameters
GRU	51.85	42.28	0.988	38,151
LSTM-3L	32.72	24.82	0.995	50,851
DNFWP	32.05	22.31	0.996	162

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kandilogiannakis, G.; Mastorocostas, P.; Voulodimos, A.; Hilas, C.; Varsamis, D. Wind Power Prediction Using a Dynamic Neuro-Fuzzy Model. Electronics 2025, 14, 2326. https://doi.org/10.3390/electronics14122326

AMA Style

Kandilogiannakis G, Mastorocostas P, Voulodimos A, Hilas C, Varsamis D. Wind Power Prediction Using a Dynamic Neuro-Fuzzy Model. Electronics. 2025; 14(12):2326. https://doi.org/10.3390/electronics14122326

Chicago/Turabian Style

Kandilogiannakis, George, Paris Mastorocostas, Athanasios Voulodimos, Constantinos Hilas, and Dimitrios Varsamis. 2025. "Wind Power Prediction Using a Dynamic Neuro-Fuzzy Model" Electronics 14, no. 12: 2326. https://doi.org/10.3390/electronics14122326

APA Style

Kandilogiannakis, G., Mastorocostas, P., Voulodimos, A., Hilas, C., & Varsamis, D. (2025). Wind Power Prediction Using a Dynamic Neuro-Fuzzy Model. Electronics, 14(12), 2326. https://doi.org/10.3390/electronics14122326

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wind Power Prediction Using a Dynamic Neuro-Fuzzy Model

Abstract

1. Introduction

2. The Structure of Dynamic Neuro-Fuzzy Wind Predictor

3. The Model-Building Process

4. Experimental Results

4.1. The Wind Prediction Problem—Data Preprocessing Stage

4.2. Example 1: The Kaggle Dataset

4.3. Example 2: The Greek Dataset

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI