An Improved Wind Power Forecasting Model Considering Peak Fluctuations

Yang, Shengjie; Tang, Jie; Ye, Lun; Liu, Jiangang; Zhao, Wenjun

doi:10.3390/electronics14153050

Open AccessArticle

An Improved Wind Power Forecasting Model Considering Peak Fluctuations

by

Shengjie Yang

¹,

Jie Tang

^1,*

,

Lun Ye

²,

Jiangang Liu

¹ and

Wenjun Zhao

¹

School of Computer Science, Hunan University of Technology and Business, Changsha 410205, China

²

State Grid Hunan Electric Power Company Limited Economic & Technical Research Institute, Changsha 410004, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(15), 3050; https://doi.org/10.3390/electronics14153050

Submission received: 1 July 2025 / Revised: 24 July 2025 / Accepted: 29 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Digital Intelligence Technology and Applications)

Download

Browse Figures

Versions Notes

Abstract

Wind power output sequences exhibit strong randomness and intermittency characteristics; traditional single forecasting models struggle to capture the internal features of sequences and are highly susceptible to interference from high-frequency noise and predictive accuracy is still notably poor at the peaks where the power curve undergoes abrupt changes. To address the poor fitting at peaks, a short-term wind power forecasting method based on the improved Informer model is proposed. First, the temporal convolutional network (TCN) is introduced to enhance the model’s ability to capture regional segment features along the temporal dimension, enhancing the model’s receptive field to address wind power fluctuation under varying environmental conditions. Next, a discrete cosine transform (DCT) is employed for adaptive modeling of frequency dependencies between channels, converting the time series data into frequency domain representations to extract its frequency features. These frequency domain features are then weighted using a channel attention mechanism to improve the model’s ability to capture peak features and resist noise interference. Finally, the Informer generative decoder is used to output the power prediction results, this enables the model to simultaneously leverage neighboring temporal segment features and long-range inter-temporal dependencies for future wind-power prediction, thereby substantially improving the fitting accuracy at power-curve peaks. Experimental results validate the effectiveness and practicality of the proposed model; compared with other models, the proposed approach reduces MAE by 9.14–42.31% and RMSE by 12.57–47.59%.

Keywords:

wind power forecasting; informer; frequency attention mechanism; dilated convolution

1. Introduction

Driven by the continued implementation of China’s ‘dual carbon’ strategy, the traditional power system is experiencing significant structural changes and a transformative shift [1]. In this context, wind energy, as a clean and renewable resource, is increasingly becoming a strategic priority in the development of new energy across many countries [2]. However, owing to the inherent influence of geographical and climatic conditions, wind power generation exhibits considerable uncertainty, resulting in notable fluctuations in output when connected to the power grid [3]. This will bring huge challenges to the real-time balance, risk management, and resource allocation of the power grid. Therefore, accurate and timely prediction of the power changes in wind power generation is of great significance for building a new type of power system [4].

Short-term wind power forecasting is mainly used to predict the power output of wind farms or wind farm clusters in a relatively short period of time in the future. The time scale of the prediction usually ranges from a few hours to a few days, and the time resolution of the prediction results is 15 min [5]. At present, traditional wind power forecasting methods mainly include two categories: physical methods and statistical methods [6]. Physical methods are based on raw meteorological data such as wind speed, direction, temperature, and pressure, and are used for Numerical Weather Prediction (NWP) to predict win speed and direction; the power curve of the wind turbine is then used to calculate the wind power [7]. However, physical methods require a significant amount of computing resources and have long prediction times. Statistical methods mainly analyze historical data to construct the relationship between wind power and meteorological characteristics for prediction. Common statistical methods include Autoregressive Moving Average [8], Autoregressive Integral Moving Average [9], Kalman Filter method [10], etc. With the continuous development of artificial intelligence technology, many scholars have begun to apply machine learning methods to wind power forecasting, Common methods include Random Forest [11], Support Vector Machine [12], Multilayer Perceptron [13], and Gradient Boosting Tree [14]. However, shallow machine learning algorithms lack the ability to capture highly nonlinear relationships between time series data, resulting in poor performance when making large-scale predictions [15,16,17].

With the continuous deepening of artificial intelligence research [18,19,20,21], some researchers have applied deep learning in the field of power prediction and achieved good results. Ref. [22] uses long short-term memory neural networks for short-term prediction of wind power. Ref. [23] uses Convolutional Neural Networks (CNN) to extract features and form predicted feature vectors. Then, a Gated Recurrent Unit (GRU) is used to establish a nonlinear relationship between historical feature data and power time series data to improve the model’s prediction performance. Ref. [24] combines Variable Mode Decomposition (VMD) and Long Short-Term Memory Network (LSTM) to decompose raw power data with strong nonlinearity into more stable subsequences, thereby improving the prediction accuracy of the model. Although LSTM and GRU recurrent neural networks perform well in processing temporal data, their serial structure limits the speed of data processing. As the time span of the input sequence increases, the model will face the problem of gradient explosion or vanishing, resulting in a lack of ability to capture long-term dependencies [25,26].

In recent years, the Transformer model proposed by the Google team has made breakthrough progress in multiple fields such as natural language processing and computer vision. Its core lies in the self-attention mechanism and the design of encoder–decoder structure. However, Transformers face issues such as quadratic time complexity and high memory consumption in practical applications. Ref. [27] proposed the Informer model to compensate for the shortcomings of Transformer in long-term time prediction tasks and demonstrated good prediction performance. Ref. [28] proposed the MGWO Informer model, which improved the gray wolf optimization algorithm to optimize the parameters of the Informer model and enhance its accuracy. Reference [29] uses feature engineering to mine independent features related to wind power from raw data and then uses an Informer model to handle the problem of long-term prediction of wind power series. The Informer model outlined in the above literature is capable of extracting feature relationships between time points, but lacks processing for the sharp fluctuations in power generated during unstable wind conditions. Ref. [30] proposed the GGInformer model, which introduces genetic algorithms to extract features from multidimensional time series data, reducing the impact of redundant features on subsequent predictions and improving prediction accuracy. Ref. [31] proposed the CNN Informer model, which uses two-dimensional convolution to extract temporal features and trend information, and uses the Informer model to predict power data. The above Informer model only analyzes and models historical power time domain features, without considering the important influence of frequency domain features, which limits the improvement of model accuracy.

Existing studies have primarily aimed at improving overall forecasting accuracy and demonstrate satisfactory performance during periods of sustained power increase or decrease. However, the predictive fit at the peaks where the power curve exhibits abrupt changes remains largely unsatisfactory. This shortcoming may stem from the models’ deficiencies in long-range feature dependencies, their inability to capture short-range temporal segment characteristics, or their neglect of the significant impact of frequency domain information. In response to the above issues, this article proposes a short-term wind power forecasting model based on time convolutional networks and frequency domain attention mechanism (FAM) to improve the Informer model. Firstly, time segment features of historical data are extracted through TCN and concatenated with the Informer encoder embedding layer to enhance the receptive field of the model. Secondly, FAM is introduced to extract frequency domain features from the output data of the encoder, converting time series data into frequency domain representations, and calculating channel descriptors through channel attention mechanism to weight the extracted frequency domain features, minimizing the interference of noise in the input data and improving the model’s perception ability of outliers. Finally, the extracted data is input into the Informer model decoder, which combines sparse self-attention mechanism and self-attention distillation operation to improve the prediction speed of the model. In the aforementioned procedure, the temporal segment features provided by the TCN enable the model to cope with sudden power variations within short intervals, allowing timely reactions at the inflection points of the power curve. The Informer’s proprietary sparse self-attention mechanism and self-attention distillation allow the model to memorize long-range temporal trends, thereby offering an overarching estimate of future power evolution based on historical information. The incorporation of the FAM further equips the model to exploit frequency domain features rather than relying solely on temporal ones, thereby capturing inter-channel dependencies that might be overlooked in the time domain and enhancing the stability of wind-power forecasting. To validate the theoretical claims, a series of ablation and comparative experiments were conducted. The results indicate that the proposed model reduces MAE by 9.14–42.31% and RMSE by 12.57–47.59% relative to other models.

2. Related Work

2.1. Time Convolutional Network

The temporal convolutional network (TCN) is a variant of convolutional neural networks specifically designed to efficiently model and process time series data [32]. The integration of causal convolution, dilated convolution, and residual connections enables traditional convolutional neural networks to more effectively handle time series tasks.

Among them, causal convolution can effectively capture long-term dependencies in time series, ensuring that only information before the current time point is used when processing data, thereby maintaining causal relationships. However, due to its ability to capture long-term correlations, causal convolution typically requires deeper networks or larger convolution kernels to meet the requirements. The TCN model, due to the introduction of dilated convolution, can expand the receptive field of the convolution kernel without increasing the number of parameters, while maintaining the dimensionality of the output features. Its basic structure is shown in Figure 1.

Residual connections enable the network to transmit information between different layers, effectively alleviating the problems of gradient vanishing and exploding in deep networks, thereby improving their stability and training efficiency. Figure 2 shows the residual structural units of TCN.

2.2. Channel Attention Mechanism

The channel attention mechanism improves feature representation in convolutional neural networks by adaptively adjusting the importance of individual channels [33]. Squeeze-and-Excitation (SE) blocks simulate inter-channel dependencies and recalibrate feature maps using global information, thereby enhancing the network’s representational capacity. As shown in Figure 3, the process involves two stages: squeeze and excitation. In the context of time series signals, Global Average Pooling (GAP) is applied along the time dimension during the squeeze operation to produce channel descriptors, as described by the following equation:

Z_{c} = G A P (x_{c}) = \frac{1}{L_{S}} \sum_{i = 1}^{L_{S}} x_{c} (i)

(1)

where

x_{c}

is the original feature map;

c

and

L_{S}

represent channel and time series length, respectively; and the scalar

Z_{c}

is the c-th element of

Z

. The purpose of the excitation step is to capture inter-channel dependencies via a nonlinear activation function, formulated as follows:

a t t = σ (f_{2} δ (f_{1} Z))

(2)

where

f_{1}

and

f_{2}

are two fully connected layers;

a t t \in R^{C}

is the learned attention vector which dot multiplies to the original feature map to re-scale each channel; and

δ (\cdot)

and

σ (\cdot)

refer to ReLU and Sigmoid activation function, respectively.

2.3. Mish Activation Function

To solve the problem of some neurons being unable to be activated in the ReLU activation function, Misra proposed the Mish activation function [34]. The function graph is shown in Figure 4.

It is not difficult to see that the Mish activation function has a smoother curve, and its smooth and non-monotonic characteristics can further reduce the impact of gradient explosion on the model, while also making the model converge faster. The function expression is as follows:

Mish (x) = x \cdot \tanh (\ln (1 + e^{x}))

(3)

2.4. Informer

The Informer model consists of an encoder and a decoder. To reduce model parameters and improve computation speed, Informer incorporates conventional convolution and pooling operations into its architecture, based on traditional Transformer models [35]. The specific structure diagram of Informer is shown in Figure 5. Here are several improvements to the Informer model:

1.: probSparse self-attention mechanism

The encoder part of the Informer model adopts a multi-head sparse self-attention mechanism instead of the traditional multi-head self-attention mechanism in the Transformer model. By compressing the convolutional pool of the attention layer, the time complexity and memory usage are reduced, which can effectively handle longer sequence inputs.

2.: Self-attention distillation

To better handle the input of long time series and solve the problem of information redundancy, Informer adopts the self-attention distillation method, which extracts the main features of the sequence through convolution and pooling operations and reduces the spatial complexity of the input. It greatly reduces the number of network layers and parameters, and improves the robustness of the layer stacking part, enabling the model to more effectively handle long-term dependencies of time-series data.

3.: Generative decoder

The Informer model uses a standard decoder structure consisting of two identical multi-head probabilistic sparse self-attention layers. To improve the slow prediction speed problem faced by traditional decoders when processing long temporal tasks, Informer has adopted a generative decoder. The decoder receives a long sequence input, fills the target element with zeros, calculates the weighted attention combination of the feature map, and introduces a hidden multi-head probabilistic sparse self-attention mechanism. This mechanism prevents each location from focusing on subsequent data, thus avoiding autoregression. Finally, the final output is obtained through a fully connected layer.

Figure 5. Schematic diagram of Informer structure.

3. Methodology

3.1. Analysis of Abnormal Event Characteristics

In the field of wind power forecasting, the prediction process is often difficult to completely avoid the impact of abnormal events or unexpected situations, such as meteorological disasters (typhoons, thunderstorms, etc.), equipment failures, or wind turbine shutdowns. Prediction models are usually trained based on historical data, but these abnormal events may cause changes in the state pattern of wind power, making it difficult for the model to accurately determine whether the current power belongs to noise or outliers, thereby significantly reducing the prediction accuracy of the model at the peaks and valleys of the power change curve. The introduction of TCN enables the model to extract features of time segments near the observation point, thereby determining whether the observation point is an outlier, or a part of a changing pattern based on the current environment. Yet relying solely on the analysis of adjacent temporal segments still leaves the model vulnerable to failure. It is therefore imperative that the model also leverage historical data from more distant past points as a reference, thereby enhancing its ability to promptly detect power-curve transitions. Given Informer’s outstanding performance in long-sequence forecasting, we adopt it to establish dependencies among long-range temporal points, endowing the model with both neighboring-segment and long-span inter-temporal feature associations and consequently refining the fitting accuracy at power-curve peaks.

3.2. Frequency Domain Feature Analysis

Wind power data is often difficult to completely avoid interference from noise. In the process of power collection, random fluctuations in wind speed and direction, errors in measurement equipment, power fluctuations in power grid operation, and aging of system equipment all inevitably bring noise interference to the original data. Frequency domain analysis is an effective means of processing and separating noise [36]. For example, when wind power is disturbed by noise, its waveform becomes chaotic. These noise sources usually appear as random high-frequency signals in the time domain, masking the key trends and fluctuation characteristics of wind power, making it difficult to distinguish power information from noise in the time domain, but it can be clearly distinguished in the frequency domain. Therefore, this article introduces frequency domain attention mechanism. By converting the raw power information into a frequency domain representation to separate noise, the goal of improving model prediction accuracy is achieved.

At present, the existing frequency information extraction methods are mainly based on Fourier transform (FT). However, due to the periodicity of its functions, Fourier transform cannot effectively capture the authenticity of non-stationary signals or signal transients, thus introducing high-frequency noise. Therefore, this article uses Discrete Cosine Transform (DCT) for frequency information mining. As DCT uses symmetric extension for its periodic extension, it can avoid the jumping discontinuity of Fourier transform and improve the stability when processing non-stationary signals. The basis function formula for one-dimensional DCT (1D-DCT) is as follows:

B_{l}^{i} = \cos (\frac{π l}{L_{S}} (i + \frac{1}{2}))

(4)

Thus, 1D-DCT is represented as

f_{1 d}^{l} = \sum_{i = 0}^{L_{S} - 1} x_{i}^{1 d} B_{l}^{i}

(5)

s.t.

l \in {0, 1, \dots, L_{S} - 1}

;

f_{1 d}^{l}

is the 1D-DCT frequency spectrum;

x^{1 d} \in R^{L}

is the input; and

L

is the length of

x^{1 d}

. Then, the inverse 1D-DCT can be written as

x_{i}^{1 d} = \sum_{i = 0}^{L_{S} - 1} f_{l}^{1 d} B_{l}^{i}

(6)

3.3. Frequency Attention Mechanism

The overall structure diagram of FAM is shown in Figure 6. Firstly, FAM divides the input features into n subgroups along the channel dimension as

[v_{0}, v_{1}, \dots, v_{n - 1}]

. Subsequently, each subgroup will be processed from low to high frequency by its corresponding DCT frequency component, and each individual channel will be processed by the same frequency component, resulting in

F r e q^{i} = D C T_{j} (V^{i}) = \sum_{j = 0}^{L_{S} - 1} (V_{:, l}^{i}) B_{l}^{j}

(7)

where

j

is the 1D frequency component index corresponding to

V^{i}

and

F r e q^{i} \in R^{L}

is the

L

-dimensional vector after discrete cosine transform. The entire frequency domain channel vector can be obtained through a stacking operation:

F r e q = s t ([F r e q^{0}, F r e q^{1}, \dots, F r e q^{n - 1}])

(8)

where

F r e q \in R^{C \times L}

is the frequency domain attention vector. After obtaining the frequency domain attention vector, the model can learn attention weights through the channel attention SE block. The entire frequency domain attention mechanism framework can be written as

F_{c} - a t t = σ (f_{2} ω (f_{1} (F r e q)))

(9)

where

ω

and

σ

representing Mish and Sigmoid activation functions, respectively. Through the above steps, each channel feature interacts with each frequency component information, comprehensively obtaining important time information from the frequency domain, which can enhance the diversity of feature extraction in the network.

3.4. TCN–FAM–Informer

Due to various factors affecting wind speed, including short-term meteorological changes, seasonal variations, terrain, etc., wind power has significant non-stationarity and is prone to rapid fluctuations in the short term. Therefore, it is necessary to conduct regularity analysis on power data while reducing the impact of noise on the model.

The Informer model introduces sparse self-attention mechanism and self-attention distillation operation in the original encoder–decoder structure, which can effectively avoid gradient explosion problem while capturing long-term temporal dependencies. Compared with the Transformer model, it is more suitable for time series prediction tasks. However, the Informer model can only extract feature information between time points, lacking the capture of temporal features. Therefore, based on the shortcomings of the Informer model, TCN is introduced to enhance the extraction and processing of information dependencies and key features in time segments. On the other hand, in order to make the model more applicable to the irregular features of wind power sequences, a frequency domain attention mechanism is introduced to convert time-series data into frequency domain representations, extract data frequency information, effectively separate and reduce the interference of high-frequency noise, and enhance the robustness of the model. The basic framework of the model is shown in Figure 7. First, the raw wind power data, after feature selection, passes through a TCN layer to extract information within neighboring temporal segments, and is then fed into the Informer encoder. Here, the data traverse two encoder layers to capture long-range temporal dependencies. Next, the encoder’s output is forwarded to the FAM layer, which assigns frequency-weighted information at the channel level. Finally, the FAM output enters the decoder, undergoes Informer’s masked multi-head probSparse self-attention, and yields the ultimate prediction via a fully connected layer.

3.5. Model Prediction Process

The prediction process of the model in this article is shown in Figure 8. The specific steps are as follows:

Step 1: First, use Pearson correlation coefficient to perform correlation analysis on the original dataset, calculate the correlation coefficient between power features and meteorological features, retain meteorological features with strong correlation with power data as the final input of the model, and divide the training set and validation set.

Step 2: Establish the TCN–FAM–Informer model and adjust the model training parameters. Train the model using the training set data to optimize the model parameters and save the best parameters during the training process.

Step 3: Input the test set data into the trained TCN–FAM–Informer model, output the predicted values of the test set, and perform inverse normalization on them.

Step 4: Select Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of Determination (R-Squared, R²) as evaluation metrics to assess the predictive accuracy of our model.

4. Results

4.1. Data Preprocessing

The dataset uses historical wind power data and meteorological data from a wind farm in China as raw data, with a time span of 1 January 2019 to 31 December 2020, a sampling interval of 15 min, and 96 sampling points per day.

This experiment will use 64,320 sets of data from 1 January 2019 to 31 October 2020 as the training set, and 5856 sets of data from November December 2020 as the testing set. Using min max normalization, the normalization formula is as follows:

Y = \frac{X - X_{\min}}{X_{\max} - X_{\min}}

(10)

where

Y

is the normalized value of wind power or meteorological data,

X

is the original wind power or meteorological data value, and

X_{\min}

and

X_{\max}

are the minimum and maximum values of wind power or meteorological data, respectively.

There are many types of meteorological factors that affect wind power, and the degree of influence of different factors varies. Therefore, wind power data often exhibits strong volatility. To investigate the importance of different factors on the characteristics of wind power, Pearson correlation coefficient was used to conduct correlation analysis on the dataset, and input variables were selected reasonably to reduce the complexity of the model and improve its predictive performance. The relevant formula is

r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}}}

(11)

where

i

is the number of data points,

x_{i}

and

y_{i}

are the values of the

i

-th data point for variables

x

and

y

, respectively,

\bar{x}

and

\bar{y}

is the average value of the data, and the value of correlation coefficient

r

is between −1 and 1. The correlation analysis results are shown in Table 1.

From the above table, it can be seen that the correlation coefficients of factors such as wind direction at 10 m and 30 m are lower than 0.5, indicating that the influencing factors are relatively low. Therefore, they are not used as model feature inputs. The final inputs for this model are the wind speed at 10 m, 30 m, 50 m, and hub height, which have an importance level greater than 0.5.

4.2. Preparation Before the Experiment

The hardware configuration used in this experiment is an Intel i5-13600kf CPU (Intel, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 4060 Ti GPU, with 8 GB of video memory (NVIDIA Corporation, Santa Clara, CA, USA). The deep learning framework is PyTorch 1.12.0, and the operating system is Win10.

4.3. Model Evaluation Indicators

To verify the superiority of this model over existing models, this paper uses MAE, RMSE, and R2 to evaluate the prediction accuracy of the proposed network model. The calculation formula is as follows:

\begin{matrix} M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}| \end{matrix}

(12)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(13)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(14)

where

y_{i}

represents the true value,

{\hat{y}}_{i}

represents the predicted value,

{\bar{y}}_{i}

represents the average of the true values, and

n

represents the dimension of the test set. The closer the values of MAE and RMSE are to 0, the better the predictive performance of the model. The closer the value of R² is to 1, the better the predictive performance of the model [37,38,39].

4.4. Hyperparameter Optimization

Considering the impact of hyperparameter selection on the learning ability of the model during the construction process [40], cross-validation is employed to systematically explore the hyper-parameter space, with multiple runs averaged to neutralize random experimental error. Model performance under each hyper-parameter combination was rigorously assessed using MAE and RMSE, the configuration yielding the lowest values was deemed optimal, and the corresponding hyper-parameters are presented in Table 2.

4.5. Result and Analysis

The overall prediction results of the model are shown in Figure 9. The prediction results indicate that the model proposed in this article can effectively cope with the unstable characteristics of wind power data, and still exhibit good fitting effects at peaks and valleys. To further validate the effectiveness and superiority of the model proposed in this article, ablation experiments and comparative experiments were designed for verification, and the errors between different models and real values were presented in the following two sections. To improve the curve comparison effect, it is recommended to extract the predicted data from the test set of 60 data points within 15 h for display.

4.5.1. Ablation Experiment

To verify the impact of the TCN layer and FAM layer on prediction performance in our model, experiments were conducted to compare the TCN–Informer (T–Informer), FAM–Informer (F–Informer), and Informer models. The results of the ablation experiment are shown in Figure 10.

As can be seen from the prediction graph, the TCN–FAM–Informer model’s prediction results are compared to those of F–Informer, T–Informer, and Informer, the fitting degree with the true value curve is higher, and the prediction effect is better. Among them, the prediction results of the Informer model fluctuate greatly, making it difficult to maintain good stability during the prediction process, especially when wind power generates continuous fluctuations, making it difficult to effectively fit the peak curve. The introduction of TCN layer in T–Informer increases the receptive field of the model, effectively improving its stability. F–Informer has shown better fitting performance compared to Informer and can still fit the rough trend of the original data when wind power fluctuates at high frequencies, which to some extent improves the model’s ability to capture key features. The evaluation indicators for the predicted results of ablation experiments are shown in Table 3.

The results showed that in MAE evaluation, this model reduced by 10.41%, 9.14%, and 14.09% compared to T–Informer, F–Informer, and Informer models, respectively. At the same time, RMSE also decreased by 15.39%, 12.57%, and 21.06%, respectively. Compared to other ablation models, this model has significantly improved prediction accuracy.

Based on the above experimental results, it is not difficult to see that the model proposed in this paper exhibits better anti-interference performance in the face of rapid power fluctuations, can adapt to sharp changes in power data, and still shows good fitting effect at power peak points. This indicates that the introduction of TCN and FAM in the model has improved the feature mining efficiency of the Informer model for temporal information, making it easier for the model to determine the trend of wind power fluctuations based on the current environment, while reducing the interference of noise in temporal data on the model.

4.5.2. Comparative Experiment

To further substantiate the efficacy of the proposed model in wind power forecasting, this paper selects several advanced deep learning hybrid models in the field of wind power forecasting as benchmarks for comparison. A comparative analysis was conducted among the TCN–FAM–Informer model, the Convolutional Neural Network Long Short-Term Memory Network Attention (CNN-LSTM-A) model, the Convolutional Neural Network Gated Recurrent Unit Network Attention (CNN-GRU-A) model, the TCN-FEM-Transformer (TF–Transformer) model, and the VMD–Informer model. The experimental results are depicted in Figure 11, and the evaluation metrics for the wind power forecasting results of these models are summarized in Table 4.

As shown in Figure 11, both the proposed model and the TF–Transformer model can fit the true values well for effective prediction. However, the CNN-LSTM-A model and the CNN-GRU-A model have significant longitudinal errors in the prediction process, with the CNN-LSTM-A model performing the worst. This is due to the recursive structure of its model, which makes it difficult to capture long-distance time information when processing long-term data, and is prone to problems such as gradient vanishing. The CNN-GRU-A model has relatively smaller errors compared to the CNN-LSTM-A model, but the overall trend of the predicted results of CNN-GRU-A is similar to that of CNN-LSTM-A. Although it has improved the problem of gradient vanishing to some extent, it is still difficult to capture effective features of long-term time-series data. When wind power fluctuates, the prediction results of TCN–FAM–Transformer model and TF–Transformer model are more in line with the real power than those of CNN-LSTM-A and CNN-GRU-A models. This indicates that this type of Transformer model can effectively extract features from distant historical data due to its encoder–decoder structure parallel mechanism in the network. At the same time, its multi-head attention mechanism can ensure the full mining of multidimensional feature information of power data, thus performing better in fitting trends. As shown in Table 4, in the MAE index, this model reduced by 42.31% and 37.55% compared to the CNN-LSTM-A and CNN-GRU-A models, respectively. At the same time, RMSE decreased by 47.59% and 41.68%, respectively.

Compared with the TF–Transformer model, the TCN–FAM–Transformer model reduced MAE and RMSE indicators by 12.99% and 17.44%, respectively, as shown in Table 4. The relevant predictive indicators have significant advantages. The TCN–FAM–Informer adopts sparse self-attention mechanism and self-attention distillation operation, which avoids the overfitting caused by the global self-attention mechanism focusing too much on random fluctuation historical data in traditional Transformer models, and further optimizes the correlation effect between some abnormal or peak observation points. The model proposed in this article consistently outperforms the TF–Transformer model in prediction performance, especially at special nodes such as maximum and minimum values, ensuring stable control of error levels.

To provide a more comprehensive evaluation of the model’s performance, the training and inference times of all baseline models were recorded and are presented in Table 5. In terms of both training and inference time, the proposed model outperforms the TF–Transformer and VMD–Informer models, exhibiting lower computational costs. Although the proposed model requires slightly longer training time compared to CNN-LSTM-A and CNN-GRU-A due to its increased number of parameters, it achieves faster inference. This efficiency is attributed to the generative decoder of the Informer architecture, which enables the model to output multi-step forecasts in a single forward pass during long-sequence prediction, thereby reducing overall inference time.

4.5.3. Supplementary Analysis

To highlight the advantages of this study in peak fitting, this paper further demonstrates the prediction performance of various models when wind power fluctuates due to abrupt changes in wind speed. As shown in Figure 12, the wind power generally shows an upward trend followed by a downward trend, with a brief downward fluctuation at the peak of the power curve. As illustrated in the figure, our model can quickly respond to power changes and exhibits good fitting performance at the peak. In contrast, the other comparative models, although able to capture the general trend of power changes, fail to respond promptly and effectively when the power trend changes. Therefore, their fitting performance at the peak is poor, and they may even exhibit a lag effect.

5. Conclusions

5.1. Summary

This study tackles the critical challenge of accurately forecasting wind power, especially in capturing the peaks and troughs of power curves, where traditional models often struggle due to the inherent volatility and randomness of wind power data. The main research gap identified lies in existing models’ limited ability to effectively handle sharp fluctuations and noise interference, which results in suboptimal prediction accuracy. To address this issue, we propose a novel short-term wind power forecasting model based on the TCN–FAM–Informer architecture.

The proposed model integrates multiple modules to enhance both prediction accuracy and stability. First, a temporal convolutional network (TCN) is employed to extract features from adjacent time segments of the input data, enabling the model to make timely and effective judgments when abrupt changes in wind power occur. Second, to mitigate potential model failure that may arise from using a single architecture, the Informer is incorporated to capture long-range temporal dependencies. This significantly expands the receptive field, allowing the model to rely on historical patterns to better recognize sudden power fluctuations and improve its fitting performance at peak values. Finally, to further ensure model stability, a Frequency Attention Mechanism (FAM) is introduced to transform the time-series data into the frequency domain. This helps distinguish between abnormal noise and critical power features, enabling the model to consider both temporal and frequency domain information, thereby enhancing its overall forecasting capability.

Through a series of ablation and comparative experiments, the proposed TCN–FAM–Informer model has demonstrated superior performance. Compared with other state-of-the-art models, our model achieves a significant reduction in prediction errors, with the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) decreasing by up to 9.14–42.31% and 12.57–47.59%, respectively. These results underscore the model’s effectiveness in capturing both short-term and long-term dependencies in wind power data, as well as its robustness to noise interference.

5.2. Limitations and Future Directions

However, this study still has certain limitations. First, this research focuses on the effectiveness of model training and the optimization of peak prediction. However, in the process of wind power forecasting, improving the reliability of input data is also a promising research direction, and there is room for improvement in this regard in this study. Secondly, this study has been conducted using a relatively complete dataset for relevant predictions and has not focused on the occurrence of a large amount of missing data within the dataset. This is constrained by the nature of the wind power dataset itself. If a wind farm dataset with distinct characteristics is available for analysis, it would effectively contribute to enhancing the model’s generalization capability.

Future research could be extended in two main directions. First, the introduction of multiple or enhanced feature selection algorithms could enable more effective analysis and screening of model input features; from the perspective of enhancing the reliability of the input data to the model, this approach would allow the model to take into account the correlations between feature variables, thereby helping to refine the original dataset while reducing dimensionality. Secondly, future studies could explore validating the model’s effectiveness on additional datasets with distinct characteristics, when such data are available. For example, scenarios involving substantial data loss due to equipment failures in aging wind farms, or continuous extreme weather conditions in special environments, present unique challenges for wind power forecasting. Investigating the model’s performance under these conditions would be both meaningful and valuable and may offer important insights for improving its robustness and adaptability in practical applications.

Author Contributions

S.Y.: methodology, validation, funding acquisition. J.T.: writing—original draft preparation, writing—review and editing, software, conceptualization. L.Y.: software, data curation. J.L.: formal analysis, data curation. W.Z.: formal analysis, data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The National Social Science Fund of China (No. 23BGL234).

Data Availability Statement

The data presented in this research can be accessed upon request to the corresponding author.

Conflicts of Interest

Author Lun Ye was employed by the company State Grid Hunan Electric Power Company Limited Economic & Technical Research Institute, Changsha, China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wang, H.; Guo, D.; Wang, L.; Zhou, T.; Jia, C.; Liu, Y. A novel frequency sparse downsampling interaction transformer for wind power forecasting. Energy 2025, 326, 136199. [Google Scholar] [CrossRef]
Bashir, T.; Wang, H.; Tahir, M.; Zhang, Y. Wind and solar power forecasting based on hybrid CNN-ABiLSTM, CNN-transformer-MLP models. Renew. Energy 2025, 239, 122055. [Google Scholar] [CrossRef]
Dong, Y.; Zhou, B.; Zhang, H.; Yang, G.; Ma, S. A deep time-frequency augmented wind power forecasting model. Renew. Energy 2025, 239, 123550. [Google Scholar] [CrossRef]
Hanifi, S.; Liu, X.; Lin, Z.; Lotfian, S. A critical review of wind power forecasting methods—Past, present and future. Energies 2020, 13, 3764. [Google Scholar] [CrossRef]
Liu, X.; Zhou, J. Short-term wind power forecasting based on multivariate/multi-step LSTM with temporal feature attention mechanism. Appl. Soft Comput. 2024, 150, 111050. [Google Scholar] [CrossRef]
Liu, W.; Mao, Z. Short-term photovoltaic power forecasting with feature extraction and attention mechanisms. Renew. Energy 2024, 226, 120437. [Google Scholar] [CrossRef]
Qiu, H.; Shi, K.; Wang, R.; Zhang, L.; Liu, X.; Cheng, X. A novel temporal–spatial graph neural network for wind power forecasting considering blockage effects. Renew. Energy 2024, 227, 120499. [Google Scholar] [CrossRef]
Singh, A.; Kumar, R.S.; Bajaj, M.; Khadse, C.B.; Zaitsev, I. Machine learning-based energy management and power forecasting in grid-connected microgrids with multiple distributed energy sources. Sci. Rep. 2024, 14, 19207. [Google Scholar] [CrossRef]
Jonkers, J.; Avendano, D.N.; Van Wallendael, G.; Van Hoecke, S. A novel day-ahead regional and probabilistic wind power forecasting framework using deep CNNs and conformalized regression forests. Appl. Energy 2024, 361, 122900. [Google Scholar] [CrossRef]
Louka, P.; Galanis, G.; Siebert, N.; Kariniotakis, G.; Katsafados, P.; Pytharoulis, I.; Kallos, G. Improvements in wind speed forecasts for wind power prediction purposes using Kalman filtering. J. Wind. Eng. Ind. Aerodyn. 2008, 96, 2348–2362. [Google Scholar] [CrossRef]
Abuella, M.; Chowdhury, B. Random forest ensemble of support vector regression models for solar power forecasting. In Proceedings of the 2017 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 23–26 April 2017; pp. 1–5. [Google Scholar]
Zendehboudi, A.; Baseer, M.A.; Saidur, R. Application of support vector machine models for forecasting solar and wind energy resources: A review. J. Clean. Prod. 2018, 199, 272–285. [Google Scholar] [CrossRef]
Zhou, X.; Wu, J.; Liang, W.; Wang, K.I.K.; Yan, Z.; Yang, L.T.; Jin, Q. Reconstructed graph neural network with knowledge distillation for lightweight anomaly detection. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11817–11828. [Google Scholar] [CrossRef]
Li, C.; Zhu, D.; Hu, C.; Li, X.; Nan, S.; Huang, H. ECDX: Energy consumption prediction model based on distance correlation and XGBoost for edge data center. Inf. Sci. 2023, 643, 119218. [Google Scholar] [CrossRef]
Yang, M.; Ju, C.; Huang, Y.; Guo, Y.; Jia, M. Short-term power forecasting of wind farm cluster based on global information adaptive perceptual graph convolution network. IEEE Trans. Sustain. Energy 2024, 15, 2063–2076. [Google Scholar] [CrossRef]
Zhou, X.; Li, Y.; Liang, W. CNN-RNN based intelligent recommendation for online medical pre-diagnosis support. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 18, 912–921. [Google Scholar] [CrossRef]
Yao, L. Global exponential convergence of neutral type shunting inhibitory cellular neural networks with D operator. Neural Process. Lett. 2017, 45, 401–409. [Google Scholar] [CrossRef]
Pan, Z.; Wang, Y.; Cao, Y.; Gui, W. VAE-based interpretable latent variable model for process monitoring. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 6075–6088. [Google Scholar] [CrossRef] [PubMed]
Zhou, X.; Liang, W.; Kevin, I.; Wang, K.; Wang, H.; Yang, L.T.; Jin, Q. Deep-learning-enhanced human activity recognition for internet of healthcare things. IEEE Internet Things J. 2020, 7, 6429–6438. [Google Scholar] [CrossRef]
Zhou, X.; Xu, X.; Liang, W.; Zeng, Z.; Yan, Z. Deep-learning-enhanced multitarget detection for end–edge–cloud surveillance in smart IoT. IEEE Internet Things J. 2021, 8, 12588–12596. [Google Scholar] [CrossRef]
Jiang, L.; Hu, R.; Wang, X.; Tu, W.; Zhang, M. Nonlinear prediction with deep recurrent neural networks for non-blind audio bandwidth extension. China Commun. 2018, 15, 72–85. [Google Scholar] [CrossRef]
Liu, B.; Zhao, S.; Yu, X.; Zhang, L.; Wang, Q. A novel deep learning approach for wind power forecasting based on WD-LSTM model. Energies 2020, 13, 4964. [Google Scholar] [CrossRef]
Zhao, S.; Xu, Z.; Zhu, Z.; Liang, X.; Zhang, Z.; Jiang, R. Short and long-term renewable electricity demand forecasting based on CNN-Bi-GRU model. IECE Trans. Emerg. Top. Artif. Intell. 2025, 2, 1–15. [Google Scholar] [CrossRef]
Ghanbari, E.; Avar, A. Short-term wind power forecasting using the hybrid model of multivariate variational mode decomposition (MVMD) and long short-term memory (LSTM) neural networks. Electr. Eng. 2025, 107, 2903–2933. [Google Scholar] [CrossRef]
Li, G.; Ding, C.; Zhao, N.; Wei, J.; Guo, Y.; Meng, C.; Huang, K.; Zhu, R. Research on a novel photovoltaic power forecasting model based on parallel long and short-term time series network. Energy 2024, 293, 130621. [Google Scholar] [CrossRef]
Yang, S. A novel study on deep learning framework to predict and analyze the financial time series information. Future Gener. Comput. Syst. 2021, 125, 812–819. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
Wang, J.; Liang, Y.; Liu, X. An Ultra-Short-Term Wind Power Prediction Method Based on MGWO-VMD-Informer. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4872799 (accessed on 21 June 2025).
Wei, H.; Wang, W.-S.; Kao, X.-X. A novel approach to ultra-short-term wind power prediction based on feature engineering and informer. Energy Rep. 2023, 9, 1236–1250. [Google Scholar] [CrossRef]
Ren, S.-Q.; Song, W. Feature extraction and prediction of multidimensional time series based on GGInformer model. Comput. Eng. Sci. 2024, 46, 590. [Google Scholar]
Wang, H.-K.; Song, K.; Cheng, Y. A hybrid forecasting model based on CNN and Informer for short-term wind power. Front. Energy Res. 2022, 9, 788320. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, J.; Wei, D.; Xia, Y. An improved temporal convolutional network with attention mechanism for photovoltaic generation forecasting. Eng. Appl. Artif. Intell. 2023, 123, 106273. [Google Scholar] [CrossRef]
Limouni, T.; Yaagoubi, R.; Bouziane, K.; Guissi, K.; Baali, E.H. Accurate one step and multistep forecasting of very short-term PV power using LSTM-TCN model. Renew. Energy 2023, 205, 1010–1024. [Google Scholar] [CrossRef]
Misra, D. Mish: A self regularized non-monotonic activation function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
Liang, W.; Chen, X.; Huang, S.; Xiong, G.; Yan, K.; Zhou, X. Federal learning edge network based sentiment analysis combating global COVID-19. Comput. Commun. 2023, 204, 33–42. [Google Scholar] [CrossRef] [PubMed]
Jiang, L.; Yu, S.; Wang, X.; Wang, C.; Wang, T. A new source-filter model audio bandwidth extension using high frequency perception feature for IoT communications. Concurr. Comput. Pract. Exp. 2020, 32, e4638. [Google Scholar] [CrossRef]
Zhao, J.; Nguyen, H.; Nguyen-Thoi, T.; Asteris, P.G.; Zhou, J. Improved Levenberg–Marquardt backpropagation neural network by particle swarm and whale optimization algorithms to predict the deflection of RC beams. Eng. Comput. 2022, 38 (Suppl. S5), 3847–3869. [Google Scholar] [CrossRef]
Jiang, W.; Lv, S.; Wang, Y.; Chen, J.; Liu, X.; Sun, Y. Computational experimental study on social organization behavior prediction problems. IEEE Trans. Comput. Soc. Syst. 2020, 8, 148–160. [Google Scholar] [CrossRef]
Jiang, W.; Ye, F.; Liu, W.; Liu, X.; Liang, G.; Xu, Y.; Tan, L. Research on Prediction Methods of Prevalence Perception under Information Exposure. Comput. Mater. Contin. 2020, 65, 3. [Google Scholar] [CrossRef]
Feng, Y.; Qin, Y.; Zhao, S. Correlation-split and Recombination-sort Interaction Networks for air quality forecasting. Appl. Soft Comput. 2023, 145, 110544. [Google Scholar] [CrossRef]

Figure 1. Extended convolutional network structure.

Figure 2. TCN residual structure unit.

Figure 3. Channel attention mechanism module structure.

Figure 4. Functional images of Mish and ReLU.

Figure 6. Schematic diagram of FAM.

Figure 7. TCN–FAM–Informer model structure.

Figure 8. Flow chart of TCN–FAM–Informer prediction.

Figure 9. Overall prediction results of the model.

Figure 10. Results of ablation experiment.

Figure 11. Results of comparative experiment.

Figure 12. Results of supplementary analysis experiment.

Table 1. Pearson correlation coefficient.

Factor	Pearson Correlation
Wind speed at height of 10 m	0.762343
Wind direction at height of 10 m	−0.269798
Wind speed at height of 30 m	0.766938
Wind direction at height of 30 m	−0.316535
Wind speed at height of 50 m	0.764895
Wind direction at height of 50 m	−0.342492
Wind speed at the height of wheel hub	0.767563
Wind direction at the height of wheel hub	−0.315980
Air temperature	−0.329322
Atmosphere	0.023260
Relative humidity	−0.211927

Table 2. Experimental parameter settings.

Parameter	Settings
Input dimension	5
Basic learning rate	0.0001
Batch size	128
Training rounds	50
Dropout coefficient	0.1
Fully connected network dimension	2048
Number of encoder blocks	2
Number of decoder blocks	1

Table 3. Results of various prediction methods in ablation experiment.

Model	MAE	RMSE	R²
TF–Informer	2.4325	3.0783	0.9873
T–Informer	2.7151	3.6384	0.9822
F–Informer	2.6773	3.5208	0.9815
Informer	2.8316	3.8993	0.9786

Table 4. Results of various prediction methods in comparative experiment.

Model	MAE	RMSE	R²
TF–Informer	2.4325	3.0783	0.9873
CNN-LSTM-A	4.2168	5.8746	0.9457
CNN-GRU-A	3.8953	5.2785	0.9526
TF–Transformer	2.7956	3.7285	0.9797
VMD–Informer	2.9154	4.0117	0.9779

Table 5. Comparison of training and inference efficiency across models.

Model	Training Time	Inference Time
TF–Informer	409 s	43 s
CNN-LSTM-A	373 s	47 s
CNN-GRU-A	357 s	45 s
TF–Transformer	551 s	51 s
VMD–Informer	698 s	67 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, S.; Tang, J.; Ye, L.; Liu, J.; Zhao, W. An Improved Wind Power Forecasting Model Considering Peak Fluctuations. Electronics 2025, 14, 3050. https://doi.org/10.3390/electronics14153050

AMA Style

Yang S, Tang J, Ye L, Liu J, Zhao W. An Improved Wind Power Forecasting Model Considering Peak Fluctuations. Electronics. 2025; 14(15):3050. https://doi.org/10.3390/electronics14153050

Chicago/Turabian Style

Yang, Shengjie, Jie Tang, Lun Ye, Jiangang Liu, and Wenjun Zhao. 2025. "An Improved Wind Power Forecasting Model Considering Peak Fluctuations" Electronics 14, no. 15: 3050. https://doi.org/10.3390/electronics14153050

APA Style

Yang, S., Tang, J., Ye, L., Liu, J., & Zhao, W. (2025). An Improved Wind Power Forecasting Model Considering Peak Fluctuations. Electronics, 14(15), 3050. https://doi.org/10.3390/electronics14153050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Wind Power Forecasting Model Considering Peak Fluctuations

Abstract

1. Introduction

2. Related Work

2.1. Time Convolutional Network

2.2. Channel Attention Mechanism

2.3. Mish Activation Function

2.4. Informer

3. Methodology

3.1. Analysis of Abnormal Event Characteristics

3.2. Frequency Domain Feature Analysis

3.3. Frequency Attention Mechanism

3.4. TCN–FAM–Informer

3.5. Model Prediction Process

4. Results

4.1. Data Preprocessing

4.2. Preparation Before the Experiment

4.3. Model Evaluation Indicators

4.4. Hyperparameter Optimization

4.5. Result and Analysis

4.5.1. Ablation Experiment

4.5.2. Comparative Experiment

4.5.3. Supplementary Analysis

5. Conclusions

5.1. Summary

5.2. Limitations and Future Directions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI