Ultra-Short-Term Wind Power Prediction Based on Fused Features and an Improved CNN

Li, Hui; Li, Siyao; Li, Hua; Bai, Liang

doi:10.3390/pr13072236

Open AccessArticle

Ultra-Short-Term Wind Power Prediction Based on Fused Features and an Improved CNN

¹

School of Electrical Engineering, Xi’an University of Technology, Xi’an 710048, China

²

Electric Power Research Institute of State Grid Shaanxi Electric Power Company, Xi’an 710100, China

³

College of Hydraulic and Hydropower Engineering, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(7), 2236; https://doi.org/10.3390/pr13072236

Submission received: 11 June 2025 / Revised: 7 July 2025 / Accepted: 11 July 2025 / Published: 13 July 2025

(This article belongs to the Section Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

It is difficult for a single feature in wind power data to fully reflect the multifactor coupling relationship with wind power, while the forecast model hyperparameters rely on empirical settings, which affects the prediction accuracy. In order to effectively predict the continuous power in the future time period, an ultra-short-term prediction model of wind power based on fused features and an improved convolutional neural network (CNN) is proposed. Firstly, the historical power data are decomposed using dynamic modal decomposition (DMD) to extract their modal features. Then, considering the influence of meteorological factors on power prediction, the historical meteorological data in the sample data are extracted using kernel principal component analysis (KPCA). Finally, the decomposed power modal and the extracted meteorological components are reconstructed into multivariate time-series features; the snow ablation optimisation algorithm (SAO) is used to optimise the convolutional neural network (CNN) for wind power prediction. The results show that the root-mean-square error of the prediction result is 31.9% lower than that of the undecomposed one after using DMD decomposition; for the prediction of the power of two different wind farms, the root-mean-square error of the improved CNN model is reduced by 39.8% and 30.6%, respectively, compared with that of the original model, which shows that the proposed model has a better prediction performance.

Keywords:

dynamic modal decomposition; snow ablation optimisation; convolutional neural network; wind power prediction

1. Introduction

In the context of global efforts to achieve carbon neutrality and peak carbon emissions, the development of sustainable and low-emission energy systems has gained widespread international attention. As a result, wind energy has seen a significant rise in its contribution to electricity generation networks [1]. Nevertheless, the extensive integration of wind-based generation poses considerable operational challenges for power grids [2]. Enhancing grid reliability and economic efficiency can be achieved through accurate wind power forecasting, which utilises historical weather patterns and generation records to minimise the destabilising effects of renewable energy integration [3]. Such predictive capabilities play a critical role in ensuring grid stability and optimising wind farm operations [4].

At this stage, wind power prediction faces two main challenges [5]: (1) Wind farm historical data contain a variety of features; these data are complex and redundant, interacting with each other, and if these data are directly input into the prediction model, it will result in a large prediction error and a degradation in the performance of the prediction model. (2) The adaptability and robustness of the traditional prediction model are not enough to fit the nonlinear relationship between wind power data and power. Wind power prediction can be divided into three categories according to the mathematical model established [6]: physical model, statistical model, and learning model. Among them, the physical model establishes physics equations to simulate the wind farm movement laws and combines with power curves to make predictions, which is complex to model and limited in the application of short-term prediction [7]. Statistical models establish the mapping relationship between historical data and power by analysing historical operation data, so as to predict future power. However, the above methods have a limited ability to deal with the deep features of wind power data, while the generalisation performance needs to be improved [8,9,10]. With the development of artificial intelligence, learning models based on deep learning methods have received wide attention in wind power prediction due to their stronger data mining ability and the ability to make learning corrections continuously [11].

To address the above two challenges, researchers have used various signal decomposition algorithms to analyse the relationship between signals in different frequency bands. Study [12] introduced a forecasting approach for wind power generation that combines empirical mode decomposition with radial basis function neural networks. The experimental findings demonstrated that utilising EMD for signal processing significantly enhances the model’s prediction performance. The prediction error RMSE of the traditional RBF neural network was 17.04, while that of the EMD-RBF neural network was 11.61. After EMD decomposition, the prediction error was reduced by 31.87%. Nevertheless, the empirical mode decomposition method suffers from inherent mode-mixing limitations. To address this issue, reference [13] employed variational mode decomposition for signal preprocessing before feeding the resulting subcomponents into an enhanced gated recurrent unit network for wind generation forecasting. The results showed that VMD can effectively avoid modal overlap issues, with an average error reduction of over 50% compared to single LSTM and single GRU prediction models, and a reduction of approximately 40% compared to conventional multi-dimensional VMD-GRU models. However, the lack of reasonable evaluation criteria to guide the parameter setting hinders the application of VMD in power prediction due to the large impact of VMD preset parameters on the decomposition performance [14]. In addition, applying various deep learning algorithms to wind power prediction is a research hotspot and direction. Study [15] used Long Short-Term Memory (LSTM) to address the inherent issues of neural networks, such as getting stuck in local minima and gradient vanishing, achieving an accuracy of 99.63% and reducing the prediction error RMSE by 26.17% compared to traditional Backpropagation (BP) models. Study [16] used gated recurrent unit (GRU) to effectively reduce the number of parameters and computation by controlling the flow of information and state update. The prediction interval coverage (PICP) was improved to 96.40%, outperforming traditional models (SVM: 94.72%, KELM: 96.11%, ANN: 95.67%). The interval prediction width was reduced by 58.5% to 73.9%. Convolutional neural networks (CNNs) have received increasing attention since the AlexNet model won the image classification competition in 2012 [17]. Study [18] implemented a deep learning framework utilising convolutional neural networks to forecast wind energy generation output, reducing prediction errors by 2% to 4% compared to traditional models in terms of prediction accuracy. But wind power is affected by a number of factors together, so the model inputs are multivariate time series data. Traditional convolutional neural networks can handle high-dimensional data well, but the parameter adjustment is slow.

To address the shortcomings of the existing methods in feature extraction and model parameter optimisation, and to provide a more efficient solution for ultra-short-term wind power forecasting, this study proposes a wind power forecasting method based on feature fusion and an improved convolutional neural network. First, kernel principal component analysis (KPCA) is used to analyse historical meteorological datasets, extracting effective meteorological kernel principal components. Then, dynamic mode decomposition (DMD) is employed to extract modal features from historical power data. The meteorological kernel principal components are fused with the modal features of power to form a new sample dataset. This method combines meteorological factors influencing power output, addressing the limitation of single features in comprehensively reflecting the multifactor coupling relationships in wind power. Subsequently, the snowmelt optimisation algorithm (SAO) is used to optimise the CNN hyperparameters, overcoming the limitations of traditional methods that rely on empirically defined parameters. Additionally, a self-attention mechanism is introduced to enhance the global modelling capabilities of CNNs, significantly improving prediction accuracy.

2. Fundamental Theory

2.1. Kernel Principal Component Analysis

As an enhanced variant of conventional principal component analysis, kernel PCA (KPCA) effectively addresses nonlinear feature extraction challenges through kernel-based transformation. Its idea is to use a nonlinear mapping function to project the samples of the original data into a high-dimensional feature space, which is then analysed by PCA, and, finally, the selection of features of the data is achieved by transforming the dot-product operation into a kernel calculation in the original space [19,20]. The main operation formulas are as follows:

M λ α = \tilde{K} α, α = {[α_{1}, α_{2}, \dots, α_{M}]}^{T}

(1)

where M represents the number of samples and

\tilde{K}

represents the centred kernel matrix.

After the conversion of the above formulas, it is possible to extract the principal components using the general PCA method, after which the projection of a data point on the eigenvectors is calculated to obtain the kernel principal components of the point.

2.2. Dynamic Modal Decomposition

Peter Schmid proposed dynamic mode decomposition (DMD) in 2008, and since then, it has been widely used in the study of various nonlinear mechanical systems, such as the construction and analysis of hydrodynamic systems. Because of its superior data mining capability, it has also been applied to data degradation and time series prediction in recent years. It is based on the idea of fitting a nonlinear problem by a system of multiple linear equations [21,22,23].

Consider a set of observations containing m data points. In this paper, we use the Hankel matrix to transform the observations into a higher dimensional space, which is constructed as follows:

H = (\begin{matrix} y (t_{1}) & \dots & y (t_{m - d}) \\ ⋮ & ⋱ & ⋮ \\ y (t_{d}) & \dots & y (t_{m}) \end{matrix})

(2)

where m represents the number of power points and d represents the dimension of the delay embedding, where time-shifted copies of the scalar time series y are stacked on top of each other to form the Hankel matrix H.

A snapshot of the data

X = [x_{1}, x_{2}, \dots, x_{m}]

is constructed, and the purpose of DMD is to extract important dynamic information from these data. By constructing a sliding matrix, the following two matrices can be defined from X:

X_{1} = [x_{1}, x_{2}, \dots, x_{m - 1}]

(3)

X_{2} = [x_{2}, x_{3}, \dots, x_{m}]

(4)

The Koopman operator idea is that there always exists a matrix A such that the next moment state can be represented by the previous moment state, that is,

X_{i + 1} = A X_{i}

. Thus, the relationship between X₁ and X₂ can be expressed as:

X_{2} = [x_{2}, x_{3}, \dots, x_{m}] = [A x_{1}, A x_{2}, \dots, A x_{m - 1}] = A X_{1}

(5)

x_m in X₂ can be represented by a weighted linear combination of the x_i column vectors of X₁:

x_{m} = \sum_{i = 1}^{m - 1} a_{i} x_{i} + r

(6)

where a represents the weight vector and r represents the residual vector. Then, Equation (5) can be expressed as:

X_{2} = X_{1} S + r e_{m - 1}^{T}

(7)

where

e_{m - 1}^{T}

represents the unit vector, which is used to find the optimal eigenvalues and eigenvectors by minimising the residual matrix

r e_{m - 1}^{T}

, so that the system prediction is the closest to the target; S represents the low-dimensional approximation matrix, whose eigenvalues can be approximated to replace the eigenvalues of A.

S = [\begin{matrix} 0 & \dots & 0 & a_{1} \\ 1 & \dots & 0 & a_{2} \\ ⋮ & ⋮ & ⋮ \\ 1 & \dots & 1 & a_{m - 1} \end{matrix}]

(8)

a can be expressed by Equation (9):

a = {(X_{1})}^{- 1} x_{m}

(9)

Since the matrix S may have pathological properties that make it difficult to solve directly, it is common to solve its similarity matrix

\tilde{S}

instead, which is achieved by applying Singular Value Decomposition (SVD) to the data matrix X₁ to obtain X₂:

X_{2} = U \sum V^{H} S

(10)

S = V \sum^{- 1} U^{H} X_{2}

(11)

\tilde{S} = U^{H} X_{2} V \sum^{- 1} = U^{H} A U = \tilde{A}

(12)

The eigenvalue

λ_{k}

of the similarity matrix

\tilde{S}

contains the dynamic characteristics of the system state evolution, and its numerical size reflects the change rule of the system within the time step

Δ t

. The eigenvalue L of the similarity matrix s contains the dynamic characteristics of the system state evolution, specifically expressed as:

{\tilde{S}}_{y_{k}} = λ_{k} y_{k}, K = 1, 2, \dots, k

(13)

and the dynamic modes of the system can be characterised by the Kth eigenvector

ϕ_{k}

of the matrix A. Based on the equivalence property of similar matrices, the following relationship exists at this point:

ϕ_{k} = U y_{k}

(14)

A complete reconstruction of the system state is possible based on the eigenvectors

ϕ_{k}

and eigenvalues

λ_{k}

. The cross-section data of the system at any moment can be obtained by the eigenvalue transformation

ω_{k} = l n (λ_{k}) / Δ t

:

X_{D M D} (t) = \sum_{k = 1}^{k} b_{k} (0) ϕ_{k} (x) e^{j ω t} = ϕ d i a g (e^{ω t}) b

(15)

where

b_{k} (0)

represents the modal initial amplitude;

ϕ

represents the iterative matrix consisting of eigenvectors

ϕ_{k}

;

d i a g (e^{ω t})

represents the diagonal matrix with diagonal elements

e^{ω t}

; and

b

represents the vector consisting of

b_{k} (0)

.

The DMD responds to a nonlinear dynamic situation with low-dimensional dynamic features, and its dynamic modes reflect the state of that system at any given moment.

2.3. Snow Ablation Optimisation Algorithm

The snow ablation optimiser (SAO) is a new physically based snow ablation optimiser (SAO) algorithm developed to simulate the sublimation and melting behaviour of snow inspired by the sublimation and melting behaviour of snow in nature [24,25]. Its algorithmic process is divided into the following four steps:

Initialisation phase: In the initialisation phase, the population is initialised, and a random batch of particles is generated and divided equally into two subpopulations:

Z = L + θ \times (U - L)

(16)

Exploration phase: The first subpopulation performs the positional update of the exploration phase, i.e., the sublimation process in which water molecules show a highly dispersed character, and this stochastic process is modelled with Brownian motion because of the irregularity of the motion. Brownian motion can be used to search for potential optimal solutions in the space using dynamics and a uniform step size. At this time, the number of individuals in the population with Na is randomly selected to form the subpopulation Z(t), and the position of this individual is updated as follows:

Z_{i} (t + 1) = E l i t e (t) + B M_{i} (t) \otimes (θ_{1} \times (G (t) - Z_{i} (t)) + (1 - θ_{1}) \times (\bar{Z} (t) - Z_{i} (t)))

(17)

\bar{Z} (t) = \frac{1}{N} \sum_{i = 1}^{N} Z_{i} (t)

(18)

E l i t e (t) \in [G (t), Z_{s e c o n d} (t), Z_{t h i r d} (t), Z_{c} (t)]

(19)

Z_{c} (t) = \frac{1}{N_{1}} \sum_{i = 1}^{N_{1}} Z_{i} (t)

(20)

where

E l i t e (t)

represents the elite individuals, which are randomly selected from the set

\{G (t), Z_{s e c o n d} (t), Z_{t h i r d} (t), Z_{c} (t)\}

;

G (t), Z_{s e c o n d} (t), Z_{t h i r d} (t)

represents the best, second-best, and third-best individuals in the current Z(t);

Z_{c} (t)

represents the individuals at the centre of mass of the top 50% of individuals with large fitness values;

B M_{i} (t)

is a vector of random numbers based on a Gaussian distribution, denoting Brownian motion;

\bar{Z} (t)

denotes the individual at the position of the centre of mass in Z(t); and

θ_{1}

denotes the random number located at (0, 1).

Exploitation phase: The second subpopulation performs the positional update of the exploitation phase, i.e., the water molecules no longer exhibit highly dispersed characteristics during the melting process but rather explore around the current local optimal solution. The classical ‘degree-day method’ is used as one of the models for the snow melting algorithm. At this stage, individuals are more likely to explore potential optimal regions based on the centre of mass of the population and the local optimal solution. In the development phase, the remaining individuals in the population are reconstituted into a subpopulation Z(t), which is updated by the following equation:

Z_{i} (t + 1) = M \times G (t) + B M_{i} (t) \otimes (θ_{2} \times (G (t) - Z_{i} (t)) + (1 - θ_{2}) \times (\bar{Z} (t) - Z_{i} (t)))

(21)

M = (0.35 + 0.25 \times \frac{e^{\frac{t}{t_{\max}}} - 1}{e - 1}) \times T (t), T (t) = e^{\frac{- t}{t_{\max}}}

(22)

where M represents the snowmelt rate and

θ_{2}

represents random numbers located in (−1, 1) used as inter-individual communication.

Dual-population mechanism: As the number of iterations increases, the number of subpopulations Na of the sublimation process gradually increases, increasing the weight of exploration, while the number of subpopulations Nb of the melting process gradually decreases, avoiding over-localised exploitation as a way to achieve the search for the globally optimal solution. When the number of populations is greater than zero and the number of iterations is not up to the maximum number of iterations, the number of populations is expressed as:

\{\begin{array}{l} N_{a} = N_{a} + 1 \\ N_{b} = N_{b} - 1 \end{array}

(23)

In meta-heuristic algorithms, it is important to balance the tension between global search and local exploitation, where steam can come from snow sublimation or from snow first melting into water and then from water vaporisation. Over time, the algorithm gradually shifts from an irregular motion with highly dispersive characteristics at the beginning to a deeper exploration into the solution space. The purpose of the two-population mechanism is to reflect this optimisation-seeking strategy while guaranteeing exploration and deepening capabilities. The code for dual-population mechanism is Algorithm 1. The pseudocode for SAO algorithm is Algorithm 2.

Algorithm 1: Dual-population mechanism

1: Initialisation: t = 0, t_max, Na = Nb = N/2, where N denotes the population size
2: while (t < t_max) do
3: if Na < N then
4: Na = Na + 1, Nb = Nb − 1
5: end if
6: t = t + 1
7: end while

Algorithm 2: Snow ablation optimiser (SAO)

1: Initialisation: the swarm Z_i (i = 1,2,…, N), t = 0, t_max, Na = Nb = N/2
2: Fitness evaluation
3: Record the current best individual G(t)
4: while (t < t_max) do
5: Calculate the snowmelt rate M through Equation (10)
6: Randomly divide the whole population P into two subpopulations Pa and Pb
7: for each individual do
8: Update each individual’s position through Equation (12)
9: end for
10: Fitness evaluation
11: Update G(t)
12: t = t + 1
13: end while
14: Return G(t)

2.4. Convolutional Neural Network

The fundamental structure of convolutional neural networks comprises five primary components: input processing units, feature extraction layers (convolutional blocks), dimensionality reduction modules (pooling operations), classification networks (fully connected layers), and final output nodes. These architectures leverage three key principles—(1) localised receptive fields for spatial feature detection, (2) parameter sharing mechanisms for computational efficiency, and (3) subsampling techniques for hierarchical abstraction—collectively enabling parameter optimisation, accelerated training convergence, and reduced model complexity. These advancements have contributed substantially to improved forecasting capabilities in renewable energy systems, particularly for wind generation prediction tasks in recent research.

The convolutional layer is the most important part of a CNN, and its inputs and outputs are connected by weights and biases [26]. The input–output correspondence of the convolutional layer is as shown in Equation (24):

Y = f (\sum_{i = 1}^{n} (X \otimes W) + B)

(24)

where X represents the input; f represents the excitation function; W represents the convolution kernel;

\otimes

represents the convolution operation; and B represents the output bias.

The feature map is pooled after the convolution operation so that it takes the mean or maximum value in a certain range, and pooling can effectively reduce the model parameters to avoid overfitting.

The fully connected layer is located behind the convolution and pooling layers, and its function is to integrate the characteristics of the convolution and pooling layers.

y_{k} = f (ω^{k} x^{k - 1} + b^{k})

(25)

where y_k is the output; k represents the kth fully connected layer;

ω^{k}

represents the connection weights;

x^{k - 1}

is the unfolded one-dimensional graph; and b^k is the bias.

2.5. Improving Convolutional Neural Networks

In this study, the improved CNN-based wind power prediction model was built in MATLAB R2023a, and the overall process can be divided into three parts:

(1) Input layer.

In the input layer, the dimension of the data input is defined, and the features are represented as an ‘image’ with a certain length and a height and width of 1. In essence, the sequence data are processed by a 1 × 1 convolution kernel; then, a sequence folding layer is established to convert the data into a pseudo two-dimensional form that is suitable for CNN processing, so as to facilitate subsequent 2D convolution operations.

(2) Convolution module.

This module contains a convolutional layer, a batch normalisation layer, an activation function layer, a Dropout layer, and a maximum pooling layer.

As shown in Figure 1, the convolutional layer introduces a hierarchical optimisation architecture for optimisation of the network parameters. SAO is used as an external optimiser in the optimisation phase of the model structural parameters with the optimisation objectives of convolutional kernel size, number, and learning rate, which is run prior to the training of the model program to determine the hyperparameters of the network. Adam is specified as the optimisation algorithm in trainingOptions. Adam determines the initial learning rate through the optimisation algorithm and applies it to the network weights training module, whose optimisation objective is the updating of neuron weights and biases, and which can be optimised by back propagation for the specific weight values of the convolution kernel and the parameters of the fully connected layer. The synergy between the two facilitates both jumping out of the local optimum more efficiently than traditional grid search and successive parameter optimisation for accelerated convergence of the adaptive learning rate.

A batch normalisation layer is added after the convolutional layer to accelerate training and stabilise the training gradient. The activation function ReLU is connected afterwards to optimise the training efficiency and construct a nonlinear mapping relationship between the input features and the output target. The nonlinearly transformed data are added to the Dropout layer before the maximum pooling unit, and the Dropout regularisation temporarily discards the neuron’s output in the CNN at random with a 10% probability (p = 0.1) to prevent the neuron from relying too much on local features. The maximum pooling window is (2, 1) with a step size of 2, placing the lightweight regularity before pooling to retain more valid features for downsampling while avoiding the amplification of noise by pooling.

This module gradually reduces the sequence length while extracting local temporal features to enhance the model’s robustness.

(3) Output module.

This module contains sequence unfolding, the flattening process, a fully connected layer, a self-attention layer, and an output layer. Sequence unfolding and the flattening process restores the collapsed 2D sequence to its original structure and flattens the multi-dimensional features into vectors, after transitioning to the fully connected layer to compress the feature dimensions, the self-attention mechanism is introduced to capture the global dependencies of sequences, enhance the global interactions, and improve the model’s ability of modelling the long time-series relationship to make up for the insufficiency of the CNN’s local sensory field. Finally, the fully connected layer maps the results to the target dimension, and the regression layer outputs the prediction results and calculates the loss.

3. Prediction Model Based on Fused Features and an Improved CNN

3.1. Predictive Modelling Overall Process

Based on the theoretical foundation of the above algorithms, this paper uses the KPCA and DMD algorithms to preprocess the data, and then uses the SAO algorithm to automatically find the optimal hyperparameters of the CNN network model. The overall flow of the prediction model proposed in this paper is shown in Figure 2.

The specific steps of the process shown in the figure are as follows:

(1): The historical wind power output data and related meteorological data are subjected to outlier cleaning, missing value interpolation, and normalisation.
(2): The pre-processed dataset is divided into a historical meteorological dataset and a historical power dataset. The data in the historical meteorological dataset are downscaled by KPCA to select the kernel principal components that can recognise the meteorological features; the data in the historical power dataset are decomposed by DMD to extract the main modes that can characterise the power features. The feature factors extracted by the two methods that can affect the prediction results are fused to obtain the fusion feature information.
(3): The above fusion features are used as inputs and processed using the CNN to extract the feature factors and nonlinear relationships in the data and make predictions. The snow ablation algorithm SAO is introduced into the weight updating process of the CNN model to improve the learning ability and prediction accuracy of the model. Meanwhile, the self-attention mechanism is accessed before the fully connected layer of the CNN, which calculates the attention weights between the sequence elements after convolutional processing and outputs the prediction results through the fully connected layer after weighted summation.
(4): The predicted values of the training set are compared with the actual values, and the loss function is calculated. The model parameters are updated using the optimisation algorithm to reduce the loss function value.
(5): Steps three and four are repeated until the model converges. The optimal model parameters are finally obtained and saved.
(6): The selected optimal neural network architecture is employed to generate forecasts for the test dataset, followed by a comprehensive evaluation and analytical assessment of model performance metrics.

3.2. Indicators for the Evaluation of Predictive Models

A performance assessment of the proposed model was conducted through four quantitative metrics: determination coefficient (R²), mean absolute error (MAE), root-mean-square error (RMSE), and mean absolute percentage error (MAPE). The R² metric reflects model fitting quality, with values approaching 1 representing superior regression performance. Conversely, lower values of the MAE, RMSE, and MAPE indicate better predictive accuracy [27]. The computational formulas for these evaluation criteria are presented below:

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - {\bar{y}}_{i})}^{2}}

(26)

MAE = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}|

(27)

RMSE = \sqrt{\frac{1}{N} {\sum_{i = 1}^{N} (y_{i} - {\hat{y}}_{i})}^{2}}

(28)

MAPE = \frac{1}{N} \sum_{i = 1}^{N} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}| \times 100 %

(29)

4. Case Study Analysis

4.1. Data Source

This research utilised operational data collected from a 200 MW wind farm located in Hami, Xinjiang, comprising both power generation records and meteorological measurements. The dataset spans the entire month of January 2019, with 15 min resolution measurements yielding 2976 temporal samples. The facility consists of 133 wind turbines, each rated at 1.5 MW capacity.

The accompanying meteorological observations include eleven distinct parameters: wind velocity and direction recorded at four different heights (10 m, 30 m, 50 m, and 70 m) from the anemometer tower, along with atmospheric pressure, ambient temperature, and relative humidity measurements.

4.2. KPCA Dimensionality Reduction

Wind power prediction has many influencing factors. In addition to the historical power data, which have the most influence on it, the meteorological data measured at wind farms will also have an influence on the power prediction, and because of the correlation between the meteorological data, they need to be downscaled in order to remove the redundant information. Using MATLAB R2023a, KPCA downscaling was performed on the meteorological data, and the kernel parameter was set to c = 20,000. Eleven meteorological features were fused, the kernel principal components with a cumulative contribution rate of more than 85% were selected, and, finally, four kernel principal components were extracted, which are noted as y₁, y₂, y₃, and y₄. The downscaled data are shown in Table 1, the cumulative percentage of the contribution of features is shown in Table 2.

From Table 2, it can be seen that the first four feature vectors constitute the kernel principal components that can reflect the meteorological information. The aggregated explanatory power of the derived feature components achieved 92.535% of the total variance representation, indicating that the four extracted kernel principal components can reflect the vast majority of the information of the 11 features.

4.3. DMD Decomposed Power Data

The Hankel matrix obtained from the construction of the historical power data was used to extract a snapshot of the data using a sliding time window with a sliding step of 1.The sample data construction is shown in Figure 3.

The constructed data snapshots were modally decomposed to obtain the dynamic modes of the system. The order of decomposition corresponds to the amount of information contained in the data, and the first feature decomposed by the system is also called the dominant feature. The ten modes after decomposition are shown in Figure 4. The calculated energy share of each modal signal is shown in Figure 5.

As can be seen from Figure 5, the first three orders of modal signals accumulated in the 10 modal signals decomposed by the DMD account for a large proportion of the original signal energy, reaching 85.83%.

4.4. Experimental Comparison of Predictive Model Performance

To systematically evaluate the effectiveness of the proposed model, multiple performance metrics were used to compare the model with benchmark prediction methods. The experimental design employed a control variable method to assess the contribution of each module to the overall system performance.

During model development and validation, the dataset was divided into a training set (80%) and a test set (20%). A rolling multi-step prediction method was used to generate wind power predictions for the next four hours, thereby enabling multi-step performance evaluation.

4.4.1. Prediction Under Different Decomposition Algorithms

To assess the effectiveness of the proposed decomposition method for power data preprocessing, a comparative analysis was conducted against alternative decomposition techniques. The evaluation employed a standard CNN architecture as the baseline prediction framework. As shown in Figure 6 and Table 3, the reported error metrics represent averaged values from five independent experimental runs.

The results show that compared with the undecomposed method, the prediction errors of the models under the four decomposed methods are reduced, which verifies the effectiveness of the signal decomposition in improving the prediction results. Among them, the DMD model outperforms the other models in all the indexes, the prediction result curve is closer to the actual power fluctuation, and the RMSE is reduced by 31.9% compared with the undecomposed model, which indicates that the DMD decomposition algorithm can effectively improve the prediction accuracy.

4.4.2. Prediction Under Different Master Forecasting Models

In order to verify the influence of the main prediction model on the prediction performance, RNN, LSTM, GRU and CNN were selected for comparison. The prediction results are shown in Figure 7, and Table 4 shows the prediction error metrics of the different models. The results are also averaged after five trials. The results show that the CNN model performs the best with the highest R2 (0.98909), indicating the strongest fitting ability. The MAE (4.7742) and the RMSE (7.9954) are the lowest, indicating the lowest prediction error and the best stability. Although the MAPE (0.3331) is not optimal, it is still within the acceptable range and is still the best choice when combined with the other indicators.

4.4.3. Prediction Under Different Optimisation Algorithms

This study proposes an enhanced prediction framework integrating dynamic mode decomposition (DMD) with convolutional neural networks, augmented by several technical innovations. First, to overcome the challenges of manual hyperparameter optimisation in CNN architectures, the SAO algorithm automatically determines optimal configurations for (1) convolution kernel dimensions (ranging 2–6), (2) filter quantities (8–128 units), and (3) learning rates (10⁻⁴ to 10⁻²). Second, a self-attention module is incorporated to improve temporal dependency modelling in extended sequence forecasting. Finally, a dropout layer (with probability 0.1) is implemented to enhance the model’s generalisation capability by mitigating overfitting risks. The experiments show that the SAO-CNN optimisation yields the best parameters: the number of convolutional kernels is 32, the size is 6, and the learning rate is 0.00381847. Figure 8 demonstrates the fitness curves of the SAO optimisation process. Figure 9 and Table 5 show the prediction results and error comparison under different optimisation algorithms; the results are averaged after five trials.

The experimental results demonstrate significant improvements in model performance through parameter optimisation. A comparative analysis reveals that the SAO algorithm outperforms alternative optimisation methods in identifying optimal CNN configurations. Specifically, the SAO-optimised CNN achieves remarkable predictive performance, exhibiting (1) the highest correspondence with actual power measurements (R² = 99.358%), (2) a 39.8% reduction in the RMSE compared to the baseline CNN architecture, and (3) minimal deviation between predicted and observed power curves. These findings collectively indicate that the SAO-based optimisation approach substantially enhances forecasting precision.

4.4.4. Projections for Different Wind Farms

In order to show that the proposed prediction model has good generalisation, data from a wind farm in Inner Mongolia were used for prediction. The data processing, as well as meteorological factor extraction and historical power data decomposition, were the same as the aforementioned experimental methods. The input matrix was constructed and input into the prediction model. The comparison shows that the proposed prediction model is 2.7% more accurate than the basic CNN model, and the prediction error RMSE is 30.6% lower compared to the basic CNN model. The CNN is reduced by 30.6%, which has good prediction performance as well as generalisation ability. The specific results are shown in Figure 10 and Table 6. The results are also averaged over five trials.

5. Conclusions

This research introduces a novel wind power forecasting system that integrates multi-source feature fusion with optimised CNN components to improve temporal prediction performance. The case study analysis reveals three principal conclusions:

(1) Decomposing the actual power data by DMD can effectively retain the information of the original power sequence and improve the accuracy of prediction.

(2) Optimising the CNN parameters by SAO can adaptively determine the optimal network parameter combination. Compared with the commonly used parameter setting methods, it overcomes the randomness of the empirical settings and has higher accuracy when applied to prediction.

(3) The constructed improved CNN prediction model, thanks to the fusion feature’s ability to retain data information and the CNN’s stronger data mining ability, improves the fitting ability of the model to a certain extent, reduces the error when predicting, and demonstrates better prediction performance. It also illustrates the good generalisation performance of the model by predicting the power of different wind farms.

The predictive capability of wind power forecasting models is fundamentally dependent on both the quality of operational data from wind farms and the configuration of model parameters. Therefore, future research should focus on how to deeply mine the original data information and how to optimise the prediction model performance to improve stability and generalisability and better support the online prediction and application of wind power. In response to some of the issues identified in this study, we will conduct further research, specifically including the following:

(1) Conducting additional ablation experiments on the key components of the proposed model, removing the KPCA module, DMD module, SAO module, and self-attention mechanism module one by one to calculate the contribution of each module to the prediction results.

(2) Preprocessing the annual data (a total of 35,040 sampling points) from the two wind farms and extracting merged features. Seasonal predictions will be made for the entire year to validate the applicability of the proposed model under different seasonal conditions. Extreme conditions will be incorporated into the experiments to test model performance. Further model improvements will be made based on the characteristics of different seasons.

(3) Conducting a more detailed analysis of computational efficiency, including training time and resource usage. The model’s performance will be tested in real-world scenarios by integrating the prediction model into the wind farm’s SCADA system via a program, enabling real-time data reception and online predictions, and evaluating the prediction performance.

Author Contributions

Conceptualisation, H.L. (Hui Li) and H.L. (Hua Li); methodology, S.L. and L.B.; software, H.L. (Hui Li) and S.L.; validation, S.L. and L.B.; formal analysis, H.L. (Hui Li) and S.L.; investigation, S.L. and H.L. (Hua Li); data curation, H.L. (Hua Li) and L.B.; writing—original draft preparation, S.L.; writing—review and editing, H.L. (Hui Li). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors are very grateful to the reviewers, associate editors, and editors for their valuable comments and time spent.

Conflicts of Interest

Authors Hua Li is employed by the State Grid Shaanxi Electric Power Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xiao, Y.; Zou, C.; Chi, H.; Fang, R. Boosted GRU model for short-term forecasting of wind power with feature-weighted principal component analysis. Energy 2023, 267, 126503. [Google Scholar] [CrossRef]
Rayi, V.K.; Mishra, S.P.; Naik, J.; Dash, P.K. Adaptive VMD based optimized deep learning mixed kernel ELM autoencoder for single and multistep wind power forecasting. Energy 2022, 244, 122585. [Google Scholar] [CrossRef]
Huang, L.L.; Li, S.; Fu, Y.; Wang, Z.S. Ultra-short term offshore wind power prediction based on condition-assessment of wind turbines. Acta Energiae Solaris Sin. 2022, 43, 391–398. [Google Scholar]
Wang, T.T.; Miao, S.H.; Yao, F.X.; Liu, Z.W.; Zhang, S.Y. Day-ahead and Intra-day Joint Dispatch Strategy of High Proportion Wind Power System Considering Dynamic Frequency Response Constraints. Proc. CSEE 2024, 44, 2590–2604. [Google Scholar] [CrossRef]
Zhang, Y.N.; Mori, G. Review on Power Prediction Technology of Wind Power Generation System. Electr. Eng. 2023, 14–17. [Google Scholar] [CrossRef]
Sun, R.F.; Zhang, T.; He, Q.; Xu, H.X. Review on key technologies and applications in wind power forecasting. High Volt. Eng. 2021, 47, 1129–1143. [Google Scholar]
Wu, Y.H.; Wang, Y.S.; Xu, H.; Chen, Z.; Zhang, Z.; Guan, S.J. Survey of wind power output power forecasting technology. J. Front. Comput. Sci. Technol. 2022, 16, 2653–2677. [Google Scholar]
Wang, C.X.; Lu, Z.X.; Qiao, Y.; Min, Y.; Zhou, S.X. Short-term wind power prediction based on nonparametric regression model. Autom. Electr. Power Syst. 2010, 34, 78–82+91. [Google Scholar]
Singh, S.N.; Mohapatra, A. Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew. Energy 2019, 136, 758–768. [Google Scholar]
Zhou, W.; Zhong, J.C.; Sun, H.; Li, G.F.; Kong, J.H.; Zhang, F.H. Rolling estimation of intra-day wind power prediction error intervals based on hidden markov model. Autom. Electr. Power Syst. 2018, 42, 90–95+184. [Google Scholar]
Niu, Z.W.; Yu, Z.Y.; Li, B.; Tang, W.H. Short-term wind power prediction model based on deep gated recurrent unit neural network. Electr. Power Autom. Equip. 2018, 38, 36–42. [Google Scholar]
Wang, J.X.; Deng, B.; Wang, J. Short-term wind power prediction based on empirical mode decomposition and RBF neural network. Proc. CSU-EPSA 2020, 32, 109–115. [Google Scholar]
Sheng, S.Q.; Jin, H.; Liu, C.R. Short-to medium-term and short-term prediction of wind farm power generation based on VMD-WSGRU. Power Syst. Technol. 2022, 46, 897–904. [Google Scholar]
Duan, J.; Wang, P.; Ma, W.; Tian, X.; Fang, S.; Cheng, Y.; Chang, Y.; Liu, H. Short-term wind power forecasting using the hybrid model of improved variational mode decomposition and Correntropy Long Short-term memory neural network. Energy 2021, 214, 118980. [Google Scholar] [CrossRef]
Zhang, Y.Q.; Cheng, Q.Z.; Jiang, W.J.; Liu, X.; Shen, L.; Chen, Z.H. Photovoltaic power prediction model based on EMD-PCA-LSTM. Acta Energiae Solaris Sin. 2021, 42, 62–69. [Google Scholar]
Li, C.; Tang, G.; Xue, X.; Saeed, A.; Hu, X. Short-term wind speed interval prediction based on ensemble GRU model. IEEE Trans. Sustain. Energy 2019, 11, 1370–1380. [Google Scholar] [CrossRef]
Priyanga, P.S.; Krithivasan, K.; Pravinraj, S.; Shankar Sriram, V.S. Detection of cyberattacks in industrial control systems using enhanced principal component analysis and hypergraph-based convolution neural network (EPCA-HG-CNN). IEEE Trans. Ind. Appl. 2020, 56, 4394–4404. [Google Scholar]
Peng, X.; Li, Y.; Dong, L.; Cheng, K.; Wang, H.; Xu, Q. Short-term wind power prediction based on wavelet feature arrangement and convolutional neural networks deep learning. IEEE Trans. Ind. Appl. 2021, 57, 6375–6384. [Google Scholar] [CrossRef]
Tanf, W.Q.; Xu, W.; Wen, C.; Guo, X. Transient stability assessment based on kernel principal component analysis and deep belief network. Electr. Mach. Control. Appl. 2021, 48, 46–52. [Google Scholar]
Duan, Q.; Zhao, J.G.; Ma, Y. Feature extraction of transient stability assessment model based on optimized KPCA. Control Decis. 2010, 25, 1403–1407. [Google Scholar]
Schmid, P.J. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 2010, 656, 5–28. [Google Scholar] [CrossRef]
Cui, L.X.; Long, W. Trading strategy based on dynamic mode decomposition: Tested in Chinese stock market. Phys. A Stat. Mech. Appl. 2016, 461, 498–508. [Google Scholar] [CrossRef]
Peng, H. Several Application Research Based on Dynamic Mode Decomposition. Master’s Thesis, Zhengzhou University, Zhengzhou, China, 2022. [Google Scholar]
Ma, Y.G.; Tan, C. Early warning method for gearbox oil temperature based on snow melting algorithm optimized long short-term memory network. Electr. Power Sci. Eng. 2024, 40, 51–59. [Google Scholar]
Deng, L.Y.; Liu, S.Y. Snow ablation optimizer: A novel metaheuristic technique for numerical optimization and engineering design. Expert Syst. Appl. 2023, 225, 120069. [Google Scholar] [CrossRef]
Yildiz, C.; Acikgoz, H.; Korkmaz, D.; Budak, U. An improved residual-based convolutional neural network for very short-term wind power forecasting. Energy Convers. Manag. 2021, 228, 113731. [Google Scholar] [CrossRef]
Ren, D.F.; Ma, J.Q.; He, Z.Q.; Wu, Q.M. Research on ultra-short-term wind power forecasting based on AVMD-CNN-GRU-Attention. Acta Energiae Solaris Sin. 2024, 45, 436–443. [Google Scholar]

Figure 1. Convolutional layered optimisation architecture.

Figure 2. Flowchart of the wind power prediction model based on an improved CNN.

Figure 3. Schematic of a sliding window to build a sample.

Figure 4. DMD decomposition of modal diagrams of each order.

Figure 5. Energy diagram for each modal signal.

Figure 6. Prediction results under different decomposition algorithms.

Figure 7. Prediction results under different master prediction models.

Figure 8. SAO-CNN convergence curve.

Figure 9. Prediction results under different optimisation algorithms.

Figure 10. Prediction results under different wind farms.

Table 1. Data after KPCA downscaling.

Serial Number	y₁	y₂	y₃	y₄
1	0.0183	0.0041	−0.0001	0.0045
2	0.0208	0.0039	0.0001	0.0045
…	…	…	…	…
2975	0.0167	0.0066	0.0025	−0.0083
2976	0.0173	0.0069	0.0024	−0.0090

Table 2. Cumulative share of contribution.

Serial Number	Contribution Rate/%	Cumulative Contribution Rate/%
1	50.087	50.087
2	20.540	70.627
3	13.441	84.068
4	8.467	92.535

Table 3. Comparison of prediction errors under different decomposition algorithms.

Decomposition Methods	R²	MAE	RMSE	MAPE
Undecomposed	0.98269	7.1662	12.0453	1.4932
EMD	0.98136	6.5374	10.4237	0.73186
EEMD	0.98502	5.7372	9.3437	0.32642
VMD	0.98689	4.9857	8.7657	0.58654
DMD	0.98853	4.5206	8.2011	0.30684

Table 4. Comparison of prediction errors under different master prediction models.

Predictive Model	R²	MAE	RMSE	MAPE
RNN	0.9832	6.3264	9.8943	0.27972
LSTM	0.9837	6.074	9.7468	0.24307
GRU	0.98645	5.3342	8.8856	0.3884
CNN	0.98909	4.7742	7.9954	0.3331

Table 5. Comparison of prediction errors under different optimisation algorithms.

Predictive Model	R²	MAE	RMSE	MAPE
Basic CNN	0.98853	4.773	8.0403	1.0009
PSO-CNN	0.98886	4.4465	7.0984	2.91508
WOA-CNN	0.98831	3.2712	6.5326	0.9208
GWO-CNN	0.9895	3.9397	5.2991	1.4581
SAO-CNN	0.99358	2.747	4.8388	0.36499

Table 6. Comparison of prediction errors under different optimisation algorithms.

Predictive Model	R²	MAE	RMSE	MAPE
Basic CNN	0.97001	4.2252	5.2513	1.3331
PSO-CNN	0.98469	3.1995	4.1673	1.4581
WOA-CNN	0.98924	3.1335	4.1391	1.0009
GWO-CNN	0.99033	3.1003	4.0841	0.60226
SAO-CNN	0.99617	1.0189	3.6412	0.49482

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Li, S.; Li, H.; Bai, L. Ultra-Short-Term Wind Power Prediction Based on Fused Features and an Improved CNN. Processes 2025, 13, 2236. https://doi.org/10.3390/pr13072236

AMA Style

Li H, Li S, Li H, Bai L. Ultra-Short-Term Wind Power Prediction Based on Fused Features and an Improved CNN. Processes. 2025; 13(7):2236. https://doi.org/10.3390/pr13072236

Chicago/Turabian Style

Li, Hui, Siyao Li, Hua Li, and Liang Bai. 2025. "Ultra-Short-Term Wind Power Prediction Based on Fused Features and an Improved CNN" Processes 13, no. 7: 2236. https://doi.org/10.3390/pr13072236

APA Style

Li, H., Li, S., Li, H., & Bai, L. (2025). Ultra-Short-Term Wind Power Prediction Based on Fused Features and an Improved CNN. Processes, 13(7), 2236. https://doi.org/10.3390/pr13072236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ultra-Short-Term Wind Power Prediction Based on Fused Features and an Improved CNN

Abstract

1. Introduction

2. Fundamental Theory

2.1. Kernel Principal Component Analysis

2.2. Dynamic Modal Decomposition

2.3. Snow Ablation Optimisation Algorithm

2.4. Convolutional Neural Network

2.5. Improving Convolutional Neural Networks

3. Prediction Model Based on Fused Features and an Improved CNN

3.1. Predictive Modelling Overall Process

3.2. Indicators for the Evaluation of Predictive Models

4. Case Study Analysis

4.1. Data Source

4.2. KPCA Dimensionality Reduction

4.3. DMD Decomposed Power Data

4.4. Experimental Comparison of Predictive Model Performance

4.4.1. Prediction Under Different Decomposition Algorithms

4.4.2. Prediction Under Different Master Forecasting Models

4.4.3. Prediction Under Different Optimisation Algorithms

4.4.4. Projections for Different Wind Farms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI