Short-Term Wind Power Prediction Based on Multi-Feature Domain Learning

Xue, Yanan; Yin, Jinliang; Hou, Xinhao

doi:10.3390/en17133313

Open AccessArticle

Short-Term Wind Power Prediction Based on Multi-Feature Domain Learning

by

Yanan Xue

,

Jinliang Yin

^* and

Xinhao Hou

School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300384, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(13), 3313; https://doi.org/10.3390/en17133313

Submission received: 29 May 2024 / Revised: 30 June 2024 / Accepted: 3 July 2024 / Published: 5 July 2024

(This article belongs to the Special Issue Advances in AI Methods for Wind Power Forecasting and Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Wind energy, as a key link in renewable energy, has seen its penetration in the power grid increase in recent years. In this context, accurate and reliable short-term wind power prediction is particularly important for the real-time scheduling and operation of power systems. However, many deep learning-based methods rely on the relationship between wind speed and wind power to build a prediction model. These methods tend to consider only the temporal features and ignore the spatial and frequency domain features of the wind power variables, resulting in poor prediction accuracy. In addition to this, existing power forecasts for wind farms are often based on the wind farm level, without considering the impact of individual turbines on the wind power forecast. Therefore, this paper proposes a wind power prediction model based on multi-feature domain learning (MFDnet). Firstly, the model captures the similarity between turbines using the latitude, longitude and wind speed of the turbines, and constructs a turbine group with similar features as input based on the nearest neighbor algorithm. On this basis, the Seq2Seq framework is utilized to achieve weighted fusion with temporal and spatial features in multi-feature domains through high-frequency feature extraction by DWT. Finally, the validity of the model is verified with data from a wind farm in the U.S. The results show that the overall performance of the model outperforms other wind farm power prediction algorithms, and reduces MAE by 25.5% and RMSE by 20.6% compared to the baseline persistence model in predicting the next hour of wind power.

Keywords:

short-term wind power prediction; multi-domain feature fusion; wavelet transform; similarity; deep learning; wind turbines

1. Introduction

The world is facing challenges such as climate change, resource shortages, and environmental pollution. Gradually replacing traditional fossil fuel power generation with renewable energy has become a development trend [1], and sustainable development has become the guiding principle of the global energy transformation [2]. In this transformation, wind energy, as a clean and renewable energy source, plays a vital role. The popularization and utilization of wind energy can help reduce environmental pollution and carbon emissions and promote sustainable development. With the advancement of technology and industry, countries are actively developing and utilizing wind energy and integrating wind power into the power grid to promote the development of clean energy. However, the volatility and stochastic nature of wind power pose significant challenges for the power system when it is incorporated into the grid. Therefore, accurate short-term wind power prediction is particularly important. Such predictions not only help improve grid dispatching efficiency and reduce operating costs but also enhance the security, reliability, and controllability of the system [3,4,5,6].

According to the time scale, researchers categorize wind power prediction into ultra-short-term prediction (less than 30 min), short-term prediction (30 min to 6 h), medium-term prediction (6 h to 1 day), and long-term prediction (1 day to 7 days). These types of predictions play different roles in the actual operation of the power system. Short-term prediction not only affects real-time dispatch but also has an important impact on power generation plans. This is because it can guide the formulation of power generation plans several hours in advance, enabling more efficient utilization of generation resources. Accurate short-term forecasts ensure stable grid operation and rational allocation of generation resources, which is of profound significance for achieving efficient operation and reliable power supply for the power system. Therefore, accurate short-term forecasting has become the focus of researchers’ attention in recent years.

In recent decades, there have been three main categories of wind power prediction (WPP) methods: physical methods, statistical methods, and artificial intelligence methods.

The physical approach describes the physical relationship between weather conditions, topography, wind speed, and wind turbine power, using the resulting numerical weather prediction (NWP) as input to the wind power prediction model without the need for historical wind power data. Statistical models that analyze the time series of historical data can directly describe the link between wind speed and wind power generation as predicted by the NWP without taking into account the physical characteristics of the generation system. For example, Christos Stathopoulos et al. explored the problem of wind power prediction through numerical and statistical prediction models and verified that accurate wind power prediction can be achieved under the condition of reliable local environmental data [7]. Michael Milligan et al. developed a class of autoregressive moving average (ARMA) models applied to wind speed and wind output [8]. Xiaosheng Peng et al. proposed a data mining-based regional power prediction method to optimize the input parameters, which is highly superior to traditional prediction methods [9]. P. Lakshmi Deepak proposed an improved linear regression algorithm that overcomes the limitations of ridge regression, achieving better wind power forecasting [10].

The AI method makes future wind power predictions by learning the relationship between past weather conditions and the power output generated from past time series. Unlike statistical methods based on explicit statistical analysis, the AI method excels in characterizing the nonlinear and highly complex relationships between the input data (NWP forecasts and output power). Thus, it can achieve better prediction results in scenarios involving short-term wind power prediction. For example, Jianwu Zeng et al. proposed a short-term WPP model based on support vector machines to predict the wind speed. Then they utilize the power–wind speed characteristics of the wind turbine generator to predict wind power, which provides better prediction accuracy for both ultra-short-term and short-term WPPs [11]. Guoqing An et al. proposed an Adaboost algorithm combined with a particle swarm optimization extreme learning machine (PSO-ELM) in conjunction with a wind power prediction model [12]. Bowen Zhou et al. studied weather forecast data as one of the inputs to the long short-term memory (LSTM) network model to realize the on-site prediction of wind farm power [13]. Jie Hao et al. proposed an improved random forest short-term prediction model based on hierarchical output power, which adopts Poisson resampling instead of random forest’s bootstrap to improve the training speed of the random forest algorithm [14]. Weisi Deng et al. proposed a short-term WPP method for windy weather based on wind speed interval segmentation and TimeGAN, which improves the accuracy of short-term wind power prediction [15]. Md Alamgir Hossain et al. developed a prediction modeling framework consisting of LSTM, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and monarch butterfly optimization (MBO) algorithms, which has low computation time and satisfactory performance [16]. Wei Fan and colleagues propose an A-GRU-S2S model based on the sequence-to-sequence GRU architecture, which eliminates the model’s dependency on temporal distance information, thereby effectively predicting ultra-short-term power output of wind farms [17]. Jiaqiu Hu and colleagues have introduced a forecasting model based on Neural Prophet, which enhances the accuracy of wind power prediction and the integration capability of renewable energy during cold wave conditions [18]. Nanyang Zhu et al. proposed GGNet, a granularity-based GNN with better performance than the state-of-the-art (SoTA) method [19]. The above studies have shown that prediction models based on artificial intelligence methods can achieve high-resolution wind power prediction. However, most of them still suffer from several drawbacks, such as:

Most of them rely on the relationship between wind speed and wind power to build the prediction models, and lack consideration for additional features such as time features, spatial features, and so on.
There is a lack of consideration of individual turbines when measuring wind farm power.

In summary, a short-term wind power prediction model based on multi-feature domain learning (MFDnet) is proposed in this paper. The method leverages the complementary characteristics of temporal, spatial correlation, and frequency domain information, integrates wind speed latitude and longitude data from different turbines, along with time information, utilizes the high-pass filtering characteristics of wavelet transform to capture the high-speed changing components of the signal, and improves the model’s sensitivity to the short-term signal transformation. Additionally, a similarity matrix is introduced to design a spatial similarity nearest neighbor algorithm (SSNN) to reduce the dependence of wind power prediction on wind speed. The main contributions of this paper are:

The idea of integrating multiple feature domains is introduced into wind power prediction, which improves the prediction accuracy through the complementary characteristics of temporal, spatial and frequency domains.
A SSNN is proposed to obtain correlation information between multiple turbines using historical latitude and longitude information and historical wind speed information from different turbines, thus reducing the uncertainty transfer caused by previous wind speed dependent forecasts.
In the selected dataset, we propose a new wind power forecasting model that performs individual wind power predictions for each turbine in a wind farm and then combines the predictions for an overall wind farm forecast. Compared to other competing algorithms, it demonstrates superior performance. Specifically, the model outperforms other wind farm power prediction algorithms in overall performance and reduces the MAE by 25.5% and RMSE by 20.6% when predicting the wind power for the next 1 h compared to the baseline persistence model.

The rest of the paper is structured as follows: Section 2 focuses on the dataset and the specific methods of data processing. Section 3 provides a detailed description of the proposed algorithmic model. Section 4 focuses on experimental validation. Finally, Section 5 summarizes the entire paper.

2. Data Processing

The dataset utilized in this paper is sourced from [20], originating from a flat terrain inland wind farm located in the United States. It encompasses hourly wind speeds and wind power data for 200 turbines randomly selected over the period of 2010 to 2011. The dataset covers the time period from 9 January 2010 to 31 August 2011, with a temporal resolution of 60 min. The overall ratio of anomalous and missing data is less than 2%. The statistical summary of the dataset is presented in Table 1, which includes measures such as mean, standard deviation (Std), minimum value (Min), and maximum value (Max).

Additionally, the dataset incorporates hourly wind speed and direction measurements from three meteorological masts. Notably, it includes the relative coordinates (latitude and longitude) of the 200 turbines. It is important to note that due to the confidentiality surrounding the exact location of the wind farm, the provided dataset contains relative positions of the real data with an added constant. However, the layout remains consistent with the actual arrangement. Therefore, this dataset has no effect on the prediction accuracy of the algorithm proposed in this paper [3].

2.1. Feature Selection

Feature selection plays a pivotal role in enhancing the accuracy and reliability of wind power prediction models. In this section, we conduct various correlation analyses to uncover the inherent relationships within the data in our dataset, laying the groundwork for our feature selection process. Given the paramount importance of meteorological factors in wind power prediction, our initial focus is on examining the correlation between wind speed and wind power.

To quantify this correlation, we employ the Pearson correlation coefficient, a widely-used statistical metric for assessing the strength and direction of linear relationships between two variables. Ranging from −1 to 1, the Pearson correlation coefficient provides insights into how changes in one variable correspond to changes in another. A positive correlation coefficient indicates a positive relationship, where an increase in one variable corresponds to an increase in the other. Conversely, a negative correlation coefficient signifies an inverse relationship, with an increase in one variable associated with a decrease in the other. A correlation coefficient near 0 suggests little to no linear relationship between the variables.

The formula for calculating the Pearson correlation coefficient is as follows:

\begin{matrix} r_{x y} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}} \end{matrix}

(1)

where,

x_{i}

,

y_{i}

are the wind power and wind speed at the

i - t h

moment,

\bar{x}

and

\bar{y}

denote the mean value of wind speed and wind power, respectively, and n is the length of time (h).

Specifically, we initially screened numbers from 1 to 200 with an incremental factor of 10. This resulted in a series of turbines numbered 1, 10, 20, 30,..., 190, and 200, totaling 21 turbines. The wind power and wind speed of these 21 turbines were tested for Pearson similarity at the same moment. The final results are presented in Figure 1. It can be observed that the correlation coefficients between different turbines consistently range from 0.90 to 1.00. This indicates a strong correlation between wind speed and wind power in this dataset, justifying the consideration of wind speed as a correlation factor for wind power prediction.

Additionally, it is essential to assess the temporal autocorrelation of the wind power and the wind speed series. Figure 2 illustrates the Pearson autocorrelation coefficients of the wind power series with different lags. Figure 3 illustrates the Pearson autocorrelation coefficients of the wind power series with different lags.

It can be seen that the autocorrelation coefficient decreases sharply at the first few lags, and when the lag is not more than 6 h, the autocorrelation coefficient is greater than 0.3. Therefore, this indicates a stronger autocorrelation among the wind data during these early time points, as well as wind speed data, and we can use the wind speed as the characterization input in the prediction of short-term wind power.

Based on the aforementioned analysis, wind speed, time, latitude, and longitude have been selected as inputs for the prediction task in this study. The wind speeds represent a key factor for modeling wind power prediction, while the latter two introduce the concepts of temporal and spatial features, respectively.

In this paper, the timestamp is formatted in reserved hours, enabling its utilization as a feature input for the model. As demonstrated earlier, wind speed and wind power data tend to exhibit temporal correlation; that is, wind speed and wind power at the current time may be correlated with data from previous time periods. By incorporating time as a feature into the model, this temporal correlation can be more effectively considered, thereby enhancing the prediction accuracy of the model.

Furthermore, this study utilizes latitude and longitude as spatial feature inputs, enabling the model to capture the correlations among turbine spatial nodes [21].

Therefore, in the subsequent chapter, we propose a novel approach to address this issue.

2.2. Data Preprocessing

In order to ensure the quality of the raw data and the accuracy of the prediction, the input wind power data and wind speed data are firstly processed with missing values, and the linear interpolation method is used in this paper, whose formula can be expressed as:

\begin{matrix} y = y_{0} + \frac{(x - x_{0}) (y_{1} - y_{0})}{x_{1} - x_{0}} \end{matrix}

(2)

where

x_{0}

,

x_{1}

are timestamps and

y_{0}

,

y_{1}

are the corresponding wind power or wind speed magnitudes.

Additionally, to mitigate the impact of differing scales and distributions of features [22], this paper employs Min-Max normalization to scale the wind power and wind speed to a range between 0 and 1, which is shown in Equation (3). The outcome of this process is standardized wind energy data, which is used to enhance the processing efficiency of the prediction model.

\begin{matrix} x_{n o r m} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}} \end{matrix}

(3)

where

x_{m i n}

is the minimum value of the data,

x_{m a x}

is the maximum value of the data, and

x_{n o r m}

is the normalized data.

3. Methods and Principles

3.1. Spatial Similarity Nearest Neighbor Algorithm

There exists a spatio-temporal correlation among neighboring turbines, where turbines in adjacent areas often share similar air density, air pressure, and humidity conditions [23]. Our approach attempts to perform wind power prediction at the turbine level, and turbines with similar conditions are favorable for wind power prediction. Turbines with high similarity exhibit highly similar wind power forecasting results. By aggregating these highly similar turbines into a turbine ensemble and using this ensemble for training, it is possible to mitigate the impact of individual turbine biases on model accuracy [24]. In this paper, we propose a nearest neighbor algorithm based on this idea.

Traditionally, nearest neighbor algorithms [25] determine a threshold for the similarity measure and assign targets to corresponding groups using the nearest neighbor rule. In conventional nearest neighbor-based grouping algorithms, spatial distances between targets and the center of each group are typically calculated, and these distance values are used to achieve the grouping task. Unlike singularly employing spatial distance as a metric in nearest neighbor algorithms, our approach simultaneously considers the difference between spatial distance and the similarity of wind speeds among different turbines. Specifically, we first calculate the Euclidean distance between different turbines using latitude and longitude values to ascertain the spatial correlation among them, as depicted by the following Equation (4):

\begin{matrix} E (a, b) = \sqrt{{(x_{b} - x_{a})}^{2} + {(y_{b} - y_{a})}^{2}} \end{matrix}

(4)

It is not difficult to understand that when

E (a, b)

is smaller, the closer the two spatial connections are, and the closer the corresponding wind power is. After obtaining the relationship between the different turbine spaces, we use the cosine similarity to calculate the similarity of wind speeds at the same historical moments for the different turbines, and the formula is shown in Equation (5):

\begin{matrix} S (A, B) = \frac{A \cdot B}{| | A | | \cdot | | B | |} \end{matrix}

(5)

where

S (A, B)

denotes the cosine similarity of wind speeds at the same historical moment for turbine A and turbine B, “·” denotes the dot product of vectors, and

| | \cdot | |

denotes the van of vectors.

When

S (A, B)

is closer to 1, the wind speed sequence of turbines A and B is more similar, and the wind power is also closer. Combining the spatial correlation coefficients and wind speed similarity coefficients obtained from Equations (4) and (5), the spatial similarity can be obtained by subtracting the two; see Equation (6).

S S (A, B) = S (A, B) - E (a, b)

(6)

where the larger the

S S (A, B)

is, the more similar the turbines A,B are to each other.

The resulting

S S (A, B)

is used as a metric to find the k-nearest neighbors for each target wind turbine. The obtained turbine group and the corresponding wind speed data for the turbines are combined as features, which, along with temporal features, are merged and fed into the model. The framework of the method proposed in this paper is illustrated in Figure 4. The selected data will be divided into training, validation, and test datasets in a 6:2:2 ratio. The spatial distance between turbines will be calculated based on the latitude and longitude of individual turbines and combined with wind speed data using a spatial similarity nearest neighbor algorithm to identify the n turbines most similar to each turbine. The data corresponding to these turbines will be integrated to form turbine ensembles, which will serve as inputs to the model.

It is worth mentioning that this paper differs from other prediction methods in that wind speed is indirectly input to the wind power prediction model as a feature for the following reasons:

Considering wind speed as one of the important factors affecting the results of wind power prediction, small changes in wind speed may lead to significant changes in wind power [26]. The wind power prediction model still needs to take into account the impact caused by wind speed on the prediction.
To eliminate the bias in wind power prediction due to wind speed prediction errors.
To reduce the parameters for model training and improve the prediction efficiency.

3.2. Wavelet Transform-Based Frequency Domain Feature Extraction

The wavelet transform plays a crucial role in wind power prediction [27]. Compared with the traditional Fourier transform, it offers superior time resolution and allows for a more flexible selection of mother wavelet types, thereby enhancing signal analysis effectiveness and prediction accuracy. While the traditional Fourier transform may sacrifice time information when processing signals in the frequency domain, the wavelet transform retains both frequency and time information, making it more adept at analyzing intermittent and highly variable waveforms. There are two main types of wavelet transforms: discrete wavelet transform (DWT) and continuous wavelet transform (CWT) [28]. CWT can comprehensively capture all information within a given time-series signal, but it is computationally complex and challenging to implement [29]. In contrast, DWT utilizes discrete sampling and is more suitable for practical time series signal processing. Moreover, DWT effectively reduces computational complexity and mitigates the issue of information redundancy that may arise with CWT. Therefore, this paper opts for DWT, which is defined by the following Equation (7):

\begin{matrix} W (M, n) = 2^{- (m / 2)} Σ_{t = 0}^{T} f (t) ψ (\frac{t - {n \cdot 2}^{m}}{2^{m}}) \end{matrix}

(7)

where

2^{m}

is the scale parameter and n is the position parameter,

ψ (\frac{t - n \cdot 2^{m}}{2^{m}})

is the wavelet basis function, and T is the length of the signal

f (t)

. Common wavelet basis functions are Haar wavelet, Daubechies wavelet, Symlet wavelet, and Coiflet wavelet. Among them, Haar wavelet is a function in the form of a square wave with the following mathematical formula:

\begin{matrix} ψ (t) & = \{\begin{matrix} 1 & 0 \leq t < \frac{1}{2} \\ - 1 & \frac{1}{2} \leq t < 1 \\ 0 & otherwise \end{matrix} \end{matrix}

(8)

where

ψ (t)

denotes the Haar wavelet basis function, which is defined in the interval

[0, 1)

. When t is in the interval

[0, \frac{1}{2})

, the wavelet function takes the value of 1, which indicates a positive spike; when t falls in the interval

[\frac{1}{2}, 1)

, the wavelet function takes the value of

- 1

, which indicates a negative spike; in other intervals, the wavelet function takes the value of 0, which means no spike. In this paper, this wavelet basis function is chosen as the basis function of the wavelet transform, and the specific reasons will be explained in Section 4.5 (Experiments).

3.3. Relevant Models

3.3.1. Causal Convolutional Neural Network

Convolutional neural networks (CNNs) were initially developed for image recognition tasks. However, when it comes to time series data, researchers often lean towards recurrent neural networks (RNNs) or LSTM networks. Despite their effectiveness in various applications, RNNs and LSTMs encounter challenges such as high training memory requirements and prolonged training times. These issues become particularly pronounced when handling extensive sequential data, which is often the case in wind power prediction tasks. Therefore, causal convolution is introduced for wind power prediction task [30].

Causal convolution is a specialized variant of the convolution operation designed for handling sequential data. It distinguishes itself by utilizing only current and preceding values from the input sequence to compute the output, excluding any information from future time steps. This is shown in Equation (9):

\begin{matrix} p (x | θ) = \prod_{t = 0}^{N - 1} p (x (t + 1) | x (0), \dots, x (t), θ) \end{matrix}

(9)

where N denotes the number of samples. Causal convolution predicts

x (t + 1)

using

x (0)

,

x (1)

, ⋯,

x (t)

, which allows the network to learn the dependencies before and after the target moment. Figure 5 illustrates the structure of causal convolution.

3.3.2. Deep-Learning-Based GRU

The gated recurrent unit (GRU) stands as a variant of RNN, similar to LSTM, renowned for its capacity to capture long-term dependencies [31]. Despite its simpler structure and fewer training parameters compared to LSTM, GRU exhibits comparable performance across various tasks. Illustrated in Figure 6, the GRU architecture comprises an update gate and a reset gate, governing the flow of information and memory retention. GRU, similar to LSTM, adeptly learns and retains both short-term and long-term correlations, enhancing its sequential data processing capabilities. The update gate modulates the incorporation of new input data with past memories, while the reset gate regulates the retention of historical information, facilitating the dynamic adjustment of information retention and forgetting. This gating mechanism endows GRU with greater flexibility in learning diverse sequential data patterns while achieving comparable performance with fewer parameters than LSTM. In the context of wind power prediction, characterized by intricate time dependencies among multiple variables, GRU enables the adaptive adjustment of historical information utilization and responsiveness to current inputs, thus enhancing the model’s capability in processing such sequential data.

3.4. Wind Power Prediction Model with Multi-Feature Domain Learning

In this study, the Seq2Seq method serves as the foundational architecture of the model. The encoder component comprises two pathways: one harnesses a GRU network to glean spatio-temporal features, while the other extracts frequency-domain features via DWT. Subsequently, features acquired from both pathways are weighted, fused, and output as depicted below:

\begin{matrix} E n c_{o} = α \cdot E_{s} + β \cdot E_{f} \end{matrix}

(10)

where

E n c_{o}

denotes the encoder output,

E_{s}

denotes the output of spatio-temporal data after GRU, and

E_{f}

denotes the frequency domain features after DWT processing. In Section 4.6, we design experiments on weight values in detail.

In the decoder section, we introduce a feedforward module composed of two causal convolutional layers. This feedforward module embraces a causal convolutional structure and integrates a ReLU activation function, tailored to capture the nonlinear relationships among various factors, such as wind speed and wind power, thereby enhancing the model’s prediction accuracy. Towards the end of the feedforward module, we implement residual connections to preserve input information, which is then fused with processed data to facilitate the retention of historical information by the model. With this architecture, our model adeptly synthesizes spatio-temporal and frequency domain features, incorporating the nonlinear relationships of wind speed and wind power for more precise wind power predictions. The architecture of the model is illustrated in Figure 7.

After preprocessing the one-dimensional SCADA data, it is fed into the STD-Encoder for both frequency domain and time domain feature extraction. In the frequency domain, discrete wavelet transform (DWT) is used to extract high-frequency signals, which are then processed by a gated recurrent unit (GRU). The resulting features are weighted and combined with those from the time domain. The decoder processes the information from the encoder and ultimately completes the prediction task.

Additionally, MFDnet utilizes the GRU-based Seq2Seq framework, which offers several advantages over other prediction methods:

The Seq2Seq framework improves the input flexibility of the model and increases the generalizability of the model.
GRU demonstrates superior capability in managing long-term dependencies while balancing the problem of vanishing gradients.
The utilization of fewer model parameters results in decreased computational demands and hardware requirements.

4. Experiments

4.1. Evaluation Indicators

In this paper, the coefficient of determination (

R^{2}

), mean absolute error (MAE), and root mean square error (RMSE) are employed as evaluation metrics for wind power forecasts.

R^{2}

is a statistical measure used to assess the goodness of fit of a regression model. MAE assesses the average prediction error, while RMSE accentuates larger errors by squaring these deviations. The calculation of these metrics is outlined below:

\begin{matrix} R^{2} & = 1 - \frac{\sum_{t = 1}^{n} {(y_{t r u e} - y_{p r e d})}^{2}}{\sum_{t = 1}^{n} {(y_{t r u e} - y_{a v e r})}^{2}} \end{matrix}

(11)

\begin{matrix} M A E & = \frac{1}{n} \sum_{t = 1}^{n} |y_{t u r e} - y_{p r e d}| \end{matrix}

(12)

\begin{matrix} R M S E & = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(y_{t r u e} - y_{p r e d})}^{2}} \end{matrix}

(13)

where n is the number of sampling points,

y_{p r e d}

is the predicted value, and

y_{t r u e}

is the true power.

4.2. Relevant Work

The experiments in this paper are conducted utilizing the following hardware and software setup: NVIDIA GeForce RTX 3060 Laptop GPU, AMD Ryzen 7 5800H with a clock speed of 3.2 GHz, and PyCharm 2021 as the software environment. The paper proposes a short-term wind power prediction model based on multi-feature domain learning. Parameters for the model include a dropout size of 0.1 for the causal convolution, a convolution kernel size of 3, and the selection of Haar as the mother wavelet type for DWT. The experiment settings include 300 iterations, a learning rate of 0.001, optimization using the Adam optimizer, and setting the nearest neighbor k to 5, which specifies the number of turbines in each turbine ensemble to be 5.

Unlike other wind power prediction tasks that are based on a single wind farm, the dataset used in our experiments contains 200 different turbines. Considering that evaluating the combined prediction error of these 200 turbines as the overall prediction deviation for the wind farm might lead to compensating errors, we average the prediction errors of the 200 turbines to assess the accuracy of the wind farm’s power prediction.

4.3. Comparison with Traditional Methods

In this paper, we highlight the superiority of our proposed model by comparing it with a diverse range of classical models and state-of-the-art deep learning-based approaches. These comparisons encompass fundamental deep learning models such as multilayer perceptron (MLP) [32], RNN [33], GRU [34], LSTM [24], and LSTM-LSTM [35], alongside deep learning models specifically tailored for wind power prediction, such as DST [23] and STAN [36]. In addition to this, we have chosen the standard benchmark persistence model (denoted as PM in the Table 2) as the benchmark for the full text comparison. We used the coefficient of determination (

R^{2}

), mean absolute error (MAE) and root mean square error (RMSE) as evaluation metrics. To enhance evaluation accuracy, all models underwent three rounds of testing within the same environment, and the results were averaged. The outcomes of these experiments are presented in Table 2, where the highest

R^{2}

and lowest MAE or RMSE for each hour are highlighted for clarity.

The experimental results indicate that the RMSE of the RNN is lower than that of our proposed model at the 1 h forecasting horizon. Moreover, our proposed method demonstrates smaller MAE and RMSE at most time intervals, substantiating the superior predictive capability of our model for wind power forecasting. In order to show the evaluated values of the different forecasting models more clearly at different time scales, bar charts have been employed for visualization, as shown in Figure 8, Figure 9 and Figure 10.

In Figure 8, Figure 9 and Figure 10, a visual comparison of each model’s performance at different time points is presented. It is evident that the curve representing our model exhibits significant enhancement over other methods from 1 to 3 h. However, as time progresses, the superiority of our proposed model diminishes compared to other models. This phenomenon can be attributed to the model’s selection of the high-frequency component of the original signal during processing of frequency-domain information with GRU in the encoder. Typically, this component contains rapid variations or detailed information, corresponding to sudden changes in wind power characteristics. Moreover, high-frequency information also includes noise and other interference, posing challenges for the model in long-term feature extraction of wind power. Nevertheless, it is noteworthy that our model still demonstrates improvements compared to other methods, indicating its superior noise resistance and generalization capability. In Figure 11, the forecast performance of a wind farm comprising 200 turbines is shown, indicating that the actual results are closely aligned with the model’s predictions.

In summary, we attribute the superior performance of our method in short-term wind power prediction tasks to the following possible factors:

The Seq2Seq framework captures deeper temporal dependencies than simple prediction methods such as LSTM or GRU.
The introduction of DWT to extract frequency domain information improves the utilization of information by fusing the idea of spatio-temporal and frequency domain features.
The use of the embedding method, which introduces individual turbine identities as features, enables the model to learn the differences between turbines and predict based on these differences, leading to a more accurate prediction of future power generation.

4.4. K Tuning Experiment

This section delves into examining the impact of the value of k on the accuracy of the prediction task within the SSNN. Figure 12 shows the autocorrelation of wind speeds for 20 turbines at lag times of 1 to 4 h, revealing strong correlations over short periods. This means that the wind resources owned by these turbines are similar and the number of turbine groups should be controlled to a small value. And since the autocorrelation coefficients of the turbines are all close to each other, this indicates that the number of turbines constituting the turbine group should be approximately the same. Therefore, the number of turbine groups in this paper is controlled to be the same constant k. To effectively verify the influence of the value of k on the accuracy of the final prediction task, this paper has conducted the following test, setting the range of k to be a positive integer interval from 1 to 10, with all other conditions of the test remaining the same. The final result is the average result after three trials as shown in Table 3, Table 4 and Table 5.

R^{2}

is closest to 1, and both MAE and RMSE predictions are minimized when

k = 5

, indicating optimal performance. Moreover, there is a discernible decrease in prediction accuracy as k deviates from this optimal value

(k = 5)

. This outcome can be attributed to the methodology employed in this paper, where turbine groups derived from the k-nearest neighbors serve as feature inputs for model training. During this training process, the turbine group data act as mutual constraints, ensuring that the wind power of a turbine group remains similar. When k is small, these turbines fail to exert significant constraints, resulting in poorer predictions. Conversely, when k exceeds the optimal value, a larger number of unrelated turbines are included, leading to misjudgments by the model and subsequent declines in prediction accuracy.

4.5. Effects of Mother-Wavelet Selection

This paper presents an algorithm utilizing the Haar wavelet basis functions within the wavelet transform to extract high- and low-frequency components. Actually, the DWT includes various wavelet basis functions, such as the Daubechies (db) wavelet, Symlet (sym) wavelet, and Coiflet (coif) wavelet. The selection of different mother wavelets can influence the processing of time series data to some extent [37]. To delve deeper into the implications of mother wavelet selection, we conducted a case study involving different types of wavelet basis functions. These functions can be represented in a general form:

\begin{matrix} ψ (t) = \sum_{k = 0}^{2 N - 1} h_{k} ψ (2 t - k) \end{matrix}

(14)

where

h_{k}

are the filter coefficients and

ψ (t)

is the wavelet basis function; the optional N determines its specific properties and support length.

In this paper, we selected commonly used wavelet basis functions, namely sym2, Haar, db2, and coif1. It is noteworthy that each of these wavelets possesses distinct advantages. For instance, the Daubechies wavelet boasts tight support and orthogonal properties, rendering it proficient in accurately capturing both short-term and long-term features within the signal. Symlet is characterized by greater symmetry compared to other wavelets, exhibiting minimal asymmetry and the highest number of vanishing moments for a given compact support, thus effectively handling asymmetric signals [38]. Coiflet wavelets are particularly well-suited for analyzing transient, time-varying signals, making them ideal for processing non-smooth signals and signals with pronounced singularities [39].

To assess the impact of various mother wavelets on the prediction outcomes, three alternative wavelet basis functions were employed in lieu of the Haar wavelet for experimentation. Subsequently, the prediction performance metrics of

R^{2}

, MAE and RMSE are evaluated and compared. The results are illustrated in Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23 and Figure 24.

The results indicate that the MAE and RMSE values obtained using the Haar-based DWT are consistently lower compared to those achieved with all other mother wavelets, and the

R^{2}

value is higher than those of the other mother wavelets. Therefore, it can be concluded that the widely adopted Haar wavelet basis function is more suitable for wind power prediction tasks in this context.

4.6. Weighted Design Experiment

In Section 3.2, a weighted summation method is proposed to integrate information from both the frequency and spatio-temporal domains. This weighted approach is designed to regulate the influence of each domain on the final prediction results. As the parameter

β

increases, high-frequency features in the frequency domain may exert a greater influence on the results. However, this also amplifies the interference from noise present in the high-frequency signals, thereby affecting the prediction more significantly. Conversely, as

β

decreases, features processed by the encoder tend to align more closely with the long-term trend.

To enhance the rigor of the weighting design, this section conducts experiments with various weighting parameter configurations, with

R^{2}

, MAE and RMSE selected as the evaluation metric. Figure 25, Figure 26 and Figure 27 present the results of these experiments. Nine different weight combinations are compared in this paper, all summing to 1. It is observed that at the 1-hour mark, the combination of

α

= 0.9 and

β

= 0.1 demonstrates superior prediction performance compared to other models. Over time, the impact of different weights on the prediction results diminishes. Beyond the 3-hour mark, the disparities among the prediction results under different weight configurations become negligible, as shown in Figure 26 and Figure 27.

Consequently, this paper selects the weights

α

= 0.9 and

β

= 0.1. These parameters are deemed optimal for the weighted summation method.

4.7. Ablation Experiment

Based on the aforementioned experimental outcomes, our proposed model demonstrates superior performance on the public dataset. To delve into the validity of SSNN, DWT frequency domain processing, and the feedforward module within our proposed model, we conducted an ablation experiment outlined in Table 6. It is worth mentioning that we chose the Seq2Seq framework with both encoder and decoder as GRU as the baseline.

The experimental results are presented in Table 7, Table 8 and Table 9. Ablation experiments CM1, CM2, and CM3 demonstrate that our MFDnet outperforms when using individual modules alone. Conversely, experiments CM4, CM5, and CM6 reveal that SSNN performs the best in our model, followed by the feedforward module, and lastly, DWT. Further analysis combined with CM2 results suggests a mismatch issue between DWT and the feedforward module. This mismatch arises from the feedforward module learning noise characteristics. Integrating SSNN into the model helps alleviate this misalignment by enhancing similarity features between turbines, thereby mitigating the noise effects introduced by DWT on the final prediction results to some extent. This is corroborated by the relatively poor performance of CM1 alone.

Comparing the MAE and RMSE results of the baseline and MFDnet, this study concludes the following: MFDnet exhibits significant improvements in wind power prediction compared to the GRU-based Seq2Seq framework. Specifically, MFDnet improves

R^{2}

by 10.4%, reduces MAE by 4.1% and RMSE by 3.6% at the 1-hour mark, and improves

R^{2}

by 5.5%, reduces MAE by 1.6% and RMSE by 1.7% on average across the 6-h prediction task.

5. Conclusions

In this research, we developed a short-term wind power forecasting model that intergrates multiple feature domains aimed at enhancing the accuracy of short-term wind power predictions. The crux of the model involves utilizing data from wind turbines within a wind farm to predict the wind farm’s power output. At the input stage, the model captures inter-turbine correlation features along with spatial and temporal features through a spatial similarity-based nearest neighbor algorithm, which forms the input to the model. By introducing a strategy of multi-feature domain fusion, the spatial, temporal, and frequency domain features are designed to complement each other. On the feature extraction end, a feedforward module captures nonlinear relationships, enabling the model to improve its adaptability in wind power forecasting by analyzing both long-term and short-term dependencies in sequential data. Comprehensive testing on an open wind power dataset has shown that the MFDnet model possesses significant efficacy, outperforming other advanced models such as STAN and DST in two distinct evaluation metrics for short-term wind power forecasting tasks. Additionally, compared to the persistence model, MFDnet achieved an average reduction of 33.7% in MAE and 22.3% in RMSE, while

R^{2}

improves by 4.8%.

Future work can be summarized in two main directions. Firstly, under the condition of obtaining more data, to expand the prediction from a single wind farm to different wind farms’ power forecasts, thereby increasing the model’s generalizability. Secondly, to further consider the trend and fluctuation details of wind power predictions, refining the proposed forecasting method to improve accuracy and achieve ultra-short-term wind power prediction.

Author Contributions

Methodology, X.H.; software, X.H. and Y.X.; validation, Y.X. and J.Y.; formal analysis, J.Y. and Y.X.; data curation, Y.X.; writing—original draft, Y.X., X.H. and J.Y.; writing—review and editing, J.Y., Y.X. and X.H.; funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Tianjin University of Technology under Project YBXM2310.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, H.; Liu, L.; He, Q. A Spatiotemporal Coupling Calculation Based Short-Term Wind Farm Cluster Power Prediction Method. IEEE Access 2023, 11, 131418–131434. [Google Scholar] [CrossRef]
Sun, Y.; Yang, J.; Zhang, X.; Hou, K.; Hu, J.; Yao, G. An Ultra-short-term wind power forecasting model based on EMD-EncoderForest-TCN. IEEE Access 2024. Available online: https://ieeexplore.ieee.org/document/10460523 (accessed on 5 March 2024). [CrossRef]
Li, M.; Yang, M.; Yu, Y.; Li, P.; Wu, Q. Short-Term Wind Power Forecast Based on Continuous Conditional Random Field. IEEE Trans. Power Syst. 2024, 39, 2185–2197. [Google Scholar] [CrossRef]
Zhu, J.; Su, L.; Li, Y. Wind power forecasting based on new hybrid model with TCN residual modification. Energy AI 2022, 10, 100199. [Google Scholar] [CrossRef]
Yang, M.; Wang, D.; Xu, C.; Dai, B.; Ma, M.; Su, X. Power transfer characteristics in fluctuation partition algorithm for wind speed and its application to wind power forecasting. Renew. Energy 2023, 211, 582–594. [Google Scholar] [CrossRef]
Wang, D.; Yang, M.; Zhang, W. Wind Power Group Prediction Model Based on Multi-Task Learning. Electronics 2023, 12, 3683. [Google Scholar] [CrossRef]
Stathopoulos, C.; Kaperoni, A.; Galanis, G.; Kallos, G. Wind power prediction based on numerical and statistical models. J. Wind. Eng. Ind. Aerodyn. 2013, 112, 25–38. [Google Scholar] [CrossRef]
Milligan, M.; Schwartz, M.; Wan, Y. Statistical Wind Power Forecasting Models: Results for U.S. Wind Farms; Preprint. 2003. Available online: https://api.semanticscholar.org/CorpusID:11737140 (accessed on 1 May 2003).
Peng, X.; Chen, Y.; Cheng, K.; Zhao, Y.; Wang, B.; Che, J.; Wen, J.; Lu, C.; Lee, W. Wind Power Prediction for Wind Farm Clusters Based on the Multi-feature Similarity Matching Method. In Proceedings of the 2019 IEEE Industry Applications Society Annual Meeting, Baltimore, MD, USA, 29 September–3 October 2019; pp. 1–11. [Google Scholar]
Deepak, P.L.; Ramkumar, G.; Sajiv, G. Improved Wind Power Generation Prediction through Novel Linear Regression over Ridge Regression. In Proceedings of the 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, 2–3 May 2024; pp. 46–50. [Google Scholar]
Zeng, J.; Qiao, W. Support vector machine-based short-term wind power forecasting. In Proceedings of the 2011 IEEE/PES Power Systems Conference and Exposition, Phoenix, AZ, USA, 20–23 March 2011; pp. 1–8. [Google Scholar]
An, G.; Jiang, Z.; Cao, X.; Liang, Y.; Zhao, Y.; Li, Z.; Dong, W.; Sun, H. Short-Term Wind Power Prediction Based On Particle Swarm Optimization-Extreme Learning Machine Model Combined With Adaboost Algorithm. IEEE Access 2021, 9, 94040–94052. [Google Scholar] [CrossRef]
Zhou, B.; Ma, X.; Luo, Y.; Yang, D. Wind Power Prediction Based on LSTM Networks and Nonparametric Kernel Density Estimation. IEEE Access 2019, 7, 165279–165292. [Google Scholar] [CrossRef]
Hao, J.; Zhu, C.; Guo, X. Wind Power Short-Term Forecasting Model Based on the Hierarchical Output Power and Poisson Re-Sampling Random Forest Algorithm. IEEE Access 2020, 9, 6478–6487. [Google Scholar] [CrossRef]
Deng, W.; Dai, Z.; Liu, X.; Chen, R.; Wang, H.; Zhou, B.; Tian, W.; Lu, S.; Zhang, X. Short-Term Wind Power Prediction Based on Wind Speed Interval Division and TimeGAN for Gale Weather. In Proceedings of the 2023 International Conference on Power Energy Systems and Applications (ICoPESA), Nanjing, China, 24–26 February 2023; pp. 352–357. [Google Scholar]
Hossain, M.A.; Gray, E.; Lu, J.; Islam, M.R.; Alam, M.S.; Chakrabortty, R.; Pota, H.R. Optimized Forecasting Model to Improve the Accuracy of Very Short-Term Wind Power Prediction. IEEE Trans. Ind. Inform. 2023, 19, 10145–10159. [Google Scholar] [CrossRef]
Fan, W.; Miao, L.; An, Y.; Chen, D.; Zhong, K. Wind Farm Ultra Short Term Power Prediction Considering Attention Mechanism and Historical Data of Wind Farm Internal Units. In Proceedings of the 2024 6th Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 28–31 March 2024; pp. 1121–1125. [Google Scholar]
Hu, J.; Meng, W.; Tang, J.; Zhuo, Y.; Rao, Z.; Sun, S. Short-Term Wind Power Prediction under Cold Wave Weather Conditions based on Neural Prophet. In Proceedings of the 2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 1–3 March 2024; pp. 1262–1266. [Google Scholar]
Zhu, N.; Wang, Y.; Yuan, K.; Yan, J.; Li, Y.; Zhang, K. GGNet: A novel graph structure for power forecasting in renewable power plants considering temporal lead-lag correlations. Appl. Energy 2024, 364, 123194. [Google Scholar] [CrossRef]
Ding, Y. Data Science for Wind Energy; Chapman & Hall/CRC Press: Boca Raton, FL, USA, 2019; Available online: https://api.semanticscholar.org/CorpusID:199102989 (accessed on 24 May 2019).
Yu, C.; Yan, G.; Yu, C.; Zhang, Y.; Mi, X. A multi-factor driven spatiotemporal wind power prediction model based on ensemble deep graph attention reinforcement learning networks. Energy 2023, 263, 126034. [Google Scholar]
Kim, Y.; Kim, M.K.; Fu, N.; Liu, J.; Wang, J.; Srebric, J. Investigating the Impact of Data Normalization Methods on Predicting Electricity Consumption in a Building Using different Artificial Neural Network Models. Sustain. Cities Soc. 2024, 105570. [Google Scholar] [CrossRef]
Li, J.; Armandpour, M. Deep Spatio-Temporal Wind Power Forecasting. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 4138–4142. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Xiong, Z.; Wang, C.; Zhao, Y.; Deng, D.; Liu, H. A Grouping Method of Sea Battlefield Targets Based on The Improved Nearest Neighbor Algorithm. In Proceedings of the 2022 IEEE International Conference on Unmanned Systems (ICUS), Guangzhou, China, 28–30 October 2022; pp. 298–302. [Google Scholar]
Erdem, E.; Shi, J. ARMA based approaches for forecasting the tuple of wind speed and direction. Appl. Energy 2011, 88, 1405–1414. [Google Scholar] [CrossRef]
Percival, D.B.; Walden, A.T. Wavelet Methods for Time Series Analysis. In Cambridge Series in Statistical and Probabilistic Mathematics, 4th ed.; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
Ahn, E.; Hur, J. A short-term forecasting of wind power outputs using the enhanced wavelet transform and arimax techniques. Renew. Energy 2023, 212, 394–402. [Google Scholar] [CrossRef]
Zhang, W.; Lin, Z.; Liu, X. Short-term offshore wind power forecasting—A hybrid model based on Discrete Wavelet Transform (DWT), Seasonal Autoregressive Integrated Moving Average (SARIMA), and deep-learning-based Long Short-Term Memory (LSTM). Renew. Energy 2022, 185, 611–628. [Google Scholar] [CrossRef]
Yang, Y.; Zhang, H.; Peng, S.; Su, S.; Li, B. Wind Power Probability Density Prediction Based on Quantile Regression Model of Dilated Causal Convolutional Neural Network. Chin. J. Electr. Eng. 2023, 9, 120–128. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Velasco, L.C.P.; Serquiña, R.P.; Zamad, M.S.A.A.; Juanico, B.F.; Lomocso, J.C. Week-ahead Rainfall Forecasting Using Multilayer Perceptron Neural Network. Procedia Comput. Sci. 2019, 161, 386–397. [Google Scholar] [CrossRef]
Liu, M.; Ding, L.; Bai, Y. Application of hybrid model based on empirical mode decomposition, novel recurrent neural networks and the ARIMA to wind speed prediction. Energy Convers. Manag. 2021, 233, 113917. [Google Scholar] [CrossRef]
Liu, X.; Lin, Z.; Feng, Z. Short-term offshore wind speed forecast by seasonal ARIMA—A comparison against GRU and LSTM. Energy 2021, 227, 120492. [Google Scholar] [CrossRef]
Hu, Z.; Gao, Y.; Ji, S.; Mae, M.; Imaizumi, T. Improved multistep ahead photovoltaic power prediction model based on LSTM and self-attention with weather forecast data. Appl. Energy 2024, 359, 122709. [Google Scholar] [CrossRef]
Fu, X.; Gao, F.; Wu, J.; Wei, X.; Duan, F. Spatiotemporal Attention Networks for Wind Power Forecasting. In Proceedings of the 2019 International Conference on Data Mining Workshops (ICDMW), Beijing, China, 8–11 November 2019; pp. 149–154. [Google Scholar]
Moradi, M. Wavelet transform approach for denoising and decomposition of satellite-derived ocean color time-series: Selection of optimal mother wavelet. Adv. Space Res. 2022, 69, 2724–2744. [Google Scholar] [CrossRef]
Shen, Y.; Sun, J.; Yang, X.; Liu, P. Symlets Wavelet Transform based Power Management of Hybrid Energy Storage System. In Proceedings of the 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2), Wuhan, China, 30 October–1 November 2020; pp. 2556–2561. [Google Scholar]
Patwary, R.; Datta, S.; Ghosh, S. Harmonics and interharmonics estimation of a passive magnetic fault current limiter using Coiflet Wavelet transform. In Proceedings of the 2011 International Conference on Communication and Industrial Application, Kolkata, India, 26–28 December 2011; pp. 1–6. [Google Scholar]

Figure 1. Correlation between wind speed and wind power.

Figure 2. Autocorrelation of wind power at different lags.

Figure 3. Autocorrelation of wind speed at different lags.

Figure 4. Framework of proposed method.

Figure 5. The structure of causal convolution.

Figure 6. The structure of GRU.

Figure 7. The architecture of the model.

Figure 8.

R^{2}

for different methods.

Figure 8.

R^{2}

for different methods.

Figure 9. MAE for different methods.

Figure 10. RMSE for different methods.

Figure 11. Comparison of actual data and best model predictions.

Figure 12. Correlation of wind speeds for different turbines at different lag times.

Figure 13.

R^{2}

for sym2.

Figure 13.

R^{2}

for sym2.

Figure 14.

R^{2}

for Haar.

Figure 14.

R^{2}

for Haar.

Figure 15.

R^{2}

for db2.

Figure 15.

R^{2}

for db2.

Figure 16.

R^{2}

for coif1.

Figure 16.

R^{2}

for coif1.

Figure 17. MAE for sym2.

Figure 18. MAE for Haar.

Figure 19. MAE for db2.

Figure 20. MAE for coif1.

Figure 21. RMSE for sym2.

Figure 22. RMSE for Haar.

Figure 23. RMSE for db2.

Figure 24. RMSE for coif1.

Figure 25.

R^{2}

with different weights.

Figure 25.

R^{2}

with different weights.

Figure 26. MAE with different weights.

Figure 27. RMSE with different weights.

Table 1. The statistical summary of the dataset.

Typology	Train				Test
Typology	Mean	Std	Min	Max	Mean	Std	Min	Max
Wind power	7.152	0.292	6.897	19.9	7.023	0.267	0.08728	19.6
Wind speed	0.476	0.467	2.7	8.3	0.523	0.421	3.5	7.1
Wind direction	203.413	14.71189	63.56	291.7	254.531	104.4253	163	358.4

Table 2. Comparison of experimental results.

Times (h)	Metrics	PM	MLP	RNN	GRU	LSTM	STAN	DST	LSTM-LSTM	Ours
1	$R^{2}$	−0.174	0.717	0.885	0.877	0.812	0.824	0.877	0.856	0.902
	MAE	0.220	0.130	0.123	0.130	0.138	0.131	0.128	0.136	0.123
	RMSE	0.268	0.174	0.165	0.172	0.184	0.188	0.172	0.181	0.166
2	$R^{2}$	0.670	0.730	0.854	0.810	0.798	0.796	0.771	0.817	0.848
	MAE	0.178	0.161	0.155	0.157	0.170	0.153	0.155	0.160	0.152
	RMSE	0.203	0.199	0.208	0.205	0.217	0.222	0.205	0.209	0.199
3	$R^{2}$	0.930	0.718	0.848	0.950	0.886	0.686	0.847	0.777	0.718
	MAE	0.128	0.182	0.178	0.176	0.190	0.174	0.174	0.177	0.172
	RMSE	0.155	0.231	0.230	0.226	0.238	0.247	0.226	0.228	0.221
4	$R^{2}$	0.900	0.735	0.788	0.774	0.881	0.688	0.774	0.789	0.739
	MAE	0.131	0.202	0.195	0.190	0.205	0.192	0.189	0.191	0.187
	RMSE	0.159	0.256	0.270	0.241	0.255	0.264	0.241	0.243	0.237
5	$R^{2}$	0.590	0.693	0.729	0.882	0.789	0.781	0.881	0.819	0.681
	0.363	MAE	0.215	0.210	0.203	0.217	0.204	0.203	0.203	0.201
	RMSE	0.384	0.278	0.271	0.254	0.267	0.275	0.254	0.255	0.250
6	$R^{2}$	0.580	0.701	0.711	0.772	0.617	0.642	0.672	0.675	0.629
	MAE	0.369	0.223	0.220	0.213	0.226	0.218	0.213	0.213	0.213
	RMSE	0.389	0.301	0.282	0.263	0.277	0.284	0.263	0.264	0.261

Table 3.

R^{2}

of different values of k.

Table 3.

R^{2}

of different values of k.

K\Time (h)	1	2	3	4	5	6
k = 1	0.743	0.702	0.755	0.675	0.646	0.509
k = 2	0.743	0.661	0.726	0.652	0.627	0.492
k = 3	0.789	0.691	0.717	0.652	0.622	0.486
k = 4	0.819	0.636	0.732	0.660	0.627	0.497
k = 5	0.902	0.848	0.718	0.739	0.681	0.629
k = 6	0.819	0.713	0.705	0.639	0.604	0.474
k = 7	0.819	0.752	0.711	0.644	0.611	0.480
k = 8	0.789	0.772	0.722	0.650	0.618	0.486
k = 9	0.819	0.753	0.713	0.646	0.621	0.486
k = 10	0.834	0.760	0.711	0.638	0.615	0.480

The bold values represent the highest

R^{2}

for each time interval.

Table 4. MAE of different values of k.

K\Time (h)	1	2	3	4	5	6
k = 1	0.132	0.158	0.176	0.191	0.203	0.213
k = 2	0.132	0.156	0.174	0.189	0.203	0.215
k = 3	0.129	0.158	0.175	0.189	0.203	0.214
k = 4	0.132	0.156	0.174	0.189	0.202	0.213
k = 5	0.123	0.152	0.172	0.187	0.201	0.213
k = 6	0.127	0.153	0.171	0.187	0.202	0.214
k = 7	0.127	0.153	0.172	0.187	0.202	0.214
k = 8	0.126	0.152	0.170	0.186	0.201	0.213
k = 9	0.126	0.153	0.171	0.187	0.201	0.213
k = 10	0.126	0.152	0.171	0.187	0.201	0.213

The bold values represent the lowest MAE for each time interval.

Table 5. RMSE of different values of k.

K\Time (h)	1	2	3	4	5	6
k = 1	0.175	0.206	0.227	0.242	0.254	0.264
k = 2	0.175	0.204	0.223	0.238	0.251	0.261
k = 3	0.172	0.202	0.222	0.238	0.250	0.260
k = 4	0.170	0.205	0.224	0.239	0.251	0.262
k = 5	0.166	0.199	0.221	0.237	0.250	0.261
k = 6	0.170	0.201	0.220	0.236	0.248	0.258
k = 7	0.170	0.201	0.221	0.237	0.249	0.259
k = 8	0.169	0.199	0.219	0.235	0.248	0.259
k = 9	0.170	0.200	0.221	0.237	0.250	0.260
k = 10	0.169	0.200	0.221	0.236	0.249	0.259

The bold values represent the lowest RMSE for each time interval.

Table 6. Ablation protocol.

Method	SSNN	DWT	Feedforward Module
Baseline	−	−	−
Comparison method 1 (CM1)	✓	−	−
Comparison method 2 (CM2)	−	✓	−
Comparison method 3 (CM3)	−	−	✓
Comparison method 4 (CM4)	✓	✓	−
Comparison method 5 (CM5)	✓	−	✓
Comparison method 6 (CM6)	−	✓	✓
Proposed Method	✓	✓	✓

Table 7.

R^{2}

for ablation experiments.

Table 7.

R^{2}

for ablation experiments.

Times	Baseline	CM1	CM2	CM3	CM4	CM5	CM6	Proposed Method
1 h	0.817	0.861	0.827	0.831	0.864	0.826	0.769	0.902
2 h	0.790	0.805	0.803	0.810	0.820	0.797	0.819	0.848
3 h	0.625	0.696	0.701	0.699	0.711	0.699	0.703	0.718
4 h	0.713	0.691	0.688	0.693	0.697	0.694	0.687	0.739
5 h	0.689	0.662	0.675	0.675	0.678	0.671	0.670	0.681
6 h	0.645	0.615	0.630	0.623	0.630	0.623	0.623	0.629

The bold values indicate the highest

R^{2}

achieved in each time interval compared to the baseline and other comparison methods (CM1 through CM6).

Table 8. MAE for ablation experiments.

Times	Baseline	CM1	CM2	CM3	CM4	CM5	CM6	Proposed Method
1 h	0.128	0.127	0.126	0.127	0.127	0.125	0.129	0.123
2 h	0.155	0.153	0.153	0.153	0.155	0.153	0.154	0.152
3 h	0.174	0.172	0.172	0.172	0.173	0.172	0.172	0.172
4 h	0.189	0.188	0.188	0.188	0.189	0.188	0.189	0.187
5 h	0.203	0.202	0.202	0.203	0.202	0.202	0.204	0.201
6 h	0.213	0.214	0.214	0.216	0.214	0.214	0.217	0.213

The bold values indicate the lowest MAE achieved in each time interval compared to the baseline and other comparison methods (CM1 through CM6).

Table 9. RMSE for ablation experiments.

Times	Baseline	CM1	CM2	CM3	CM4	CM5	CM6	Proposed Method
1 h	0.172	0.170	0.168	0.170	0.169	0.168	0.171	0.166
2 h	0.203	0.201	0.200	0.201	0.201	0.200	0.201	0.199
3 h	0.224	0.221	0.221	0.221	0.222	0.221	0.221	0.221
4 h	0.240	0.237	0.237	0.237	0.238	0.237	0.236	0.237
5 h	0.253	0.249	0.251	0.251	0.251	0.250	0.250	0.250
6 h	0.263	0.259	0.261	0.260	0.261	0.260	0.260	0.261

The bold values indicate the lowest RMSE achieved in each time interval compared to the baseline and other comparison methods (CM1 through CM6).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, Y.; Yin, J.; Hou, X. Short-Term Wind Power Prediction Based on Multi-Feature Domain Learning. Energies 2024, 17, 3313. https://doi.org/10.3390/en17133313

AMA Style

Xue Y, Yin J, Hou X. Short-Term Wind Power Prediction Based on Multi-Feature Domain Learning. Energies. 2024; 17(13):3313. https://doi.org/10.3390/en17133313

Chicago/Turabian Style

Xue, Yanan, Jinliang Yin, and Xinhao Hou. 2024. "Short-Term Wind Power Prediction Based on Multi-Feature Domain Learning" Energies 17, no. 13: 3313. https://doi.org/10.3390/en17133313

APA Style

Xue, Y., Yin, J., & Hou, X. (2024). Short-Term Wind Power Prediction Based on Multi-Feature Domain Learning. Energies, 17(13), 3313. https://doi.org/10.3390/en17133313

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Wind Power Prediction Based on Multi-Feature Domain Learning

Abstract

1. Introduction

2. Data Processing

2.1. Feature Selection

2.2. Data Preprocessing

3. Methods and Principles

3.1. Spatial Similarity Nearest Neighbor Algorithm

3.2. Wavelet Transform-Based Frequency Domain Feature Extraction

3.3. Relevant Models

3.3.1. Causal Convolutional Neural Network

3.3.2. Deep-Learning-Based GRU

3.4. Wind Power Prediction Model with Multi-Feature Domain Learning

4. Experiments

4.1. Evaluation Indicators

4.2. Relevant Work

4.3. Comparison with Traditional Methods

4.4. K Tuning Experiment

4.5. Effects of Mother-Wavelet Selection

4.6. Weighted Design Experiment

4.7. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI