1. Introduction
The world is facing challenges such as climate change, resource shortages, and environmental pollution. Gradually replacing traditional fossil fuel power generation with renewable energy has become a development trend [
1], and sustainable development has become the guiding principle of the global energy transformation [
2]. In this transformation, wind energy, as a clean and renewable energy source, plays a vital role. The popularization and utilization of wind energy can help reduce environmental pollution and carbon emissions and promote sustainable development. With the advancement of technology and industry, countries are actively developing and utilizing wind energy and integrating wind power into the power grid to promote the development of clean energy. However, the volatility and stochastic nature of wind power pose significant challenges for the power system when it is incorporated into the grid. Therefore, accurate short-term wind power prediction is particularly important. Such predictions not only help improve grid dispatching efficiency and reduce operating costs but also enhance the security, reliability, and controllability of the system [
3,
4,
5,
6].
According to the time scale, researchers categorize wind power prediction into ultra-short-term prediction (less than 30 min), short-term prediction (30 min to 6 h), medium-term prediction (6 h to 1 day), and long-term prediction (1 day to 7 days). These types of predictions play different roles in the actual operation of the power system. Short-term prediction not only affects real-time dispatch but also has an important impact on power generation plans. This is because it can guide the formulation of power generation plans several hours in advance, enabling more efficient utilization of generation resources. Accurate short-term forecasts ensure stable grid operation and rational allocation of generation resources, which is of profound significance for achieving efficient operation and reliable power supply for the power system. Therefore, accurate short-term forecasting has become the focus of researchers’ attention in recent years.
In recent decades, there have been three main categories of wind power prediction (WPP) methods: physical methods, statistical methods, and artificial intelligence methods.
The physical approach describes the physical relationship between weather conditions, topography, wind speed, and wind turbine power, using the resulting numerical weather prediction (NWP) as input to the wind power prediction model without the need for historical wind power data. Statistical models that analyze the time series of historical data can directly describe the link between wind speed and wind power generation as predicted by the NWP without taking into account the physical characteristics of the generation system. For example, Christos Stathopoulos et al. explored the problem of wind power prediction through numerical and statistical prediction models and verified that accurate wind power prediction can be achieved under the condition of reliable local environmental data [
7]. Michael Milligan et al. developed a class of autoregressive moving average (ARMA) models applied to wind speed and wind output [
8]. Xiaosheng Peng et al. proposed a data mining-based regional power prediction method to optimize the input parameters, which is highly superior to traditional prediction methods [
9]. P. Lakshmi Deepak proposed an improved linear regression algorithm that overcomes the limitations of ridge regression, achieving better wind power forecasting [
10].
The AI method makes future wind power predictions by learning the relationship between past weather conditions and the power output generated from past time series. Unlike statistical methods based on explicit statistical analysis, the AI method excels in characterizing the nonlinear and highly complex relationships between the input data (NWP forecasts and output power). Thus, it can achieve better prediction results in scenarios involving short-term wind power prediction. For example, Jianwu Zeng et al. proposed a short-term WPP model based on support vector machines to predict the wind speed. Then they utilize the power–wind speed characteristics of the wind turbine generator to predict wind power, which provides better prediction accuracy for both ultra-short-term and short-term WPPs [
11]. Guoqing An et al. proposed an Adaboost algorithm combined with a particle swarm optimization extreme learning machine (PSO-ELM) in conjunction with a wind power prediction model [
12]. Bowen Zhou et al. studied weather forecast data as one of the inputs to the long short-term memory (LSTM) network model to realize the on-site prediction of wind farm power [
13]. Jie Hao et al. proposed an improved random forest short-term prediction model based on hierarchical output power, which adopts Poisson resampling instead of random forest’s bootstrap to improve the training speed of the random forest algorithm [
14]. Weisi Deng et al. proposed a short-term WPP method for windy weather based on wind speed interval segmentation and TimeGAN, which improves the accuracy of short-term wind power prediction [
15]. Md Alamgir Hossain et al. developed a prediction modeling framework consisting of LSTM, complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and monarch butterfly optimization (MBO) algorithms, which has low computation time and satisfactory performance [
16]. Wei Fan and colleagues propose an A-GRU-S2S model based on the sequence-to-sequence GRU architecture, which eliminates the model’s dependency on temporal distance information, thereby effectively predicting ultra-short-term power output of wind farms [
17]. Jiaqiu Hu and colleagues have introduced a forecasting model based on Neural Prophet, which enhances the accuracy of wind power prediction and the integration capability of renewable energy during cold wave conditions [
18]. Nanyang Zhu et al. proposed GGNet, a granularity-based GNN with better performance than the state-of-the-art (SoTA) method [
19]. The above studies have shown that prediction models based on artificial intelligence methods can achieve high-resolution wind power prediction. However, most of them still suffer from several drawbacks, such as:
Most of them rely on the relationship between wind speed and wind power to build the prediction models, and lack consideration for additional features such as time features, spatial features, and so on.
There is a lack of consideration of individual turbines when measuring wind farm power.
In summary, a short-term wind power prediction model based on multi-feature domain learning (MFDnet) is proposed in this paper. The method leverages the complementary characteristics of temporal, spatial correlation, and frequency domain information, integrates wind speed latitude and longitude data from different turbines, along with time information, utilizes the high-pass filtering characteristics of wavelet transform to capture the high-speed changing components of the signal, and improves the model’s sensitivity to the short-term signal transformation. Additionally, a similarity matrix is introduced to design a spatial similarity nearest neighbor algorithm (SSNN) to reduce the dependence of wind power prediction on wind speed. The main contributions of this paper are:
The idea of integrating multiple feature domains is introduced into wind power prediction, which improves the prediction accuracy through the complementary characteristics of temporal, spatial and frequency domains.
A SSNN is proposed to obtain correlation information between multiple turbines using historical latitude and longitude information and historical wind speed information from different turbines, thus reducing the uncertainty transfer caused by previous wind speed dependent forecasts.
In the selected dataset, we propose a new wind power forecasting model that performs individual wind power predictions for each turbine in a wind farm and then combines the predictions for an overall wind farm forecast. Compared to other competing algorithms, it demonstrates superior performance. Specifically, the model outperforms other wind farm power prediction algorithms in overall performance and reduces the MAE by 25.5% and RMSE by 20.6% when predicting the wind power for the next 1 h compared to the baseline persistence model.
The rest of the paper is structured as follows:
Section 2 focuses on the dataset and the specific methods of data processing.
Section 3 provides a detailed description of the proposed algorithmic model.
Section 4 focuses on experimental validation. Finally,
Section 5 summarizes the entire paper.
2. Data Processing
The dataset utilized in this paper is sourced from [
20], originating from a flat terrain inland wind farm located in the United States. It encompasses hourly wind speeds and wind power data for 200 turbines randomly selected over the period of 2010 to 2011. The dataset covers the time period from 9 January 2010 to 31 August 2011, with a temporal resolution of 60 min. The overall ratio of anomalous and missing data is less than 2%. The statistical summary of the dataset is presented in
Table 1, which includes measures such as mean, standard deviation (Std), minimum value (Min), and maximum value (Max).
Additionally, the dataset incorporates hourly wind speed and direction measurements from three meteorological masts. Notably, it includes the relative coordinates (latitude and longitude) of the 200 turbines. It is important to note that due to the confidentiality surrounding the exact location of the wind farm, the provided dataset contains relative positions of the real data with an added constant. However, the layout remains consistent with the actual arrangement. Therefore, this dataset has no effect on the prediction accuracy of the algorithm proposed in this paper [
3].
2.1. Feature Selection
Feature selection plays a pivotal role in enhancing the accuracy and reliability of wind power prediction models. In this section, we conduct various correlation analyses to uncover the inherent relationships within the data in our dataset, laying the groundwork for our feature selection process. Given the paramount importance of meteorological factors in wind power prediction, our initial focus is on examining the correlation between wind speed and wind power.
To quantify this correlation, we employ the Pearson correlation coefficient, a widely-used statistical metric for assessing the strength and direction of linear relationships between two variables. Ranging from −1 to 1, the Pearson correlation coefficient provides insights into how changes in one variable correspond to changes in another. A positive correlation coefficient indicates a positive relationship, where an increase in one variable corresponds to an increase in the other. Conversely, a negative correlation coefficient signifies an inverse relationship, with an increase in one variable associated with a decrease in the other. A correlation coefficient near 0 suggests little to no linear relationship between the variables.
The formula for calculating the Pearson correlation coefficient is as follows:
where,
,
are the wind power and wind speed at the
moment,
and
denote the mean value of wind speed and wind power, respectively, and
n is the length of time (h).
Specifically, we initially screened numbers from 1 to 200 with an incremental factor of 10. This resulted in a series of turbines numbered 1, 10, 20, 30,..., 190, and 200, totaling 21 turbines. The wind power and wind speed of these 21 turbines were tested for Pearson similarity at the same moment. The final results are presented in
Figure 1. It can be observed that the correlation coefficients between different turbines consistently range from 0.90 to 1.00. This indicates a strong correlation between wind speed and wind power in this dataset, justifying the consideration of wind speed as a correlation factor for wind power prediction.
Additionally, it is essential to assess the temporal autocorrelation of the wind power and the wind speed series.
Figure 2 illustrates the Pearson autocorrelation coefficients of the wind power series with different lags.
Figure 3 illustrates the Pearson autocorrelation coefficients of the wind power series with different lags.
It can be seen that the autocorrelation coefficient decreases sharply at the first few lags, and when the lag is not more than 6 h, the autocorrelation coefficient is greater than 0.3. Therefore, this indicates a stronger autocorrelation among the wind data during these early time points, as well as wind speed data, and we can use the wind speed as the characterization input in the prediction of short-term wind power.
Based on the aforementioned analysis, wind speed, time, latitude, and longitude have been selected as inputs for the prediction task in this study. The wind speeds represent a key factor for modeling wind power prediction, while the latter two introduce the concepts of temporal and spatial features, respectively.
In this paper, the timestamp is formatted in reserved hours, enabling its utilization as a feature input for the model. As demonstrated earlier, wind speed and wind power data tend to exhibit temporal correlation; that is, wind speed and wind power at the current time may be correlated with data from previous time periods. By incorporating time as a feature into the model, this temporal correlation can be more effectively considered, thereby enhancing the prediction accuracy of the model.
Furthermore, this study utilizes latitude and longitude as spatial feature inputs, enabling the model to capture the correlations among turbine spatial nodes [
21].
Therefore, in the subsequent chapter, we propose a novel approach to address this issue.
2.2. Data Preprocessing
In order to ensure the quality of the raw data and the accuracy of the prediction, the input wind power data and wind speed data are firstly processed with missing values, and the linear interpolation method is used in this paper, whose formula can be expressed as:
where
,
are timestamps and
,
are the corresponding wind power or wind speed magnitudes.
Additionally, to mitigate the impact of differing scales and distributions of features [
22], this paper employs Min-Max normalization to scale the wind power and wind speed to a range between 0 and 1, which is shown in Equation (
3). The outcome of this process is standardized wind energy data, which is used to enhance the processing efficiency of the prediction model.
where
is the minimum value of the data,
is the maximum value of the data, and
is the normalized data.
4. Experiments
4.1. Evaluation Indicators
In this paper, the coefficient of determination (
), mean absolute error (MAE), and root mean square error (RMSE) are employed as evaluation metrics for wind power forecasts.
is a statistical measure used to assess the goodness of fit of a regression model. MAE assesses the average prediction error, while RMSE accentuates larger errors by squaring these deviations. The calculation of these metrics is outlined below:
where
n is the number of sampling points,
is the predicted value, and
is the true power.
4.2. Relevant Work
The experiments in this paper are conducted utilizing the following hardware and software setup: NVIDIA GeForce RTX 3060 Laptop GPU, AMD Ryzen 7 5800H with a clock speed of 3.2 GHz, and PyCharm 2021 as the software environment. The paper proposes a short-term wind power prediction model based on multi-feature domain learning. Parameters for the model include a dropout size of 0.1 for the causal convolution, a convolution kernel size of 3, and the selection of Haar as the mother wavelet type for DWT. The experiment settings include 300 iterations, a learning rate of 0.001, optimization using the Adam optimizer, and setting the nearest neighbor k to 5, which specifies the number of turbines in each turbine ensemble to be 5.
Unlike other wind power prediction tasks that are based on a single wind farm, the dataset used in our experiments contains 200 different turbines. Considering that evaluating the combined prediction error of these 200 turbines as the overall prediction deviation for the wind farm might lead to compensating errors, we average the prediction errors of the 200 turbines to assess the accuracy of the wind farm’s power prediction.
4.3. Comparison with Traditional Methods
In this paper, we highlight the superiority of our proposed model by comparing it with a diverse range of classical models and state-of-the-art deep learning-based approaches. These comparisons encompass fundamental deep learning models such as multilayer perceptron (MLP) [
32], RNN [
33], GRU [
34], LSTM [
24], and LSTM-LSTM [
35], alongside deep learning models specifically tailored for wind power prediction, such as DST [
23] and STAN [
36]. In addition to this, we have chosen the standard benchmark persistence model (denoted as PM in the
Table 2) as the benchmark for the full text comparison. We used the coefficient of determination (
), mean absolute error (MAE) and root mean square error (RMSE) as evaluation metrics. To enhance evaluation accuracy, all models underwent three rounds of testing within the same environment, and the results were averaged. The outcomes of these experiments are presented in
Table 2, where the highest
and lowest MAE or RMSE for each hour are highlighted for clarity.
The experimental results indicate that the RMSE of the RNN is lower than that of our proposed model at the 1 h forecasting horizon. Moreover, our proposed method demonstrates smaller MAE and RMSE at most time intervals, substantiating the superior predictive capability of our model for wind power forecasting. In order to show the evaluated values of the different forecasting models more clearly at different time scales, bar charts have been employed for visualization, as shown in
Figure 8,
Figure 9 and
Figure 10.
In
Figure 8,
Figure 9 and
Figure 10, a visual comparison of each model’s performance at different time points is presented. It is evident that the curve representing our model exhibits significant enhancement over other methods from 1 to 3 h. However, as time progresses, the superiority of our proposed model diminishes compared to other models. This phenomenon can be attributed to the model’s selection of the high-frequency component of the original signal during processing of frequency-domain information with GRU in the encoder. Typically, this component contains rapid variations or detailed information, corresponding to sudden changes in wind power characteristics. Moreover, high-frequency information also includes noise and other interference, posing challenges for the model in long-term feature extraction of wind power. Nevertheless, it is noteworthy that our model still demonstrates improvements compared to other methods, indicating its superior noise resistance and generalization capability. In
Figure 11, the forecast performance of a wind farm comprising 200 turbines is shown, indicating that the actual results are closely aligned with the model’s predictions.
In summary, we attribute the superior performance of our method in short-term wind power prediction tasks to the following possible factors:
The Seq2Seq framework captures deeper temporal dependencies than simple prediction methods such as LSTM or GRU.
The introduction of DWT to extract frequency domain information improves the utilization of information by fusing the idea of spatio-temporal and frequency domain features.
The use of the embedding method, which introduces individual turbine identities as features, enables the model to learn the differences between turbines and predict based on these differences, leading to a more accurate prediction of future power generation.
4.4. K Tuning Experiment
This section delves into examining the impact of the value of k on the accuracy of the prediction task within the SSNN.
Figure 12 shows the autocorrelation of wind speeds for 20 turbines at lag times of 1 to 4 h, revealing strong correlations over short periods. This means that the wind resources owned by these turbines are similar and the number of turbine groups should be controlled to a small value. And since the autocorrelation coefficients of the turbines are all close to each other, this indicates that the number of turbines constituting the turbine group should be approximately the same. Therefore, the number of turbine groups in this paper is controlled to be the same constant k. To effectively verify the influence of the value of k on the accuracy of the final prediction task, this paper has conducted the following test, setting the range of k to be a positive integer interval from 1 to 10, with all other conditions of the test remaining the same. The final result is the average result after three trials as shown in
Table 3,
Table 4 and
Table 5.
is closest to 1, and both MAE and RMSE predictions are minimized when , indicating optimal performance. Moreover, there is a discernible decrease in prediction accuracy as k deviates from this optimal value . This outcome can be attributed to the methodology employed in this paper, where turbine groups derived from the k-nearest neighbors serve as feature inputs for model training. During this training process, the turbine group data act as mutual constraints, ensuring that the wind power of a turbine group remains similar. When k is small, these turbines fail to exert significant constraints, resulting in poorer predictions. Conversely, when k exceeds the optimal value, a larger number of unrelated turbines are included, leading to misjudgments by the model and subsequent declines in prediction accuracy.
4.5. Effects of Mother-Wavelet Selection
This paper presents an algorithm utilizing the Haar wavelet basis functions within the wavelet transform to extract high- and low-frequency components. Actually, the DWT includes various wavelet basis functions, such as the Daubechies (db) wavelet, Symlet (sym) wavelet, and Coiflet (coif) wavelet. The selection of different mother wavelets can influence the processing of time series data to some extent [
37]. To delve deeper into the implications of mother wavelet selection, we conducted a case study involving different types of wavelet basis functions. These functions can be represented in a general form:
where
are the filter coefficients and
is the wavelet basis function; the optional
N determines its specific properties and support length.
In this paper, we selected commonly used wavelet basis functions, namely sym2, Haar, db2, and coif1. It is noteworthy that each of these wavelets possesses distinct advantages. For instance, the Daubechies wavelet boasts tight support and orthogonal properties, rendering it proficient in accurately capturing both short-term and long-term features within the signal. Symlet is characterized by greater symmetry compared to other wavelets, exhibiting minimal asymmetry and the highest number of vanishing moments for a given compact support, thus effectively handling asymmetric signals [
38]. Coiflet wavelets are particularly well-suited for analyzing transient, time-varying signals, making them ideal for processing non-smooth signals and signals with pronounced singularities [
39].
To assess the impact of various mother wavelets on the prediction outcomes, three alternative wavelet basis functions were employed in lieu of the Haar wavelet for experimentation. Subsequently, the prediction performance metrics of
, MAE and RMSE are evaluated and compared. The results are illustrated in
Figure 13,
Figure 14,
Figure 15,
Figure 16,
Figure 17,
Figure 18,
Figure 19,
Figure 20,
Figure 21,
Figure 22,
Figure 23 and
Figure 24.
The results indicate that the MAE and RMSE values obtained using the Haar-based DWT are consistently lower compared to those achieved with all other mother wavelets, and the value is higher than those of the other mother wavelets. Therefore, it can be concluded that the widely adopted Haar wavelet basis function is more suitable for wind power prediction tasks in this context.
4.6. Weighted Design Experiment
In
Section 3.2, a weighted summation method is proposed to integrate information from both the frequency and spatio-temporal domains. This weighted approach is designed to regulate the influence of each domain on the final prediction results. As the parameter
increases, high-frequency features in the frequency domain may exert a greater influence on the results. However, this also amplifies the interference from noise present in the high-frequency signals, thereby affecting the prediction more significantly. Conversely, as
decreases, features processed by the encoder tend to align more closely with the long-term trend.
To enhance the rigor of the weighting design, this section conducts experiments with various weighting parameter configurations, with
, MAE and RMSE selected as the evaluation metric.
Figure 25,
Figure 26 and
Figure 27 present the results of these experiments. Nine different weight combinations are compared in this paper, all summing to 1. It is observed that at the 1-hour mark, the combination of
= 0.9 and
= 0.1 demonstrates superior prediction performance compared to other models. Over time, the impact of different weights on the prediction results diminishes. Beyond the 3-hour mark, the disparities among the prediction results under different weight configurations become negligible, as shown in
Figure 26 and
Figure 27.
Consequently, this paper selects the weights = 0.9 and = 0.1. These parameters are deemed optimal for the weighted summation method.
4.7. Ablation Experiment
Based on the aforementioned experimental outcomes, our proposed model demonstrates superior performance on the public dataset. To delve into the validity of SSNN, DWT frequency domain processing, and the feedforward module within our proposed model, we conducted an ablation experiment outlined in
Table 6. It is worth mentioning that we chose the Seq2Seq framework with both encoder and decoder as GRU as the baseline.
The experimental results are presented in
Table 7,
Table 8 and
Table 9. Ablation experiments CM1, CM2, and CM3 demonstrate that our MFDnet outperforms when using individual modules alone. Conversely, experiments CM4, CM5, and CM6 reveal that SSNN performs the best in our model, followed by the feedforward module, and lastly, DWT. Further analysis combined with CM2 results suggests a mismatch issue between DWT and the feedforward module. This mismatch arises from the feedforward module learning noise characteristics. Integrating SSNN into the model helps alleviate this misalignment by enhancing similarity features between turbines, thereby mitigating the noise effects introduced by DWT on the final prediction results to some extent. This is corroborated by the relatively poor performance of CM1 alone.
Comparing the MAE and RMSE results of the baseline and MFDnet, this study concludes the following: MFDnet exhibits significant improvements in wind power prediction compared to the GRU-based Seq2Seq framework. Specifically, MFDnet improves by 10.4%, reduces MAE by 4.1% and RMSE by 3.6% at the 1-hour mark, and improves by 5.5%, reduces MAE by 1.6% and RMSE by 1.7% on average across the 6-h prediction task.
5. Conclusions
In this research, we developed a short-term wind power forecasting model that intergrates multiple feature domains aimed at enhancing the accuracy of short-term wind power predictions. The crux of the model involves utilizing data from wind turbines within a wind farm to predict the wind farm’s power output. At the input stage, the model captures inter-turbine correlation features along with spatial and temporal features through a spatial similarity-based nearest neighbor algorithm, which forms the input to the model. By introducing a strategy of multi-feature domain fusion, the spatial, temporal, and frequency domain features are designed to complement each other. On the feature extraction end, a feedforward module captures nonlinear relationships, enabling the model to improve its adaptability in wind power forecasting by analyzing both long-term and short-term dependencies in sequential data. Comprehensive testing on an open wind power dataset has shown that the MFDnet model possesses significant efficacy, outperforming other advanced models such as STAN and DST in two distinct evaluation metrics for short-term wind power forecasting tasks. Additionally, compared to the persistence model, MFDnet achieved an average reduction of 33.7% in MAE and 22.3% in RMSE, while improves by 4.8%.
Future work can be summarized in two main directions. Firstly, under the condition of obtaining more data, to expand the prediction from a single wind farm to different wind farms’ power forecasts, thereby increasing the model’s generalizability. Secondly, to further consider the trend and fluctuation details of wind power predictions, refining the proposed forecasting method to improve accuracy and achieve ultra-short-term wind power prediction.