All experiments were realized based on the programming language Python 3.8, MATLAB 2018a, and the deep learning framework Pytorch torch 1.8 on a configuration of an AMD Ryzen 7 7735H 3.20 GHz CPU and NVIDIA 4060 8G GPU.
4.4.1. Comparative Experimental Analysis of Optimization Algorithms
To evaluate the effectiveness of the proposed enhancement to the algorithm, the northern goshawk optimization algorithm (NGO), gray wolf optimization algorithm (GWO), whale optimization algorithm (WOA), DBO, and MSDBO algorithms were used to optimize the fitness curves, including optimal values, means, and standard deviations of the ICEEMDAN, as detailed in
Table 2. For a fair and efficient comparison, each algorithm was set up with 8 individuals, a maximum of 10 iterations, an upper bound of [0.1,30], a lower bound of [0.3,60], and a dimension of 2. The sample entropy of the final IMFN component, derived from the wind power decomposition via ICEEMDAN, served as the fitness function (with the residual term defined as the IMFN component). The maximum number of iterations for ICEEMDAN was set at 500, and the NSTD and NE parameter combinations were explored.
Figure 8 illustrates the optimization iteration process for the five algorithms.
The comparative analysis presented in
Figure 8 indicates that NGO initially achieves the highest fitness value, suggesting its global search capability is limited and prone to local optima. Conversely, GWO starts with a lower fitness value but shows less convergence performance compared to WOA and DBO as iterations continue. Among the various convergence curves, MSDBO stands out with the fastest convergence rate and the highest optimization accuracy. Additionally, a closer look at the data in
Table 2 reveals that MSDBO significantly outperforms WOA and DBO in terms of optimum, mean, and standard deviation, attributed to the SPM chaotic mapping’s ability to quickly identify the global optimal solution. Through its iterations, MSDBO efficiently locates the lowest fitness value, thanks to the incorporation of the Levy flight strategy and adaptive t-distribution variation, which enhance its global search and convergence capabilities. In summary, with its unique design and innovative algorithm strategy, MSDBO achieves high convergence accuracy while maintaining convergence speed, and sufficiently balances global and local search capabilities. In addition, the fitness function of various optimization algorithms chooses the sample entropy of ICEEMDAN decomposition to the final IMF component, which is the optimization of parameter combination in noisy data or uncertain environment. Combined with the above results, it proves that MSDBO algorithm also shows strong anti-noise ability and adaptability.
4.4.2. Experimental Analysis of Quadratic Decomposition Reconstruction Strategy
The parameters of the ICEEMDAN method are all set artificially, and blindly setting the parameters cannot bring out the best performance of the algorithm [
38]. Since the decomposition effect of the ICEEMDAN method depends on the parameters NSTD and NE, MSDBO is introduced to optimize the search of the parameter combinations of ICEEMDAN to achieve the purpose of adaptive optimal setting of the parameters and to improve the ability of extracting the features by using MSDBO-ICEEMDAN to reject the noise in the wind power. Initialize the MSDBO optimization algorithm parameters, configured with eight individuals. The maximum number of iterations is 10, and the sample entropy of the last IMFN component after ICEEMDAN decomposition of wind power is taken as the fitness function. When the fitness value is smaller, it represents the smaller complexity of decomposition to the last IMFN component, which can better prevent the difficulty of modeling caused by the stacking of the IMF components being too large. Given that the ICEEMDAN parameter combination of NSTD and NE has an upper bound of [0.1, 30] and a lower bound of [0.3, 60], the maximum number of iterations is 500. The final optimal parameter combinations of NSTD and NE are determined through optimization and updating, and the ICEEMDAN parameters are reconfigured to set the NSTD to 0.23 and the NE to 60. This is followed by using ICEEMDAN with the determined optimal parameter combination to model the historical wind power and generate 14 IMF components. The spectrum is shown in
Figure 9, with the first row representing the historical wind power.
Since the components generated by ICEEMDAN have the property that the mean value is approximately 0, the frequency of each component decreases sequentially, and the components are independent of each other, this paper utilizes the t-test to test the 0-mean hypothesis for each IMF component by setting up a two-sided hypothesis: The original hypothesis H0: the mean value of the IMF components = 0. The alternative hypothesis H1: the mean value of the IMF components ≠ 0. Taking the significance level of 0.05, i.e., when the p-value is greater than the selected level of significance, the original hypothesis H0 is accepted; otherwise, the alternative hypothesis H1 is accepted.
In general, when calculating sample entropy, m takes 1 or 2, and it is more common for m to be set to 2. Because this can better capture the dynamic properties in the time series, we set m to 2 here. Common values for r range from 0.1 to 0.25 times the standard deviation of the time series. When the value of r is large, more information is lost because more pairs of vectors are considered similar. However, when the value of r is small, it is not ideal to estimate the statistical properties of the system, because many actually similar vector pairs may be excluded due to small differences, so we set r to 0.2 times the time series standard deviation here.
At the significance level of 0.05, the
t-test is conducted for the difference in means of each IMF component and the respective sample entropy is calculated, as shown in
Table 3 and
Table 4, where it can be seen that at IMF6, the
p-value is less than 0.05 for the first time, which is significant, and at IMF11, the
p-value is less than 0.05 for the second time, which is significant. In the meantime, by the monotonically decreasing result of sample entropy, as shown in
Figure 10, the complexity of the IMF components is monotonically reduced, so the frequency is monotonically reduced. Therefore, these two places are taken as the demarcation points of different frequency band IMF components: IMF1-IMF5 are the high-frequency band IMF components, IMF6-IMF10 are the middle-frequency band components, and IMF11-IMF14 are the low-frequency band components.
The IMF components of the high-frequency band, middle-frequency band, and low-frequency band are obtained from the sample entropy and t-mean test. Given that the middle-frequency band IMF component after one decomposition has moderate complexity and less noise, in order to retain the important feature information of the original signal in the band, it is not subjected to any additional reconstruction or processing but is directly inputted into the model as a feature, which ensures that the model can adequately learn the key information of the frequency band. The considered high-frequency band IMF components still contain high complexity and residual noise components, which are reconstructed by summing to obtain a single high-frequency IMF component, and then the reconstructed high-frequency IMF components are decomposed twice.
The value of K in VMD influences the effectiveness of the decomposition, and it is typically determined based on expert judgment, usually ranging from 5 to 9. In this case, we assess the value of K by calculating the central frequency of the IMF components. The optimal K value is achieved when eventually the central frequency of the IMF component tends to stabilize. As shown in
Table 5, when K is set to 8 and 9, the last IMF component’s central frequency is closest to that of the preceding IMF component, indicating stability. Therefore, we choose K to be 8 to generate a new set of IMF components, and the obtained IMF spectrum is shown in
Figure 11, with the first row as the historical wind power, which aims to further refine the structure of the IMF components, so that the high-frequency features and noise components can be separated on a smaller scale, reducing the interference of noise on the model performance, lowering the modeling difficulty, and improving the accuracy of the prediction. For the IMF components in the lower frequency bands, they are summed and reconstructed to simplify the modelling process and reduce the amount of computation, since their influence on the prediction results is minimal.
To verify the effectiveness of the quadratic decomposition reconstruction strategy, we designed a series of models with different quadratic decomposition reconstruction strategies to compare with ours(our model), and analyzed the five error metrics of MAE, MSE, RMSE, MAPE, and R
2. The statistical results of the error metrics evaluations are shown in
Table 6: model A (without secondary decomposition reconstruction, directly using Nons-DCTransformer to predict different frequency band IMF components after primary decomposition above), model B (without secondary decomposition, both high-frequency band IMF components and low-frequency band IMF components are reconstructed, and the middle-frequency band IMF is not reconstructed; then, Nons-DCTransformer is utilized to predict the above components), model C (the high-frequency band IMF components are reconstructed, middle-frequency band IMF components are not reconstructed, and the low-frequency band IMF is reconstructed; the low-frequency band IMF is reconstructed after primary decomposition), and model D (the high-frequency band and low-frequency band components are both reconstructed and then decomposed twice, the mid-frequency band IMF component is not reconstructed, and the above components are predicted using Nons-DCTransformer).
To compare the prediction efficacy of different models more intuitively and efficiently, we chose radargrams for in-depth analysis. From
Figure 12, we can see that different decomposition and reconstruction methods have a great impact on the prediction performance, and reconstructing the high-frequency band IMF components or the secondary decomposition of the reconstructed high-frequency IMF components can effectively improve the prediction accuracy. In combination with the evaluation metrics, ours reduces the MAE, MSE, RMSE, and MAPE relative to model A by 31.81%, 56.34%, 56.34%, 33.86%, and 27.63%, respectively, while the R
2 improves by 0.92%. The MAE, MSE, RMSE, and MAPE relative to model B decrease by 41.24%, 68.27%, 43.71%, and 32.67%, while the R
2 improves by 1.64%. The low-frequency band IMF component reconstruction or the secondary decomposition of the reconstructed components has a negligible effect on the model performance, and the performance of model B and model C in the metrics of MAE, MSE, RMSE, MAPE, and R
2 is almost comparable with a negligible difference. Meanwhile, ours also presents extremely close results in these metrics compared to model D. However, ours still dominates and leads in these similar metrics. In summary, to improve the model performance and to reduce the experimental procedure, we selected the secondary decomposition of the reconstructed high-frequency IMF components and the reconstruction of the low-frequency band IMF components as our final modeling strategy.
4.4.3. Experimental Analysis of Ablation
In this paper, ablation experiments were designed to evaluate the performance gains included in the testing of each module in ours, comparing this model with model 1 (Transformer), model 2 (Nons-Transformer), model 3 (Nons-DCTransformer), model 4 (ICEEMDAN- Nons- DCTransformer), and model 5 (MSDBO-ICEEMDAN-Nons-DCTransformer); we also compared the models constructed with different input features, and the results of the evaluation metrics are shown in
Table 7 and
Figure 13a,b.
Table 7 clearly shows that the benchmark model 1 has the highest MAE. In contrast, our model demonstrates reductions in MAE, MSE, and RMSE by 66.43%, 88.09%, and 65.49%, respectively, while R
2 increases by 5.87%. Additionally, when observing the fitting figure of the prediction curve alongside the actual curve in
Figure 14, it is evident that our model is more adept at capturing the intricate relationships within the data, leading to enhanced overall prediction performance compared to the benchmark model.
In the ablation experiments, we set the learning rate of the ablation experimental model to 0.0001, the batch size to 32, the hidden layer size to 512, the dropout to 0.1, the output size to 8, the model dimensionality to 512, the encoder layer to 2, the decoder layer to 1, the number of training rounds to 10, and the mapping layer dimensions unique to the Nons-Transformer architecture to 256, 256; the number of layers is 2.
A comparison between model 1 and model 2 reveals that the decomposition of MAE, MSE, and RMSE is reduced by 17.27%, 9.88%, and 5.11%, respectively, while R2 is improved by 0.75%. The substitution of the attentional mechanism in the Transformer with the Des attentional mechanism effectively alleviates the excessive stable problem caused by Transformer and its variants on the direct stability of historical wind power data, thereby improving the prediction performance.
When comparing model 2 and model 3, the incorporation of dilation causal convolution within the encoder not only prevents information leakage but also introduces a dilation factor that enhances the model’s sensitivity to local features across varying time scales, resulting in improved accuracy. This is evidenced by reductions in MAE, MSE, and RMSE by 10.26%, 9.54%, and 4.88%, respectively, alongside a 0.53% increase in R2.
The comparison between model 3 and model 4 demonstrates that the inclusion of ICEEMDAN in the predictive model significantly decreases prediction errors, with reductions of 26.38%, 60.00%, and 36.60% for MAE, MSE, and RMSE, respectively, and an improvement in R2 by 3.16%. This enhancement is attributed to ICEEMDAN’s capability to decompose historical wind data into IMF components across various frequency ranges, thereby enabling the predictive model to predict wind power at multiple frequency scales, which ultimately increases accuracy and reduces error.
Comparison between model 4 and model 5 shows that the MSDBO optimized ICEEMDAN has some performance improvement compared to the direct decomposition of historical wind power prediction without optimization, with the MAE, MSE, and RMSE reduced by 6.78%, 8.16%, and 4.16%, respectively, and the R2 improved by 0.21%. This shows that the choice of NSTD and NE combinations in ICEEMDAN has some effectiveness and emphasizes the importance of ICEEMDAN parameter optimization.
By comparing model 5 with ours, the reconstructed low-frequency band IMF component reduces the experimental process, and the secondary decomposition of the reconstructed high-frequency IMF component using VMD further removes the noise and reduces the modeling difficulty, which in turn reduces the prediction errors such as by MAE, MSE, and RMSE by 23.46%, 43.99%, and 25.18%, respectively, and R2 improved by 0.82%, which effectively improves the model performance.
By comparing ours constructed based on inputs of seven features versus six features, we find that input features with low correlation lead to model redundancy; selecting key input features using Spearman and Kendall correlation coefficients reduces the prediction error; the MAE, MSE, and RMSE are reduced by 13.92%, 13.92%, and 16.00%, respectively; and the R2 improves by 0.30%.
In summary, Nons-Transformer effectively solves the excessive stable problem affecting model performance due to the “direct stability” design of the Transformer and its variants; the inclusion of dilated causal convolution in the Nons-Transformer encoder improves the perception of local features; ICEEMDAN reduces the noise, effectively reduces the complexity of the WT historical data, and enhances the readability of the data; the proposed MSDBO method not only makes full use of the theoretical basis of the ICEEMDAN process, but also significantly improves the decomposition effect. The Spearman and Surel correlation coefficients screen out the most critical input features, thus reducing redundancy and improving prediction accuracy. In addition, considering that the IMF high-frequency band component still retains participation noise with high complexity and modelling difficulty, and that the low-frequency band component does not affect the prediction effect, the high-frequency and low-frequency bands are reconstructed by the sample entropy and t-mean test and the high-frequency component is decomposed secondarily, which effectively improves the prediction accuracy and reduces the experimental process. The analysis of the ablation experiments shows that ours has performance gains in each module, and these modules work together for the ultra-short-term wind power prediction task to significantly improve the prediction accuracy and robustness.
4.4.4. Comparative Experimental Analysis
To assess the validity and stability of ours, we designed twelve comparative experiments and the fitting diagrams are shown in
Figure 15a,b including single model a (BP), model b (CNN), model c (LSTM), model d (GRU), model e (BiLSTM), and model f (TCN), and combined model g (CNN-LSTM), model h (EMD-Nons-DCTransformer), model i (VMD-Nons-DCTransformer), model j (WOA-ICEEMDAN-VMD-Nons-DCTransformer), model k (DBO-ICEEMDAN-VMD-Nons-DCTransformer), and model l (MSDBO-ICEEMDAN-EMD-Nons-DCTransformer).
Table 8 details the results of each evaluation metric, and
Figure 16 demonstrates the MAE metrics of different models, providing us with a rich basis for comparative analysis.
During the comparison, it was observed that model a exhibited relatively weak predictive capabilities, which can be attributed to its limitations in feature extraction inherent to deep neural network architectures, thereby restricting its applicability in wind power prediction. Although CNNs demonstrate strong performance in image processing, their effectiveness in time-series prediction is not as pronounced. Commonly utilized models in the time-series prediction domain, such as LSTM, BiLSTM, GRU, and TCN, may also fall short of anticipated predictive outcomes due to insufficient sensitivity to local features. Furthermore, model d outperformed model h and i in terms of decomposition methodology, with reductions in MAE, MSE, and RMSE of 21.97%, 47.92%, and 27.86% when compared to model h, and 18.38%, 25.75%, and 13.84% when compared to model i, respectively. Additionally, the R2 values for model d improved by 1.77% and 0.62%, respectively. Ours also demonstrated a significant advantage over the EMD quadratic decomposition algorithm utilized in model l, as evidenced by substantial reductions in MAE, MSE, and RMSE, alongside notable improvements in R2.
Specifically, ours achieved reductions in the evaluation metrics MAE, MSE, and RMSE of 20.82%, 33.89%, and 18.69% compared to model j, and 10.88%, 18.80%, and 9.90% compared to model k, while enhancing the R2 metrics by 0.40% and 0.20%, respectively. This marked advantage is primarily due to the MSDBO-ICEEMDAN decomposition algorithm’s exceptional capability for pattern separation, which is facilitated by the precise optimization of NSTD and NE parameter combinations, effectively mitigating noise and enhancing the quality of the decomposition results. Moreover, the incorporation of dilated causal convolution and a Des attention mechanism further augments the model’s proficiency in local feature extraction and addresses the problem where the sequence is excessively stable, which affects model performance. In summary, our proposed Nons-DCTransformer ultrashort-term wind power prediction model based on the secondary decomposition reconstruction strategy successfully integrates the advantages of MSDBO-ICEEMDAN primary decomposition, secondary decomposition reconstruction strategy, and a Nons-DCTransformer model. This integration strategy enables the model to more accurately capture the short-term and long-term dependencies of wind power sequence, which significantly improves the prediction accuracy. This result not only verifies the validity and stability of the model, but also provides new research ideas and methods in the field of wind power prediction.
4.4.5. Experimental Analysis of Wind Power for Some of More Stable and Sharply Fluctuating Wind Power
In wind farms, due to meteorological instability and other factors, there are dramatic changes in wind power, leading to sharp rises and falls. At this time, accurately predicting wind power is one of the problems that need to be solved. The selection of the historical wind power data of sharp rise and fall in wind farm to reflect the proposed combination model for generalization to different wind power situations, which includes a total of 200 test data points outside the training set, as shown in
Figure 17, with a time interval of 15 min. Additionally, part of the more stable historical wind power data in wind farm is selected, which also includes a total of 200 test data points outside the training set, as shown in
Figure 18, with a time interval of 15 min. The models are compared through experimental analysis of the two different historical power data and the same moment of the meteorological data prediction to verify the robustness of a variety of models. The power prediction fitting diagrams of the two stages are shown in
Figure 19a,b and
Figure 20a,b, respectively, and the evaluation metrics are shown in
Table 9 and
Table 10.
(1) Dramatically fluctuating wind power sequence
By observing
Figure 19a,b, it can be clearly seen that under the scenario of wind power experiencing drastic and sharp rise and fall, for example, when the time series data are located in the interval from 0 to 150, ours demonstrates a very high degree of fit, and its prediction curves almost overlap with the historical wind power curves, which closely fits the actual trend of change.
Further, when the historical wind power is in the critical phase of sharp rise and fall, especially in the range of time series data from 150 to 200, ours still maintains the excellent performance. Compared to other comparative models, it not only captures the sharp changes in wind power more accurately, but also demonstrates significant advantages in prediction accuracy and stability.
To summarize, ours shows remarkable predictive ability and robustness in dealing with the challenges of complex and variable wind power with sharp rise and fall, which provides a strong support for the accurate prediction of wind power.
(2) Relatively stable wind power sequence
An examination of
Figure 20a,b reveals that, in scenarios characterized by relatively stable wind power, ours exhibits exceptional accuracy, with its prediction curves closely aligning with the actual wind power output curves, thereby indicating a high degree of fit.
In this specific dataset for prediction, ours outperforms other comparative models in terms of wind power prediction capabilities. It not only significantly surpasses them in prediction accuracy but also demonstrates a consistent level of stability and reliability throughout the prediction process, thereby ensuring precise wind power predictions.
In conclusion, ours also shows commendable performance in scenarios involving more stable wind power. Its accurate predictive capabilities and high stability offer substantial support for the reliable operation and effective management of the wind power sector.