4.2. Data Description
The present study employs data obtained from a Longyuan wind farm, which was collected during the period spanning from October to December of 2024. The data includes the power output, measured at 15 min intervals, with 96 sampling points per day. The dataset is divided into a training set and a testing set, with a ratio of 8:2. The primary objective of the study is to forecast wind power production, focusing on the short term (one-step ahead). The model predicts the next 15 min interval using historical data. This is formulated as a regression problem where the features consist of the previous 24 h window of wind power and meteorological data, and the target output is the wind power at the subsequent 15 min timestep. The initial wind power data collection is shown in
Figure 7.
Figure 8 illustrates the feature-label mapping and the temporal relationship.
Figure 9 depicts the sliding window approach, which generates successive training samples. The training and testing datasets are split chronologically, ensuring that the training data comes before the test data. This prevents future information from leaking into the training phase, crucial for time series forecasting.
Figure 7 displays the raw wind power time series data collected from the Longyuan Wind Farm. The horizontal axis represents time (sampled at 15 min intervals), while the vertical axis shows the corresponding power output values (in kW or MW). The figure clearly illustrates the pronounced non-stationarity, volatility, and intermittency of wind power output, attributable to the random nature of meteorological factors such as wind speed and direction. This graph vividly reflects the characteristic properties of wind power data, highlighting the challenges of forecasting directly from the raw sequence. Consequently, it underscores the necessity of employing Variational Modal Decomposition (VMD) for signal preprocessing to reduce sequence complexity, as adopted in this paper.
Figure 8 clearly delineates the forecasting task of this study in the form of a sequence diagram. The diagram explicitly labels: Historical Window: A 24 h period (comprising 96 15 min sampling points), containing wind power output and associated meteorological characteristics within this timeframe. Forecasting Point: The next 15 min time point following the historical window (i.e., the 97th point), representing the target value to be predicted by the model. Timeline: Denoted by “t − 95” to “t” for the 96 consecutive time points within the historical window, and “t + 1” for the future time point to be forecasted. This diagram visually illustrates the single-step forward forecasting configuration described herein: utilizing past 24 h data to forecast future 15 min power output, thereby establishing the temporal correspondence between features (X) and labels (y).
Figure 9 illustrates how a sliding window approach constructs a sample set for model training and testing from the original time series. The figure shows: The window slides rightward along the time axis at a fixed length (24 h), advancing one time step (15 min) per slide. The data within each window constitutes an input sample (feature), while the power value at the immediately subsequent time point serves as the output label for that sample. This approach transforms long-term time series data into a sequence of supervised learning samples, suitable for training machine learning models such as Support Vector Machines (SVMs). This diagram further illustrates the principle of temporal integrity in dataset construction: training and test sets are partitioned chronologically to prevent “data leakage” during model predictions for future data.
The division between training and test sets is strictly conducted in chronological order, ensuring that training data precedes test data temporally. This partitioning method prevents the leakage of future information during the training phase, adhering to the fundamental principles of time series forecasting.
4.3. Outlier Handling
During the acquisition of wind power data, sensor malfunctions and power-related failures can lead to erroneous or missing data [
33]. Consequently, outliers are an inevitable part of the dataset, and they can severely affect the accuracy of power forecasting. Identifying abnormal wind power data involves detecting and analyzing missing values, duplicate values, and outliers [
34,
35].
The quartile method has been introduced as a data analysis technique that employs the quantiles of the data to identify outliers (see
Figure 10 for a visual representation of the principle).
Q2 and
IQR denote the median and the interquartile range, respectively, and their calculation formulas are given in Equation (28).
where
Q1 and
Q3 denote the first and third quartiles, respectively. Data values exceeding the upper limit
or falling below the lower limit
are considered outliers. Calculating of the upper and lower limits are as Equations (29) and (30).
Outliers are regarded as anomalous data points. Considering that wind power exhibits a continuous variation pattern with a high correlation among three consecutive points, the mean of the preceding and succeeding values is adopted for data imputation [
36,
37].
Implementation to Prevent Data Leakage: The quartiles (Q1, Q3) and the resulting bounds (, ) are calculated solely from the training dataset. This ensures that the criteria for identifying outliers are derived independently, without any information from the test set. The same calculated bounds are then applied to screen the entire dataset (including the test set) for consistency, but the test set does not influence the bound determination.
Considering that wind power exhibits a continuous variation pattern with a high correlation among consecutive points, detected outliers are imputed using the mean of the immediately preceding and succeeding valid values within the same dataset partition (training or test set). For edge cases (e.g., the first or last point being an outlier), a simple linear interpolation from neighboring valid points is used.
Clarification on Temporal Integrity: This imputation method is applied independently within the training and testing phases. During model training, only historical data (the training set) is available. Any outlier in the training set is replaced using the mean of its adjacent values within the training set, which are chronologically prior and posterior to it. This process does not utilize any future information from the test set. During the testing phase, if an outlier is identified in the test set input features (based on the pre-calculated bounds from the training set), its imputation similarly relies only on adjacent values within the test set sequence itself. This strict separation guarantees that the model’s training process is not contaminated by future or test information, adhering to fundamental time-series forecasting principles.
4.5. Data Decomposition
The selection of the optimal number of decomposed modes, K, is of paramount importance for the effectiveness of VMD. The present study employs a comprehensive multi-criteria decision-making approach, determining the optimal K value by integrating center frequency stability analysis with sample entropy assessment. With regard to the VMD, it is important to note that training and testing are conducted separately. During the training phase, VMD is applied solely to historical wind power sequences within the training set, with the resulting modal components (IMFs) and their statistical characteristics (such as center frequency and sample entropy) derived entirely from the training data. Test set data is excluded from the decomposition process, thereby fundamentally preventing test information leakage from the decomposition stage into the model training phase.
4.5.1. Center Frequency Stability Analysis
The center frequency analysis examines the stabilization of the last component’s frequency across different K values, as presented in
Table 4. When the center frequency of the final IMF stabilizes, it indicates thorough decomposition without generating redundant noise-dominant modes.
As shown in
Table 4, when K ≥ 4, the variation in the center frequency of the last IMF becomes significantly smaller, indicating that the decomposition is already thorough. However, a critical observation emerges at K = 8: the center frequency difference between the last two IMFs (IMF7 and IMF8) decreases dramatically to 0.001 Hz, compared to 1.576 Hz at K = 7. This minimal frequency separation at K = 8 suggests potential modal aliasing or the decomposition of noise into separate modes, indicating over-decomposition.
4.5.2. Sample Entropy Analysis for Decomposition Adequacy Assessment
To quantitatively evaluate decomposition quality and avoid subjective judgment, Sample Entropy (SE) analysis was introduced. Sample Entropy measures the complexity and regularity of time series: lower SE values indicate more regular, predictable sequences. The SE distributions for different K values are presented in
Table 5.
4.5.3. Comprehensive Decision-Making for K = 7 Selection
The selection of K = 7 as the optimal decomposition mode number is based on the following multi-faceted analysis:
- (1)
Sample Entropy Marginal Benefit Analysis
The average sample entropy (excluding the last component, typically representing residual noise) shows a consistent and significant reduction from K = 2 to K = 7, with reduction rates ranging from 21.5% to 24.6%. This indicates substantial gains in signal regularity with each additional mode. However, from K = 7 to K = 8, the reduction rate drops sharply to 15.6%, signaling diminishing returns and suggesting that additional modes beyond K = 7 contribute minimally to signal decomposition while potentially capturing noise.
- (2)
Avoidance of Over-Decomposition
At K = 8, IMF8 exhibits an extremely low sample entropy value of 0.01. In information theory, such near-zero entropy values indicate sequences with minimal information content, likely representing noise decomposed into artificial modes. This phenomenon, combined with the negligible center frequency difference between IMF7 and IMF8 (0.001 Hz), strongly suggests over-decomposition at K = 8.
- (3)
Computational Efficiency Balance
While VMD complexity increases with K, the performance improvement from K = 7 to K = 8 is marginal. The significant computational cost increase does not justify the minimal gain in decomposition quality, making K = 7 the practical optimum for balancing accuracy and efficiency.
- (4)
Center Frequency Distribution Rationality
At K = 7, all IMFs maintain distinct center frequency separations (minimum difference of 1.576 Hz between IMF6 and IMF7), effectively avoiding modal aliasing. In contrast, K = 8 shows nearly identical frequencies for the last two modes, indicating frequency overlap and potential information redundancy.
- (5)
Reconstruction Accuracy Sufficiency
Empirical testing showed that the reconstruction error at K = 7 was 0.14%, while at K = 8 it was 0.13%—an improvement of only 0.01%. This negligible enhancement confirms that K = 7 provides sufficient decomposition accuracy for subsequent forecasting tasks.
4.5.4. Final VMD with K = 7
Based on the above comprehensive analysis considering decomposition adequacy, computational efficiency, and avoidance of over-decomposition, this study selects K = 7 as the optimal VMD mode number. The VMD results for the training historical data with K = 7 are presented in
Figure 12.
As illustrated in
Figure 12, the wind power sub-series obtained through VMD with K = 7 exhibit distinct regularity and periodicity while effectively capturing the variation trends of the original data. Each sub-series highlights local characteristics more clearly, providing a stable foundation for subsequent forecasting models.
4.7. Model Testing
In order to evaluate the prediction effect of the model proposed in this paper, we predict the test set data and compare it with other models. Specifically, we first put the same data into other models to construct the comparison model. Then, the pure SVM model with the addition of the VMD strategy is compared with the VMD-IDBO-SVM model with the addition of other strategies and improved. The relevant parameter settings are shown in
Table 6. The comparison plots of the prediction results of the SVM and VMD-IDBO-SVM models are shown in
Figure 13, and the results of the error comparison are shown in
Figure 14. The 3D prediction performance scatter plot is shown in
Figure 15, the regression scatter plot is shown in
Figure 16, and the correlation coefficient heat map is shown in
Figure 17. In order to validate the robustness of key parameter selection, preliminary sensitivity analyzes were conducted on the modal number K of VMD and the population size of IDBO. Maintaining constant other conditions, the range of K values varied between 5 and 9, yielding an average RMSE less than 5% variation in the model’s performance on the validation set. This finding suggests that the model maintains consistent performance around an optimal K value of approximately 7. The final parameter settings for each model employed in this study are summarized in
Table 6. Specifically, the VMD-IDBO-SVM model utilizes K = 7 modes for VMD. The IDBO algorithm parameters adhere to the settings outlined in
Table 3. The SVM’s C and γ parameters are independently optimized by IDBO for each IMF sub-sequence rather than being fixed values, demonstrating the model’s adaptive capability. All preprocessing steps in this study—including outlier handling, normalization, feature selection, and signal decomposition—were performed independently on the training set. The test set was used solely for final performance evaluation. This rigorous data isolation strategy ensures the reliability and unbiasedness of model performance assessment, eliminating the possibility of overfitting due to data leakage.
The figure compares the prediction results of the traditional Support Vector Machine (SVM) model and the VMD-IDBO-SVM model proposed in this paper on the test set. The graph shows sample points on the horizontal axis and wind power values on the vertical. The red or solid lines (prediction curves of the VMD-IDBO-SVM model) are closer to the true value curves than the black or dashed lines (prediction curves of the traditional SVM model). Especially in the region with sharp fluctuations, the VMD-IDBO-SVM model still tracks the real changes better, which indicates its strong fitting ability and adaptability.
The graph below illustrates the prediction error (i.e., residuals) of the SVM in comparison to the VMD-IDBO-SVM model on the test set. The error value is indicative of the discrepancy between the predicted value and the true value. The graph reveals that the error fluctuation range of VMD-IDBO-SVM is more limited and the distribution is more concentrated, suggesting that its prediction results are more stable and that the error control effect is superior to that of the traditional SVM model.
The plot illustrates the relationship among the values predicted, the true measurements, and other salient features (e.g., time or wind speed) in a three-dimensional scatter format. The closer the distribution of points is to the diagonal plane, the higher the prediction accuracy is. The point set of the VMD-IDBO-SVM model is more tightly clustered around the diagonal line, suggesting that it maintains high prediction consistency in different dimensions.
The figure shows a two-dimensional scatter plot between predicted and true values with a regression line. The ideal distribution of points is one in which the points are evenly distributed on both sides of the regression line and close to the diagonal. The more concentrated set of points and the slope of the regression line closer to 1 for the VMD-IDBO-SVM suggests a better linear fitting ability and prediction accuracy.
This map illustrates the Pearson correlation coefficients among the characteristic variables and the wind power output. The use of darker colors (e.g., blue) is indicative of more pronounced positive correlations, while the employment of lighter colors (e.g., pink) signifies stronger negative correlations. The figure shows that the wind speed at different heights is highly correlated with the power output, which verifies the reasonableness of the feature selection.
The proposed model is evaluated by comparing its predictions with those of other strategies, including the original SVM, VMD-SVM, VMD-DBO-SVM, and VMD-IDBO-SVM. The single strategy prediction comparison graph is shown in
Figure 18, and the single strategy prediction comparison local enlargement graph is shown in
Figure 19.
The figure compares the prediction results of several single-strategy models, including SVM, VMD-SVM, VMD-DBO-SVM, and VMD-IDBO-SVM. VMD-IDBO-SVM performs optimally over the whole time series, especially at peaks and valleys which are closer to the true values, suggesting that it has a better overall performance than other single-strategy models.
The figure is a locally enlarged version of
Figure 18, highlighting the details of the prediction for a particular time interval. It is evident that VMD-IDBO-SVM still tracks the true value accurately in the fast-change interval, while the other models show lag or bias, further validating their local fitting ability.
To examine the optimization finding ability of testing IDBO in the combined model, the combined models with different optimization algorithms added are compared. SSA, PSO and GWO are chosen to be added to the VMD-SVM model to form three prediction models, which are compared with Under identical iterative optimization conditions, the combined prediction model proposed in this study, and the test set prediction results are presented in
Figure 20. The layout enlargement of the test results is shown in
Figure 21.
The figure compares the predictive performance of different optimization algorithms (SSA, PSO, GWO, IDBO) integrated with the VMD-SVM model. The horizontal axis represents the sample indices, while the vertical axis denotes the power output values. The prediction curves generated by the VMD-IDBO-SVM model demonstrate the closest alignment with the true value curves, indicating that the IDBO algorithm exhibits a distinct advantage in parameter optimization.
This figure is a partial enlargement of
Figure 20, which highlights the details of the prediction in a certain high volatility interval. the VMD-IDBO-SVM still closely follows the actual values at the inflection point, while the other combined models show different degrees of deviation, which proves their robustness in dealing with the non-stationary sequences.
From
Figure 20 and
Figure 21, it can be seen that various optimization algorithms also affect the accuracy of the prediction models due to the differences in their optimization abilities, and the VMD-IDBO-SVM model has a better fit to the actual values, and the method demonstrates enhanced tracking accuracy for actual values at abrupt inflection points within the reference data.
The VMD-IDBO-SVM model matches actual values better and tracks them better at sharp inflection points. The prediction model is evaluated using three indexes, the results of which are shown in
Table 7.
As illustrated in
Table 7, a comparison of the prediction performance of different models on the test set is presented. From the results, the stepwise improvement of model performance can be clearly seen: the traditional SVM model has the lowest prediction accuracy (MAE = 36.236, RMSE = 43.302, R
2 = 0.704) due to the fact that no decomposition and optimization strategy is introduced; after the introduction of the VMD (VMD-SVM), the various errors are significantly reduced (MAE = 12.161, RMSE = 15.059), and R
2 is improved to 0.813. This verifies the key role of VMD in reducing data non-stationarity and volatility. 15.059), and R
2 improved to 0.813. This verifies the key role of VMD in reducing data non-stationarity and volatility, and provides a more stable basis for subsequent prediction. Further introduction of optimization algorithms resulted in continuous improvement of model performance. Among them, VMD-SSA-SVM, VMD-PSO-SVM and VMD-GWO-SVM are gradually optimized, while the VMD-DBO-SVM model has demonstrated strong competitiveness (MAE = 6.357, RMSE = 7.993, R
2 = 0.921). The VMD-IDBO-SVM model presented in this paper demonstrates superior performance in comparison to the other models examined, exhibiting a substantial decrease in MAE and RMSE to 3.315 and 4.130, respectively, while attaining an R
2 of 0.985. This result fully verifies the significant advantages of the IDBO algorithm over the other optimization algorithms in terms of the parameter optimization and the global search capability, as well as the hybrid modeling framework proposed in this paper and the hybrid model framework proposed in this paper in improving the accuracy and stability of short-term wind power forecasts.