3.2. Data Analysis
To fully explore the effective information in the data, this paper extends the multidimensional features of the original grid frequency time series based on the phase space reconstruction theory and the date–time feature extraction method to construct a multidimensional feature matrix. This process can characterize the potential information in the frequency series more comprehensively, providing a solid data foundation for constructing a high-precision prediction model.
3.2.1. Determination of Phase Space Parameters
In this paper, the average mutual information method is used to plot the curve of mutual information between the original grid frequency sequence and its delayed sequence with delay time (see
Figure 3). The horizontal axis is the delay time,
τ, and the vertical axis is the mutual information value, reflecting the correlation degree between the original sequence and the delayed
τ sequence. When determining the optimal delay time, the first local minimum of the mutual information curve is usually selected. At this time, the data redundancy can be effectively reduced, and the dynamic information of the sequence is maintained. If there is no apparent local minimum in the curve, the position where the mutual information value tends to be stable is selected as the optimal delay time.
From
Figure 3, it can be seen that with the increase in delay time,
τ, the mutual information value between the original sequence and the delayed sequence decreases rapidly, which indicates that the information redundancy between the data points is gradually reduced and the correlation is weakened. When
τ is small, the mutual information value is high, reflecting a high degree of correlation between the sequences, which easily leads to excessive redundant information in phase space reconstruction, and is not conducive to the presentation of the system’s dynamic features. As
τ increases, the mutual information value gradually decreases, and the information independence increases, which helps to enrich the phase space structure. The red dashed line marked at
τ = 256 in the figure is the first local minimum of the mutual information curve, so it is selected as the optimal delay time and verified by the local zoom-in diagram. The suitable delay time reduces the redundant information in the reconstruction and preserves the system’s dynamics. Based on the phase space reconstruction theory and mutual information analysis,
τ = 256 is finally determined to be the optimal delay time for the subsequent dynamics analysis and predictive modeling.
After determining the delay time, this paper uses the Cao method to determine the optimal embedding dimension of the grid frequency time series.
Figure 4 demonstrates the curves of the two key metrics,
E1 and
E2, in the Cao method with the embedding dimension.
The upper half of
Figure 4 shows that
E1 increases with the embedding dimension
m and stabilizes after
m = 4, indicating that the system’s degrees of freedom have been sufficiently inscribed, and the effect of continuing to increase the embedding dimension is limited. The second half shows the curve of
E2 with
m.
E2 approaches 1 when
m ≥ 10, further verifying the dynamics of the sequence.
Combining the optimal delay time determined by the mutual information method and the analytical results of Cao’s method, this paper finally determines the embedding dimension of the grid frequency time series as m = 4.
3.2.2. Extraction of Date and Time Information
In the preprocessing stage of grid time series data, reasonable extraction and encoding of temporal features are crucial to improve the prediction model’s performance. Based on the one-month grid frequency data collected above, this paper extracts the following temporal features: week of month, day of month, day of week, is_weekend, hour, minute, and second. Among them, hour, minute, second, day of week, and day of month belong to discrete variables with strong periodicity, and sine and cosine coding are used to reveal their cyclic patterns better and avoid model misjudgement; is_weekend is a binary feature, which is directly coded with 0/1 to distinguish between weekdays and weekends; week of month is_weekend is a binary feature, which is directly coded with 0/1 to distinguish weekdays and weekends; week of month is coded with integers from 1 to 5, which is convenient for the model to capture the cyclical changes within the month. Through diversified temporal features and appropriate coding methods, the temporal patterns in the grid frequency series can be more comprehensively explored, effectively improving the accuracy of model analysis and prediction.
In this paper, the correlation between the features extracted based on phase space reconstruction and date–time information and the grid frequency is further analyzed and demonstrated by a scatter plot in
Figure 5. Here, the horizontal axis represents the absolute value of Spearman’s correlation coefficient (|
r|) between each feature and the grid frequency, and the vertical coordinate is its corresponding significance level, which is quantified by −log10 (
p value) to reveal the degree of correlation and statistical significance of each feature with the target variable.
In
Figure 5, the red scatters represent the features extracted based on phase space reconstruction, while the blue scatters indicate the features obtained from date–time information. In time series forecasting, Spearman correlation coefficient (|
r|) is commonly used for feature screening. However, relying only on the correlation coefficient may ignore some statistically significant features despite the low correlation coefficient. Therefore, this paper also introduces the significance test (
p-value) as an auxiliary criterion in feature screening.
In this paper, we take the lowest correlation coefficient in the red scatter as the threshold value of −log10 (p value) and filter the blue scatter to keep only those date and time features whose −log10 (p value) is not less than this threshold value. This method finally filters “hour” as a valid time feature. This double-criteria feature screening avoids information omission, improves the features’ predictive ability and statistical reliability, and helps improve the model’s performance in practical applications.
To summarize, the final feature set selected in this paper contains the following: phase space reconstruction features with a delay time of 256 s and an embedding dimension of 4, as well as the “hour” feature in the date–time information.
3.3. Parameter Settings for DSCW Method
Based on the subset of features screened in the previous section, this paper employs a grid search optimization algorithm to traverse the parameter space. It performs model training and validation evaluation for all combinations of sliding window sizes, step ratios, and attenuation factor λ parameters. The parameter combination that minimizes the root mean square error (RMSE) is finally selected to determine the optimal values of each parameter.
Figure 6 and
Figure 7 show the trend and distribution of the model RMSE under different combinations of sliding window length and step ratio, respectively. From the figures, it can be intuitively observed that different parameter combinations significantly affect the model performance.
According to the results in
Figure 6 and
Figure 7, the effect of the step ratio on the model RMSE is more significant. The box plots in
Figure 6 show that the RMSE shows an overall upward trend with the increase in the step size ratio, indicating that a smaller step size ratio helps to improve the model prediction accuracy. When the step size ratio is larger, the prediction error and volatility of the model increase significantly, and the outliers of some distributions tend to increase, indicating that a step size ratio that is too large is not conducive to model stability. Combined with the three-dimensional scatter plot analysis in
Figure 7, the combination of different sliding window lengths and step size ratios has an obvious relationship with the effect of RMSE. Overall, the smaller step ratio and longer sliding window length exhibit lower RMSE values, indicating that the simultaneous optimization of these two hyperparameters can effectively improve the model performance.
Based on the results in
Figure 6 and
Figure 7, the step size ratio is finally determined to be 0.1. Subsequently, this paper further filters the optimal sliding window length and
λ value by comparing the RMSE performances of the model with different sliding window lengths and values of the decay factor
λ based on a step size ratio of 0.1.
Figure 8 demonstrates the effect of
λ values on the RMSE of the model under different sliding windows.
As can be seen from the figure, the RMSE of the model decreases significantly when λ is gradually increased to 0.1. Subsequently, it tends to stabilize under most of the sliding window lengths. This phenomenon suggests that the appropriate fusion of historical feature weights helps improve the model’s generalization ability. Further analysis shows that, in the case of shorter sliding windows (e.g., 60 min, 120 min), the RMSE value of the model is more sensitive to changes in λ. In the case of longer sliding windows, the value of RMSE is at a lower level and relatively less affected by the change in λ. Some of the data points in the figure are labeled with circles with black outlines, representing the best λ parameter points under different sliding window lengths. Overall, a reasonable setting of the λ value can effectively reduce the prediction error, and there is a certain degree of difference in the value of λ to optimize the model performance under the conditions of different lengths of the sliding window.
Combined with the above analysis results, it can be seen that the sliding window size, step ratio, and attenuation factor, λ, all significantly affect the prediction performance of the model. When the sliding window is set to 480 min, the step ratio is 0.1, and the attenuation factor λ is 0.1, the model achieves the lowest RMSE and the best prediction performance on the validation set. Therefore, in this paper, the sliding window of 480 min, step ratio of 0.1, and λ value of 0.1 are finally selected as the optimal parameters of the model to improve the accuracy and stability of time series prediction.
3.5. Prediction Results of the DSCW-LightGBM Method
In this paper, a systematic comparative analysis is conducted with the proposed DSCW method as the core, and its performance is compared horizontally with four benchmark feature selection methods, including the following: a traditional feature screening method based on correlation coefficient analysis, a feature importance assessment method based on mutual information, a regularized feature selection method based on Lasso regression, and a method based on recursive feature elimination (RFE).
Table 2 lists the number of selected features under each method, which provides data support for the subsequent performance evaluation.
According to the statistical results in
Table 2, the SCW method retains only five core features, showing its advantages in feature redundancy control and effective information extraction. In contrast, the Spearman and Lasso methods screened 3 and 8 features, respectively, the mutual information (MI) method retained more features with a total of 14 features due to its focus on nonlinear dependency mining, and the number of features was significantly higher than that of the previous methods, while the RFE retained all 15 features. This table visualizes the difference in the number of features between different feature selection methods.
In this paper, the control variable method investigates the effects of the parameter configurations of sliding window size, step ratio coefficient, and attenuation factor, λ, on the model performance under different feature selection strategies.
Figure 10 and
Figure 11 show the distribution patterns of the values of the above three key parameters and their effects on the model performance in each feature selection method, respectively.
According to the experimental results in
Figure 10 and
Figure 11,
Table 3 gives the optimal values of each parameter under different feature selection strategies. In addition, the hyperparameter optimization results based on the LightGBM model under the corresponding parameter combinations are also listed in the table.
According to the experimental parameter settings in
Table 1 and
Table 3, it can be seen that the sliding window length parameter and step size scaling factor are 480 min and 0.1 for different feature selection methods; also, the feature sampling rate and learning rate of the LightGBM algorithm are set to 1.0 and 0.2, respectively. In terms of the maximum depth parameter selection, both the Spearman and RFE methods use a larger maximum depth (9), while the Lasso method cooperates with a lower maximum depth (4) and fewer number of leaves (15), which enhances the adaptability to high-dimensional features. In contrast, the SCW method proposed in this paper employs a maximum depth value of 7 at a more eclectic level.
Based on the parameter configuration information under different feature selection strategies given in
Table 1 and
Table 3, the LightGBM model prediction results are analyzed.
Table 4 lists the main performance metrics of the LightGBM model on the test set and the average computation time of the sliding window weights under different feature selection methods.
Figure 12 then visually compares the performance of different feature selection methods on multidimensional evaluation metrics through radar charts.
According to the quantitative evaluation results in
Table 4, the SCW method proposed in this paper shows a significant advantage in the regression task, with a low RMSE value of 1.799 × 10
−3, an R
2 as high as 0.9924, and better indicators such as MAE and MAPE than the other methods. In this table, the downward arrows indicate that a smaller value means better regression performance for that metric, while the upward arrow for R
2 means that a larger value is preferable. While the Spearman and Lasso methods show good fitting capabilities, their overall regression performance is slightly inferior to SCW. The MI and RFE methods have increased model complexity due to higher feature dimensions, and the weight calculation time consumed is 32.82 ms and 34.70 ms, respectively. Combined with the radar charts in
Figure 12, the SCW method has a balanced and outstanding performance in the core indexes such as R
2 and RMSE; the Spearman method has an advantage in calculation time consumed, but the performance of the other indexes is average; the MI and RFE have minimal differences in error metrics such as MAE and MAPE, and their regression accuracy is comparable. Overall, the SCW method not only improves model accuracy but also enhances computational efficiency by effectively controlling the feature dimensions, which verifies its feasibility and value in engineering applications.
Figure 13 further compares the time series fitting curves of the LightGBM model optimized based on five feature selection methods: SCW, Spearman, MI, Lasso, and RFE. The original sequences here are within the test sample interval, allowing us to analyze the dynamic consistency between the prediction results of each method and the actual values.
From the figure, it can be seen that feature selection methods such as SCW, Spearman, MI, Lasso, and RFE can more accurately fit the trend of the target series in terms of the overall trend and fluctuation interval. The SCW method is closer to the true value (True) in most of the sample points, which shows a strong dynamic adaptation ability.
In order to further evaluate the impact of different feature weight allocation strategies on the LightGBM model in grid frequency timing prediction. In this paper, three feature input methods are designed: unweighted input (LightGBM), static weight assignment (SCW-LightGBM), and dynamic weight assignment (D-SCW-LightGBM). All three methods are compared and experimented upon with the same dataset and training process.
Table 5 presents the results of the three methods on the key regression metrics (RMSE, MAE, MAPE).
Figure 14 then presents the fitting effect of the predicted sequences of each method to the real sequences on the validation set.
As can be seen from
Table 5, all three methods exhibit low error levels in the regression task, with D-SCW-LightGBM slightly outperforming the other two methods in terms of RMSE, MAE, and MAPE. The downward arrows in the table indicate that lower values correspond to better model performance for these metrics. This indicates that dynamically adjusting the feature weights can improve the model’s prediction accuracy. Specifically, D-SCW-LightGBM utilizes the dynamic weighting mechanism to dynamically adjust the feature weights according to the correlation between the features within the sequence and the target, which makes the model more flexible in responding to changes in the sequence information, and thus has the best prediction performance, with an RMSE of 1.799 × 10
−3, which outperforms both the static-weighted SCW-LightGBM (1.810 × 10
−3) and the unweighted LightGBM model (2.088 × 10
−3).
Based on the SHAP value analysis presented in
Table 6, the D-SCW method consistently demonstrates an advantage in enhancing the importance of key lagged frequency features compared to both the static SCW weighting strategy and the unweighted LightGBM baseline model. Notably, the SHAP values of core lagged frequency features (such as Freq_lag_1) are significantly higher than those of other features, which aligns well with the theoretical assumption in power systems that “recent historical states primarily govern frequency dynamic responses.” This empirical evidence substantiates the model’s ability to accurately capture and represent the underlying physical mechanisms.
Figure 14 visualizes the effect of the sequence fitting of the three feature weighting methods through the time series curves. It can be seen that the model as a whole can fit the fluctuation trend of the target variable better, regardless of dynamic weighting, static weighting, or no weighting configuration. However, the LightGBM model with no weighting configuration of the input features deviates more significantly from the actual value at some inflection points and where the fluctuations are large. In contrast, SCW-LightGBM improves the stability of the fit by static weighting, while D-SCW-LightGBM enhances the responsiveness to sudden changes by the dynamic weighting mechanism, and its fitted curves are closer to the actual values at the turning points of the trend. In summary, the feature weighting method, especially the introduction of the dynamic adaptation mechanism, significantly improves the accuracy and adaptability of the LightGBM model in grid frequency time series prediction, especially when dealing with non-stationary time series data.
To further validate the effectiveness of the dynamic significance–correlation-based weighting method proposed in this paper in grid frequency prediction, the real-time time series dataset of grid frequency in February 2025 is used as a test sample to validate the LightGBM model.
Table 7 presents the proposed method’s regression performance and computational efficiency compared with four feature selection strategies: Spearman, MI, Lasso, and RFE. By quantitatively assessing each method’s prediction accuracy and the average computation time for sliding window weights, the actual effectiveness of each method in the prediction task can be objectively reflected.
Figure 15 visually illustrates the differences in various metrics among the methods using a multiYaxis bar chart, providing a clear and robust basis for comprehensively evaluating the proposed method.
According to the results in
Table 7, the differences between the feature selection methods on the RMSE and MAE metrics are small, and the overall error is low. Among them, the SCW method performs optimally on these two error indicators. The performance of the methods on the MAPE indicator is relatively close, and the R
2 values of all methods are higher than 0.994, indicating that the model has good interpretability. However, there are obvious differences in the efficiency of feature weight computation: the Spearman method has the shortest weight computation time of 11.22 ms, while the RFE and Lasso methods have relatively longer weight computation times of more than 30 ms.
Figure 15 further verifies the above conclusions through the visualization comparison. It can be seen that the SCW method outperforms the other methods in terms of prediction accuracy (RMSE, MAE) and the degree of model fit (R
2), and although the weight calculation time is slightly higher than that of the Spearman method, it is much lower than that of the algorithms such as MI and RFE. In addition, the Spearman method makes its MAPE slightly higher than that of SCW due to the screening of features based on correlation only, further validating the importance of statistical significance testing. In summary, the SCW method balances prediction accuracy and computational efficiency in the grid frequency prediction task by integrating correlation analysis and a statistical significance test.