To evaluate the proposed spatio-temporal K-Modes clustering, ASTGCN point forecasting, and KDE interval forecasting, 20 PV plants in Jilin Province (615 MW total) are used. Power is sampled every 15 min; only 05:00–17:00 data are retained, with nighttime values set to zero. Records exceeding capacity or negative are removed, gaps are cubic-spline interpolated, and each plant’s power is Min–Max normalized. The dataset spans 1 January 2017–31 December 2018, containing plant locations, power, and NWP variables (cloud cover, irradiance, temperature, wind speed). The first eleven months of 2018 serve for training and the last month for testing; the forecast horizon is 24 h (96 points).
5.1. Overall Evaluation of the Model
Table 2 lists the installed capacity and geographic coordinates of the 20 PV plants and, by means of the complex-fluctuation weather extraction method, gives the annual count of such days for each site. Spatial correlation stems from plant locations: different coordinates lead to different simultaneous outputs. Temporal correlation is captured by the number of complex-fluctuation days, which reflects the intra-day variability of power; this count varies markedly across plants over a year.
When clustering with K-Modes, the choice of K directly affects forecast accuracy. If K is smaller than the true number of similarity groups, plants with distinct characteristics are forced into the same cluster; conversely, if K is larger, similar plants are split.
Table 3 quantifies this impact on interval prediction quality: for K = 3, the cluster-averaged interval achieves the highest coverage while maintaining the narrowest bandwidth.
Figure 9 shows the spatial distribution of 20 PV cluster power plants in Jilin province and the clustering results of PV power plants after using K-Modes clustering. From
Figure 9, it can be seen that the power output of each PV plant differs at the same point in time, which is caused by the different spatial geographic distribution of each PV plant. And after using K-Modes clustering, sub-cluster 2 only has power plants 1, 3, and 6, sub-cluster 3 only has power plants 4, 9, and 13, and the rest of the power plants belong to cluster 1. This indicates that the other meteorological characteristics of power plants with similar spatial geographic locations in a certain region are not necessarily similar, which is caused by the different internal structures of different PV power plants. Therefore, the classification of PV clusters cannot be based on spatial characteristics alone.
After partitioning the PV cluster, the ASTGCN model is used to forecast each sub-cluster individually, and the final cluster-wide prediction is obtained by summation. To validate the superiority of the proposed approach, it is compared with traditional machine-learning and deep-learning baselines under the following error metrics. As shown in
Figure 10, panels (a)–(f) display the prediction curves of the benchmark models, while panel (g) presents the proposed ASTGCN model, whose predicted curve closely tracks the actual values with the highest accuracy, thereby demonstrating the effectiveness of the proposed method.
To quantify the final forecasting results of each model, this paper employs four evaluation metrics: Root Mean Square Error (RMSE) [
34], Mean Absolute Percentage Error (MAPE) [
35], Mean Absolute Error (MAE) [
36], and the coefficient of determination (R2) [
37]. RMSE and MAE are scale-independent, while R2 indicates the proportion of variance in the observed values explained by the predictions. Generally, lower RMSE, MAE, and MAPE values together with a higher R2 signify superior predictive performance; the exact formulas are given in Abbreviations,
Appendix A. Definition of Error Metrics. The bar chart comparing the prediction errors of all models is presented in
Figure 11.
In accordance with the “Technical Requirements for Power Forecasting of Photovoltaic Power Stations,” this paper investigates prediction intervals for the PV cluster at confidence levels of 95%, 90% and 80%;
Figure 12 displays the cluster-wide forecasts generated by the proposed method under these three levels, and
Table 4 quantifies the corresponding evaluation metrics. At the 95% level, the upper bound is highest and the lower bound is lowest, so the greatest number of observed power points falls inside the interval, the coverage probability reaches 95.14%, and the bandwidth is also the widest at 24.44%, indicating the strongest ability to cover the true values and the highest reliability. When the confidence level is reduced to 90% the bounds tighten, a small fraction of observations is excluded, and both coverage and bandwidth decrease; likewise, the 80% level narrows the band still further. According to
Table 4, the 95% interval is the most reliable, yielding the smallest
RACE of 0.14% (1.94% and 2.36% lower in absolute terms than those at 90% and 80%, respectively), while conversely the 80% interval is the sharpest, with the minimum
PINAW of 15.99% and the highest composite skill score
CWC, demonstrating that it achieves the most compact representation of uncertainty while tolerating only a slight loss in coverage.
Figure 13 presents the error probability histogram. The fitted density curve is smooth and closely matches the empirical distribution, indicating accurate residual density estimation and providing a reliable basis for confidence quantification. Overall, the proposed method satisfies the general principles of interval forecasting and delivers satisfactory performance across all three confidence levels.
5.2. Validity Verification Considering Spatio-Temporal Properties
This paper incorporates both temporal and spatial characteristics when clustering photovoltaic (PV) stations. Temporal features (the annual number of fluctuating days for each plant) and spatial features (geographical coordinates) are fed into the clustering algorithm. To validate this idea, the resulting interval forecasts are compared with those produced by three alternative strategies: Method 1 uses K-Modes driven solely by spatial attributes; Method 2 uses K-Modes driven solely by temporal attributes; and Method 3 applies K-Modes without considering either spatial or temporal information.
Figure 14 and
Table 5 jointly present the interval-forecast curves and evaluation metrics produced by K-Modes clustering when temporal and/or spatial attributes are selectively injected. Compared with the three ablated alternatives, the proposed spatio-temporal strategy simultaneously achieves the highest coverage and the narrowest bandwidth: coverage rises by 0.84% and bandwidth falls by 4.81% relative to the space-only variant, and the gains are even larger (+4.01% coverage and −7.20% bandwidth) against the time-only variant. Relying on a single dimension—either the annual fluctuation-day histogram or the GPS coordinates—yields overly conservative envelopes because the resultant centroids fail to isolate homogenous regimes: spatial-only grouping ignores cloud-advection-driven ramping signatures, while temporal-only grouping disregards longitude–latitude phase shifts, leaving substantial intra-cluster heterogeneity that must be absorbed by wider intervals. Consequently, both single-feature schemes are forced to sacrifice sharpness to maintain nominal coverage. The “neither-time-nor-space” baseline performs worst: during steep downward ramps, the interval cannot follow the drop, exposing up to 11% of true values and yielding the lowest
PICP (89.15%), a clear sign that the absence of guiding features erodes the model’s ability to recognize fluctuation patterns. In contrast, the simultaneous inclusion of fluctuation-day counts and geo-coordinates forces K-Modes to form centroids in a hybrid temporal–spatial metric space, producing clusters whose members share both similar meteorological statistics and correlated irradiance paths; the resulting residuals are nearly homoscedastic and approximately Gaussian, allowing KDE to adopt a smaller kernel width while still capturing the true error mass. Hence, the proposed spatio-temporal clustering not only captures power ramps more accurately and raises coverage but also delivers the thinnest prediction band, achieving the desirable “high coverage–low bandwidth” outcome without trading away reliability.
5.3. Validation of K-Modes Cluster Partitioning
After cluster partitioning, PV plants within a sub-cluster usually share similar meteorological characteristics and analogous generation processes; this homogeneity not only smooths the forecasting errors of the sub-cluster but also provides approximately identically distributed residual samples for subsequent kernel density estimation, thereby simultaneously improving the reliability and sharpness of the interval forecasts. To verify the effectiveness of the K-Modes partitioning method adopted in this paper, two comparative experiments are designed: the first performs interval prediction after K-Means clustering, and the second conducts interval prediction directly on the entire cluster without any partitioning. As shown in
Figure 15, the K-Means-based interval prediction performs well, offering a large upper bound that almost encloses all actual power observations; however, this also increases the bandwidth. Although most points in the high-power segment are covered, the significantly elevated upper bound causes overall bandwidth inflation, reflecting that the continuous distance metric forcibly merges stations with similar amplitudes but different fluctuation phases, resulting in a multimodal and heavy-tailed residual distribution that compels KDE to widen the kernel width to maintain coverage; consequently, K-Means clustering works better in high-power periods but slightly worse in low-power periods. When no clustering is applied and interval prediction is carried out on the entire cluster, the capture of fluctuations is less pronounced; although a certain interval coverage is satisfied, the bandwidth is also increased. In contrast, the proposed method reduces the bandwidth of the power interval while maintaining a reasonable distribution of the upper and lower bounds, allowing more actual power points to fall within the interval; therefore, it is more accurate and reasonable than traditional methods and enables KDE to capture the same probability mass within a narrower bandwidth.
The interval prediction evaluation metrics using K-Modes clustering, K-Means clustering and a single cluster with different confidence intervals are given in
Table 6. The comparison with the K-Modes clustering evaluation metrics further proves the above statement that the proposed method in this paper has the highest score in both reliability and acuity, as well as skill. At three confidence intervals of 95%, 90%, and 80%, the interval coverage of K-Means clustering is 0.97%, 2.29% and 0.56% lower than that of this paper’s method, while the bandwidth increases by 2.9%, 2.67% and 2.28%, respectively, than that of this paper’s method. The results of interval prediction without clustering are the least effective among the three algorithms, and the interval prediction accuracy is the lowest. This also proves the necessity of clustering the PV plants in the interval prediction of PV clusters.