1. Introduction
With the acceleration of the global energy transition, photovoltaic (PV) power, known for its clean and efficient nature, holds significant practical importance in the development of a green, low-carbon energy system [
1]. Nevertheless, the inherent volatility, stochasticity, and intermittency associated with PV generation present substantial challenges to maintaining the secure and stable operation of power systems [
2,
3,
4,
5]. Consequently, the accurate prediction of PV power generation for upcoming periods not only helps power dispatching authorities comprehensively coordinate various adjustable energy resources and maintain safe, stable system operations, but also contributes to the full utilization of solar energy resources and the cost reduction of the operation [
6].
In this context, achieving a higher temporal resolution of PV forecasting, particularly at the 15 min level, has become increasingly important [
7]. For example, in China, the official regulations for the grid connection and operation of electric power plants (
https://hzj.nea.gov.cn/xxgk/zcfg/202401/t20240125_230766.html (accessed on 1 August 2025)) mandate that PV units report generation data at 15 min intervals. This regulatory requirement underscores the necessity of high-resolution forecasting, which is not only critical for the seamless integration of PV units into the grid but also for maintaining grid stability [
8]. In addition, insufficient temporal resolution in forecasting can also lead to an overestimation of the economic benefits associated with battery energy storage systems. Specifically, as battery operations rely on capturing short-term imbalances between PV generation and load, coarse-grained forecasts (e.g., 1 h) tend to obscure these transient fluctuations. This masks the true frequency of charging/discharging needs, inflates projections of the battery’s arbitrage and regulation capabilities, and thus distorts investment decisions [
9]. Therefore, this study focuses on the day-ahead photovoltaic power prediction at a 15 min resolution, given its potential advantages in enhancing both grid stability and the economic efficiency of energy storage operations.
Many scholars have conducted extensive research and proposed a range of methods for day-ahead photovoltaic power forecasting, where data-driven machine learning models have become the dominant paradigm due to their superior feature extraction and nonlinear approximation abilities [
10].
As summarized in
Table 1, typical examples include the multi-layer perceptron (MLP) [
11] and its state-of-the-art variations such as DLinear [
12], N-BEATS [
13], and TimeMixer [
14], along with convolutional neural networks (CNNs) [
15] and recurrent neural networks (RNNs) [
16,
17]. Nevertheless, both MLP-based and CNN-based models have a limited receptive field, which restricts their ability to capture long-term patterns within PV power generation data [
14,
18]. RNN-based models, on the other hand, suffer from gradient vanishing or explosion. As an advanced subset of data-driven neural network models, Transformers use self-attention mechanisms to weigh the importance of different temporal positions within the input sequence dynamically, demonstrating exceptional performance [
18,
19]. For example, Tian et al. [
19] use the Transformer model and combine photovoltaic and numerical meteorological data in the Hebei province for ultra-short-term power forecasting. The results show that, compared with traditional models, the Transformer model can better learn the relationships between weather features and outperform traditional models. Furthermore, Zhou et al. [
20] propose the Informer model, based on the ProbSparse attention mechanism, which achieves lower computational complexity and memory usage and can handle long input sequences more efficiently. Nie et al. [
21] propose the PatchTST model, which divides time series into non-overlapping patches and employs a Transformer architecture with channel-individual attention to achieve efficient and accurate long-term time-series forecasting.
However, the aforementioned models were constructed based on the nonlinear relationship between the closest historical power data and that of the target day. This predictive framework, which necessitates the most recent historical power as input variables for forecast generation, encounters significant operational limitations in practical implementations when confronted with monitoring infrastructure failures that preclude access to recent historical power data [
30,
31]. In this context, similar-day analysis, which uses historical power datasets accumulated over long periods rather than relying solely on recent power data, has emerged as a reliable forecasting methodology. For example, Ye et al. [
22] cluster historical days into seasonal and weather-type groups (e.g., sunny, rainy) based on key meteorological parameters (irradiance, temperature, and humidity). Subsequently, the power generation profile from the most similar historical day within the same group is adopted as the forecasting baseline, with Euclidean distance metrics employed to quantify similarity. Acharya et al. [
23] classify historical days by PV power patterns and select primary as well as secondary weather variables via deviation analysis. Then, the closest group is chosen using primary variables, and refined within the group with secondary variables to identify similar days. However, exhaustive pairwise comparisons incur high computational costs. Meanwhile, the classification-based approach to similar-day recognition requires a large training corpus, making it difficult for the method to effectively identify atypical days in special weather scenarios. Therefore, a more effective similar-day selection method is desirable.
On the other hand, while similar-day analysis methodology provides an acceptable forecasting baseline for the target day, it can only broadly capture the general patterns of power generation, exhibiting limitations in accurately representing the various subtle fluctuations that manifest on the target day. The refinement of forecasts derived from similar-day analysis is also essential to enhance predictive accuracy. For example, Gulin et al. [
24] take predictions from the meteorological service as baselines and use an MLP to revise prediction sequences in real-time according to recent error differentials between forecasts and the latest measured power data. Zhang et al. [
25] employ a CNN to produce the forecasting baseline and design an error-correction module based on the hybridization of the wavelet transform (WT) and k-nearest neighbor (KNN) algorithms, which mainly accounts for historical prediction error patterns of the CNN model. However, existing correction methodologies rely predominantly on historical prediction error patterns specific to individual models, without giving sufficient consideration to the valuable target-day information embedded within meteorological forecast data.
In addition, although several researchers have noticed the effectiveness of multi-scale analysis in improving the prediction accuracy of power generation, the majority of existing studies merely incorporate multi-scale prediction results in a unidirectional manner, failing to account for the inter-scale relationships and characteristics. For example, Jiang et al. [
26] use empirical mode decomposition (EMD) to decompose power data, and construct different LSTM neural network structures for the intrinsic mode functions of each frequency band. Li et al. [
27] decompose the historical power data based on the fast iterative filtering decomposition (FIFD) method and use the echo state network with kernel extreme learning machine (ESN-KELM) to model the different components, respectively. More advanced methods include MSGNet [
28], which utilizes frequency-domain decomposition to fuse multi-scale features by capturing cross-frequency correlations, and Pathformer [
29], which employs attention mechanisms to adaptively integrate features across scales based on their predictive relevance. Although various models incorporate multi-scale designs, they often fail to simultaneously leverage information from different scales, derived from both past observations and typical generation modes [
14,
32].
To address critical challenges, including data missing, insufficient similar-day prediction accuracy, and the limitation of unidirectional fusion in effectively utilizing multi-scale prediction information, this study proposes a novel model, SN-Transformer-BiMixer, designed for day-ahead photovoltaic power prediction at a 15 min resolution. As shown in
Figure 1, the model architecture comprises three core components. First, a Siamese network (SN) is introduced to identify similar days for the target day based on numerical weather prediction(NWP). By focusing on learning discriminative features between days rather than features specific to each day, the SN can efficiently select representative similar days and generate baseline power curves without large training datasets [
33]. Second, a Transformer model is used to dynamically correct these baseline curves via its self-attention mechanism, enabling the capture of complex correlations among meteorological variables and PV data for better prediction accuracy. Finally, a “down-top + top-down” bidirectional mixer (BiMixer) module is designed to fuse prediction results across different scales, addressing the limitations of unidirectional fusion in utilizing multi-scale information. The key contributions of this research are summarized as follows:
A Siamese network is introduced to identify multi-scale similar historical days for the daily power to be predicted, thus enhancing forecasting robustness, particularly when processing incomplete or missing real-time power generation data.
A Transformer-based correction framework is proposed to systematically refine preliminary predictions from similar-day matching. Furthermore, the designed “down-top + top-down” bidirectional mixer architecture enables comprehensive integration of power curve patterns across different temporal resolutions, substantially improving both forecast accuracy and reliability.
Comprehensive experimental studies are conducted on real-world PV sites in China. The results demonstrate the superiority of the proposed model in terms of prediction accuracy and robustness.
The remaining parts of this paper are organized as follows. In
Section 2, the proposed method is introduced in detail.
Section 3 illustrates the experimental setup.
Section 4 presents the experimental results and analysis. Finally, conclusions are drawn in
Section 5.
2. Methods
Given the widespread absence of real-time historical data due to the sensing device or transmission failure, coupled with the multi-scale nature of PV generation, this study proposes a novel model (i.e., SN-Transformer-BiMixer) for day-ahead PV forecasting, as shown in
Figure 1.
Specifically, the proposed SN-Transformer-BiMixer mainly consists of a synergistic collaboration of three core modules, as follows: (1) SN module with its twin-branch structure and shared weights demonstrates excellent performance in small-sample classification tasks, enabling effective identification of days similar to the target forecast day. (2) The Transformer module based on self-attention mechanisms precisely captures complex temporal dependencies in power data to refine the similar-day curves generated by the SN module, thereby facilitating in-depth analysis of relationships among NWP data, historical power generation, and target forecasting power. (3) The innovatively designed BiMixer effectively integrates prediction information across different temporal scales from the Transformer output, achieving complementary optimization of forecasting results through mutual calibration of multi-scale features.
The following sections provide a detailed description of each module within the proposed model, SN-Transformer-BiMixer.
2.1. Identification of Multi-Scale Similar Days by SN
The core of selecting similar historical days for PV power forecasting is to ensure the selected days closely match the target day’s features. However, PV power is influenced by uncertain meteorological factors like solar irradiance, temperature, and humidity. Meanwhile, actual production environments often have missing data issues. Traditional methods that select similar days from continuous time series rely on simple temporal correlation, which fails to handle these uncertainties and thus cannot achieve stable and accurate forecasting.
In addition, when traditional classification methods classify small-sample data, as the number of classification types increases, the number of samples in each category will decrease, resulting in a reduction in the accuracy of the classifier. To effectively solve the small-sample problem, Tolosana et al. [
33] use a Siamese neural network model to learn the similarity between different samples, and then match samples of unknown categories. This method has been successfully applied to the field of online signature verification. The Siamese neural network is a special neural network structure, which is composed of two sub-networks with a Siamese relationship. The two sub-networks have the same structure and share weights, but have different inputs. The Siamese neural network can simultaneously learn the features of two input samples. By comparing and analyzing the differences and similarities between these features, it can explore the internal connections of the data and play a unique role in fields such as image matching, similar text recognition, and anomaly detection [
34].
Therefore, in our study, the SN is applied to select similar historical days in the forecasting of PV power generation. Firstly, the power data and NWP data are down-sampled to generate two types of data at different scales. Notably, the NWP encompasses three critical meteorological features, as follows: surface horizontal radiation, diffuse radiation, and direct radiation, which have a strong correlation with photovoltaic power generation [
35,
36,
37].
For each scale, according to the power data, K-means clustering is carried out on the normalized power data (that is, the trend of the power curve) in the spring, summer, autumn, and winter seasons, and several samples are selected as “typical days”. Subsequently, the NWP data of the forecast day are input into the SN as the key factor in determining which typical pattern the predicted day belongs to. It is important to note that the power-based clustering and the meteorology-based classification remain theoretically decoupled. Clustering is performed solely on historical power curves to identify representative patterns (“typical days”), while the Siamese network operates in the meteorological feature space to learn similarity relationships that reflect these power-based types. This design avoids ungrounded fusion of feature spaces and maintains the physical causality between weather and photovoltaic output. Finally, a weighted calculation is carried out similarly to the weight to obtain the basic forecast result of the PV power of the forecast day at this scale.
Specifically, the SN structure adopted in this study is shown in
Figure 2. The input data of the SN is a sample pair of NWP data
. In the actual production environment, there may be missing data in the PV power generation data. However, the NWP information can provide relatively complete data. Therefore, using the NWP data as the model input can avoid the problem of low prediction accuracy caused by missing data. Among them,
and
represent the NWP data of the
i-th day and the
j-th day, respectively.
represents the label indicating whether
and
belong to the same category. When
and
are samples under the same category,
, otherwise,
.
represents the model parameters, and
represents the distance measure between samples, and its expression is as follows:
Here, and , respectively, represent the mapping functions that transform the input data and into low-dimensional feature vectors.
The loss function of the SN usually adopts the contrastive loss function, which is shown as follows:
Here, is the set threshold. The SN loss function realizes the classification learning of samples through the influence of the distance between samples of different categories on the loss value. When the samples belong to different categories () and the distance is less than the threshold , the loss increases as the distance decreases, prompting the model to increase the distance between samples of different categories; when the samples are of the same category (), the loss increases as the distance increases, driving the model to reduce the distance between samples of the same category. In this way, the model is guided to learn the feature representations that can effectively distinguish samples of different categories, improving the classification accuracy and generalization ability.
2.2. Correction of Multi-Scale Similar Curves by Transformer
Although the SN can provide the power prediction results for the forecast day, this model mainly focuses on the similarity of the shape of the power curve and fails to fully consider the magnitude of the power values as well as the inherent randomness and volatility of PV power generation.
In this context, this study innovatively constructs a power correction module. This module takes the prediction from the SN as the reference value and combines it with the NWP data to dynamically correct the prediction results, thereby improving the accuracy of the power output prediction under actual weather conditions. Specifically, the Transformer module is adopted as the post-correction module. It effectively solves the problem that traditional complex recurrent or convolutional neural networks struggle with in parallel computing in sequence processing, greatly improving training efficiency and significantly reducing training time. Meanwhile, it demonstrates good generalization ability in multi-task scenarios, capable of efficiently learning data features and optimizing model performance in different scenarios [
38]. The specific formulations are as follows:
Here, X represents the NWP data, including three significant meteorological variables, as follows: surface horizontal radiation, diffuse radiation, and direct radiation. represents the corrected predicted value of the photovoltaic power generation by the Transformer, and represents the initial prediction generated by the SN model. Transformer(·) indicates that this study employs the Transformer module to optimize the initial predictions. The module’s training objective is to minimize the error between the true values and the SN-generated predictions.
The core of the Transformer module lies in the multi-head attention mechanism. Each single-head attention mechanism, which serves as the building block of multi-head attention, can be calculated as follows. Given a query matrix
Q, a key matrix
K, and a value matrix
V, the attention score is computed using the dot-product operation:
Here, dividing by helps prevent the dot-product values from becoming excessively large, which could lead to extremely small gradients in the softmax function.
Multi-head attention integrates multiple single-head attention mechanisms to capture diverse features of the input sequence. It is defined as follows:
Here, is the query weight matrix for the i-th attention head, is the key weight matrix for the i-th head, is the value weight matrix for the i-th head, is the output projection matrix. h represents the number of heads, denotes the dimension of queries/keys, denotes the dimension of values, and is the model’s base dimension (input/output dimension of all sub-layers).
In the Transformer encoder, after the multi-head attention layer, there is a feed-forward neural network (FFN) composed of two linear layers with a ReLU activation function in between. The output of the FFN is calculated as follows:
Here, , are weight matrices, and , are bias vectors. is the FFN’s intermediate dimension (typically 4×d model in standard Transformer designs).
By leveraging the Transformer to correct the initial predictions from the SN, we obtain the refined PV power prediction results. . Among them, is the power prediction result with a time resolution of .
2.3. Fusion of Multi-Scale Information by BiMixer
PV power is affected by multiple factors such as meteorological conditions and the position of the sun, and its variation characteristics are different at different time scales. It is difficult for a single scale to comprehensively capture the differentiated changes. The multi-scale method can process data at different scales simultaneously. For example, it can handle the rapid fluctuations of light intensity in a short period and the power changes caused by the transition of weather types over a long period, thus accurately capturing the variation characteristics of PV power [
39].
Nevertheless, existing studies predominantly employ unidirectional paradigms for multi-scale prediction fusion. For instance, Chen et al. [
29] introduced Pathformer, which utilizes attention mechanisms to directly summarize multi-scale information. Similarly, Zhu et al. [
28] propose MSGNet, which applies frequency-domain decomposition to synthesize multi-scale features. In these methods, the attention mechanism relies on data-driven, unidirectional weight assignment, while frequency-domain fusion depends on a one-time domain transformation and concatenation. However, both approaches fail to effectively manage dynamic interactions or ensure global consistency across multiple scales, thereby limiting their ability to fully exploit the complementary information present across all scales [
14,
32].
Given this, this study conducts research from a multi-scale perspective. Based on the SN and the Transformer correction module described above, the corrected PV power prediction results of the forecast day at different scales are obtained. Subsequently, this study designs a bidirectional mixer module. This module is capable of fully integrating the information of prediction results at different scales, effectively bridging the shortcomings of unidirectional fusion methods. The structure of the bidirectional mixer module is shown in
Figure 3.
The bidirectional mixer module designed in this study is composed of a “down-top” mixer module and a “top-down” mixer module, and finally, the effective combination of the two sub-modules is achieved through an MLP model. The input data of the bidirectional mixer module is . The specific analysis of the two modules is as follows.
2.3.1. “Top-Down“ Mixer
The “top-down” mixer module constructed using an MLP is designed to transform low-resolution scale features into high-resolution ones, facilitating the progressive transformation of macroscopic-level information into more detailed representations. This hierarchical refinement process is formalized by the following equation:
Here, the contains one hidden layer with a ReLU activation function. The “top-down” mixer module can uncover the key features hidden at the high-resolution scale, providing richer and more accurate information for photovoltaic power forecasting. Thereby, we obtain the “top-down” fusion results .
2.3.2. “Down-Top” Mixer
Conversely, the “down-top” mixer module primarily projects high-resolution features onto low-resolution ones. It efficiently filters redundant information from high-resolution data, emphasizing key features and enabling the model to capture overarching patterns of power changes from a macroscopic perspective. The specific equation is expressed as follows:
Here, the contains one hidden layer with a ReLU activation function. According to this module, new fusion results are generated.
Based on the prediction results
obtained from the “top-down” mixer module and the prediction results
obtained from the “down-top” mixer module, a linear layer is used to fuse the prediction results at each scale. Meanwhile, to capture prediction information, an MLP model is utilized to combine with the initial prediction results
, and the prediction results on each scale are mapped to the time resolution of
. That is,
Here, the has one hidden layer, and the activation function is ReLU. Thus, we obtain the bidirectional mixing power prediction results at each scale .
Finally, the PV power prediction for the forecast day is derived via the average weighting method.
5. Conclusions
Accurate day-ahead PV power forecasting is of paramount importance for maintaining the safe and stable operation of the power grid. However, the phenomenon of data missing is widespread in engineering applications and often causes common data-driven machine-learning models to produce large errors in photovoltaic power prediction. In this context, this study proposes a novel method called SN-Transformer-BiMixer. Wherein, a Siamese neural network is introduced to identify days similar to the forecast day without large training datasets. Then, a Transformer and a bidirectional mixer are constructed to refine similar curves derived from SN for better accuracy.
Comprehensive experimental evaluations validate the superiority of the proposed method over existing approaches for day-ahead PV forecasting. Specifically, the day-ahead PV forecasts at 15 min resolution, generated via the SN-based similar-day selection method, achieve an RMSE of 3.152, substantially outperforming the state-of-the-art time-series model, TimesNet, which yields an RMSE of 4.015. After applying the Transformer-based correction, the RMSE of forecasts derived from similar days declines to 2.786, marking an 11.6% improvement in accuracy. Further incorporation of multi-scale information (2 h, 1 h, and 30 min) coupled with a bidirectional fusion of the intermediate predictions reduces the RMSE to 2.490, yielding an additional 10.63% gain. Overall, these findings provide compelling evidence that augmenting the SN-based similar-day selection framework with Transformer-based correction and bidirectional fusion significantly enhances the accuracy of day-ahead photovoltaic forecasting. Moreover, when benchmarked against the Transformer and Informer, our SN-Transformer-BiMixer method also demonstrates superior performance, achieving RMSE reductions of 4.52% relative to the Transformer and 11.32% relative to the Informer.
Ablation experiments further identify the sources of this enhanced performance. Specifically, when selecting similar days, the SN method reduces the RMSE of prediction results by 27.04% compared with the MIC method, validating the effectiveness of the SN model in similar-day data selection. In the fusion of multi-scale prediction results, the BiMixer model reduces the RMSE by 6.24% compared with unidirectional fusion approaches and by 10.69% compared with the average weight method. It also outperforms the attention-based method (SN-Transformer-Attention) by reducing RMSE by 1.58% and the frequency-domain fusion model (SN-Transformer-Frequency) by lowering RMSE by 0.80%. These results reflect the superiority of the bidirectional fusion model over other fusion models. Importantly, the proposed method also exhibits robustness to data imperfections. Experiments on datasets containing missing data, conducted with zero-padding imputation for missing data, show stable prediction results, with the model achieving a notable 10.01% reduction in RMSE compared to the Informer model and 8.99% compared to the Transformer model, even under these challenging conditions.
Given the diverse and seasonally complex characteristics of PV power curves, the K-means clustering algorithm used in this study to select ‘typical days’ may ignore some site-specific, low-frequency but important patterns, such as bimodal curves on power-limited days. For future work, a manual screening process will be incorporated to more comprehensively identify certain types of power curves as ‘typical days’. More research will be conducted on the detection and simulation experiments of data missing scenarios. Furthermore, further research will focus on the detection and simulation of missing data scenarios, with the aim of improving model robustness under realistic conditions. In addition, the impact of various loss factors on power generation, including second-order effects, spectral effects, and shading, will be considered to further improve the model’s accuracy in the future. The automatic detection and classification of faults will also be explored.