Day-Ahead Photovoltaic Power Forecasting Based on SN-Transformer-BiMixer

Huang, Xiaohong; Ding, Xiuzhen; Han, Yating; Sima, Qi; Li, Xiaokang; Bao, Yukun

doi:10.3390/en18164406

Open AccessArticle

Day-Ahead Photovoltaic Power Forecasting Based on SN-Transformer-BiMixer

by

Xiaohong Huang

¹,

Xiuzhen Ding

²,

Yating Han

¹,

Qi Sima

²

,

Xiaokang Li

¹ and

Yukun Bao

^2,*

¹

Intelligence & Integrity Energy Technology Co., Ltd., Wuhan 430010, China

²

Center for Modern Information Management, School of Management, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(16), 4406; https://doi.org/10.3390/en18164406

Submission received: 11 June 2025 / Revised: 8 August 2025 / Accepted: 11 August 2025 / Published: 19 August 2025

(This article belongs to the Special Issue New Progress in Electricity Demand Forecasting)

Download

Browse Figures

Versions Notes

Abstract

Accurate forecasting of photovoltaic (PV) power is crucial for ensuring the safe and stable operation of power systems. However, the practical implementation of forecasting systems often faces challenges due to missing real-time historical power data, typically caused by sensor malfunctions or communication failures, which substantially hamper the performance of existing data-driven time-series forecasting techniques. To address these limitations, this study proposes a novel day-ahead PV forecasting approach based on similar-day analysis, i.e., SN-Transformer-BiMixer. Specifically, a Siamese network (SN) is employed to identify patterns analogous to the target day within a historical power dataset accumulated over an extended period, considering its superior ability to extract discriminative features and quantify similarities. By identifying similar historical days from multiple time scales using SN, a baseline generation pattern for the target day is established to allow forecasting without relying on real-time measurement data. Subsequently, a transformer model is used to refine these similar temporal curves, yielding improved multi-scale forecasting outputs. Finally, a bidirectional mixer (BiMixer) module is designed to synthesize similar curves across multiple scales, thereby providing more accurate forecast results. Experimental results demonstrate the superiority of the proposed model over existing approaches. Compared to Informer, SN-Transformer-BiMixer achieves an 11.32% reduction in root mean square error (RMSE). Moreover, the model exhibits strong robustness to missing data, outperforming the vanilla Transformer by 8.99% in RMSE.

Keywords:

photovoltaic forecasting; deep learning; Siamese network; multi-scale; bidirectional mixer

1. Introduction

With the acceleration of the global energy transition, photovoltaic (PV) power, known for its clean and efficient nature, holds significant practical importance in the development of a green, low-carbon energy system [1]. Nevertheless, the inherent volatility, stochasticity, and intermittency associated with PV generation present substantial challenges to maintaining the secure and stable operation of power systems [2,3,4,5]. Consequently, the accurate prediction of PV power generation for upcoming periods not only helps power dispatching authorities comprehensively coordinate various adjustable energy resources and maintain safe, stable system operations, but also contributes to the full utilization of solar energy resources and the cost reduction of the operation [6].

In this context, achieving a higher temporal resolution of PV forecasting, particularly at the 15 min level, has become increasingly important [7]. For example, in China, the official regulations for the grid connection and operation of electric power plants (https://hzj.nea.gov.cn/xxgk/zcfg/202401/t20240125_230766.html (accessed on 1 August 2025)) mandate that PV units report generation data at 15 min intervals. This regulatory requirement underscores the necessity of high-resolution forecasting, which is not only critical for the seamless integration of PV units into the grid but also for maintaining grid stability [8]. In addition, insufficient temporal resolution in forecasting can also lead to an overestimation of the economic benefits associated with battery energy storage systems. Specifically, as battery operations rely on capturing short-term imbalances between PV generation and load, coarse-grained forecasts (e.g., 1 h) tend to obscure these transient fluctuations. This masks the true frequency of charging/discharging needs, inflates projections of the battery’s arbitrage and regulation capabilities, and thus distorts investment decisions [9]. Therefore, this study focuses on the day-ahead photovoltaic power prediction at a 15 min resolution, given its potential advantages in enhancing both grid stability and the economic efficiency of energy storage operations.

Many scholars have conducted extensive research and proposed a range of methods for day-ahead photovoltaic power forecasting, where data-driven machine learning models have become the dominant paradigm due to their superior feature extraction and nonlinear approximation abilities [10].

As summarized in Table 1, typical examples include the multi-layer perceptron (MLP) [11] and its state-of-the-art variations such as DLinear [12], N-BEATS [13], and TimeMixer [14], along with convolutional neural networks (CNNs) [15] and recurrent neural networks (RNNs) [16,17]. Nevertheless, both MLP-based and CNN-based models have a limited receptive field, which restricts their ability to capture long-term patterns within PV power generation data [14,18]. RNN-based models, on the other hand, suffer from gradient vanishing or explosion. As an advanced subset of data-driven neural network models, Transformers use self-attention mechanisms to weigh the importance of different temporal positions within the input sequence dynamically, demonstrating exceptional performance [18,19]. For example, Tian et al. [19] use the Transformer model and combine photovoltaic and numerical meteorological data in the Hebei province for ultra-short-term power forecasting. The results show that, compared with traditional models, the Transformer model can better learn the relationships between weather features and outperform traditional models. Furthermore, Zhou et al. [20] propose the Informer model, based on the ProbSparse attention mechanism, which achieves lower computational complexity and memory usage and can handle long input sequences more efficiently. Nie et al. [21] propose the PatchTST model, which divides time series into non-overlapping patches and employs a Transformer architecture with channel-individual attention to achieve efficient and accurate long-term time-series forecasting.

However, the aforementioned models were constructed based on the nonlinear relationship between the closest historical power data and that of the target day. This predictive framework, which necessitates the most recent historical power as input variables for forecast generation, encounters significant operational limitations in practical implementations when confronted with monitoring infrastructure failures that preclude access to recent historical power data [30,31]. In this context, similar-day analysis, which uses historical power datasets accumulated over long periods rather than relying solely on recent power data, has emerged as a reliable forecasting methodology. For example, Ye et al. [22] cluster historical days into seasonal and weather-type groups (e.g., sunny, rainy) based on key meteorological parameters (irradiance, temperature, and humidity). Subsequently, the power generation profile from the most similar historical day within the same group is adopted as the forecasting baseline, with Euclidean distance metrics employed to quantify similarity. Acharya et al. [23] classify historical days by PV power patterns and select primary as well as secondary weather variables via deviation analysis. Then, the closest group is chosen using primary variables, and refined within the group with secondary variables to identify similar days. However, exhaustive pairwise comparisons incur high computational costs. Meanwhile, the classification-based approach to similar-day recognition requires a large training corpus, making it difficult for the method to effectively identify atypical days in special weather scenarios. Therefore, a more effective similar-day selection method is desirable.

On the other hand, while similar-day analysis methodology provides an acceptable forecasting baseline for the target day, it can only broadly capture the general patterns of power generation, exhibiting limitations in accurately representing the various subtle fluctuations that manifest on the target day. The refinement of forecasts derived from similar-day analysis is also essential to enhance predictive accuracy. For example, Gulin et al. [24] take predictions from the meteorological service as baselines and use an MLP to revise prediction sequences in real-time according to recent error differentials between forecasts and the latest measured power data. Zhang et al. [25] employ a CNN to produce the forecasting baseline and design an error-correction module based on the hybridization of the wavelet transform (WT) and k-nearest neighbor (KNN) algorithms, which mainly accounts for historical prediction error patterns of the CNN model. However, existing correction methodologies rely predominantly on historical prediction error patterns specific to individual models, without giving sufficient consideration to the valuable target-day information embedded within meteorological forecast data.

In addition, although several researchers have noticed the effectiveness of multi-scale analysis in improving the prediction accuracy of power generation, the majority of existing studies merely incorporate multi-scale prediction results in a unidirectional manner, failing to account for the inter-scale relationships and characteristics. For example, Jiang et al. [26] use empirical mode decomposition (EMD) to decompose power data, and construct different LSTM neural network structures for the intrinsic mode functions of each frequency band. Li et al. [27] decompose the historical power data based on the fast iterative filtering decomposition (FIFD) method and use the echo state network with kernel extreme learning machine (ESN-KELM) to model the different components, respectively. More advanced methods include MSGNet [28], which utilizes frequency-domain decomposition to fuse multi-scale features by capturing cross-frequency correlations, and Pathformer [29], which employs attention mechanisms to adaptively integrate features across scales based on their predictive relevance. Although various models incorporate multi-scale designs, they often fail to simultaneously leverage information from different scales, derived from both past observations and typical generation modes [14,32].

To address critical challenges, including data missing, insufficient similar-day prediction accuracy, and the limitation of unidirectional fusion in effectively utilizing multi-scale prediction information, this study proposes a novel model, SN-Transformer-BiMixer, designed for day-ahead photovoltaic power prediction at a 15 min resolution. As shown in Figure 1, the model architecture comprises three core components. First, a Siamese network (SN) is introduced to identify similar days for the target day based on numerical weather prediction(NWP). By focusing on learning discriminative features between days rather than features specific to each day, the SN can efficiently select representative similar days and generate baseline power curves without large training datasets [33]. Second, a Transformer model is used to dynamically correct these baseline curves via its self-attention mechanism, enabling the capture of complex correlations among meteorological variables and PV data for better prediction accuracy. Finally, a “down-top + top-down” bidirectional mixer (BiMixer) module is designed to fuse prediction results across different scales, addressing the limitations of unidirectional fusion in utilizing multi-scale information. The key contributions of this research are summarized as follows:

A Siamese network is introduced to identify multi-scale similar historical days for the daily power to be predicted, thus enhancing forecasting robustness, particularly when processing incomplete or missing real-time power generation data.
A Transformer-based correction framework is proposed to systematically refine preliminary predictions from similar-day matching. Furthermore, the designed “down-top + top-down” bidirectional mixer architecture enables comprehensive integration of power curve patterns across different temporal resolutions, substantially improving both forecast accuracy and reliability.
Comprehensive experimental studies are conducted on real-world PV sites in China. The results demonstrate the superiority of the proposed model in terms of prediction accuracy and robustness.

The remaining parts of this paper are organized as follows. In Section 2, the proposed method is introduced in detail. Section 3 illustrates the experimental setup. Section 4 presents the experimental results and analysis. Finally, conclusions are drawn in Section 5.

2. Methods

Given the widespread absence of real-time historical data due to the sensing device or transmission failure, coupled with the multi-scale nature of PV generation, this study proposes a novel model (i.e., SN-Transformer-BiMixer) for day-ahead PV forecasting, as shown in Figure 1.

Specifically, the proposed SN-Transformer-BiMixer mainly consists of a synergistic collaboration of three core modules, as follows: (1) SN module with its twin-branch structure and shared weights demonstrates excellent performance in small-sample classification tasks, enabling effective identification of days similar to the target forecast day. (2) The Transformer module based on self-attention mechanisms precisely captures complex temporal dependencies in power data to refine the similar-day curves generated by the SN module, thereby facilitating in-depth analysis of relationships among NWP data, historical power generation, and target forecasting power. (3) The innovatively designed BiMixer effectively integrates prediction information across different temporal scales from the Transformer output, achieving complementary optimization of forecasting results through mutual calibration of multi-scale features.

The following sections provide a detailed description of each module within the proposed model, SN-Transformer-BiMixer.

2.1. Identification of Multi-Scale Similar Days by SN

The core of selecting similar historical days for PV power forecasting is to ensure the selected days closely match the target day’s features. However, PV power is influenced by uncertain meteorological factors like solar irradiance, temperature, and humidity. Meanwhile, actual production environments often have missing data issues. Traditional methods that select similar days from continuous time series rely on simple temporal correlation, which fails to handle these uncertainties and thus cannot achieve stable and accurate forecasting.

In addition, when traditional classification methods classify small-sample data, as the number of classification types increases, the number of samples in each category will decrease, resulting in a reduction in the accuracy of the classifier. To effectively solve the small-sample problem, Tolosana et al. [33] use a Siamese neural network model to learn the similarity between different samples, and then match samples of unknown categories. This method has been successfully applied to the field of online signature verification. The Siamese neural network is a special neural network structure, which is composed of two sub-networks with a Siamese relationship. The two sub-networks have the same structure and share weights, but have different inputs. The Siamese neural network can simultaneously learn the features of two input samples. By comparing and analyzing the differences and similarities between these features, it can explore the internal connections of the data and play a unique role in fields such as image matching, similar text recognition, and anomaly detection [34].

Therefore, in our study, the SN is applied to select similar historical days in the forecasting of PV power generation. Firstly, the power data and NWP data are down-sampled to generate two types of data at different scales. Notably, the NWP encompasses three critical meteorological features, as follows: surface horizontal radiation, diffuse radiation, and direct radiation, which have a strong correlation with photovoltaic power generation [35,36,37].

For each scale, according to the power data, K-means clustering is carried out on the normalized power data (that is, the trend of the power curve) in the spring, summer, autumn, and winter seasons, and several samples are selected as “typical days”. Subsequently, the NWP data of the forecast day are input into the SN as the key factor in determining which typical pattern the predicted day belongs to. It is important to note that the power-based clustering and the meteorology-based classification remain theoretically decoupled. Clustering is performed solely on historical power curves to identify representative patterns (“typical days”), while the Siamese network operates in the meteorological feature space to learn similarity relationships that reflect these power-based types. This design avoids ungrounded fusion of feature spaces and maintains the physical causality between weather and photovoltaic output. Finally, a weighted calculation is carried out similarly to the weight to obtain the basic forecast result of the PV power of the forecast day at this scale.

Specifically, the SN structure adopted in this study is shown in Figure 2. The input data of the SN is a sample pair of NWP data

(Z_{i}, Z_{j}; Z_{i j})

. In the actual production environment, there may be missing data in the PV power generation data. However, the NWP information can provide relatively complete data. Therefore, using the NWP data as the model input can avoid the problem of low prediction accuracy caused by missing data. Among them,

Z_{i}

and

Z_{j}

represent the NWP data of the i-th day and the j-th day, respectively.

Z_{i j}

represents the label indicating whether

Z_{i}

and

Z_{j}

belong to the same category. When

Z_{i}

and

Z_{j}

are samples under the same category,

Z_{i j} = 1

, otherwise,

Z_{i j} = 0

.

ω

represents the model parameters, and

d_{i j}

represents the distance measure between samples, and its expression is as follows:

d_{i j} = \sqrt{{(G_{ω} (Z_{i}) - G_{ω} (Z_{j}))}^{2}} .

(1)

Here,

G_{ω} (Z_{i})

and

G_{ω} (Z_{j})

, respectively, represent the mapping functions that transform the input data

Z_{i}

and

Z_{j}

into low-dimensional feature vectors.

The loss function of the SN usually adopts the contrastive loss function, which is shown as follows:

L = \frac{1}{N} \sum_{i = 1}^{N} (Z_{i j} d_{i j}^{2} + (1 - Z_{i j}) max {(ϵ - d_{i j}, 0)}^{2}) .

(2)

Here,

ϵ

is the set threshold. The SN loss function realizes the classification learning of samples through the influence of the distance between samples of different categories on the loss value. When the samples belong to different categories (

Z_{i j} = 0

) and the distance is less than the threshold

ϵ

, the loss increases as the distance decreases, prompting the model to increase the distance between samples of different categories; when the samples are of the same category (

Z_{i j} = 1

), the loss increases as the distance increases, driving the model to reduce the distance between samples of the same category. In this way, the model is guided to learn the feature representations that can effectively distinguish samples of different categories, improving the classification accuracy and generalization ability.

2.2. Correction of Multi-Scale Similar Curves by Transformer

Although the SN can provide the power prediction results for the forecast day, this model mainly focuses on the similarity of the shape of the power curve and fails to fully consider the magnitude of the power values as well as the inherent randomness and volatility of PV power generation.

In this context, this study innovatively constructs a power correction module. This module takes the prediction from the SN as the reference value and combines it with the NWP data to dynamically correct the prediction results, thereby improving the accuracy of the power output prediction under actual weather conditions. Specifically, the Transformer module is adopted as the post-correction module. It effectively solves the problem that traditional complex recurrent or convolutional neural networks struggle with in parallel computing in sequence processing, greatly improving training efficiency and significantly reducing training time. Meanwhile, it demonstrates good generalization ability in multi-task scenarios, capable of efficiently learning data features and optimizing model performance in different scenarios [38]. The specific formulations are as follows:

{\hat{Y}}^{T} = T r a n s f o r m e r (X) + {\hat{Y}}^{S N} .

(3)

Here, X represents the NWP data, including three significant meteorological variables, as follows: surface horizontal radiation, diffuse radiation, and direct radiation.

{\hat{Y}}^{T}

represents the corrected predicted value of the photovoltaic power generation by the Transformer, and

{\hat{Y}}^{S N}

represents the initial prediction generated by the SN model. Transformer(·) indicates that this study employs the Transformer module to optimize the initial predictions. The module’s training objective is to minimize the error between the true values and the SN-generated predictions.

The core of the Transformer module lies in the multi-head attention mechanism. Each single-head attention mechanism, which serves as the building block of multi-head attention, can be calculated as follows. Given a query matrix Q, a key matrix K, and a value matrix V, the attention score is computed using the dot-product operation:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V .

(4)

Here, dividing by

\sqrt{d_{k}}

helps prevent the dot-product values from becoming excessively large, which could lead to extremely small gradients in the softmax function.

Multi-head attention integrates multiple single-head attention mechanisms to capture diverse features of the input sequence. It is defined as follows:

\begin{matrix} MultiHead (Q, K, V) = Concat ({head}_{1}, \dots, {head}_{h}) W^{O}, \end{matrix}

(5)

\begin{matrix} {head}_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}) . \end{matrix}

(6)

Here,

W_{i}^{Q} \in R^{d_{model} \times d_{k}}

is the query weight matrix for the i-th attention head,

W_{i}^{K} \in R^{d_{model} \times d_{k}}

is the key weight matrix for the i-th head,

W_{i}^{V} \in R^{d_{model} \times d_{v}}

is the value weight matrix for the i-th head,

W^{O} \in R^{h d_{v} \times d_{model}}

is the output projection matrix. h represents the number of heads,

d_{k}

denotes the dimension of queries/keys,

d_{v}

denotes the dimension of values, and

d_{model}

is the model’s base dimension (input/output dimension of all sub-layers).

In the Transformer encoder, after the multi-head attention layer, there is a feed-forward neural network (FFN) composed of two linear layers with a ReLU activation function in between. The output of the FFN is calculated as follows:

FFN (x) = ReLU (0, x W_{1} + b_{1}) W_{2} + b_{2} .

(7)

Here,

W_{1} \in R^{d_{model} \times d_{ff}}

,

W_{2} \in R^{d_{ff} \times d_{model}}

are weight matrices, and

b_{1} \in R^{d_{ff}}

,

b_{2} \in R^{d_{model}}

are bias vectors.

d_{ff}

is the FFN’s intermediate dimension (typically 4×d model in standard Transformer designs).

By leveraging the Transformer to correct the initial predictions from the SN, we obtain the refined PV power prediction results.

{\hat{Y}}^{T r a n s f o r m e r} = {{\hat{Y}}_{t_{0}}^{T}, {\hat{Y}}_{t_{1}}^{T}, \dots, {\hat{Y}}_{t_{m}}^{T}}

. Among them,

{\hat{Y}}_{t_{i}}^{T}

is the power prediction result with a time resolution of

t_{i}

.

2.3. Fusion of Multi-Scale Information by BiMixer

PV power is affected by multiple factors such as meteorological conditions and the position of the sun, and its variation characteristics are different at different time scales. It is difficult for a single scale to comprehensively capture the differentiated changes. The multi-scale method can process data at different scales simultaneously. For example, it can handle the rapid fluctuations of light intensity in a short period and the power changes caused by the transition of weather types over a long period, thus accurately capturing the variation characteristics of PV power [39].

Nevertheless, existing studies predominantly employ unidirectional paradigms for multi-scale prediction fusion. For instance, Chen et al. [29] introduced Pathformer, which utilizes attention mechanisms to directly summarize multi-scale information. Similarly, Zhu et al. [28] propose MSGNet, which applies frequency-domain decomposition to synthesize multi-scale features. In these methods, the attention mechanism relies on data-driven, unidirectional weight assignment, while frequency-domain fusion depends on a one-time domain transformation and concatenation. However, both approaches fail to effectively manage dynamic interactions or ensure global consistency across multiple scales, thereby limiting their ability to fully exploit the complementary information present across all scales [14,32].

Given this, this study conducts research from a multi-scale perspective. Based on the SN and the Transformer correction module described above, the corrected PV power prediction results of the forecast day at different scales are obtained. Subsequently, this study designs a bidirectional mixer module. This module is capable of fully integrating the information of prediction results at different scales, effectively bridging the shortcomings of unidirectional fusion methods. The structure of the bidirectional mixer module is shown in Figure 3.

The bidirectional mixer module designed in this study is composed of a “down-top” mixer module and a “top-down” mixer module, and finally, the effective combination of the two sub-modules is achieved through an MLP model. The input data of the bidirectional mixer module is

{\hat{Y}}^{T r a n s f o r m e r}

. The specific analysis of the two modules is as follows.

2.3.1. “Top-Down“ Mixer

The “top-down” mixer module constructed using an MLP is designed to transform low-resolution scale features into high-resolution ones, facilitating the progressive transformation of macroscopic-level information into more detailed representations. This hierarchical refinement process is formalized by the following equation:

for i : (m - 1) \to 0 do : {\hat{Y}}_{t_{i}}^{T} = {\hat{Y}}_{t_{i}}^{T} + M L P_{t_{i}}^{t o p - d o w n} ({\hat{Y}}_{t_{i + 1}}^{T}) .

(8)

Here, the

M L P_{t_{i}}^{t o p - d o w n} (\cdot)

contains one hidden layer with a ReLU activation function. The “top-down” mixer module can uncover the key features hidden at the high-resolution scale, providing richer and more accurate information for photovoltaic power forecasting. Thereby, we obtain the “top-down” fusion results

{\hat{Y}}^{t o p - d o w n} = {{\hat{Y}}_{t_{0}}^{t d}, {\hat{Y}}_{t_{1}}^{t d}, \dots, {\hat{Y}}_{t_{m}}^{t d}}

.

2.3.2. “Down-Top” Mixer

Conversely, the “down-top” mixer module primarily projects high-resolution features onto low-resolution ones. It efficiently filters redundant information from high-resolution data, emphasizing key features and enabling the model to capture overarching patterns of power changes from a macroscopic perspective. The specific equation is expressed as follows:

for i : 1 \to m do : {\hat{Y}}_{t_{i}}^{T} = {\hat{Y}}_{t_{i}}^{T} + M L P_{t_{i}}^{d o w n - t o p} ({\hat{Y}}_{t_{i - 1}}^{T}) .

(9)

Here, the

M L P_{t_{i}}^{d o w n - t o p} (\cdot)

contains one hidden layer with a ReLU activation function. According to this module, new fusion results

{\hat{Y}}^{d o w n - t o p} = {{\hat{Y}}_{t_{0}}^{d t}, {\hat{Y}}_{t_{1}}^{d t}, \dots, {\hat{Y}}_{t_{m}}^{d t}}

are generated.

Based on the prediction results

{\hat{Y}}^{t o p - d o w n}

obtained from the “top-down” mixer module and the prediction results

{\hat{Y}}^{d o w n - t o p}

obtained from the “down-top” mixer module, a linear layer is used to fuse the prediction results at each scale. Meanwhile, to capture prediction information, an MLP model is utilized to combine with the initial prediction results

{\hat{Y}}_{t_{i}}^{T}

, and the prediction results on each scale are mapped to the time resolution of

t_{0}

. That is,

\begin{matrix} {\hat{Y}}_{t_{i}}^{M i x i n g} = M L P ({\hat{Y}}_{t_{i}}^{T} + L i n e a r ({\hat{Y}}_{t_{i}}^{t d} + {\hat{Y}}_{t_{i}}^{d t})) . \end{matrix}

(10)

Here, the

M L P (\cdot)

has one hidden layer, and the activation function is ReLU. Thus, we obtain the bidirectional mixing power prediction results at each scale

{\hat{Y}}^{M i x i n g} = {{\hat{Y}}_{t_{0}}^{M i x i n g}, {\hat{Y}}_{t_{1}}^{M i x i n g}, \dots, {\hat{Y}}_{t_{m}}^{M i x i n g}}

.

Finally, the PV power prediction for the forecast day is derived via the average weighting method.

\hat{Y} = \frac{1}{m} \sum_{i = 0}^{m} {\hat{Y}}_{t_{i}}^{M i x i n g} .

(11)

3. Experimental Setup

3.1. Dataset

The real-world PV power generation records of a PV power station in Hebei Province, China, were used for experimental analyses. Apart from power generation data, the dataset also encompasses numerical weather prediction data with a time resolution of 15 min, covering the period from 4 February 2023 to 4 July 2024. The data were partitioned into training, validation, and test sets in a 7:2:1 ratio, as shown in Figure 4. Specifically, the training set starts on 4 February 2023; the validation set starts on 18 December 2023; and the test set starts on 4 May 2024. The training set is used to optimize the model parameters, the validation set guides the training process and prevents overfitting, while the test set evaluates the performance of the trained model on unseen data during applications.

In this study, the Numerical Weather Prediction data were obtained from the Xihe Energy Big Data Platform (www.xihe-energy.com (accessed on 1 August 2025)), encompassing key meteorological variables such as surface horizontal radiation, diffuse radiation, and direct radiation. Taking direct radiation as a representative example, Figure 5 illustrates the prediction accuracy at each time point, offering an intuitive visualization of performance fluctuations across different temporal segments. Additionally, a comprehensive statistical summary of prediction accuracy covering historical irradiance at each time point and evaluated using RMSE, MAPE, and

R^{2}

, is presented in Table 2, providing quantitative details to complement the graphical insights.

3.2. Evaluation Metrics

To verify the prediction performance of the proposed model, the mean absolute percentage error (MAPE) and root mean square error (RMSE) are selected as the evaluation metrics, as they are popular and straightforward [40]. The details are as follows:

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{\hat{Y_{i}} - Y_{i}}{Y_{i}}|,

(12)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\hat{Y_{i}} - Y_{i})}^{2}},

(13)

Here,

Y_{i}

and

\hat{Y_{i}}

represent the actual value and the predicted value of the PV power generation on the target day i, respectively.

3.3. Hyperparameter Settings

To verify the superiority of the proposed prediction method, this study selects eight distinct models as baselines for performance comparison, such as MLP [11] as well as its state-of-the-art variations, including DLinear [12], N-BEATS [13], and TimeMixer [14]. In addition, TimesNet [15], RNN [16], and LSTM [17] are also selected as the CNN-based and RNN-based baselines. Furthermore, Transformer [19], Informer [20], and PatchTST [21], which are constructed based on the attention mechanism, are also used for comparative analysis. The hyperparameter settings for each model are shown in Table 3.

For the proposed SN-Transformer-BiMixer model, the SN component consists of two 1D convolutional layers with 16 and 32 neurons, respectively, followed by a 128-neuron linear layer to achieve low-dimensional feature mapping for NWP data at different scales. The Transformer module consists of two encoder layers with an embedding dimension of 512 and a decoder layer, where the feed-forward network maintains a consistent dimension of 512 neurons, thereby enabling effective correction of initial predictions through cross-modal feature interaction. The BiMixer architecture comprises bidirectional subnetworks, as follows: the “top-down” mixer module, structured with three fully connected layers of 24-48-96 neurons, captures hierarchical dependencies from coarse-to-fine-grained features across scales; the “down-top” mixer module, conversely using 48-24-12 neuron fully-connected layers, transforms high-resolution scale features into low-resolution scale features. Outputs from both directional subnetworks are finally fed into a 96-neuron fully connected layer, enabling the effective combination of the two sub-modules.

All models are constructed based on the PyTorch-2.5.1+cu124 framework and are trained by the Adam optimizer with the early stopping strategy. The experiments are run on an NVIDIA RTX4060 16 GB GPU (Manufacturer: NVIDIA Corporation; Headquarters: Santa Clara, CA, USA).

4. Results and Analysis

4.1. Main Results

To offer a comprehensive overview of the proposed model, this section presents the experimental results for each stage of the SN-Transformer-BiMixer model. Specifically, Section 4.1.1 outlines the detailed construction process for the ‘typical day’ and provides a comparative analysis of the predictive performance achieved at various time-scales through the similar-day method based on SN. Section 4.1.2 compares and analyzes the predictive performance at different time-scales following the application of Transformer-based modifications. Section 4.1.3 then discusses how BiMixer can be used to generate the final 15 min resolution prediction results based on refined multi-scale outputs. These results are then compared with those from existing prediction models (e.g., MLP, DLinear, and TimeMixer).

4.1.1. Selecting Similar Days by SN

For the SN to effectively identify days exhibiting similar characteristics to the forecast day, the construction of an appropriate support set is essential. For simplicity and without loss of generality, the clustering method is used to generate support set samples.

Specifically, the historical PV power data is first organized into daily groups, each consisting of 96 time points. Next, all daily groups spanning from 4 February 2023 to 4 February 2024 were systematically categorized into four distinct sub-datasets corresponding to their respective seasons (i.e., spring, summer, autumn, and winter). Then, for each sub-dataset, the normalized power data is adopted for clustering analysis by the K-means algorithm, with the number of clusters determined using the silhouette coefficient method [41]. Compared with the clustering based on the original data, the clustering of the normalized data can effectively eliminate the interference of dimensions and value ranges, unify the variable scale, accurately measure the data similarity, and highlight the shape characteristics of the data distribution. Thus, the data with similar shapes can be accurately classified into one category [42]. For each cluster, 2 samples with the closest Euclidean distance to the centroid are selected as the “typical days” of this category, totaling 40 “typical days”. The specific flowchart of this process is shown in Figure 6.

As visualized in Figure 6, clustering by seasons (spring/summer/fall/winter) extracts typical days with PV-generation-specific features; summer days reflect high-irradiance peaks (and weather-driven volatility like thunderstorm-induced drops) and winter days show low but stable output under weak sunlight. These season-aware patterns embed physical insights into the SN module, enabling accurate multi-scale similarity matching.

Subsequently, a support set and supervised sample pairs are constructed based on the “typical days” and input into the SN to complete the training of the model. All the datasets are used as forecast days and matched with the “typical days” to form forecast-sample pairs, which are input into the trained SN to obtain the basic power prediction results for all the days in the dataset. The SN is used to identify similar days for the datasets of different time scales.

As shown in Table 4, the results of similar-day analysis at the low-resolution dataset exhibit superior prediction performance compared to those at high-resolution. For example, in terms of the RMSE metric, the prediction result for the 15 min resolution dataset is 3.152, while that for the 2 h resolution dataset is 2.856. This divergence arises because low-resolution NWP data inherently aggregate short-term fluctuations into coarser time units, thereby distilling daily-scale meteorological signatures that determine similarity between days. By contrast, high-resolution data preserve transient variations that act as noise in cross-day pattern matching, overwhelming the stable daily trends critical for similar-day identification. Consequently, when trained and applied to low-resolution data, the model demonstrates enhanced accuracy and stability.

4.1.2. Correcting Forecasting Baselines by Transformer

Taking the prediction results derived from the SN as baselines for the target day, the Transformer corrects these values to generate the adjusted prediction results for each time scale. The corresponding performance metrics of the test sets across different time scales are presented in Table 5.

As shown in Table 5, the refinement conducted by Transformer improves the prediction performance of SN-derived forecast baselines across datasets of varying scales. Specifically, for the RMSE metric, for the dataset with a 15 min time resolution, the RMSE is reduced by 11.6%. For the 30 min dataset, the reduction is 15.9%. The 1 h dataset experiences a 27.9% decrease in RMSE, and the 2 h dataset shows a 22.0% reduction.

To intuitively illustrate the improvements in various metrics across different time scales after correction, the prediction results and corresponding metrics for each time scale are systematically visualized using radar charts.

As illustrated in Figure 7, the prediction results at each time scale exhibit substantial improvement after correction by the Transformer module. This outcome fully demonstrates the effectiveness and superiority of the Transformer module in refining prediction results.

Meanwhile, among the prediction results of datasets at different time scales, there is a situation where each has its advantages and disadvantages. Taking the datasets with time scales of 1 h and 2 h as examples, for the prediction results of the dataset with a 2 h time scale, its RMSE is lower than that of the dataset with a 1 h time scale. However, in terms of the MAPE index, the prediction result of the dataset with a 2 h time scale is higher than that of the dataset with a 1 h time scale. These differences in prediction performance across datasets of varying resolutions highlight the limitations of relying on a single data scale, underscoring the necessity of integrating multi-scale prediction results to elevate forecasting accuracy.

4.1.3. Fusing Multi-Scale Forecasting Results by BiMixer

Building on the previously described SN and Transformer correction module, this study constructs a bidirectional mixer module to enable multi-scale prediction fusion across different data scales. The final prediction results are shown in Table 6, with the detailed analysis presented as follows:

As illustrated in Table 6, the proposed SN-Transformer-BiMixer model consistently demonstrates superior predictive accuracy relative to existing methodologies. Specifically, concerning RMSE, the proposed architecture achieves reductions of 27.00% compared to RNN, 40.01% compared to LSTM, and 37.98% compared to TimesNet implementations. Furthermore, when benchmarked against attention mechanism-based architectures, SN-Transformer-BiMixer exhibits significant performance enhancements, with RMSE reductions of 11.32% compared to the Informer model and 4.52% compared to the standard Transformer model. Most notably, the improvement is particularly pronounced when compared to PatchTST, with the proposed model achieving a 47.33% lower RMSE and a 74.86% lower MAPE, highlighting its superior ability to capture complex patterns in PV power prediction.

In addition, consistent with the practical application scenarios of day-ahead PV power prediction, forecasts are executed only once per day, with each prediction generating a sequence of 96 data points representing the complete diurnal cycle for the subsequent day. Consequently, the available supervised training dataset for the period spanning 4 February 2023 to 17 December 2023 comprises merely 317 samples, presenting a significant constraint on models that require a large number of samples for training. For example, several MLP-based models, e.g., MLP, N-BEATS, DLinear, and TimeMixer, exhibit poor prediction performance in this case. It is conceivable that one reason for the superiority of our proposed approach is its reduction of learning complexity through the identification of a general power generation pattern via SN, followed by the implementation of corrections by a Transformer.

Notably, all comparative models (including Transformer, Informer, N-BEATS, PatchTST, and so on) lack a typical day selection mechanism, while our model incorporates this strategy. This difference enables our approach to more effectively leverage meteorological consistency to mitigate the impact of limited samples, thereby highlighting the distinct advantage of typical day selection in enhancing prediction performance.

Furthermore, the comparative prediction errors of the evaluated models are visualized in Figure 8. Intuitively, SN-Transformer-BiMixer demonstrates remarkable superiority over existing methodologies with respect to both RMSE and MAPE performance metrics. Additionally, the specific calculation of the training time for each model can be found in Appendix A.

4.2. Ablation Study

4.2.1. Effectiveness Analysis of Similar-Day Method (SN)

To evaluate the effectiveness of similar days determined by the SN method in subsequent predictions, this study conducts a comparative study between the SN method and the maximal information coefficient (MIC) method [43], which is a commonly used feature selection method. The specific approach is to use the above methods to obtain the corresponding similar-day datasets and evaluate the prediction errors of the models constructed based on each dataset.

The SN-Transformer-BiMixer model retains its original architecture, utilizing similar-day data from the SN method as input. In contrast, the MIC-Transformer-BiMixer model modifies only the input source. It employs similar-day data derived from the MIC method while maintaining identical processing steps—power correction via the Transformer module and multi-scale prediction result fusion through the bidirectional mixer module. Both models share the same test environment and dataset division ratios (training/validation/test sets). Prediction results are presented in Table 7.

Table 7 illustrates that our model significantly outperforms the MIC-Transformer-BiMixer model. Specifically, compared with the model based on the MIC method, the prediction model constructed by screening similar days using the SN method has a 27.04% reduction in the RMSE value and a 49.34% reduction in the MAPE index. This indicates that the data of similar days determined by the SN method can more accurately reflect the variation law of PV power, and has significant advantages in improving the goodness of fit and prediction accuracy of the prediction model, providing more reliable data support for subsequent PV power prediction.

4.2.2. Effectiveness Analysis of Bidirectional Mixer Module (BiMixer)

To validate the performance of the proposed bidirectional mixer module, we select four baseline models for comparison, covering both simple and sophisticated unidirectional fusion strategies—the MLP unidirectional mixer module (denoted as SN-Transformer-MLP), which retains only the “top-down” sub-module of the bidirectional mixer; the average weighting module (denoted as SN-Transformer-Average), which applies average weighting to multi-scale prediction results at the same time step; a self-attention-based multi-scale fusion module inspired by Pathformer [29] (denoted as SN-Transformer-Attention); and a frequency-domain-based multi-scale fusion module inspired by MSGNet [28] (denoted as SN-Transformer-Frequency). All models adopt the SN method for similar-day selection and use the same Transformer architecture. The PV power prediction metrics obtained by these three multi-scale mixing approaches are presented in Table 8.

Table 8 demonstrates that the proposed SN-Transformer-BiMixer model outperforms all four baseline models across both RMSE and MAPE metrics, confirming the effectiveness of the bidirectional fusion strategy. Specifically, compared to the MLP-based unidirectional mixer (SN-Transformer-MLP), it achieves a 6.24% reduction in RMSE and a 1.58% reduction in MAPE. When compared to the simple average weighting method (SN-Transformer-Average), the improvements are even more pronounced, with a 10.69% lower RMSE and 9.20% lower MAPE.

Notably, SN-Transformer-BiMixer also outperforms the two more sophisticated unidirectional fusion approaches. It reduces RMSE by 1.58% and MAPE by 0.37% compared to the attention-based method (SN-Transformer-Attention) and achieves a 0.80% lower RMSE and 5.38% lower MAPE than the frequency-domain fusion model (SN-Transformer-Frequency). These consistent improvements across all baselines collectively validate the superiority of the bidirectional mixer architecture in effectively integrating multi-scale prediction information, highlighting its ability to take advantage of both disentangled variations and complementary forecasting information from multi-scale series simultaneously.

4.3. Expanded Analysis of SN-Transformer-BiMixer in Data Missing Scenarios

To comprehensively evaluate the performance of the proposed day-ahead photovoltaic power prediction model based on SN-Transformer-BiMixer under missing data scenarios, a simulation experiment was conducted.

Specifically, in real-world operations, missing data can arise from various factors, including sensor malfunctions, network transmission fluctuations, temporary shutdowns for maintenance, or operational halts due to insufficient demand. Given the complexity of anomaly detection [44] and the diverse origins of data gaps in practical settings, our analysis focuses exclusively on known missing periods where the temporal boundaries of data loss have been identifiable. Specifically, to simulate the missing data scenarios, we implemented controlled masking of the test set data [45,46], with simulated missing data presented in Figure 9.

In addition, since

N a N

values or unrecognized markers cannot be processed by the model, three common imputation strategies are adopted in this study, with zero-padding being the first. As a widely used method [47], it preserves the original signal length while altering frequency amplitudes and introducing unintended high-frequency components. The second strategy is constant-value imputation, where all missing data points in this experiment are uniformly filled using the power data from the corresponding time periods on the first day of the test set. This approach assumes a stable baseline pattern at specific time slots, leveraging initial valid measurements as a consistent reference. The third strategy is time-dependent imputation, which adopts a more dynamic approach—missing data are filled with the real power values from the corresponding time periods of the previous day. This method accounts for potential short-term temporal correlations in power generation, aiming to reflect recent operational patterns more accurately. For these three distinct imputation methods, the following section conducts an experimental study on the proposed approach to systematically evaluate its performance variations under each scenario.

4.3.1. Zero-Padding

The comparative performance metrics between our model and baseline counterparts based on the zero-padding strategy are presented in Table 9.

Table 9 demonstrates that our model exhibits significant advantages over other comparative models under the zero-padding scenario. Specifically, in terms of RMSE, the model achieves a 10.01% reduction compared to the attention-based Informer model and an 8.99% reduction relative to the Transformer model. For MAPE, the corresponding improvements are 5.47% and 18.81% reductions compared to Informer and Transformer, respectively.

To further demonstrate the superiority of our method in the data-missing scenario, this study selected a sample date (5 June 2025) with data loss from the experimental samples and conducted an in-depth analysis of its prediction performance. The results are presented in Figure 10. Notably, to ensure clarity and readability of the figure, only the Transformer and Informer models, which have relatively higher prediction accuracy, were included in the comparative analysis.

As shown in Figure 10, under the zero-padding imputation scenario for the data-missing prediction task, the method proposed in this study significantly outperforms the Transformer and Informer models. The prediction curve derived from our approach closely matches the true values in terms of both trend patterns and numerical scales, further highlighting the superiority of the proposed method.

4.3.2. Constant-Value Imputation

The comparative performance metrics between our model and baseline counterparts based on the constant-value imputation strategy are presented in Table 10.

As shown in Table 10, our proposed model maintains significant advantages over comparative models in the constant-value imputation scenario. Specifically, in terms of RMSE, our model outperforms the Transformer by 6.37% and the Informer by 17.09%. For MAPE, the improvements are even more notable—a 7.57% reduction compared to the Transformer and a 21.68% reduction relative to the Informer. These consistent gains highlight the model’s robustness in mitigating the biases introduced by constant value-filling of missing data.

4.3.3. Time-Dependent Imputation

The comparative performance metrics between our model and baseline counterparts in the time-dependent imputation scenario are presented in Table 11.

As shown in Table 11, our model remains superior to other baseline models. In terms of RMSE, our model achieves a 2.57% improvement over the Transformer and a 13.81% improvement over the Informer. For MAPE, while the reduction relative to the Transformer is marginal (5.07%), our model still outperforms the Informer by 10.72%. This stability across dynamic imputation strategies, where missing values are filled with time-adaptive estimates, underscores the model’s ability to adapt to varying data completeness.

In summary, across the three imputation strategies (zero-padding, constant-value imputation, and time-dependent imputation), our proposed SN-Transformer-BiMixer consistently outperforms all baseline models in prediction accuracy, confirming its strong applicability to data-missing scenarios.

This superiority stems primarily from the integration of the SN module, which leverages NWP data to identify days with meteorological conditions highly similar to the forecast day. By effectively mining and incorporating such similar-day data, particularly critical when power data is missing, the model mitigates the negative impacts of incomplete input. Whether facing static fills (zero or fixed values) or dynamic time-varying fills, the SN module’s ability to anchor predictions in meteorologically consistent historical patterns ensures robust performance, validating the model’s practical value in real-world scenarios where data completeness is often compromised.

4.4. NWP Sensitivity Analysis

To further investigate the impact of NWP prediction accuracy on the performance of our proposed model, we conducted a sensitivity analysis by introducing varying levels of noise into the NWP data. The noise was sampled from a Gaussian distribution with a mean of 0 and standard deviations corresponding to noise levels of ±10%, ±20%, ±30%, and ±40% relative to the original data magnitude. This approach enables a systematic evaluation of the model’s robustness under different levels of NWP prediction accuracy [48,49].

As shown in Table 12, the prediction performance of the proposed model shows a distinct degradation trend as NWP noise levels increase. Specifically, when noise levels rise from 0 to ±40%, RMSE increases significantly by 155.5%, while MAPE exhibits a more pronounced increase of 234.12%. This indicates that the model is more sensitive to significant input inaccuracies when evaluated by MAPE.

Notably, performance degradation accelerates once noise levels exceed ±20%; the RMSE growth rate increases from 52.22% (at ±20%) to 85.51% (at ±30%), and the MAPE increase becomes particularly marked (from 12.95% to 135.62%). This suggests the model maintains relatively stable performance under moderate noise levels (≤±20%) but becomes more vulnerable as input uncertainty increases.

In summary, the proposed model exhibits a degree of robustness to minor NWP inaccuracies, yet its reliability is markedly compromised by severe input noise. This underscores the importance of high-quality NWP data for achieving optimal model performance.

5. Conclusions

Accurate day-ahead PV power forecasting is of paramount importance for maintaining the safe and stable operation of the power grid. However, the phenomenon of data missing is widespread in engineering applications and often causes common data-driven machine-learning models to produce large errors in photovoltaic power prediction. In this context, this study proposes a novel method called SN-Transformer-BiMixer. Wherein, a Siamese neural network is introduced to identify days similar to the forecast day without large training datasets. Then, a Transformer and a bidirectional mixer are constructed to refine similar curves derived from SN for better accuracy.

Comprehensive experimental evaluations validate the superiority of the proposed method over existing approaches for day-ahead PV forecasting. Specifically, the day-ahead PV forecasts at 15 min resolution, generated via the SN-based similar-day selection method, achieve an RMSE of 3.152, substantially outperforming the state-of-the-art time-series model, TimesNet, which yields an RMSE of 4.015. After applying the Transformer-based correction, the RMSE of forecasts derived from similar days declines to 2.786, marking an 11.6% improvement in accuracy. Further incorporation of multi-scale information (2 h, 1 h, and 30 min) coupled with a bidirectional fusion of the intermediate predictions reduces the RMSE to 2.490, yielding an additional 10.63% gain. Overall, these findings provide compelling evidence that augmenting the SN-based similar-day selection framework with Transformer-based correction and bidirectional fusion significantly enhances the accuracy of day-ahead photovoltaic forecasting. Moreover, when benchmarked against the Transformer and Informer, our SN-Transformer-BiMixer method also demonstrates superior performance, achieving RMSE reductions of 4.52% relative to the Transformer and 11.32% relative to the Informer.

Ablation experiments further identify the sources of this enhanced performance. Specifically, when selecting similar days, the SN method reduces the RMSE of prediction results by 27.04% compared with the MIC method, validating the effectiveness of the SN model in similar-day data selection. In the fusion of multi-scale prediction results, the BiMixer model reduces the RMSE by 6.24% compared with unidirectional fusion approaches and by 10.69% compared with the average weight method. It also outperforms the attention-based method (SN-Transformer-Attention) by reducing RMSE by 1.58% and the frequency-domain fusion model (SN-Transformer-Frequency) by lowering RMSE by 0.80%. These results reflect the superiority of the bidirectional fusion model over other fusion models. Importantly, the proposed method also exhibits robustness to data imperfections. Experiments on datasets containing missing data, conducted with zero-padding imputation for missing data, show stable prediction results, with the model achieving a notable 10.01% reduction in RMSE compared to the Informer model and 8.99% compared to the Transformer model, even under these challenging conditions.

Given the diverse and seasonally complex characteristics of PV power curves, the K-means clustering algorithm used in this study to select ‘typical days’ may ignore some site-specific, low-frequency but important patterns, such as bimodal curves on power-limited days. For future work, a manual screening process will be incorporated to more comprehensively identify certain types of power curves as ‘typical days’. More research will be conducted on the detection and simulation experiments of data missing scenarios. Furthermore, further research will focus on the detection and simulation of missing data scenarios, with the aim of improving model robustness under realistic conditions. In addition, the impact of various loss factors on power generation, including second-order effects, spectral effects, and shading, will be considered to further improve the model’s accuracy in the future. The automatic detection and classification of faults will also be explored.

Author Contributions

Conceptualization, X.H. and Y.B.; methodology, X.D., Y.H. and Q.S.; software, X.L.; validation, X.H., Y.H. and X.L.; data curation, X.D. and Q.S.; writing—original draft preparation, X.D. and Q.S.; writing—review and editing, X.H., Y.H. and X.L.; supervision, Y.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (72242104), Intelligence & Integrity Energy Technology Co., Ltd. (JEPCC-KYXM-2024-040), and the Interdisciplinary Research Program of Hust (2024JCYJ020).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Authors Xiaohong Huang, Yating Han, and Xiaokang Li were employed by the company, Intelligence & Integrity Energy Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from Intelligence & Integrity Energy Technology Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

Appendix A. Time Consumption Analysis

In this appendix, we present a time consumption analysis of various forecasting models to complement the prediction accuracy evaluations discussed in the main text. Understanding the computational efficiency, as reflected by time costs, is crucial for practical deployment, especially when balancing model performance and operational overhead. The following table summarizes the time consumption of different models, including our proposed method, and we analyze the trade-offs between time efficiency and prediction performance.

As illustrated in Figure A1, model time consumption varies significantly, with our SN-Transformer-BiMixer incurring a time cost of 74.5, which is markedly lower than TimesNet (84.89). This discrepancy reflects architectural design choices—TimesNet employs highly complex structures to handle long-sequence dependencies, leading to elevated computational demands, whereas our model balances functional richness with practical efficiency. By focusing on targeted enhancements rather than overarching complexity, we maintain relative efficiency while addressing critical challenges like data incompleteness.

Figure A1 further demonstrates that MLP-based models (MLP: 13.57; DLinear: 15.62; N-BEATS: 21.68) possess superior time efficiency compared to attention-mechanism-driven architectures (Transformer: 57.08; Informer: 60.47). This efficiency advantage arises from MLP’s streamlined feature aggregation mechanisms, which circumvent the elevated computational overhead inherent to attention-based operations. This inherent efficiency of MLP architectures informed our architectural decisions—the SN and BiMixer modules are grounded in MLP architectures to capitalize on this efficiency, ensuring the added complexity of these components remains operationally viable without incurring excessive time overhead.

Figure A1. Time consumption of different models.

Our model’s time consumption (74.5) is approximately 29% higher than that of Transformer (57.08), a difference attributable to the integration of the SN and BiMixer modules. However, this moderate increase in time cost is offset by significant gains in prediction accuracy, as our method outperforms the Transformer across key metrics such as RMSE and MAPE, as validated in the main text. These results directly confirm the efficacy of the SN module and the BiMixer, demonstrating that the added complexity delivers tangible performance benefits.

Looking forward, opportunities exist to further optimize efficiency. One promising direction is exploring alternatives to the Transformer backbone, potentially replacing it with more lightweight architectures. Such modifications could reduce time consumption while preserving the advantages of the SN and BiMixer modules, refining the balance between efficiency and accuracy to better suit latency-sensitive applications.

References

Hang, B.; Dou, C.; Yuan, D.; Zhang, Z. Forecasting Strategy of Photovoltaic Generation Considering Multi-Factor Self-Fluctuation. Electr. Power Syst. Res. 2024, 234, 110495. [Google Scholar] [CrossRef]
Zhao, H.; Zhu, D.; Yang, Y. Study on photovoltaic power forecasting model based on peak sunshine hours and sunshine duration. Energy Sci. Eng. 2023, 11, 4570–4580. [Google Scholar] [CrossRef]
Han, H.; Jiang, X.; Zhang, S.; Wu, C.; Cao, S.; Zang, H.; Sun, G.; Wei, Z. A Risk-Based Scheduling Optimization Strategy with Explainability Enhanced Multi-Scenario Photovoltaic Forecasting. Electr. Power Syst. Res. 2025, 246, 111729. [Google Scholar] [CrossRef]
Mayer, M.J.; Gróf, G. Extensive comparison of physical models for photovoltaic power forecasting. Appl. Energy 2021, 283, 116239. [Google Scholar] [CrossRef]
Salamah, T.; Ramahi, A.; Alamara, K.; Juaidi, A.; Abdallah, R.; Abdelkareem, M.A.; Amer, E.C.; Olabi, A.G. Effect of dust and methods of cleaning on the performance of solar PV module for different climate regions: Comprehensive review. Sci. Total Environ. 2022, 827, 154050. [Google Scholar] [CrossRef]
Wang, J.; Hu, W.; Xuan, L.; He, F.; Zhong, C.; Guo, G. TransPVP: A Transformer-Based Method for Ultra-Short-Term Photovoltaic Power Forecasting. Energies 2024, 17, 4426. [Google Scholar] [CrossRef]
Mayer, M. Effects of the Meteorological Data Resolution and Aggregation on the Optimal Design of Photovoltaic Power Plants. Energy Convers. Manag. 2021, 241, 114313. [Google Scholar] [CrossRef]
Chen, Y.; Xu, J. Solar and Wind Power Data from the Chinese State Grid Renewable Energy Generation Forecasting Competition. Sci. Data 2022, 9, 577. [Google Scholar] [CrossRef]
Rus-Casas, C.; Gilabert-Torres, C.; Fernández-Carrasco, J. Optimizing Energy Management and Sizing of Photovoltaic Batteries for a Household in Granada, Spain: A Novel Approach Considering Time Resolution. Batteries 2024, 10, 358. [Google Scholar] [CrossRef]
Wang, X.; Shen, Y.; Song, H.; Liu, S. Data Augmentation-Based Photovoltaic Power Prediction. Energies 2025, 18, 747. [Google Scholar] [CrossRef]
Parvez, I.; Sarwat, A.; Debnath, A.; Olowu, T.; Dastgir, M.; Riggs, H. Multi-Layer Perceptron Based Photovoltaic Forecasting for Rooftop PV Applications in Smart Grid. In Proceedings of the 2020 SoutheastCon, Raleigh, NC, USA, 28–29 March 2020; pp. 1–6. [Google Scholar]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Montréal, QC, Canada, 8–10 August 2023; pp. 11121–11128. [Google Scholar]
Oreshkin, B.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. In Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Wang, S.; Wu, H.; Shi, X.; Hu, T.; Luo, H.; Ma, L.; Zhang, J.; Zhou, J. TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the 11th International Conference on Learning Representations, Virtual, 25 April 2022. [Google Scholar]
Mellit, A.; Shaari, S. Recurrent neural network-based forecasting of the daily electricity generation of a Photovoltaic power system. In Proceedings of the International Conference on Ecological Vehicles and Renewable Energies, Grimaldi Forum, Monaco, 26–29 March 2009. [Google Scholar]
Jung, Y.; Jung, J.; Kim, B.; Han, S. Long Short-Term Memory Recurrent Neural Network for Modeling Temporal Patterns in Long-Term Power Forecasting for Solar PV Facilities: Case Study of South Korea. J. Clean. Prod. 2020, 250, 119476. [Google Scholar] [CrossRef]
Cao, K.; Zhang, T.; Huang, J. Advanced Hybrid LSTM-Transformer Architecture for Real-Time Multi-Task Prediction in Engineering Systems. Sci. Rep. 2024, 14, 4890. [Google Scholar] [CrossRef]
Tian, F.; Fan, X.; Wang, R.; Qin, H.; Fan, Y. A Power Forecasting Method for Ultra-Short-Term Photovoltaic Power Generation Using Transformer Model. Math. Probl. Eng. 2022, 2022, 9421400. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Nie, Y.; Nguyen, N.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In Proceedings of the International Conference on Learning Representations, Virtual, 19–21 May 2023. [Google Scholar]
Ye, G.; Yang, J.; Xia, F.; Shao, F.; Xu, J.; Yang, Z.; Peng, W.; Zheng, Z. Short-Term Photovoltaic Output Power Prediction Based on Similar Day and Optimized BP Neural Network. Int. J.-Low-Carbon Technol. 2024, 19, 766–772. [Google Scholar] [CrossRef]
Acharya, S.; Wi, Y.M.; Lee, J. Day-Ahead Forecasting for Small-Scale Photovoltaic Power Based on Similar Day Detection with Selective Weather Variables. Electronics 2020, 9, 1117. [Google Scholar] [CrossRef]
Gulin, M.; Pavlović, T.; Vašak, M. A One-Day-Ahead Photovoltaic Array Power Production Prediction with Combined Static and Dynamic on-Line Correction. Solar Energy 2017, 142, 49–60. [Google Scholar] [CrossRef]
Zhang, R.; Li, G.; Bu, S.; Kuang, G.; He, W.; Zhu, Y.; Aziz, S. A Hybrid Deep Learning Model with Error Correction for Photovoltaic Power Forecasting. Front. Energy Res. 2022, 10, 948308. [Google Scholar] [CrossRef]
Jiang, T.; Liu, Y. A Short-Term Wind Power Prediction Approach Based on Ensemble Empirical Mode Decomposition and Improved Long Short-Term Memory. Comput. Electr. Eng. 2023, 110, 108830. [Google Scholar] [CrossRef]
Li, N.; Li, L.; Zhang, F.; Jiao, T.; Wang, S.; Liu, X.; Wu, X. Research on Short-Term Photovoltaic Power Prediction Based on Multi-Scale Similar Days and ESN-KELM Dual Core Prediction Model. Energy 2023, 277, 127557. [Google Scholar] [CrossRef]
Zhu, Z.; Zhou, N.; Wang, Z.; Liang, J. MSGNet: A Multi-Feature Lightweight Learning Network for Automatic Modulation Recognition. IEEE Commun. Lett. 2024, 24, 2553–2557. [Google Scholar] [CrossRef]
Chen, P.; Zhang, Y.; Cheng, Y.; Shu, Y.; Wang, Y.; Wen, Q.; Yang, B.; Guo, C. Pathformer: Multi-Scale Transformers with Adaptive Pathways for Time Series Forecasting. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
Brown, M.; Kros, J. Data Mining and the Impact of Missing Data. Ind. Manag. Data Syst. 2003, 103, 611–621. [Google Scholar] [CrossRef]
Kim, T.; Ko, W.; Kim, J. Analysis and impact evaluation of missing data imputation in day-ahead PV generation forecasting. Appl. Sci. 2019, 9, 204. [Google Scholar] [CrossRef]
Qi, S.; Xinze, Z.; Siyue, Y.; Liang, S.; Yukun, B. Multi-scale fused Graph Convolutional Network for multi-site photovoltaic power forecasting. Energy Convers. Manag. 2025, 333, 119773. [Google Scholar] [CrossRef]
Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Ortega-Garcia, J. Exploring Recurrent Neural Networks for On-Line Handwritten Signature Biometrics. IEEE Access 2018, 6, 5128–5138. [Google Scholar] [CrossRef]
Liu, X.; Zhou, Y.; Zhao, J.; Yao, R.; Liu, B.; Zheng, Y. Siamese Convolutional Neural Networks for Remote Sensing Scene Classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1200–1204. [Google Scholar] [CrossRef]
Agbo, E.; Ettah, E.; Edet, C.; Ndoma, E. Characteristics of various radiative fluxes: Global, tilted, direct, and diffused radiation—A case study of Nigeria. Meteorol. Atmos. Phys. 2023, 135, 14. [Google Scholar] [CrossRef]
Gueymard, C.A.; Lara-Fanego, V.; Sengupta, M.; Xie, Y. Surface albedo and reflectance: Review of definitions, angular and spectral effects, and intercomparison of major data sources in support of advanced solar irradiance modeling over the Americas. Solar Energy 2019, 182, 194–212. [Google Scholar] [CrossRef]
Lohmann, G.M. Irradiance variability quantification and small-scale averaging in space and time: A short review. Atmosphere 2018, 9, 264. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 20750–20762. [Google Scholar]
Ma, J.; Zhao, M.; Shen, W.; Yang, Z.; Yu, X.; Lu, S. Photovoltaic Power Prediction Based on NRS-PCC Feature Selection and Multi-Scale CNN-LSTM Network. Int. J. Web Serv. Res. 2024, 21, 1–15. [Google Scholar] [CrossRef]
Koutsandreas, D.; Spiliotis, E.; Petropoulos, F.; Assimakopoulos, V. On the Selection of Forecasting Accuracy Measures. J. Oper. Res. Soc. 2022, 73, 937–954. [Google Scholar] [CrossRef]
Řezanková, H.A.N.A. Different approaches to the silhouette coefficient calculation in cluster evaluation. In Proceedings of the 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics, Kutná Hora, Czech Republic, 29 August –2 September 2018. [Google Scholar]
Mohamad, I.; Usman, D. Standardization and Its Effects on K-Means Clustering Algorithm. Res. J. Appl. Sci. Eng. Technol. 2013, 6, 168–172. [Google Scholar] [CrossRef]
Terwee, C.; Peipert, J.; Chapman, R.; Lai, J.S.; Terluin, B.; Cella, D.; Griffiths, P.; Mokkink, L. Minimal Important Change (MIC): A Conceptual Clarification and Systematic Review of MIC Estimates of PROMIS Measures. Qual. Life Res. 2021, 30, 3299–3303. [Google Scholar] [CrossRef] [PubMed]
Benedetti, M.; Leonardi, F.; Messina, F.; Santoro, C.; Vasilakos, A. Anomaly Detection and Predictive Maintenance for Photovoltaic Systems. Neurocomputing 2018, 310, 59–68. [Google Scholar] [CrossRef]
Li, D.; Wang, Y.; Wang, J.; Wang, C.; Duan, Y. Recent Advances in Sensor Fault Diagnosis: A Review. Sens. Actuators Phys. 2020, 309, 111990. [Google Scholar] [CrossRef]
Liu, W.; Ren, C.; Xu, Y. PV Generation Forecasting with Missing Input Data: A Super-Resolution Perception Approach. IEEE Trans. Sustain. Energy 2021, 12, 1493–1496. [Google Scholar] [CrossRef]
Liu, Z.; Xuan, L.; Gong, D.; Xie, X.; Liang, Z.; Zhou, D. A WGAN-GP Approach for Data Imputation in Photovoltaic Power Prediction. Energies 2025, 18, 1042. [Google Scholar] [CrossRef]
Sangiorgio, M.; Dercole, F.; Guariso, G. Forecasting of noisy chaotic systems with deep neural networks. Chaos Solitons Fractals 2021, 153, 111570. [Google Scholar] [CrossRef]
Liu, W.; Ren, C.; Xu, Y. Correntropy-Based Echo State Network with Application to Time Series Prediction. IEEE/CAA J. Autom. Sin. 2025, 12, 425–435. [Google Scholar]

Figure 1. The SN-Transformer-BiMixer framework.

Figure 2. The structure of SN. NWP: numerical weather prediction data;

G_{ω} (\cdot)

: represents the mapping functions;

d_{i j}

: Euclidean distance.

Figure 2. The structure of SN. NWP: numerical weather prediction data;

G_{ω} (\cdot)

: represents the mapping functions;

d_{i j}

: Euclidean distance.

Figure 3. The structure of BiMixer. WA: weighted average.

Figure 4. Visualization of power data.

Figure 5. Visualization of NWP accuracy.

Figure 6. Visualization of typical day selection.

Figure 7. The optimization achievements of RMSE and MAPE indicators for the prediction results at various scales, with subfigure (a) showing RMSE and subfigure (b) showing MAPE.

Figure 8. Visualization of prediction performance of different models.

Figure 9. Visualization of the test set under data missing conditions with zero-padding.

Figure 10. Prediction effect of each model on the day with data missing under zero-padding.

Table 1. Taxonomy of recent research works.

Reference	Backbone	Similar-Day	Correction	Multi-Scales
[11]	MLP-based	×	×	×
[12]	MLP-based	×	×	×
[13]	MLP-based	×	×	×
[14]	MLP-based	×	×	×
[15]	CNN-based	×	×	×
[16]	LSTM-based	×	×	×
[17]	LSTM-based	×	×	×
[19]	Transformer-based	×	×	×
[20]	Transformer-based	×	×	×
[21]	Transformer-based	×	×	×
[22]	MLP-based	✓	×	×
[23]	LSTM-based	✓	×	×
[24]	MLP-based	✓	✓	×
[25]	CNN-based	×	✓	×
[26]	LSTM-based	×	×	✓
[27]	RNN-based	×	×	✓
[28]	MLP-based	×	×	✓
[29]	Transformer-based	×	×	✓
Our method	MLP-Transformer-based	✓	✓	✓

✓: Considered by the work; × Not considered by the work.

Table 2. Statistical information on the accuracy level of NWP.

Data	RMSE	MAPE	$R^{2}$
NWP	72.33	48.84	0.83

Table 3. Hyperparameters of the models.

Model	Description of the Hyperparameters
MLP [11]	Two fully connected layers, with numbers of neurons as (128,64) respectively.
DLinear [12]	MLP-based architecture with decomposition: one moving average kernel with window size 25 to decompose the raw data into trend and residual (seasonal) components, followed by two linear layers applied to each component, and summed to generate the forecasts.
TimeMixer [14]	MLP-based architecture with decomposition and multi-scale-mixing: 5 scales by downsampling the raw data with window size 2, followed by 3 stacked past-decomposable-mixing (PDM) blocks to mix past information across different scales and 1 future multi-predictor mixing (FMM) block to ensemble extracted multi-scale information and generate future predictions.
N-BEATS [13]	N-BEATS architecture: 3 stacked blocks with 128 neurons per layer, 512-dimensional hidden layer, and a forecast horizon-specific linear layer. Each block includes two dense sub-layers with ReLU activation.
RNN [16]	One RNN layer with 32 neurons, followed by one fully connected layer with 32 neurons.
LSTM [17]	Two LSTM layers with 32 neurons in each layer, followed by one dense layer.
TimesNet [15]	It is composed of 2 encoder layers with an embedding dimension of 32 (implemented via the TimesBlock [15], a core block designed for capturing temporal patterns in time series data), each internal convolutional layer of the TimesBlock module uses 5 convolutional kernels, and finally, a feed-forward network is used.
Transformer [19]	Two encoder layers with embedding dimension 512, one decoder layer with the dimension of the feed-forward network, including 512 neurons.
Informer [20]	Informer-based architecture: Two encoder layers with embedding dimension 512, one decoder layer, 8-head attention, and the dimension of the feed-forward network including 2048 neurons. The sampling factor is set to 5.
PatchTST [21]	PatchTST-based architecture: 6 Transformer encoder layers, embedding dimension 128, 4-head attention, feed-forward network dimension 512, and patch size set to 16.
SN-Transformer-BiMixer (the proposed model)	SN: For SN at each time scale, it employs two 1D convolutional layers with 16 and 32 neurons, respectively, followed by a linear layer with 128 neurons.
	Transformer: Two encoder layers with embedding dimension 512, one decoder layer with the dimension of the feed-forward network, including 512 neurons.
	BiMixer: The “top-down” mixer module consists of three MLP blocks, each containing a single fully connected layer with 24, 48, and 96 neurons, respectively. The “down-top” mixer module consists of three MLP blocks, each containing a single fully connected layer with 48, 24, and 12 neurons, respectively. After both of these mixer modules, an additional MLP model is included, with a hidden layer of 96 neurons.

Table 4. Prediction accuracy of SN at different time scales.

Metric	Time Scale
Metric	2 h (scale₃)	1 h (scale₂)	30 min (scale₁)	15 min (scale₀)
RMSE	2.856	3.005	3.147	3.152
MAPE	0.741	0.867	0.851	0.833

scale₀: temporal resolution of the prediction task in the experiment; scale_3/2/1: temporal resolutions introduced to improve target prediction accuracy.

Table 5. Prediction accuracy after Transformer correction at different time scales (SN-Transformer).

Metric	Time Scale
Metric	2 h (scale₃)	1 h (scale₂)	30 min (scale₁)	15 min (scale₀)
RMSE	2.228	2.407	2.646	2.786
MAPE	0.730	0.724	0.768	0.830
RMSE Reduction	22.0%	27.9%	15.9%	11.6%
MAPE Reduction	1.35%	16.50%	9.75%	0.36%

scale₀: temporal resolution of the prediction task in the experiment; scale_3/2/1: temporal resolutions introduced to improve target prediction accuracy; RMSE/MAPE Reduction: refers to the percentage reduction of SN-Transformer over SN.

Table 6. Comparison of prediction accuracy for day-ahead PV generation (15 min scale₀).

Model	Metric		Improvement Percentage
Model	RMSE	MAPE	RMSE	MAPE
MLP [11]	5.581	3.451	55.38%	76.55%
N-BEATS [13]	3.354	4.206	34.69%	80.76%
DLinear [12]	5.597	2.168	25.76%	62.68%
TimeMixer [14]	6.410	2.445	61.15%	66.91%
RNN [16]	3.411	1.088	27.00%	25.64%
LSTM [17]	4.151	1.231	40.01%	34.28%
TimesNet [15]	4.015	1.226	37.98%	34.01%
Transformer [19]	2.608	0.815	4.52%	0.73%
Informer [20]	2.808	0.851	11.32%	4.94%
PatchTST [21]	4.728	3.219	47.33%	74.86%
SN-Transformer-BiMixer	2.490	0.809	-	-

Table 7. Comparison of prediction accuracy for similar days selected by SN and MIC (15 min scale₀).

Model Variants	Metric		Improvement Percentage
Model Variants	RMSE	MAPE	RMSE	MAPE
MIC-Transformer-BiMixer	3.413	1.597	27.04%	49.34%
SN-Transformer-BiMixer	2.490	0.809	-	-

Table 8. Comparison of prediction accuracy of different mixer methods (15 min scale₀).

Model Variants	Metric		Improvement Percentage
Model Variants	RMSE	MAPE	RMSE	MAPE
SN-Transformer-MLP	2.566	0.822	6.24%	1.58%
SN-Transformer-Average	2.694	0.891	10.69%	9.20%
SN-Transformer-Attention	2.530	0.812	1.58%	0.37%
SN-Transformer-Frequency	2.510	0.855	0.80%	5.38%
SN-Transformer-BiMixer	2.490	0.809	-	-

Table 9. Comparison of prediction accuracy of different prediction models under zero-padding (15 min scale₀).

Model	Metric		Improvement Percentage
Model	RMSE	MAPE	RMSE	MAPE
MLP [11]	6.904	4.421	62.35%	81.54%
DLinear [12]	5.679	2.264	54.08%	63.95%
N-BEATS [13]	3.358	4.299	22.60%	81.02%
TimeMixer [14]	6.580	2.445	60.50%	66.62%
RNN [16]	6.421	2.444	59.52%	66.61%
LSTM [17]	6.386	2.580	59.30%	68.37%
TimesNet [15]	4.402	1.400	40.96%	41.71%
Transformer [19]	2.865	1.005	8.99%	18.81%
Informer [20]	2.888	0.895	10.01%	5.47%
PatchTST [21]	4.871	3.483	46.64%	76.57%
SN-Transformer-BiMixer	2.599	0.816	-	-

Table 10. Comparison of prediction accuracy of different prediction models under constant-value imputation (15 min scale₀).

Model	Metric		Improvement Percentage
Model	RMSE	MAPE	RMSE	MAPE
MLP [11]	6.849	7.567	64.36%	90.64%
DLinear [12]	5.603	2.158	53.22%	67.19%
N-BEATS [13]	3.371	4.289	26.10%	83.50%
TimeMixer [14]	6.415	2.439	62.09%	71.05%
RNN [16]	6.406	2.439	62.05%	71.05%
LSTM [17]	6.286	5.443	61.17%	87.00%
TimesNet [15]	4.003	1.227	39.02%	42.30%
Transformer [19]	2.607	0.766	6.37%	7.57%
Informer [20]	2.944	0.904	17.09%	21.68%
PatchTST [21]	4.721	3.169	48.29%	77.66%
SN-Transformer-BiMixer	2.441	0.708	-	-

Table 11. Comparison of prediction accuracy of different prediction models under time-dependent imputation (15 min scale₀).

Model	Metric		Improvement Percentage
Model	RMSE	MAPE	RMSE	MAPE
MLP [11]	6.849	7.567	62.90%	89.32%
DLinear [12]	5.530	2.253	54.05%	64.13%
N-BEATS [13]	3.361	4.274	24.40%	81.10%
TimeMixer [14]	6.381	2.540	60.18%	68.25%
RNN [16]	6.426	2.447	60.46%	67.00%
LSTM [17]	6.286	5.443	59.58%	85.16%
TimesNet [15]	3.948	1.267	35.64%	36.23%
Transformer [19]	2.608	0.767	2.57%	−5.07%
Informer [20]	2.948	0.905	13.81%	10.72%
PatchTST [21]	4.749	3.287	46.49%	75.42%
SN-Transformer-BiMixer	2.541	0.808	-	-

Table 12. Prediction accuracy under different noise levels (15 min scale₀).

Noise Level	Metric		Change Relative
Noise Level	RMSE	MAPE	RMSE	MAPE
0	2.490	0.809	-	-
±10%	3.603	0.824	44.71%	1.88%
±20%	3.790	0.912	52.22%	12.95%
±30%	4.619	1.906	85.51%	135.62%
±40%	6.362	2.703	155.52%	234.12%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, X.; Ding, X.; Han, Y.; Sima, Q.; Li, X.; Bao, Y. Day-Ahead Photovoltaic Power Forecasting Based on SN-Transformer-BiMixer. Energies 2025, 18, 4406. https://doi.org/10.3390/en18164406

AMA Style

Huang X, Ding X, Han Y, Sima Q, Li X, Bao Y. Day-Ahead Photovoltaic Power Forecasting Based on SN-Transformer-BiMixer. Energies. 2025; 18(16):4406. https://doi.org/10.3390/en18164406

Chicago/Turabian Style

Huang, Xiaohong, Xiuzhen Ding, Yating Han, Qi Sima, Xiaokang Li, and Yukun Bao. 2025. "Day-Ahead Photovoltaic Power Forecasting Based on SN-Transformer-BiMixer" Energies 18, no. 16: 4406. https://doi.org/10.3390/en18164406

APA Style

Huang, X., Ding, X., Han, Y., Sima, Q., Li, X., & Bao, Y. (2025). Day-Ahead Photovoltaic Power Forecasting Based on SN-Transformer-BiMixer. Energies, 18(16), 4406. https://doi.org/10.3390/en18164406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Day-Ahead Photovoltaic Power Forecasting Based on SN-Transformer-BiMixer

Abstract

1. Introduction

2. Methods

2.1. Identification of Multi-Scale Similar Days by SN

2.2. Correction of Multi-Scale Similar Curves by Transformer

2.3. Fusion of Multi-Scale Information by BiMixer

2.3.1. “Top-Down“ Mixer

2.3.2. “Down-Top” Mixer

3. Experimental Setup

3.1. Dataset

3.2. Evaluation Metrics

3.3. Hyperparameter Settings

4. Results and Analysis

4.1. Main Results

4.1.1. Selecting Similar Days by SN

4.1.2. Correcting Forecasting Baselines by Transformer

4.1.3. Fusing Multi-Scale Forecasting Results by BiMixer

4.2. Ablation Study

4.2.1. Effectiveness Analysis of Similar-Day Method (SN)

4.2.2. Effectiveness Analysis of Bidirectional Mixer Module (BiMixer)

4.3. Expanded Analysis of SN-Transformer-BiMixer in Data Missing Scenarios

4.3.1. Zero-Padding

4.3.2. Constant-Value Imputation

4.3.3. Time-Dependent Imputation

4.4. NWP Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Time Consumption Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI