A Hybrid Interval Prediction Framework for Photovoltaic Power Prediction Using BiLSTM–Transformer and Adaptive Kernel Density Estimation

Li, Laiyuan; Li, Zhibin

doi:10.3390/app16063023

Open AccessArticle

A Hybrid Interval Prediction Framework for Photovoltaic Power Prediction Using BiLSTM–Transformer and Adaptive Kernel Density Estimation

by

Laiyuan Li

and

Zhibin Li

^*

Faculty of Artificial Intelligence, Shanghai University of Electric Power, Shanghai 200090, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 3023; https://doi.org/10.3390/app16063023

Submission received: 16 January 2026 / Revised: 14 March 2026 / Accepted: 17 March 2026 / Published: 20 March 2026

Download

Browse Figures

Versions Notes

Abstract

Photovoltaic (PV) power forecasting is strongly influenced by volatility, randomness, and changing meteorological conditions, while conventional point forecasting provides limited uncertainty information for engineering use. This study proposes a hybrid interval forecasting framework for PV prediction. Similar-day clustering first segments weather data into distinct scenarios (sunny, cloudy and overcast) to reduce noise and redundant information within sequences, enhancing stability and thereby providing a more refined feature space for deep learning. A BiLSTM–Transformer model is then used as the core forecaster, taking multiple meteorological variables as multi-feature time-series inputs. BiLSTM captures bidirectional temporal dependencies, and the Transformer enhances long-range feature extraction via attention. To improve robustness and stability, the Alpha Evolution (AE) algorithm is applied for hyperparameter optimization, balancing global exploration and local refinement. For probabilistic forecasting, Adaptive Bandwidth Kernel Density Estimation (ABKDE) is employed to construct prediction intervals, where the local bandwidth is determined by minimizing a local error function to adapt to data density and error distribution. Case studies utilizing a full-year, 5 min high-resolution dataset from the DKASC station demonstrate that the proposed AE-BiLSTM–Transformer achieves highly accurate point forecasts across diverse weather conditions, reducing the RMSE by 81.85%, 76.99%, and 72.26% under sunny, cloudy, and overcast scenarios, respectively, compared to the baseline LSTM. ABKDE further produces reliable and compact intervals; at the 90% confidence level on sunny days, it achieves PICP = 0.921 with PINAW = 0.0378, reducing PINAW by 75.16% relative to conventional KDE while maintaining comparable coverage.

Keywords:

photovoltaic interval prediction; alpha evolution; bidirectional long short-term memory; transformer; adaptive bandwidth kernel density estimation

1. Introduction

With the large-scale integration of photovoltaic (PV) power into modern power grids, the safe and stable operation of power systems faces unprecedented challenges. The output of PV power is heavily dependent on meteorological conditions, exhibiting extreme volatility, randomness, and severe non-stationarity, especially under cloudy or overcast scenarios [1]. If such highly irregular power fluctuations are directly integrated without reliable dispatching strategies, they pose severe threats to the power supply reliability, potentially causing frequency drops or local over-voltage. Therefore, achieving high-precision, highly robust ultra-short-term forecasting of photovoltaic power generation contributes to enhancing the safety and economic efficiency of grid dispatch. In addition, traditional prediction methods usually only output a single power value while ignoring fluctuations at a certain probability level, which limits their guiding role. In contrast, prediction intervals containing probabilistic information can more comprehensively characterize the characteristics of power fluctuations, and are of key significance for improving the scientificity and flexibility of power grid dispatching. In the research on photovoltaic (PV) power generation forecasting, the time scale is typically divided into four levels, with each scale corresponding to distinct application scenarios and functions. Among these, ultra-short-term forecasting (based on hourly data) is primarily used for real-time dispatching and congestion relief in the power grid [2].

Short-term forecasting, characterized by a prediction horizon of one day or several days, is principally concerned with the optimization of unit commitment and power dispatch [3,4]. By contrast, medium- and long-term forecasting, encompassing prediction horizons of multiple days and weeks, respectively, find broader applications in the operational management and maintenance of photovoltaic (PV) plants [5]. Such temporal classification not only defines the degree of forecasting precision but also delivers tailored support for both grid operations and PV plant administration.

Prediction methods are generally categorized into three types: physical, statistical, and machine learning methods [6]. Physical methods establish a functional relationship between input characteristics and output power through mechanism analysis and mathematical modeling [7]. They are suitable for newly built power stations or scenarios with missing data and can directly reflect the impacts of weather conditions, installation orientation, and component performance [8]. However, they involve heavy computational loads, yield low accuracy, and are not applicable to ultra-short-term prediction [9]. Statistical methods achieve prediction by fitting the mapping relationship between input variables and output power. They are easy to apply without requiring specific design parameters of the system [10]. Auto-regression [11], ARMAX [12], and other approaches are commonly used for linear modeling, but they struggle to address nonlinear problems such as photovoltaic power prediction. With the development of artificial intelligence, neural networks and support vector machines have become mainstream methods. Zaremba et al. [13] adopted a recurrent neural network (RNN), which can preserve the sequential information of time series, for time-series prediction tasks. Variants of RNNs, such as the long short-term memory network (LSTM) [14] and gated recurrent unit (GRU) [15], have been widely applied in photovoltaic power prediction. However, the above methods have limited capability to capture long-range global dependencies in highly volatile scenarios. The bidirectional long short-term memory network (BiLSTM) is able to process input sequences in both forward and backward directions, which effectively captures bidirectional dependencies and provides more comprehensive contextual information [16]. Single models inherently have limitations, whereas hybrid models enhance robustness by integrating the advantages of different models. In [17], a CNN-BiLSTM hybrid model was proposed, which combines the feature extraction capability of the CNN and the bidirectional temporal dependency capturing capability of BiLSTM to predict photovoltaic power under varying weather conditions. However, the CNN still has drawbacks in long-sequence modeling. The Transformer adopts the self-attention mechanism instead of neurons, which ensures favorable system performance even for long sequences and possesses stronger global perception capability [18], thus demonstrating significant advantages over neural network models such as the GRU [19]. Chen et al. [20] proposed an LSTM–Transformer hybrid model for maritime microwave channel prediction in complex marine environments. Experimental results on measured data show that this model achieved substantial performance improvements over single models and satisfied real-time requirements.

The complexity of hybrid models introduces a large number of hyperparameters (e.g., number of network layers, learning rate, number of attention heads). Additionally, due to the high volatility of photovoltaic power output, particularly under overcast and cloudy conditions, the hyperparameter space of the model contains numerous local extrema. Traditional manual hyperparameter tuning is inefficient and prone to falling into local optima, making automated optimization algorithms an effective strategy. The Grey Wolf Optimizer (GWO) [21] was adopted to automatically optimize the hyperparameters of BiLSTM, thereby improving prediction stability. Yu et al. [22] implemented automatic hyperparameter optimization of BiLSTM via the Whale Optimization Algorithm (WOA) and constructed a prediction model combined with an attention mechanism. The GWO, WOA, and other similar algorithms adopt leader-centered position update mechanisms and directly rely on absolute distance information between individuals, which may introduce search bias and increase the risk of premature convergence [23]. The improvement rate of these traditional swarm algorithms in reducing RMSE typically plateaus around 20–30% compared to unoptimized models.

Although the methods proposed in existing studies can improve the prediction accuracy of photovoltaic power from multiple perspectives, most of them focus on the power value prediction at specific time points and fail to fully reflect the volatility and uncertainty of photovoltaic output. Interval prediction quantifies the range of power fluctuations by constructing prediction intervals under different confidence levels, thereby rendering the prediction information more comprehensive and reliable [24]. The construction methods are divided into direct and indirect methods. The direct method establishes a probability distribution model based on statistical theory to directly obtain prediction intervals [25]. Jiang et al. [26] proposed a method for constructing intervals by adding or subtracting the fluctuation of prediction errors to or from point predictions. The non-fixed parameters are determined through multi-objective optimization, and the fluctuation of errors is reflected by multiplying the non-fixed parameters by the prediction errors, thus constructing adaptive prediction intervals for photovoltaic power generation.

The indirect method first estimates the probability distribution of point prediction errors, and then integrates the results onto deterministic prediction outputs to generate the final prediction intervals. Most indirect methods assume that errors follow a normal distribution or a Student’s t-distribution [24,27]. However, it is difficult to accurately assume the error distribution in engineering practice, which leads to unrealistic prediction intervals. Therefore, researchers have focused on fitting the distribution of data. For this purpose, kernel density estimation (KDE) is adopted to fit data distributions. KDE does not require a predefined distribution form and directly estimates the probability density function based on samples, thus being widely applied in interval prediction. Niu et al. [28] used KDE to estimate prediction intervals under different confidence levels after wind power point prediction. In the field of photovoltaic prediction, [22] compared KDE (with a Gaussian kernel) and normal distribution after photovoltaic point prediction, and the former achieved a narrower bandwidth and a higher coverage rate. The bandwidth affects the distribution fitting performance of KDE: an appropriate bandwidth can significantly improve the accuracy of error estimation. Compared with a fixed bandwidth, the use of an optimized bandwidth makes the resulting prediction intervals more reliable [29]. Zhou et al. [30] conducted wind power interval prediction based on LSTM-KDE, and optimized the bandwidth using the integrated mean squared error criterion, thus achieving prediction intervals with narrower width and a higher coverage rate. Despite advances in bandwidth optimization, considerable research gaps remain in the optimization strategies corresponding to different algorithms or criteria.

To address the aforementioned challenges in PV power forecasting, this study proposes a novel AE-BiLSTM–Transformer–ABKDE hybrid framework that systematically integrates adaptive optimization, deep temporal feature extraction, and probabilistic uncertainty quantification. The framework first employs the Alpha Evolution (AE) algorithm to automatically optimize hyperparameters. Through evolution path-driven basis vector adaptation, AE simultaneously incorporates both random step size and adaptive differential step size within a single alpha operator, using a decay factor α to dynamically balance exploration and exploitation. Therefore, compared to the GWO and WOA, which rely on swarm encirclement and spiral update mechanisms, AE can more stably extract and accumulate cross-generation information, thereby reducing the risk of premature convergence and entrapment in local optima. Building upon this optimization, the model architecture combines two key components: (1) a BiLSTM–Transformer module that synergizes bidirectional gating mechanisms with self-attention mechanisms to comprehensively capture short-term fluctuations and long-range temporal dependencies; (2) an Adaptive Bandwidth Kernel Density Estimation (ABKDE) mechanism that dynamically adjusts bandwidths based on local error distributions to construct tight and reliable prediction intervals. Finally, to validate the effectiveness of the proposed framework, datasets from the Desert Knowledge Australia Solar Centre (DKASC) were selected for comparative experiments using different algorithms to evaluate the predictive performance of the AE-BiLSTM–Transformer–ABKDE.

2. Methodology

2.1. Alpha Evolution Algorithm

The Alpha Evolution (AE) algorithm was proposed by Gao et al. in 2024 [31]. Its core comprises solely an “Alpha operator” that integrates basis vector adaptation, random step size attenuation, and adaptive differential step size mechanisms, without requiring additional search operators or complex meta-parameter settings like other optimization algorithms. The specific steps are as follows:

2.1.1. Initialization and Evolution Matrix

Randomly initialize N candidate solutions in a D dimension search space using the formula below:

\begin{matrix} X_{i} = l b + (u b - l b) \cdot r a n d & i = 1, 2, \dots, N \end{matrix}

(1)

Herein, lb and ub define the search boundaries. To increase population diversity, the AE algorithm uses a sampling with replacement strategy to extract samples from matrix X to form an evolution matrix E. Each individual E_i competes with the candidate solution at its original position to retain the better one.

2.1.2. Alpha Operator

The algorithm updates positions through the alpha operator as shown in Formula (2). This operator is driven by three functional components working together.

E_{i}^{t + 1} {= W + β Δ r}_{i} {+ α (L}_{i} {+ E}_{i}^{t} {- W - R}_{i}),

(2)

Each component is as follows:

Adaptive evolution of base vector W: The base vector guides the direction of evolution. Based on Formula (3), the algorithm switches between two evolution paths.

A is a D-th order square matrix constructed by performing sampling with replacement D times from the candidate solutions in X.
B is a K × D (K = [N × rand(0,1)], [M] denotes rounding M to the nearest integer.) matrix consisting of K candidate solutions selected via sampling without replacement from X.
W_a, W_b are the evolution paths associated with matrix A and B, respectively.

Its learning rate c is adjusted linearly with the number of fitness evaluations (FEs) by Formula (4) to extract optimal guidance information from path accumulation.

W = \{\begin{matrix} c_{a} W_{a}^{t} + (1 - c_{a}) \times d i a g o n a l (A) = W_{a}^{t + 1} & \begin{matrix} i f & r a n d (0, 1) < 0.5 \end{matrix} \\ c_{b} W_{b}^{t} + (1 - c_{b}) \times ω B = W_{b}^{t + 1} & e l s e \end{matrix},

(3)

c_{a} = c_{b} = 1 - F E s / M a x F E s,

(4)

ω_{i : K} = \frac{f (X_{i : K})}{\sum_{i = 1}^{K} f (X_{i : K})},

(5)

i:K represents the i-th individual in the sample set obtained after K extraction.

Nonlinear random step size: The attenuation coefficient β follows the nonlinear exponential decrease rule in Formula (6). Combined with the random perturbation matrix ∆r in Formula (7), it provides the algorithm with strong global exploration ability in the early stages.

β = e^{(\ln (\frac{M a x F E s - F E s}{M a x F E s}) - {(\frac{4 F E s}{M a x F E s})}^{2})},

(6)

Δ r = (u b - l b) \cdot (2 R_{1} R_{2} - R_{2}) \cdot S

(7)

In Formula (7), R₁ and R₂ are N × D real matrices generated by rand(0, 1), responsible for constructing the perturbation components; S is an N × D 0–1 random integer matrix used to construct perturbation patterns based on dimensional features.

Self-adaptive step size: In the differential term, α (L_i +

E_{i}^{t}

– W − R_i), the reference individuals L_i and R_i denote individuals that are relatively better and worse compared to E_i, which form an evolution gradient. As a control vector, α adjusts through dimensional differentiation to improve the local exploitation precision in the later stages.

2.1.3. Boundary Constraint and Selection

To address the problem of solutions exceeding boundaries during the update process, the algorithm uses the distance halving method in Formula (8) to force individuals back into the valid search space. Finally, a greedy selection strategy compares fitness values before and after the update. This ensures that superior genes are passed directly to the next generation for stable convergence.

E_{i, j} = \{\begin{matrix} \frac{E_{i, j} + u b}{2} & E_{i, j} > u b \\ \frac{E_{i, j} + l b}{2} & E_{i, j} < l b \end{matrix}

(8)

To make the AE algorithm workflow easier to understand, Figure 1 presents the flowchart of the algorithm.

2.2. Bidirectional Long Short-Term Memory (BiLSTM)

Traditional LSTM networks are constrained by their unidirectional processing mode, making it difficult to fully capture internal dependencies within sequences during the forward modeling process of time series, which limits their ability to capture the full trajectory of photovoltaic output throughout the day. Bidirectional Long Short-term Memory (BiLSTM) can learn bidirectional temporal features. By integrating the learning outcomes from these two directions to generate the final output, the model fully accounts for the inherent temporal characteristics of sequences. In the application of PV prediction, the forward layer captures the historical accumulation of meteorological factors, while the backward layer provides reverse contextual constraints (such as the natural decline of solar irradiance in the evening). This allows the model to accurately capture the daily pattern of PV output, thereby improving the accuracy of power prediction. The structure of the model is shown in Figure 2.

2.3. Transformer

The Transformer is a network based on the attention mechanism. The original Transformer architecture consists of both an encoder and a decoder; time-series forecasting tasks typically do not require the sequence-to-sequence generation capability of a decoder. Therefore, this study employs only the Transformer encoder as the global feature extraction module to process the sequential hidden states generated by the BiLSTM. Its modified structure is shown in Figure 3.

The encoder features a multi-layer architecture, with each layer comprising a multi-head self-attention mechanism and a feedforward neural network. The self-attention mechanism captures global relationships within the input sequence, ensuring that the entire sequence’s context is reflected at every position, while the feedforward neural network further processes this information. The core of the Transformer model is the multi-head attention mechanism, which divides Query, Key, and Value into multiple “heads” for parallel processing. The attention calculation method for each head is as follows:

A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{D_{k}}}) V,

(9)

where D_k is the dimension of the key; Q, K, and V represent Query, Key, and Value respectively. For the multi-head attention mechanism, the calculation formula is as follows:

\begin{array}{l} M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, h e a d_{2}, \dots, h e a d_{H}) W_{o} \\ h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}) \end{array},

(10)

Among these, W_o denotes the multi-head attention weight matrix, and

W_{i}^{Q}

,

W_{i}^{K}

,

W_{i}^{V}

… are the projection matrices for Query, Key and Value respectively.

In PV power prediction, the self-attention mechanism effectively captures long-range dependencies inherent in daily solar cycles. Concurrently, the multi-head mechanism allows different attention heads to track distinct physical phenomena, such as global baseline trends and short-term fluctuations. Furthermore, the model’s parallel computing enhances training efficiency, while positional encoding preserves the crucial temporal order of feature sequences.

The model architecture for this study was predefined: a single-layer BiLSTM with 20 hidden units was employed for local temporal feature extraction, followed by a two-layer Transformer encoder where each layer utilizes a multi-head self-attention mechanism with 3 heads to integrate global features. For the optimized models, the number of hidden units, the number of self-attention heads, and other key hyperparameters were treated as tunable variables and automatically optimized using the optimization algorithm. The specific parameter search space and final optimized values will be detailed in the experimental section.

2.4. Adaptive Bandwidth Kernel Density Estimation

Kernel Density Estimation (KDE) is a non-parametric method for estimating probability density, which does not rely on prior assumptions about data distribution. It can directly uncover inherent statistical patterns from observed data. By constructing smooth density curves, this method reveals the distribution characteristics of random variables. It is particularly suitable for complex data analysis scenarios in practical engineering where the distribution form is unknown. In practical photovoltaic prediction, errors arise due to sudden changes in meteorological factors and the thermodynamic characteristics of the photovoltaic modules themselves, directly impacting power system dispatch. Overestimating PV power can lead to unanticipated power deficits and grid frequency drops, necessitating the costly deployment of spinning reserves. Conversely, underestimating may result in local over-voltage and subsequent solar curtailment, causing severe clean energy wastage [17]. Assuming the error sequence for photovoltaic power point prediction is [e₁, e₂, …, e_n], the probability density estimate at point e_i can be expressed by Formula (11):

{\hat{f}}_{h} (e) = \frac{1}{N \cdot h} \sum_{i = 1}^{N} K (\frac{e - e_{i}}{h}),

(11)

where N is the sample size, h represents the bandwidth, and K(·) denotes the kernel function.

The accuracy of non-parametric kernel density estimation is jointly determined by the kernel function and bandwidth. Common kernel options include Gaussian, triangular, cosine, and Epanechnikov kernels. This study employs the Gaussian kernel; the expression is as follows:

K (e) = \frac{1}{\sqrt{2 π}} \exp (- \frac{e^{2}}{2}),

(12)

However, since the bandwidth h in traditional kernel density estimation is fixed, the results become unsatisfactory when data distribution is uneven: high-density regions tend to be overly smoothed due to excessive bandwidth, resulting in a loss of detailed information; while low-density regions exhibit significant estimation fluctuations due to insufficient bandwidth, revealing inherent limitations. Under sunny conditions, photovoltaic output remains stable with a dense error distribution, requiring a narrow bandwidth to capture high-precision details. During cloudy or overcast conditions, power fluctuates sharply and irregularly, resulting in a sparse error distribution with high variance. A wider bandwidth should be employed to prevent erroneous probability density estimates [32]. To address this, Adaptive Bandwidth Kernel Density Estimation (ABKDE) is proposed. By establishing a relationship between bandwidth and local density, it enables adaptive adjustment of the bandwidth parameter, as shown in Equation (13), thereby more accurately capturing the multi-scale characteristics of the distribution.

{\hat{f}}_{h_{i}} (e) = \frac{1}{N \cdot h_{i} (e)} \sum_{i = 1}^{N} K (\frac{e - e_{i}}{h_{i} (e)}),

(13)

where h_i(e) is the bandwidth function, which relates to the local density information near the i-th sample point e_i. To determine the optimal bandwidth for each location, the local error function (LEF) is introduced to optimize the bandwidth at each point; the calculation formula is as follows:

L (e_{k}) = {\hat{f}}_{h_{i}} {(e_{k})}^{2} - 2 {\hat{f}}_{h_{i}} (e_{k}) f (e_{k}) + \frac{2}{\sqrt{2 π h_{i}}} f (e_{k}),

(14)

Among these,

{\hat{f}}_{h_{i}} {(e_{k})}^{2} - 2 {\hat{f}}_{h_{i}} (e_{k}) f (e_{k})

is the bias term, measuring the precision of estimation;

\frac{2}{\sqrt{2 π h_{i}}} f (e_{k})

presents the variance term, which is inversely proportional to the bandwidth h_i, controlling the smoothness and variance of the estimation. The LEF comprehensively optimizes these terms: when h_i decreases, the bias term dominates optimization to enhance precision; when h_i increases, the variance term dominates to improve stability, thereby achieving a balance between bias and variance. ABKDE employs the golden section method to minimize the local error function, thereby determining the optimal bandwidth at different positions. This enables ABKDE to maintain flexibility and precision even when handling complexly distributed data. In this study, the initial search interval for the optimal bandwidth was set to [1 × 10⁻¹²,1], The golden section search iterates until the relative width of the interval drops below 10⁻⁵, or until a maximum of 30 iterations is reached.

2.5. Model Evaluation Metrics

For the results of point prediction, this paper adopts three metrics, R², MAE and RMSE, with their calculation formulas shown below:

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(p_{i} - p_{i}^{'})}^{2}}{\sum_{i = 1}^{N} {(p_{i} - \bar{p_{i}})}^{2}},

(15)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |p_{i} - p_{i}^{'}|,

(16)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(p_{i} - p_{i}^{'})}^{2}},

(17)

where p_i denotes the measured PV power at time i,

p_{i}^{'}

represents the predicted power and

\bar{p_{i}}

denotes the actual power average.

For interval prediction, the following two metrics are used: the prediction interval coverage probability (PICP) and prediction interval normalized averaged width (PINAW).

PICP represents the probability that the true power falls within the upper and lower bounds of the corresponding confidence interval prediction. The expression is as follows:

P I C P = \frac{1}{N} \sum_{i = 1}^{N} c_{i},

(18)

where c_i is a Boolean value. Assume that the prediction interval at time i under a certain confidence level is [L_i,U_i]. If the actual value falls within this interval, then c_i = 1; otherwise, c_i = 0.

PINAW measures the narrowness or width of the upper and lower bounds of the prediction interval. The formula is as follows:

P I N A W = \frac{1}{N R} \sum_{i = 1}^{N} (U_{i} - L_{i}),

(19)

R is the range between the maximum and minimum values of true power, used for normalization.

2.6. The Framework of the Interval Prediction Model

Based on the aforementioned methods, this paper proposes a novel photovoltaic power interval prediction model, AE-BiLSTM–Transformer–ABKDE, whose structure is shown in Figure 4. The main implementation process is as follows:

Missing values and outliers in the raw PV data are handled via linear interpolation, followed by IQR and Min-Max normalization. Subsequently, Pearson correlation analysis is applied to select key meteorological features from the initial variables to reduce input dimensionality, thereby reducing computational burden and improving the model’s generalization capability.
A hybrid neural network is constructed for point prediction. Specifically, a single-layer BiLSTM is employed first to process the input sequences, and its output sequential hidden states are subsequently fed into a two-layer Transformer. The BiLSTM layer is used to extract bidirectional local temporal variations, while the Transformer encoder captures long-range sequence dependencies via its multi-head self-attention mechanism. The synergistic structure compensates for the limitations of single models in handling highly volatile PV sequences.
To avoid the inefficiency and local optima associated with manual tuning, the Alpha Evolution (AE) algorithm is applied to automatically search for optimal hyperparameters (e.g., attention heads, hidden units, learning rate) using RMSE as the fitness function, ensuring the optimal network configuration and improving point prediction accuracy.
Based on the prediction errors generated by the point prediction model, the ABKDE method is employed to construct prediction intervals under given confidence levels. Instead of using a fixed bandwidth, ABKDE determines the optimal local bandwidth for each error point by minimizing the Local Error Function (LEF). The superiority of ABKDE over KDE is evaluated by comparing the prediction results of the two models using interval forecasting performance metrics.

Figure 4. The overall implementation process of this article.

3. Results

3.1. Data Description

The experimental data for this study was collected from the solar power station operated by the Desert Knowledge Australia Solar Centre (DKASC). The data collection encompasses 48,545 samples with a temporal resolution of 5 min, which are from 1 January to 31 December 2017, between 7:00 AM and 6:00 PM daily, as the PV plant does not generate electricity at night. This dataset includes eight variables: wind speed (WS), temperature (T,°C), relative humidity (RH), global horizontal radiation (GHR), diffuse horizontal radiation (DHR), wind direction (WD), radiation global tilted (RGT), radiation diffuse tilted (RDT) and photovoltaic power (PV).

3.2. Data Processing

Photovoltaic systems are prone to unexpected events such as unplanned shutdowns or equipment failures during operation, leading to data gaps and increased volatility. Simultaneously, malfunctions in the measurement sensor system may also cause collected data to become abnormal. Therefore, data preprocessing is crucial. This paper calculates the reasonable range boundaries for each feature’s normal values using quartiles and interquartile ranges, thereby effectively identifying outliers. For detected outliers and missing values, linear interpolation is employed for replacement.

Since different features have distinct units, their magnitudes may vary significantly. This can severely impact the model’s prediction outcomes, so it is necessary to map the data into a consistent range, which is known as normalization. Min-Max normalization is adopted in this study, and its corresponding formula is shown as follows:

x' = \frac{x_{i} - \min (x)}{\max (x) - \min (x)}

(20)

3.3. Correlation Analysis

To extract the key input features that affect photovoltaic power, this study employs the Pearson correlation coefficient to quantify the degree of correlation between variables, and features with the greater impact on power are selected. The Pearson correlation coefficient ρ describes the strength and direction of the correlation between two variables, as shown in Equation (21).

ρ_{x y} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{{\sum_{i = 1}^{n} (x_{i} - \bar{x})}^{2}} \sqrt{{\sum_{i = 1}^{n} (y_{i} - \bar{y})}^{2}}}

(21)

where

\bar{x}

and

\bar{y}

are the average values of the data sets X and Y, respectively. Figure 5a shows the correlation between meteorological features and PV power, which has the highest correlation with GHR and RGT. Meanwhile, the absolute values of correlations between WS, T, and RH with PV all exceed 0.4, indicating moderate correlations. Conversely, DHR, WD, and RDT exhibit weaker correlations with power output. Therefore, GHR, RGT, WS, T, and RH are selected as input features for the prediction model. To validate the effectiveness of the Pearson correlation coefficient for feature selection, Spearman’s correlation coefficient was also employed for feature selection. The results are shown in Figure 5b, revealing a high degree of consistency between the outcomes of the two methods.

From the perspective of practical engineering applications, the selection of the aforementioned meteorological variables is not only statistically grounded but also carries clear physical significance. For instance, global horizontal irradiance (GHR) determines the fundamental trend of photovoltaic output, while rapidly moving clouds during overcast conditions can cause significant fluctuations in irradiance, imposing severe transient impacts on the power grid. Additionally, rising ambient temperatures (T) trigger the “thermal degradation effect” in photovoltaic modules, reducing photovoltaic conversion efficiency. Therefore, incorporating these variables as model inputs effectively captures the physical characteristics of real photovoltaic power plants under complex operating conditions, providing more reliable theoretical support for actual grid dispatch and reserve capacity allocation.

3.4. Similar Day Clustering

Similar day clustering analysis was performed on the complete dataset using actual power and weather data. Data from 365 days were divided into three categories, with the corresponding quantities as follows: sunny days: 201; cloudy days: 134; and overcast days: 30. Figure 6 shows the photovoltaic power curves of sampled dates for the three types of weather conditions.

3.5. Short-Term Photovoltaic Power Point Prediction

3.5.1. Training Process of Prediction Model

The dataset is split into a training set, a validation set and a test set at a ratio of 7:1:2. After training the model, the input sequence consists of meteorological observations from two consecutive daytime periods (each sampled at 5 min resolution), and the model outputs the PV power sequence of the target daytime period in the test set. In the proposed framework, a single-layer BiLSTM is first employed to process the input sequences, featuring 20 hidden units in each direction to capture bidirectional local temporal variations. The generated hidden states are subsequently fed into a two-layer Transformer encoder, where each layer utilizes a multi-head self-attention mechanism with three heads to capture long-range global dependencies. The final output is extracted and passed through a ReLU activation and a dropout layer (ratio = 0.1). A fully connected layer outputs the predicted values, and the model is trained using RMSE as the loss function. The model is optimized with the Adam optimizer, with a maximum of 200 epochs, an initial learning rate of 1 × 10⁻⁴ and an L2 regularization coefficient of 1 × 10⁻⁵.

3.5.2. Comparison of Prediction Model

To verify the superiority of the proposed hybrid model (AE-BiLSTM–Transformer) in the field of photovoltaic power forecasting, eight comparative models are constructed for prediction under sunny, cloudy and overcast conditions. The same training set and test set are adopted for all comparative models, and identical parameters are configured for the same components across all models. For the models integrated with optimization algorithms (i.e., Models 6, 7 and 8), their initial parameters for the shared components are consistent with those of other models, while the final parameters are determined via automatic search using different optimization algorithms. The parameters to be optimized include the number of heads in the self-attention mechanism, the number of hidden layer nodes in BiLSTM, the initial learning rate and the L2 regularization coefficient, with the search ranges set as [2, 10], [10, 100], [1 × 10⁻⁵, 1 × 10⁻²] and [1 × 10⁻⁶, 1 × 10⁻²], respectively. The original training set was further divided into a training subset and a validation subset. During hyperparameter optimization, the fitness function was defined as the RMSE on the validation subset rather than on the training set, in order to reduce optimization bias and alleviate potential overfitting. After the optimal hyperparameters were obtained, the final model was retrained and evaluated on the independent test set. The mathematical formula for RMSE involves squaring the prediction errors. This structural feature imposes a quadratic penalty on larger deviations. The hyperparameter settings of the models are presented in Table 1.

3.5.3. Case 1: Sunny Days

Under sunny conditions, photovoltaic power output exhibits obvious periodicity with smooth power curves, and all models demonstrate basic prediction capabilities. The prediction results of the eight models are shown in Figure 7. The results indicate that the power curve predicted by Model 8 is the closest to the actual power value. It has the optimal explanatory ability in capturing the trend dynamics of the original curve, and its predicted values show the highest consistency with the actual values in terms of trend direction. Table 2 presents the prediction errors of models. The core indicator of Model 8 reaches an R² of 0.994, which represents a 20% improvement compared with the basic LSTM model. Comparative experiments show that the hybrid architecture of BiLSTM and the Transformer is the key to performance improvement: compared with single models, Model 5 (BiLSTM–Transformer) achieves an optimization range of 7.51–15.96%, 35.97–55.64%, and 37.92–50.99% for the three indicators, respectively. After introducing the AE optimization algorithm, Model 8 shows significant improvements relative to the unoptimized Model 5, with MAE reduced by 72.08% and RMSE reduced by 62.97%. Meanwhile, it still achieves better performance compared with other optimization algorithms.

3.5.4. Case 2: Cloudy Days

For cloudy days, photovoltaic power output exhibits intense short-term fluctuations with prominent randomness and nonlinear characteristics, leading to a general decline in the prediction performance of all models along with significant performance differences. Figure 8 presents the prediction results under this condition. Model 8 still maintains a leading edge: its prediction curve is highly consistent with the actual values in terms of both overall trend and local fluctuations. Table 3 further proves the excellent predictive capability of the AE-BiLSTM–Transformer model via the evaluation metrics of each model. The proposed model achieves a 22.4% increase in R², an 84.4% reduction in MAE, and a 76.9% reduction in RMSE compared with the basic model. Compared with the single BiLSTM and Transformer models, Model 5 achieves a 24.75% and 40.09% reduction in MAE, respectively. The AE optimization algorithm further amplifies the performance advantages: compared with Model 5, Model 8 realizes a 72.74% reduction in MAE; in comparison with Models 6 and 7 optimized by the GWO and WOA, Model 8 achieves a 70.53% and 59.15% reduction in MAE, respectively.

3.5.5. Case 3: Overcast Days

Under overcast conditions, photovoltaic power output completely loses periodicity and exhibits extremely strong randomness and volatility, posing the greatest challenge to prediction models. Figure 9 presents the prediction results of each model, from which it can be observed that the AE-BiLSTM–Transformer still achieves the best performance. Specifically, Figure 9b shows that during periods with high frequency and irregular fluctuations, this model can still fit the actual power curve well, demonstrating good robustness. Table 4 presents the evaluation metrics of the models; compared with the basic LSTM and BiLSTM models, the model proposed in this paper achieves a 47.58% increase in R², a 76.88% reduction in MAE, and a 72.29% reduction in RMSE. Module validity verification results show that, compared with the single BiLSTM model, Model 5 (BiLSTM–Transformer) achieves a 14.02% increase in R² and a 23.86% reduction in MAE. The AE optimization algorithm plays a crucial role: compared with Model 5, the prediction error MAE of Model 8 drops from 0.502 to 0.172, and RMSE drops from 0.616 to 0.261. In addition, the prediction accuracy is also improved to a certain extent compared with Models 6 and 7.

Comparative experiments show that the model proposed in this paper performs optimally under all weather conditions, even during the high-frequency fluctuation period of overcast days, and its prediction curve can fit the actual values well, reflecting good robustness. These results demonstrate the practical potential of the proposed model in the field of photovoltaic power prediction.

To further validate the reliability of the proposed AE-BiLSTM–Transformer framework and demonstrate its practical improvements in prediction errors, a stability analysis was conducted. The proposed model underwent five independent simulation runs under three typical weather conditions. Statistical results, including the mean and standard deviation of the MAE, are summarized in Table 5.

3.6. Interval Prediction

The research in the previous chapter shows that Model 8 achieves the best prediction performance. It outperforms both the basic models and hybrid models in terms of the three evaluation metrics under the three weather conditions. However, in practical engineering scenarios, traditional point prediction struggles to effectively address the uncertainties during system operation. To obtain numerical estimation results and their reliability information, the interval prediction method is commonly adopted in practical applications. This method can provide reliable prediction intervals for the power grid side, thereby offering greater fault tolerance space for dispatching decisions.

Based on the point prediction model constructed in the previous chapter, this section introduces the Adaptive Bandwidth Kernel Density Estimation (ABKDE) and Kernel Density Estimation (KDE) to compare the interval prediction results under different confidence levels, so as to verify the effectiveness of ABKDE. The primary reason for selecting the standard KDE as the baseline is based on the principle of controlling variables. This approach aims to rigorously isolate and quantify the performance gains attributable to the “adaptive bandwidth” mechanism, thereby avoiding the introduction of confounding variables that could arise from models with fundamentally different architectures. According to Table 6 and Table 7, a comparison of the PICP shows that both ABKDE and KDE can meet the confidence level requirements, indicating that both methods are valid. Furthermore, in terms of the PINAW indicator, ABKDE yields smaller values under all conditions.

For instance, under a 90% confidence level, the PICP values of KDE and ABKDE are 0.9248 and 0.9210 respectively, both higher than 0.9 on sunny days; this indicator of KDE is even slightly higher than that of ABKDE. However, in terms of PINAW, the corresponding values are 0.153 and 0.038, representing a reduction of 75.16% in this indicator. This demonstrates that, on the premise of ensuring validity, the ABKDE method can cover the true values with a narrower interval width, achieving higher accuracy.

Thus, this paper adopts ABKDE as the interval prediction model to conduct predictions under three weather conditions with confidence levels of 95%, 90%, and 75%, and the results are presented in Figure 10. It can be intuitively observed from the figures that the true values basically fall within the ranges of the respective confidence intervals. This indicates that ABKDE can better fit the error distribution, provide effective interval information for practical power dispatching, avoid the possibility of excessively wide intervals, and thus bring significant economic benefits.

The reliability of the model can be further demonstrated by analyzing the correlation between meteorological factors and the forecast interval. Under clear conditions, high atmospheric transparency and predictable solar trajectories provide a stable physical baseline. ABKDE employs narrower bandwidths to generate compact intervals, indicating that power grids can achieve minimal reserve capacity during sunny periods. In contrast, the increased interval width observed during cloudy and overcast conditions stems from the random nature of cloud dynamics and fluctuations in irradiance. This variability leads to non-uniform error distribution. By adaptively widening intervals, the model addresses sudden physical risks of power shortages, thereby providing a “safety buffer” for power dispatch to manage frequency stability and prevent potential energy wastage.

To verify the repeatability of the proposed interval prediction method, five independent runs were carried out for different weather categories and confidence levels, and the corresponding mean and standard deviation of PICP and PINAW are listed in Table 8.

4. Conclusions and Discussion

4.1. Conclusions

Photovoltaic power prediction is characterized by strong volatility and randomness. In addition, photovoltaic power output varies significantly under different weather conditions. Meanwhile, point prediction results cannot address the uncertainties in practical prediction scenarios and have a low fault tolerance rate. Considering the above characteristics, this paper proposes a hybrid probabilistic prediction model based on similar-day clustering, namely AE-BiLSTM–Transformer–ABKDE. The performance of the model is verified through experiments on photovoltaic data from the DKASC, and the following conclusions are drawn:

By extracting bidirectional local temporal variations and global sequence dependencies, the BiLSTM–Transformer prediction model demonstrates high adaptability to power fluctuations. Compared to standalone BiLSTM and Transformer models, this hybrid architecture maintains an optimal R² exceeding 0.974 across all three scenarios (reaching up to 0.994 under clear-sky conditions), proving its robust feature representation capability.
The Alpha Evolution (AE) algorithm plays a key role in the hyperparameter optimization of the model. By virtue of the Alpha operator, AE achieves a balance between global exploration and local exploitation. Compared to the unoptimized baseline model, under cloudy conditions, the AE-optimized framework reduces MAE and RMSE by 72.74% and 63.52%, respectively, compared to the unoptimized baseline model.
By adaptively adjusting the bandwidth based on local data density, the ABKDE module successfully quantifies prediction errors. At the 95% confidence level, the proposed model achieves a prediction interval coverage probability (PICP) of 95.99%, strictly meeting reliability requirements. Simultaneously, it generates narrower prediction intervals with a normalized average width (PINAW) of 0.076.

4.2. Discussions

The experimental results of this paper not only validate the predictive accuracy of the proposed model but also reveal its effective capture of the intrinsic physical mechanisms underlying photovoltaic power generation. The BiLSTM layer learns bidirectional temporal relationships: its forward layer reflects the historical cumulative effects of meteorological factors, while the backward layer accounts for characteristics such as the natural decay patterns of irradiance during sunset phases. Meanwhile, the self-attention mechanism of the Transformer encoder achieves precise tracking of diverse physical phenomena by identifying both global baseline trends and short-term fluctuations within long sequences. Photovoltaic power exhibits extreme volatility, particularly in cloudy and overcast conditions where output becomes highly random. This results in a hyperparameter search space containing numerous local optima. The Alpha operator in the AE algorithm balances global exploration and local refinement through a dynamic decay factor, enabling it to locate global optima more effectively than traditional algorithms. In interval forecasting, ABKDE effectively addresses the non-uniform distribution of photovoltaic prediction errors. By selecting a bandwidth based on error distribution characteristics, it generates reliable and compact prediction intervals. This approach reduces the need for excessive spinning reserve capacity in engineering applications, thereby lowering operational costs and preventing curtailment caused by forecasting deviations.

Although the proposed AE-BiLSTM–Transformer–ABKDE framework demonstrates high accuracy and reliability, its applicability boundaries and limitations must be explicitly acknowledged to guide future engineering practices.

First, the current experimental validation is based solely on a single site dataset from the DKASC and originates from a single year. The model’s generalization capability under different climatic conditions, geographic locations, and years remains unvalidated. Second, although L2 regularization and dropout strategies were employed during BiLSTM–Transformer network training, the high structural complexity and vast parameter space of this hybrid model may still lead to overfitting when applied to newly constructed PV plants with limited historical data or significantly different data distributions. Third, while the model demonstrates robustness under typical sunny, cloudy, and overcast weather scenarios, its sensitivity and prediction stability under extreme meteorological anomalies remain understudied. Furthermore, while a small number of missing values can be addressed using standard interpolation methods, the resilience of this framework in handling prolonged continuous data loss—such as 10 to 20 missing time steps caused by sensor failure—has not yet been comprehensively studied.

Therefore, to address the aforementioned limitations, our future research will focus on the following aspects. First, we will employ multi-site, multi-year datasets for cross-validation to ensure generalization capabilities. Additionally, to mitigate overfitting risks in data-scarce scenarios, we plan to incorporate transfer learning techniques and explore adaptive adjustments for lightweight networks. Moreover, we will validate the stability of predictive models under persistent data gaps across various meteorological conditions. Finally, future work will concentrate on applying predictive models to extreme weather scenarios, enhancing robustness under extreme conditions, and designing mechanisms capable of adapting to severe instantaneous anomalies.

Author Contributions

Conceptualization, L.L.; methodology, L.L.; software, L.L.; validation, L.L.; formal analysis, L.L.; investigation, L.L.; resources, Z.L.; data curation, L.L.; writing—original draft preparation, L.L.; writing—review and editing, Z.L.; visualization, L.L.; supervision, Z.L.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanghai Key Laboratory of Power Station Automation Technology, grant number 13DZ2273800.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Behera, M.K.; Nayak, N. A comparative study on short-term PV power forecasting using decomposition based optimized extreme learning machine algorithm. Eng. Sci. Technol. Int. J. 2020, 23, 156–167. [Google Scholar] [CrossRef]
Oprea, S.; Bâra, A. Ultra-short-term forecasting for photovoltaic power plants and real-time key performance indicators analysis with big data solutions. Two case studies—PV Agigea and PV Giurgiu located in Romania. Comput. Ind. 2020, 120, 103230. [Google Scholar] [CrossRef]
Wang, J.; Zhou, Y.; Li, Z. Hour-ahead photovoltaic generation forecasting method based on machine learning and multi objective optimization algorithm. Appl. Energy 2022, 312, 118725. [Google Scholar] [CrossRef]
William, V.; Elmira, J.; Gokul, S.T.; Mehdi, S.; Tey, K.S.; Ben, H.; Saad, M.; Alex, S. Short-term PV power forecasting using hybrid GASVM technique. Renew. Energy 2019, 140, 367–379. [Google Scholar] [CrossRef]
Rehman, A.; Batool, Z.; Ain, Q.U.; Ma, H. The renewable energy challenge in developing economies: An investigation of environmental taxation, financial development, and political stability. Nat. Resour. Forum 2025, 49, 0165–0203. [Google Scholar] [CrossRef]
Hong, G.; Wu, G.; Jin, Y.; Xie, H.; Ju, P.; Liang, B. Review on Research of Modeling and Simulation for Wind Power Generation in Power System. Autom. Electr. Power Syst. 2024, 48, 22–36. [Google Scholar]
Sobri, S.; Koohi-Kamali, S.; Abd, R.N. Solar photovoltaic generation forecasting methods: A review. Energy Convers. Manag. 2018, 156, 459–497. [Google Scholar] [CrossRef]
Mayer, M.J.; Gróf, G. Extensive comparison of physical models for photovoltaic power forecasting. Appl. Energy 2021, 283, 116239. [Google Scholar] [CrossRef]
Zheng, Z.; Chen, Y.; Huo, M.; Zhao, B. An overview: The development of prediction technology of wind and photovoltaic power generation. Energy Proc. 2011, 2, 601–608. [Google Scholar] [CrossRef]
Wang, L.J.; Liu, T.M.; Wang, B.; Hao, Y.; Wang, Z.; Zhuang, Y.P. Short-term wind power prediction based on SVD and kalman filter correction of multi-position NWP. Acta Energ. Sol. Sin. 2022, 43, 392–398. [Google Scholar]
Yang, C.; Thatte, A.A.; Xie, L. Multitime-Scale Data-Driven Spatio-Temporal Forecast of Photovoltaic Generation. IEEE Trans. Sustain. Energy 2015, 6, 104–112. [Google Scholar] [CrossRef]
Li, Y.T.; Su, Y.; Shu, L.J. An ARMAX model for forecasting the power output of a grid connected photovoltaic system. Renew. Energy 2014, 66, 78–89. [Google Scholar] [CrossRef]
Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent Neural Network Regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
Sun, Y. Research on Network Traffic Forecast Based on ARIMA and LSTM. Comput. Sci. Appl. 2020, 10, 1834–1842. [Google Scholar]
Wang, Y.; Liao, W.; Chang, Y. Gated Recurrent Unit Network-Based Short-Term Photovoltaic Forecasting. Energies 2018, 11, 2163. [Google Scholar] [CrossRef]
Zhen, H.; Niu, D.; Wang, K.; Shi, Y.; Ji, Z.; Xu, X. Photovoltaic power forecasting based on GA improved Bi-LSTM in microgrid without meteorological information. Energy 2021, 231, 120908. [Google Scholar] [CrossRef]
Wang, K.; Du, H.; Jia, R.; Liu, H.; Liang, Y.; Wang, X. Short-term interval probability prediction of photovoltaic power based on similar daily clustering and QR-CNN-BiLSTM model. High Volt. Eng. 2022, 48, 4372–4388. [Google Scholar]
Chen, J.; Liu, B.; Lin, W.; Zheng, J.; Xie, J. Survey of Transformer-based Time Series Forecasting Methods. Comput. Sci. 2025, 52, 96–105. [Google Scholar]
Tian, F.; Fan, X.; Wang, R.; Qin, H.; Fan, Y. A Power Forecasting Method for Ultra-Short-Term Photovoltaic Power Generation Using Transformer Model. Math. Probl. Eng. 2022, 15, 9421400. [Google Scholar] [CrossRef]
Chen, Z. An intelligent prediction system for maritime microwave channels based on a hybrid LSTM-Transformer model. Wirel. Internet Technol. 2025, 22, 17–20. [Google Scholar]
Yang, S.; Luo, Y. Short-term photovoltaic power prediction based on RF-SGMD-GWO-BiLSTM hybrid models. Energy 2025, 316, 134545. [Google Scholar] [CrossRef]
Yu, M.; Niu, D.; Wang, K.; Du, R.; Yu, X.; Sun, L.; Wang, F. Short-term photovoltaic power point-interval forecasting based on double-layer decomposition and WOA-BiLSTM-Attention and considering weather classification. Energy 2023, 275, 127348. [Google Scholar] [CrossRef]
Lai, G.; Li, T.; Shi, B. RRT-Based optimizer: A novel metaheuristic algorithm based on rapidly-exploring random trees algorithm. IEEE Access 2025, 13, 42744–42776. [Google Scholar] [CrossRef]
Gu, B.; Shen, H.; Lei, X.; Hu, H.; Liu, X. Forecasting and uncertaintyanalysis of day-ahead photovoltaic power using a novel forecasting method. Appl. Energy 2021, 299, 117291. [Google Scholar] [CrossRef]
Li, Q.; Wang, J.; Zhang, H. A wind speed interval forecasting system based on constrained lower upper bound estimation and parallel feature selection. Knowl.-Based Syst. 2021, 231, 107435. [Google Scholar] [CrossRef]
Jiang, Y.; Wang, X.; Yang, D.; Cheng, R.; Zhao, Y.; Liu, D. An adaptive photovoltaic power interval prediction based on multi-objective optimization. Comput. Electr. Eng. 2024, 120, 109717. [Google Scholar] [CrossRef]
Sluijterman, L.; Cator, E.; Heskes, T. Optimal training of mean variance estimation neural networks. Neurocomputing 2024, 597, 127929. [Google Scholar] [CrossRef]
Niu, D.; Sun, L.; Yu, M.; Wang, K. Point and interval forecasting of ultra-short-term wind power based on a data-driven method and hybrid deep learning model. Energy 2022, 254, 124384. [Google Scholar] [CrossRef]
Jiang, Y.; Huang, G.; Yang, Q.; Yan, Z.; Zhang, C. A novel probabilistic wind speed prediction approach using real time refined variational model decomposition and conditional kernel density estimation. Energy Convers. Manag. 2019, 185, 758–773. [Google Scholar] [CrossRef]
Zhou, B.; Ma, X.; Luo, Y.; Yang, D. Wind Power Prediction Based on LSTM Networks and Nonparametric Kernel Density Estimation. IEEE Access 2019, 7, 165279–165292. [Google Scholar] [CrossRef]
Gao, H.; Zhang, Q. Alpha evolution: An efficient evolutionary algorithm with evolution path adaptation and matrix generation. Eng. Appl. Artif. Intell. 2024, 137, 109202. [Google Scholar] [CrossRef]
Su, Q.; Lu, H.; Yin, X.; Lu, Q.; Yan, J. Hybrid point-interval prediction method for stochastic dynamic response of subsea umbilical cable based on BO-BiLSTM and adaptive bandwidth KDE. Ocean Eng. 2025, 320, 120317. [Google Scholar] [CrossRef]

Figure 1. The flowchart of AE.

Figure 2. BiLSTM architecture.

Figure 3. Transformer encoder model structure.

Figure 5. Correlation between PV power and various meteorological factors.

Figure 6. Schematic representation of the three types of similar day.

Figure 7. Point prediction results for sunny day.

Figure 8. Point prediction results for cloudy day.

Figure 9. Point prediction results for overcast days.

Figure 10. Three kinds of weather ((a) sunny; (b) cloudy; (c) overcast) interval prediction based on ABKDE.

Table 1. Parameters setting of models.

No.	Model	Parameter	Value
1	LSTM	Number of hidden layer nodes	20
		Learning rate	1 × 10⁻⁴
		L2 regularization coefficient	1 × 10⁻⁵
2	BiLSTM	Number of hidden layer nodes	20
		Learning rate	1 × 10⁻⁴
		L2 regularization coefficient	1 × 10⁻⁵
3	Transformer	Number of heads in self-attention	3
		Learning rate	1 × 10⁻⁴
		L2 regularization coefficient	1 × 10⁻⁵
4	LSTM–Transformer	Number of heads in self-attention	3
		Number of hidden layer nodes	20
		Learning rate	1 × 10⁻⁴
		L2 regularization coefficient	1 × 10⁻⁵
5	BiLSTM–Transformer	Number of heads in self-attention	3
		Number of hidden layer nodes	20
		Learning rate	1 × 10⁻⁴
		L2 regularization coefficient	1 × 10⁻⁵
6	GWO-BiLSTM– Transformer	The initial hyperparameters are the same as above. The optimal parameters are obtained via the GWO algorithm.
7	WOA-BiLSTM– Transformer	The initial hyperparameters are the same as above. The optimal parameters are obtained via the WOA.
8	AE-BiLSTM– Transformer	The initial hyperparameters are the same as above. The optimal parameters are obtained via AE algorithm.

Table 2. Comparison of the performance of 8 point prediction models for sunny days.

Model	R²	MAE	RMSE
1	0.827	0.638	0.755
2	0.892	0.442	0.596
3	0.870	0.459	0.654
4	0.912	0.377	0.537
5	0.959	0.283	0.370
6	0.965	0.241	0.340
7	0.971	0.204	0.309
8	0.994	0.089	0.137

Table 3. Comparison of the performance of 8 point prediction models for cloudy days.

Model	R²	MAE	RMSE
1	0.808	0.927	1.130
2	0.879	0.707	0.898
3	0.857	0.888	0.975
4	0.891	0.666	0.855
5	0.924	0.532	0.713
6	0.946	0.492	0.601
7	0.955	0.355	0.550
8	0.989	0.145	0.260

Table 4. Comparison of the performance of 8 point prediction models for overcast days.

Model	R²	MAE	RMSE
1	0.660	0.744	0.941
2	0.749	0.654	0.809
3	0.799	0.575	0.723
4	0.844	0.512	0.637
5	0.854	0.502	0.616
6	0.935	0.337	0.412
7	0.932	0.372	0.420
8	0.974	0.172	0.261

Table 5. Stability analysis of MAE over 5 independent runs across different weather conditions.

Weather Condition	Mean of MAE	Standard Deviation
Sunny	0.089	0.012
Cloudy	0.145	0.023
Overcast	0.172	0.028

Table 6. Interval prediction performance of ABKDE for 3 weather conditions.

Method	Weather	Confidence Interval	PICP	PINAW
ABKDE	Sunny	95%	0.959850	0.076053
		90%	0.921053	0.037815
		75%	0.770677	0.025211
	Cloudy	95%	0.958647	0.119257
		90%	0.917293	0.085676
		75%	0.759398	0.036970
	Overcast	95%	0.954887	0.156623
		90%	0.917293	0.114989
		75%	0.770677	0.064833

Table 7. Interval prediction performance of KDE for 3 weather conditions.

Method	Weather	Confidence Interval	PICP	PINAW
KDE	Sunny	95%	0.954887	0.155564
		90%	0.924812	0.153242
		75%	0.770677	0.027862
	Cloudy	95%	0.932331	0.132928
		90%	0.906015	0.115112
		75%	0.751880	0.038332
	Overcast	95%	0.928571	0.200867
		90%	0.917293	0.194959
		75%	0.766917	0.082032

Table 8. Stability analysis of PICP and PINAW of ABKDE over 5 independent runs across different weather conditions.

Weather	Confidence Interval	Mean of PICP	Std Dev of PICP	Mean of PINAW	Std Dev of PINAW
Sunny	95%	0.959850	0.003	0.076053	0.006
	90%	0.921053	0.008	0.037815	0.008
	75%	0.770677	0.004	0.025211	0.002
Cloudy	95%	0.958647	0.007	0.119257	0.012
	90%	0.917293	0.012	0.085676	0.009
	75%	0.759398	0.006	0.036970	0.002
Overcast	95%	0.954887	0.004	0.156623	0.014
	90%	0.917293	0.011	0.114989	0.013
	75%	0.770677	0.009	0.064833	0.005

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, L.; Li, Z. A Hybrid Interval Prediction Framework for Photovoltaic Power Prediction Using BiLSTM–Transformer and Adaptive Kernel Density Estimation. Appl. Sci. 2026, 16, 3023. https://doi.org/10.3390/app16063023

AMA Style

Li L, Li Z. A Hybrid Interval Prediction Framework for Photovoltaic Power Prediction Using BiLSTM–Transformer and Adaptive Kernel Density Estimation. Applied Sciences. 2026; 16(6):3023. https://doi.org/10.3390/app16063023

Chicago/Turabian Style

Li, Laiyuan, and Zhibin Li. 2026. "A Hybrid Interval Prediction Framework for Photovoltaic Power Prediction Using BiLSTM–Transformer and Adaptive Kernel Density Estimation" Applied Sciences 16, no. 6: 3023. https://doi.org/10.3390/app16063023

APA Style

Li, L., & Li, Z. (2026). A Hybrid Interval Prediction Framework for Photovoltaic Power Prediction Using BiLSTM–Transformer and Adaptive Kernel Density Estimation. Applied Sciences, 16(6), 3023. https://doi.org/10.3390/app16063023

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Interval Prediction Framework for Photovoltaic Power Prediction Using BiLSTM–Transformer and Adaptive Kernel Density Estimation

Abstract

1. Introduction

2. Methodology

2.1. Alpha Evolution Algorithm

2.1.1. Initialization and Evolution Matrix

2.1.2. Alpha Operator

2.1.3. Boundary Constraint and Selection

2.2. Bidirectional Long Short-Term Memory (BiLSTM)

2.3. Transformer

2.4. Adaptive Bandwidth Kernel Density Estimation

2.5. Model Evaluation Metrics

2.6. The Framework of the Interval Prediction Model

3. Results

3.1. Data Description

3.2. Data Processing

3.3. Correlation Analysis

3.4. Similar Day Clustering

3.5. Short-Term Photovoltaic Power Point Prediction

3.5.1. Training Process of Prediction Model

3.5.2. Comparison of Prediction Model

3.5.3. Case 1: Sunny Days

3.5.4. Case 2: Cloudy Days

3.5.5. Case 3: Overcast Days

3.6. Interval Prediction

4. Conclusions and Discussion

4.1. Conclusions

4.2. Discussions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI