Short-Term Photovoltaic Power Forecasting Using a Bi-LSTM Neural Network Optimized by Hybrid Algorithms

Wang, Jibo; Zhang, Zihao; Xu, Wenhao; Li, Yijin; Niu, Geng

doi:10.3390/su17125277

Open AccessArticle

Short-Term Photovoltaic Power Forecasting Using a Bi-LSTM Neural Network Optimized by Hybrid Algorithms

by

Jibo Wang

¹,

Zihao Zhang

¹,

Wenhao Xu

¹,

Yijin Li

^1,* and

Geng Niu

²

¹

School of Mechanical and Electrical Engineering, China University of Mining and Technology-Beijing, Haidian District, Beijing 100083, China

²

State Grid Shanghai Energy Interconnection Research Institute, China Electric Power Research Institute, Haidian District, Beijing 100192, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(12), 5277; https://doi.org/10.3390/su17125277 (registering DOI)

Submission received: 6 May 2025 / Revised: 2 June 2025 / Accepted: 4 June 2025 / Published: 7 June 2025

(This article belongs to the Topic Solar Forecasting and Smart Photovoltaic Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Photovoltaic (PV) power generation is characterized by high fluctuation and intermittency. The accurate forecasting of PV power is crucial for optimizing grid operation and scheduling. Thus, a novel short-term PV power-forecasting method based on genetic algorithm-adaptive multi-objective differential evolution (GA-AMODE)-optimized bidirectional long short-term memory (BiLSTM) is proposed. Firstly, a data preprocessing method, including principal component analysis, a sliding window mechanism, and Gaussian noise injection, is designed to achieve dimension reduction and data robustness. Then, a GA-AMODE-BiLSTM model for PV power forecasting is proposed. GA and AMODE algorithms are integrated to balance global and local searching processes during the optimization of the BiLSTM network’s hyperparameters. Bi-LSTM is more suitable for complex time series tasks involving long-term dependencies and asymmetric relationships. The forecasting method is evaluated by typical indexes and is statistically tested. Comparative experiments using the same dataset across various models have been performed. The results show that the proposed GA-AMODE-BiLSTM model significantly outperforms other models in forecasting accuracy. Additionally, its superior stability and generalization is demonstrated, making the proposed method an effective tool for optimizing the management of renewable energy generation and enhancing the sustainability of energy systems.

Keywords:

short-term photovoltaic power forecasting; GA-AMODE-BiLSTM model; data preprocessing; hyperparameter optimization; energy sustainability

1. Introduction

Solar energy, as a clean and renewable resource, is playing an increasingly vital role in the global energy structure. Due to the serious challenges of climate change and environmental degradation, taking full advantage of sustainable energy resources and achieving green, low-carbon development has become a global consensus [1]. The proposal of the Paris Agreement [2] and “dual carbon” targets [3] push forward the reduction in greenhouse gas emissions and accelerate the integration of renewable energy. Under these circumstances, photovoltaic (PV) power generation has emerged as a key driver of the global energy transition, owing to its advantages of low carbon emissions, pollution-free operation, and abundant resource availability. Thus, the installation and integration of PV systems is growing rapidly [4,5,6]. However, the output of PV power generation is fluctuating and intermittent, which is largely influenced by factors such as solar radiation, temperature, and humidity. The uncertainty of PV power generation poses significant challenges to grid stability and energy management. Therefore, the accurate prediction of PV power generation is crucial for maximizing integration capacity, ensuring efficient power dispatch, and optimizing energy utilization [7,8,9].

Generally, photovoltaic power generation forecasting methods are classified into indirect methods [10] and direct methods [11,12]. The indirect forecasting method, also known as the physical forecasting method, is based on weather forecast information provided by meteorological stations. It requires the data of solar irradiance, wind speed, humidity, and temperature. The installation angle and location of photovoltaic panels, as well as the conversion efficiency of solar cells, are also needed. A physical model is constructed with the above parameters. Photovoltaic power generation for a specific time period can be calculated using the physical model and relevant formulas [13,14]. AlamS et al. [15] employed three broadband irradiance models to predict solar irradiance at four sites in India. The accuracy of each model was evaluated by comparing the calculated direct normal irradiance and global irradiance with reference and measured values. TaoK et al. [16] combined transformer-based physical modeling with data-driven forecasting methods by introducing PV-related transformer variables (such as current, voltage, etc.) as key features and using physical parameters to model the relationship between environmental conditions and PV module output. Finally, historically observed data and weather forecasting data are used in conjunction with a data-driven temporal feature extraction network to achieve multi-step forecasting. In another study, a dynamic physical modeling method called PVPro to achieve high-accuracy short-term power forecasting is introduced [17]. It adjusts the model parameters based on recent production data to convert environmental data into PV system output power. SinglaP et al. [18] and ArimatsuK et al. [19] constructed approximate physical models of photovoltaic power generation and established equivalent circuit models of photovoltaic arrays. These models calculate the output power of photovoltaic systems based on the electrical characteristics of solar cells and environmental factors, which are also combined with variables such as solar irradiance and temperature. However, although the indirect forecasting method does not require detailed historical data to train the forecasting model, it relies heavily on the detailed geographic information of power stations and accurate meteorological data. Moreover, physical formulas have certain errors. These errors lead to poor resistance to interference and weak robustness [10].

Direct forecasting methods include statistical forecasting methods and artificial intelligence forecasting methods. Statistical forecasting methods use historical data on weather, solar radiation, etc., to perform curve-fitting and establish an input–output mapping model for photovoltaic power generation forecasting [20]. In an experimental study, a method based on the Markov chain model to predict the short-term output power of photovoltaic systems is presented [21]. By analyzing historical data from photovoltaic power stations, a mathematical statistical model was constructed, which effectively improved the accuracy of output power prediction under dynamic changes in the solar environment. LiY et al. [22] propose an ARMAX model, which combines historical power output data from photovoltaic systems with external meteorological factors to predict the short-term output power of grid-connected photovoltaic systems. YooE et al. [23] introduced a federated learning algorithm based on fuzzy clustering, which improves the accuracy and flexibility of solar power forecasting by clustering data between distributed generators. Since statistical forecasting methods do not need to consider the installation conditions of photovoltaic panels and the conversion efficiency of photovoltaic arrays, they are simpler to implement compared with physical methods. However, most statistical forecasting methods are linear, which is unfavorable for long-term or large-scale photovoltaic power generation prediction. The models rely on large amounts of effective historical data, which limits the prediction performance.

With the rapid developments in artificial intelligence (AI) technologies, their applications in PV power forecasting have attracted extensive attention and research. Currently, forecasting methods include support vector machines (SVM) in machine learning [20], shallow neural network models, and deep learning models [24,25]. A BP neural network model was constructed [26] using historical power generation data from PV systems and related meteorological data to perform short-term power generation forecasting. The model utilized the strong nonlinear processing capability of a neural network to effectively address the complex influencing factors in PV power generation, thereby improving forecasting accuracy and model robustness. KhanW et al. [27] employed an ensemble stacking model based on deep learning for short-term solar PV power forecasting. This method combined gradient boosting decision trees with artificial neural network and long short-term memory (LSTM) network. By stacking different forecasting models, the strengths of each model can be integrated, improving overall forecasting accuracy. NeshatM et al. [28] proposed a hybrid recurrent network combining deep residual learning network and gated long short-term memory recurrent network. This method used residual learning to enhance the neural network’s ability to process time series data, while the synergistic architecture of gated recurrent units (GRU) and LSTM was used to capture complex nonlinear dynamic patterns.

However, accurately predicting PV power generation using AI models is highly dependent on initial parameter settings. Manually tuning these parameters consumes too much time. After training, the model may still suffer from overfitting or underfitting, resulting in poor robustness. Therefore, the optimization of model parameters is an important research area. Reference [19] proposed an improved squirrel search algorithm with multiple strategies to optimize the kernel function parameters and penalty coefficients of the SVM. Experimental results show that the forecasting accuracy of the proposed model outperformed that of other traditional models. A hybrid model combining GRU and SVM for short-term PV power forecasting is introduced in [29]. The authors used GRU to process time series data and then employed SVM for final regression prediction. The ant colony algorithm was used to optimize the hyperparameters of the hybrid model. AlrashidiM et al. [30] originated a novel forecasting framework based on a hybrid data-driven model that integrates support vector regression and an artificial neural network with different metaheuristic optimization algorithms (social spider optimization, particle swarm optimization, and cuckoo search optimization) for comparison. The experimental results demonstrate that optimal choices for hyperparameters and structure play a crucial role in achieving accurate prediction results. It is worth noting that current research tends to focus on optimizations using a single algorithm. The inherent deficiency of a single algorithm, such as local convergence or premature convergence, may result in suboptimal final solutions.

In summary, to optimize hyperparameters and improve forecasting accuracy, a short-term PV power-prediction method based on a hybrid genetic algorithm-adaptive multi-objective differential evolution (GA-AMODE) algorithm-optimized bidirectional long short-term memory (BiLSTM) network. The main contribution of this work is summarized as follows.

(1) Considering the PV power forecasting problem, a series of data preprocessing procedures are employed, including missing value handling, Min–Max normalization, principal component analysis (PCA), sliding window mechanism, and Gaussian noise. PCA reduces data dimensions and assigns the weights for features. It reduces the impact of redundant features with low contributions to prediction accuracy. The injection of Gaussian noise enhances data robustness. Thus, the data hold better quality and consistency for the subsequent power forecasting.

(2) The key parameters of the BiLSTM neural network are the number of neurons and the learning rate. GA holds great global searching ability, which is suitable for optimization in complex searching spaces. AMODE shows better local searching ability through differential evolution and adaptive mechanisms. The integration of GA and AMODE simultaneously achieves global searching and local refining, which improves the learning performance of BiLSTM.

(3) Due to its bidirectional characteristics, BiLSTM is suitable for strong temporal dependent tasks. With the hyperparameters optimized using GA-AMODE, the better capability of data training and learning is obtained. The proposed GA-AMODE-BiLSTM model can achieve accurate short-term PV power forecasting with a correlation coefficient of 0.990.

(4) Using R², RMSE, and MAE as evaluation indexes, the forecasting accuracy of other models is compared to that of the proposed model. A T-test and KS-test are performed in tandem to further validate the model’s generalization ability. The results indicate that the proposed method has superior forecasting performance.

The remainder of this paper is arranged as follows. The data preprocessing method is described in Section 2. The modeling of GA-AMODE-BiLSTM is elaborated in Section 3. The forecasting simulation results are presented and discussed in Section 4. The conclusion is expressed in Section 5.

2. Data Preprocessing

Data preprocessing is a crucial procedure for forecasting, directly related to the accuracy and stability of the model. According to the characteristics of the original data and prediction requirements, a series of data preprocessing operations is adopted, including missing value handling, Min–Max normalization, PCA, sliding window, and Gaussian noise. This preprocessing step can tackle missing data, unify data scales, reduce data dimensions, assign weights, and enhance data robustness, respectively. After preprocessing, the data will hold better quality and consistency, which is more applicable for the subsequent BiLSTM prediction model.

2.1. Handling of Missing Values

In time series data, missing data can lead to inaccurate model training and distort experimental results. “Fillna” is a commonly used method in data processing, especially for handling missing values in DataFrame or series objects. It is used to replace missing values with a specified value or other calculated results. The “Fillna” method adopts the forward fill strategy. The prior data in the time series is used to fill in missing values rapidly, thereby ensuring data continuity and consistency. The calculation equation is as follows:

x_{t} = \{\begin{array}{l} x_{t} & if x_{t} is not missing \\ x_{t - 1} & if x_{t} is missing and x_{t - 1} is not missing \\ ⋮ \\ x_{t - n} & if x_{t}, x_{t - 1}, \dots, x_{t - n + 1} are missing and x_{t - n} is not missing \end{array}

(1)

where

x_{t}

is the data at t moment. If the value is not missing, it remains unchanged. If the value is missing, it is filled with the nearest preceding non-missing value (as

x_{t - 1}

or an earlier value).

2.2. Min–Max Normalization

Normalization prevents features with larger ranges from dominating model training. It can also accelerate the convergence speed and improve both training efficiency and model stability. All data are rescaled to a consistent dimension, ensuring that all feature values are within similar ranges.

The Min–Max normalization method adopted in this paper is a technique that linearly transforms the data into a specified range (usually [0, 1]). It matches BiLSTM activation function requirements for the input data range. Importantly, it preserves the shape of the original data distribution, which is crucial for algorithms that rely on the relationships between data features. As shown in Equation (2),

x^{'} = \frac{x - \min (x)}{\max (x) - \min (x)}

(2)

wherein, x is the original data, and x′ is the normalized value. max(x) and min(x) are the maximum and minimum values in the dataset, respectively.

2.3. Principal Component Analysis

PCA is a commonly used dimensionality reduction method in data preprocessing. In time series forecasting, it can serve as an auxiliary tool to reduce noise. Owing to the fact that the original input data contains many related features, PCA can retain important information from the original data, thereby improving the computational efficiency and performance of the model.

The basic idea of PCA is to find a new set of basis vectors, which are called “principal components.” These principal components are ordered based on the variance in the dataset. The first principal component captures the largest variance in the data, the second principal component captures the largest variance from the remaining portion, and so on. The calculation steps are as follows.

(1) Data standardization

X_{centered} = X - mean (X)

(3)

Wherein

X \in ℝ^{m \times n}

, and m is the number of samples. n is the number of features. mean(X) represents the mean value of each feature in the data matrix X.

(2) Calculating the covariance matrix

Σ = \frac{1}{m} X_{centered}^{T} X_{centered}

(4)

Calculate the eigenvalues

λ_{i}

and the corresponding eigenvectors

v_{i}

of the covariance matrix

Σ

. Then, select the eigenvectors

v_{1}, v_{2}, \dots, v_{k},

corresponding to the top k largest eigenvalues to form a new feature space.

(3) Data projection onto the new feature space

X_{reduced} = X_{centered} \cdot V_{k}

(5)

Wherein,

V_{k}

=

v_{1}, v_{2}, \dots, v_{k}

is the matrix composed of the top k eigenvectors.

X_{reduced}

is the data after dimensionality reduction.

2.4. Time Series Processing

The sliding window is a technique used for processing and modeling time series data, which effectively captures local patterns and changing trends from the sequence. The core idea is to divide the time series data into multiple consecutive fixed-length windows, with each window serving as an independent input data sample for model training or prediction. By adjusting the starting and ending points of the windows, data can be processed and analyzed efficiently. Based on the time series data with short sampling intervals in this paper, the technique enables the model to learn as much as possible about the trends and patterns in the time series, thereby improving prediction accuracy. These steps are shown in Figure 1.

2.5. Gaussian Noise

Gaussian noise, also known as normal noise, is a type of random noise that follows normal distribution (Gaussian distribution), as shown in Equation (6). In the field of machine learning, Gaussian noise is often used for data augmentation. Due to the uncertainty of solar power output, adding Gaussian noise to the input data when training a neural network for prediction can improve the model’s robustness and generalization. It prevents overfitting and enhances prediction accuracy.

P (x) = \frac{1}{\sqrt{2 π σ^{2}}} e x p (- \frac{{(x - μ)}^{2}}{2 σ^{2}})

(6)

Therein, x is the random variable,

μ

is the mean value,

σ^{2}

is the variance.

3. Modeling of PV Power Forecasting

The PV prediction model based on GA-AMODE-BiLSTM efficiently combines the advantages of BiLSTM network, GA, and AMODE algorithm. The modeling scheme not only performs efficient global searching, but also has strong local optimization capabilities. First, the BiLSTM network is a key component in time series data prediction. Compared to traditional LSTM, BiLSTM can learn from both past and future states, making it more suitable for tasks with strong temporal dependencies, such as photovoltaic power prediction. However, the performance of BiLSTM model strongly depends on the selection of hyperparameters, such as the learning rate and the number of neurons in the hidden layers. These two parameters strongly affect the fitting ability, decision boundaries, and generalization ability of the neutron network, which needs to be optimized. Thus, GA and AMODE are employed. The GA has strong global searching capabilities, which is suitable to finding optimal solutions in complex searching spaces. Thus, it is used for global hyperparameter tuning. However, GA may overlook some excellent individuals during a more elaborate local searching process. The AMODE, well-suited for bi-objective optimization tasks, holds fast and accurate local searching capabilities. It can perform refined searches within the neighborhood of the solutions through differential evolution and adaptive mechanisms. By jointly using GA and AMODE for hyperparameter tuning, the optimization efficiency of the BiLSTM model is significantly improved, ensuring that an optimal solution is found and further refined.

3.1. BiLSTM Neural Network

The Long Short-Term Memory (LSTM) neural network is a special type of recurrent neural network (RNN). It is primarily used for processing and predicting time series data. It addresses the issues of gradient vanishing and exploding that traditional RNNs face when dealing with long sequences. By designing a unique cell structure, LSTM can effectively remember long-term dependency information while ignoring irrelevant information. The structure of LSTM is shown in Figure 2.

The core of the LSTM network is the cell state, which controls the flow of information through gating mechanisms. As shown in Figure 2, it mainly consists of an input gate, a forget gate, and an output gate. The input gate updates the cell state primarily through the sigmoid activation function and tanh activation function layer, controlling whether the memory cell accepts the current information. This process is represented by Equations (7)–(9).

\{\begin{matrix} i_{t} = σ (W_{t} \cdot [h_{t - 1}, x_{t}] + b_{i}) \\ {\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}) \\ C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t} \end{matrix}

(7)

Sigmoid (x) = \frac{1}{1 + e^{- x}}

(8)

\tanh (x) = \frac{e x p (x) - e x p (- x)}{e x p (x) + e x p (- x)}

(9)

wherein i_t is the output of the input gate.

σ

is the sigmoid function shown in Equation (8). W is the weight coefficient matrix. x is the input vector. b is the bias term.

{\tilde{C}}_{t}

is the output of the tanh functions shown in Equation (9),

C_{t}

is the output of the memory cell at the current time step, and h_t−1 is the output state of the neuron from the previous time step (t − 1).

The value input to the forget gate is the current input and the previous hidden state. The function of the forget gate is to decide whether to retain or discard the information. This process can identify relevant feature information from the solar power output data and filter out irrelevant information, as shown in Equation (10).

f_{t} = σ (W_{f} [h_{t - 1}, x_{i}] + b_{f})

(10)

wherein f_t is the output of the forget gate.

The previously input information is contained in the hidden state. The output gate determines what information to output as the value of the next hidden state, as expressed in Equation (11).

\{\begin{matrix} o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}) \\ h_{t} = o_{t} * \tanh (C_{t}) \end{matrix}

(11)

wherein o_t is the output of the output gate.

A traditional LSTM network can only predict the output at the current time step based on the information from previous time steps, thus only capturing unidirectional dependencies. In many tasks, the LSTM network may overlook crucial information from the future. The BiLSTM network can leverage both forward and backward context information. Thus, the current result is predicted not only from past states, but also from future states. Compared to LSTM, BiLSTM is more suitable for complex time series tasks involving long-term dependencies and asymmetric relationships. By capturing information from both ends of the sequence, the model accuracy is enhanced. The structure of the BiLSTM network is shown in Figure 3.

The core idea of BiLSTM is to process the input data both forward and backward through two separate LSTM networks. As shown in Figure 3, the update equations for BiLSTM network are given by Equation (12).

\{\begin{matrix} \vec{h_{t}} = LSTM (x_{t}, \vec{h_{t - 1}}) \\ \overset{\leftarrow}{h_{t}} = LSTM (x_{t}, \overset{\leftarrow}{h_{t - 1}}) \\ y_{t} = \vec{W_{hy}} \vec{h_{t}} + \overset{\leftarrow}{W_{hy}} \overset{\leftarrow}{h_{t}} + b_{y} \end{matrix}

(12)

wherein

\vec{h_{t}}

is the output of the forward LSTM,

\overset{\leftarrow}{h_{t}}

is the output of the backward LSTM, y_t is the output of BiLSTM,

\vec{W_{hy}}

and

\overset{\leftarrow}{W_{hy}}

are output layer connection weight matrix, and b_y is the bias vector of BiLSTM.

3.2. Genetic Algorithm

GA is a heuristic searching algorithm based on the principles of natural selection and genetics, aimed at finding approximate solutions to optimization problems in complex search spaces. It gradually optimizes candidate solutions by simulating mechanisms such as selection, crossover, and mutation from the biological evolution process. The advantage of the genetic algorithm lies in its strong global searching capability, helping to avoid local optima in complex multi-modal searching spaces. The computational process of GA for hyperparameter tuning is as follows.

(1) Population initialization

The initial population is a set of solutions (individuals) that is randomly generated. If an individual is represented by a binary string of length L, and the population size is N, the initialization process can be expressed as follows:

Population = {I_{1}, I_{2}, \dots, I_{i}, \dots, I_{N}}, I_{i} = [x_{1}^{i}, x_{2}^{i}, \dots, x_{j}^{i}, \dots, x_{L}^{i}]

(13)

In the equation,

x_{j}^{i} \in {0, 1}

represents the j-th gene of the i-th individual.

(2) Fitness evaluation

The hyperparameters are optimized by GA, and the mean squared error (MSE) of the validation data is calculated as the fitness value, as shown in Equation (14).

f (I_{i}) = MSE = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i}^{val} - {\hat{y}}_{i}^{val})}^{2}

(14)

wherein N is the number of samples in the validation set and

y_{i}^{val}

and

{\hat{y}}_{i}^{val}

are the true values and predicted values of the validation set, respectively.

(3) Selection, crossover, mutation

The fitness value of each individual is calculated, and the population is sorted in ascending order of fitness values using the elitism selection strategy. The top 50% of individuals are then selected to pass on to the next generation.

The single-point crossover strategy is adopted for each hyperparameter (learning rate and number of neurons), as shown in Equation (15):

\begin{matrix} O_{1} [i] = \{\begin{matrix} P_{1} [i], & if i < k \\ P_{2} [i], & if i \geq k \end{matrix} \\ O_{2} [i] = \{\begin{matrix} P_{2} [i], & if i < k \\ P_{1} [i], & if i \geq k \end{matrix} \end{matrix}

(15)

wherein

O_{1} [i]

and

O_{2} [i]

represent the gene values of offspring 1 and offspring 2 at the i-th gene position, respectively.

P_{1} [i]

and

P_{2} [i]

represent the gene values of parent 1 and parent 2 at the i-th gene position, respectively. k is the randomly selected crossover point, satisfying

1 \leq k \leq n

. n is the gene sequence length of each individual.

A dynamic mutation rate is used for mutation procedure. The mutation rate decreases gradually as the number of generation increases, as shown in Equation (16):

η = β \times (1 - \frac{g}{g s})

(16)

wherein β is the initial mutation rate, η is the current mutation, and g and gs are the current generation of the population and the total number of generations, respectively. When a random number is smaller than the mutation rate, the number of neurons and the learning rate of the individual will be randomly regenerated.

(4) Repeat steps 2 and 3. After selection, crossover, and mutation operation, replace the old population with the new offspring individuals until the termination condition is met, i.e., the maximum number of iterations is reached.

Due to the relatively weak local searching capability of GA, it may suffer from premature convergence. As a result, combining GA optimization algorithms with stronger local searching capability can yield a more accurate population, accelerate convergence, and prevent premature convergence and entrapment in local optima.

3.3. Adaptive Multi-Objective Differential Evolution

The AMODE algorithm is a novel intelligent optimization algorithm [31], primarily used for solving multi-objective optimization problems, especially two-objective optimization. It combines the multi-objective optimization and differential evolution (DE) strategies [32], balancing global and local searching capabilities through adaptive mechanisms in the solution space. When dealing with two-objective optimization problems, the AMODE algorithm typically performs better in balancing convergence and solution diversity. For two-objective cases, the Pareto front is a two-dimensional curve. The algorithm can quickly find the distribution of solutions and effectively maintain diversity on the Pareto front. Additionally, the computational complexity is relatively low. The computational process of AMODE for hyperparameter tuning is as follows.

(1) Initialize the population and external archive

An initial population is randomly generated, with each individual representing a possible solution. Each individual is composed of multiple decision variables (genes). The external archive is used to store the non-dominated solutions (Pareto front) from the current population. The external archive can be initialized as an empty set or by filtering non-dominated solutions from the initial population and storing them. This archive is then used to maintain the solution archive and store Pareto optimal solutions during subsequent population updates. The population initialization is given by Equation (17).

X_{i} = X_{\min} + rand (0, 1) \times (X_{\max} - X_{\min})

(17)

wherein X_i is the decision variables of the i-th individual. X_min and X_max are the lower limit and upper limit of the decision variables.

rand (0, 1)

is a randomly generated number uniformly distributed between [0, 1].

(2) Differential Evolution

DE is a population-based global optimization algorithm. It is commonly applied to function optimization problems in continuous spaces. Each individual is updated through mutation, crossover, and selection. These steps are repeated until the termination condition is satisfied.

The mutation step involves a linear combination of individuals in the current population. The mutant V_i for each individual is generated by Equations (18) and (19).

V_{i} = X_{r 1} + F \cdot (X_{r 2} - X_{r 3})

(18)

F = F_{\min} + rand (0, 1) \cdot (F_{\max} - F_{\min})

(19)

wherein X_r₁, X_r₂, and X_r₃ are three different individuals randomly selected from the population. F is the scaling factor used to control the magnitude of mutation.

The crossover operation recombines the genes of the mutant individual with the current individual, thereby generating more diverse candidate solutions. Binomial crossover is used for each dimension to decide whether to retain the gene of the current individual or the mutant. The calculation formulas are shown in Equations (20) and (21).

U_{i} [j] = \{\begin{array}{l} V_{i} [j], & if rand (0, 1) \leq C R or j = j_{rand} \\ X_{i} [j], & if rand (0, 1) > C R \end{array}

(20)

C R = C R_{\min} + rand (0, 1) \cdot (C R_{\max} - C R_{\min})

(21)

wherein

U_{i} [j]

is the value of the test individual U_i at the j-th dimension.

V_{i} [j]

is the value of the mutant V_i at the j-th dimension. CR is the crossover probability, which is used to control the likelihood of inheriting genes from the mutant. j_rand is a randomly selected dimension that ensures that at least one dimension comes from the mutant V_i.

The selection operation retains the superior solutions, gradually improving the overall quality of the population. A greedy selection strategy is used to choose the individual with better fitness, from the current individuals and the test individuals, to proceed to the next generation, as shown in Equation (22).

X_{i}^{new} = \{\begin{array}{l} U_{i}, & if f (U_{i}) d o m i n a t e s f (X_{i}) \\ X_{i}, & o t h e r w i s e \end{array}

(22)

wherein

X_{i}^{new}

is the individual of a new generation.

f (U_{i})

and

f (X_{i})

are the objective function values of the test individual U_i and the current individual X_i, respectively.

f (U_{i})

dominates

f (X_{i})

indicates that the test individual U_i is no worse than X_i on all objectives and is better on at least one objective.

(3) Pareto Front Maintenance

The AMODE algorithm uses an external archive to store the currently found non-dominated solutions, and it updates it periodically. After each new candidate solution is generated, it is compared to the solutions in the external archive to determine whether it belongs to the Pareto front. If the new solution is not dominated by any solution in the external archive, it is added to the archive. Additionally, to maintain the diversity of Pareto front solutions, AMODE uses crowding distance calculation to measure the distribution of the solution set, as shown in Equation (23).

d_{i} = \sum_{m = 1}^{M} \frac{f_{i + 1}^{m} - f_{i - 1}^{m}}{f_{\max}^{m} - f_{\min}^{m}}

(23)

wherein d_i is the crowding distance of individual I.

f_{i}^{m}

is the value of individual i on the m-th objective.

f_{\max}^{m}

and

f_{\min}^{m}

are the maximum and minimum values of the m-th objective, respectively.

If the number of solutions in the external archive exceeds the preset capacity limit, crowding distance is used for filtering. The solutions with larger crowding distances are removed. Ultimately, the solution set in the external archive represents the Pareto optimal set. These solutions are non-dominated, which cannot be outperformed by any other solutions across all objectives.

Owing to the DE and adaptive mechanisms, AMODE has strong local search capability. It enables more efficient exploration within the neighborhood of solutions. For example, in DE, the mutation operation generates new candidate solutions through weighted differences. As shown in Equation (18), when F is small, the newly generated candidate solutions are closer to the current solutions, thus achieving a fine local search. Additionally, AMODE adjusts the parameters of differential evolution through an adaptive mechanism, automatically balancing global and local search proportions at different search stages. It improves the chances of finding optimal solutions and avoiding missing local optima.

3.4. GA-AMODE-BiLSTM Model

A GA-AMODE based BiLSTM neural network model is established for short-term photovoltaic power output prediction. The process is illustrated in Figure 4. The photovoltaic power forecasting consists of five steps as follows.

(1) Initialization and data preprocessing: According to the original photovoltaic power data, divide the training set, validation set, and test set, followed by data preprocessing operations. Initialize the parameters of the GA-AMODE algorithm and BiLSTM, including the population size of the GA, the maximum number of iterations, the maximum number of iterations for the AMODE algorithm, the initial mutation rate, the optimization ranges for the BiLSTM network’s learning rate, and the number of neurons in the hidden layer.

(2) Optimization of hyperparameters: The number of neurons and the learning rate of the BiLSTM neural network are the optimization objectives. GA-AMODE is used to find the optimal hyperparameters.

(3) Photovoltaic power forecasting: The optimal hyperparameters are used in BiLSTM. The model is trained on the training and validation sets after data preprocessing.

(4) Forecasting evaluation and statistical test: The forecasting results are compared to the true values. The indexes are calculated and the forecasting accuracy is evaluated. The model’s generalization ability is tested using a statistical test.

3.5. Model Evaluation and Statistical Testing

3.5.1. Evaluation Indexes

The prediction accuracy of the proposed model is evaluated using three indexes: R², RMSE, and MAE. The calculation formulas are given by Equations (24)–(26).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(24)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(25)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(26)

where n is total number of samples.

3.5.2. Statistical Testing

To further evaluate the model’s stability and generalization ability and to better understand its real-world performance in practical applications, a T-test to compare mean differences and a KS-test to compare distribution differences are employed. Both methods are used to conduct statistical tests on the predicted and actual values of the model. The calculation formulas are shown in Equations (27)–(29) for the T-test and in Equations (30)–(31) for the KS-test.

t = \frac{\bar{d}}{s_{d} / \sqrt{n}}

(27)

\bar{d} = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})

(28)

s_{d} = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(d_{i} - \bar{d})}^{2}}

(29)

wherein

\bar{d}

is the average difference between the predicted values and the actual values. The S_d is the standard deviation of the two datasets. n is the sample size. After computing the T-statistic, the p-value is obtained based on the degrees of freedom (n−1). If the p-value is less than 0.05 (commonly set as the significance level), it indicates a significant difference in means between the two datasets.

D = \max | F_{1} (x) - F_{2} (x) |

(30)

F_{n} (x) = \frac{1}{n} \sum_{i = 1}^{n} I (X_{i} \leq x)

(31)

wherein

F_{1} (x)

is the empirical distribution function of the predicted values.

F_{2} (x)

is the empirical distribution function of the actual values. D is the KS statistic, which represents the maximum absolute difference between the two empirical distribution functions at any point. n is the sample size. X_i represents the i-th data point in the sample.

I (X_{i} \leq x)

is an indicator function, where the value is 1 if

X_{i} \leq x

; otherwise, the value is 0.

4. Case Analysis

4.1. Simulation Experiment Conditions

To verify the effectiveness of the proposed GA-AMODE-BiLSTM model, the photovoltaic power data of 2019 from a certain region in Xinjiang Province, China, was used. The data contained 35,040 samples, with a sampling interval of 15 min. The training set, validation set, and test set were divided in a ratio of 8:1:1. The original input features included seven variables: module temperature (feature 1), ambient temperature (feature 2), air pressure (feature 3), humidity (feature 4), total radiation (feature 5), direct radiation (feature 6), and diffuse radiation (feature 7). First, the PCA method was applied to reduce these features into three new principal components, minimizing the impact of redundant features while retaining important information from the original data. The parameter settings of the GA-AMODE algorithm and the BiLSTM network are listed in Table 1 [33,34].

For comparison, BP, LSTM, BiLSTM, GA-BiLSTM, AMODE-BiLSTM, and the proposed GA-AMODE-BiLSTM were used for prediction with the same data. Six continuous days from the last 10% of the original data were randomly selected as the test set for the experiment. The prediction accuracy of the six models was compared using three metrics: R², RMSE, and MAE. Additionally, a statistical test was designed to further evaluate the stability and generalization ability of the models.

All experiments were conducted in the Python 3.8.0 environment and completed on a personal portable computer with a 13th Gen Intel(R) Core(TM) i9-13900HX CPU, 24 cores, 32 logical processors, and 32 GB of RAM. The complete original data and the actual data in the test set is shown in Figure 5.

4.2. Analysis of PCA Dimensionality Reduction

Using a variance explained ratio of 95%, dimensionality reduction was performed on the original photovoltaic data. New principal components were generated through linear combinations of different features and used as new input features, which were fed into the sliding window. The K-means clustering results after PCA dimensionality reduction are shown in Figure 6.

As seen in Figure 6, the samples are divided into three categories, revealing significant differences under different environmental conditions. Cluster 0 (purple) is mainly concentrated in conditions of low temperature and low radiation, which are unsuitable for photovoltaic power generation. Cluster 1 (yellow) is distributed in high-radiation or high-temperature environments, indicating that these samples correspond to optimal photovoltaic power generation conditions. Cluster 2 (cyan) shows a dense distribution in relatively average environmental conditions, being neither low temperature nor high radiation. Overall, the distribution patterns of the clusters indicate that the performance of the samples in the principal component space is closely related to environmental characteristics. This pattern helps to further understand the behavior of photovoltaic systems under different environmental conditions.

The comprehensive weights of the principal component loadings (eigenvectors) after PCA dimensionality reduction are shown in Table 2. The original input features for photovoltaic power are shown in Table 3. Principal component loadings refer to the weights of the original features in the principal components, indicating how each principal component is formed by a linear combination of the original features. It reflects the contribution of each original variable to the principal components. As shown in Table 2, Principal Component 1 (PC1) is mainly dominated by radiation-related features. The radiation levels and PC1 values are in a positive correlation Principal Component 2 (PC2) is primarily dominated by ambient temperature and humidity. It is inversely correlated with ambient temperature and is positively correlated with humidity. For instance, when ambient temperature increases and humidity decreases, photovoltaic power is high, and vice versa. Principal Component 3 (PC3) is mainly dominated by the synchronous variation in ambient temperature and humidity. When both humidity and ambient temperature increase simultaneously, the principal component value rises. This indicates that, under humid and high-temperature conditions, the photovoltaic system may be affected, such as reduced efficiency due to excessively high module temperatures.

Based on the PCA loadings, the comprehensive weights of the original input features on output power are shown in Table 3. It can be seen that radiation features play a decisive role in photovoltaic power generation. Humidity is negatively correlated with photovoltaic power generation. Air pressure has almost no impact on photovoltaic power generation.

4.3. Model Simulation

4.3.1. Evaluation of Convergence

The Mean Square Error loss function is used for the GA-AMODE-BiLSTM model, as shown in Equation (32).

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(32)

where n is the number of samples, yi is the true value of the i-th sample, and

{\hat{y}}_{i}

is the predicted value of the i-th sample.

The original sample data from the training set and validation set are fed into the model to calculate the loss function. The number of training epochs was set as 50. The loss function values after each training epoch were observed and shown in Figure 7. The concrete loss value of 20, 30, 40, 45, and 50 are listed in Table 4. In the experiment, the number of neutrons and the learning rate have already been optimized. With the optimization of the GA-AMODE algorithm, the BiLSTM network’s hidden layer was configured with 47 neurons and a learning rate of 0.0088.

As shown in Figure 7, the loss function values of this model on the training set and validation set tend to be stable after 50 training epochs. It indicates that the difference between the predicted values and the true values is becoming smaller, and the model eventually converges.

4.3.2. Forecasting Result Analysis

A total of 576 sampling points are selected from the test set and fed into the model. The comparison of the predicted values and the actual values are shown in Figure 8.

As shown in Figure 8, the proposed method effectively learned the temporal features from the historical data, and the prediction results are accurate. The R² value is 0.990. The RMSE is 1.625. The MAE is 0.543. The results meet expectations. To demonstrate the superiority of this method, comparative experiments are necessary.

4.3.3. Analysis of Comparative Cases

The prediction results of the six models at the same sampling time points were used to calculate the evaluation indexes R², RMSE, and MAE. The results are shown in Table 5 and Figure 9. Among all the models, GA-AMODE-BiLSTM shows the best forecasting performance with the highest R² and the lowest RMSE and MAE. Compared to the traditional models of BP and LSTM, the RMSE and MAE are reduced by nearly 50% and 60%, respectively. When considering the optimization of hyperparameters, the forecasting accuracy increases. Comparison of different optimization algorithms, the RMSE and MAE of GA-BiLSTM are 5% and 10% lower than that of AMODE-BiLSTM, respectively. The explanation for this is that MAE reflects the overall average error. Owing to the stronger global optimization ability of GA, GA-BiLSTM shows better overall performance, which is more suitable for the forecasting of normal weather. However, AMODE holds superior local optimization ability, which prevents large errors on individual sample points. This is beneficial for the power prediction of special weather conditions with rapid output changes. Thus, by comprehensively utilizing the advantages of GA and AMODE in hyperparameters optimization, the GA-AMODE-BiLSTM shows the best forecasting performance.

Comparison curves of the true values and predicted values for the six models are shown in Figure 10. The locally magnified curves of six key periods corresponding to the gray bar in Figure 10 are shown in Figure 11.

As seen in Figure 11a–c,e, the photovoltaic output curves during these four periods are relatively smooth, corresponding to normal weather conditions. The GA-AMODE-BiLSTM model demonstrates the best fitting performance. The curves of the AMODE-BiLSTM and GA-BiLSTM models are very similar during the rising and falling phases. The latter performs better at peak positions. Subplots (d) and (f) show more fluctuating photovoltaic output curves, corresponding to special weather conditions. In these cases, the GA-AMODE-BiLSTM model still fits the trends of the actual values well. However, traditional models such as BP, LSTM, and BiLSTM are heavily dependent on normal weather conditions, demonstrating their poor adaptability to weather condition changes. The AMODE-BiLSTM and GA-BiLSTM models show better fitting performance compared to traditional models. This indicates that the optimization of hyperparameters improves the model’s adaptability to special conditions. By combining GA and AMODE, the hyperparameters are further optimized for both global searching and local searching. Thus, for all typical periods, GA-AMODE-BiLSTM provides the best photovoltaic power forecasting results.

4.4. Statistical Testing

The t-test and KS-test results are shown in Figure 12 and Figure 13. Analyzing the t-test results reveals that the T-statistic for the GA-AMODE-BiLSTM model is close to zero, indicating a minimal mean difference between the predicted and actual values, and thus a relatively high level of accuracy. Additionally, the p-values for the BP, LSTM, and BiLSTM models are all below 0.05, indicating significant differences between their predicted and actual values, whereas the GA-AMODE-BiLSTM, GA-BiLSTM, and AMODE-BiLSTM models show less significant differences. Among them, the GA-AMODE-BiLSTM model can be considered to have no significant difference from the actual values. The significant differences observed in the BP, LSTM, and BiLSTM models are due to their predictions slightly exceeding zero during periods when PV output should be zero (such as nighttime). Although the magnitudes are only small, around 10⁻¹, this discrepancy indicates lower performance in practical applications, resulting in statistically significant differences.

The KS-test results are shown in Figure 13. The KS statistic of the GA-AMODE-BiLSTM model is the closest to zero, indicating that the distribution of the predicted values is more similar to that of the actual values. Apart from the BP neural network model, the p-values of the other models are much greater than 0.05, indicating no significant difference in the distribution of predicted and actual values for these models. The p-values of GA-AMODE-BiLSTM, GA-BiLSTM, and AMODE-BiLSTM models are close to 1, indicating better fitting with the actual values. This corresponds to the high R² values in Table 5 and further validates the conclusion drawn in Section 4.3.3.

5. Conclusions

To improve the accuracy and stability of short-term photovoltaic power forecasting, a BiLSTM model optimized with a hybrid GA-AMODE algorithm is proposed. The GA-AMODE algorithm is used to optimize the number of neurons and the learning rate of BiLSTM. Due to the globally searching capability of GA and the locally searching capability of AMODE, the optimal hyperparameters are selected and the performance of the BiLSTM neural network is enhanced. The proposed method achieves significant improvements in prediction accuracy and generalization ability compared to existing methods. It shows good robustness and adaptability, particularly in handling complex environmental changes and high-frequency meteorological data noise. It shows potential in dealing with practical scenarios. In future studies, modified algorithms will be studied for faster convergence and smaller computational loads. The selection of initial conditions needs further discussion. The application of the proposed method in multi-timescale power forecasting will also be explored.

Author Contributions

Conceptualization, J.W. and Y.L.; methodology, J.W. and Y.L.; software, J.W., Z.Z., and W.X.; validation, Z.Z. and W.X.; formal analysis, Z.Z. and W.X.; investigation, Z.Z. and W.X.; resources, Z.Z. and W.X.; data curation, J.W., Z.Z., and W.X.; writing—original draft preparation, J.W.; writing—review and editing, J.W. and Y.L.; visualization, J.W.; supervision, Y.L.; project administration, G.N.; funding acquisition, G.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Fundamental Research Funds for the Central Universities (2022YQJD13).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ripple, W.J.; Wolf, C.; Newsome, T.M.; Barnard, P.; Moomaw, W.R. World Scientists’ Warning of a Climate Emergency. BioScience 2020, 70, 8–12. [Google Scholar] [CrossRef]
Rajamani, L. Ambition and Differentiation in the 2015 Paris Agreement: Interpretative Possibilities and Underlying Politics. Int. Comp. Law Q. 2016, 65, 493–514. [Google Scholar] [CrossRef]
Zhu, H.; Cao, S.; Su, Z.; Zhuang, Y. China’s future energy vision: Multi-scenario simulation based on energy consumption structure under dual carbon targets. Energy 2024, 301, 131751. [Google Scholar] [CrossRef]
Ming, B.; Liu, P.; Guo, S.; Cheng, L.; Zhou, Y.; Gao, S.; Li, H. Robust hydroelectric unit commitment considering integration of large-scale photovoltaic power: A case study in China. Appl. Energy 2018, 228, 1341–1352. [Google Scholar] [CrossRef]
Information Office of the State Council of the People’s Republic of China. China’s Energy Development in the New Era. People’s Daily, 22 December 2020.
Tian, J.; Ooka, R.; Lee, D. Multi-scale solar radiation and photovoltaic power forecasting with machine learning algorithms in urban environment: A state-of-the-art review. J. Clean. Prod. 2023, 426, 139040. [Google Scholar] [CrossRef]
Peng, C.; Xie, P.; Pan, L.; Yu, R. Flexible robust optimization dispatch for hybrid wind/photovoltaic/hydro/thermal power system. IEEE Trans. Smart Grid 2015, 7, 751–762. [Google Scholar] [CrossRef]
Antonanzas, J.; Osorio, N.; Escobar, R.; Urraca, R.; Martinez-De-Pison, F.J.; Antonanzas-Torres, F. Review of photovoltaic power forecasting. Sol. Energy 2016, 136, 78–111. [Google Scholar] [CrossRef]
Aribia, H.B.; El-Rifaie, A.M.; Tolba, M.A.; Shaheen, A.; Moustafa, G.; Elsayed, F.; Elshahed, M. Growth Optimizer for Parameter Identification of Solar Photovoltaic Cells and Modules. Sustainability 2023, 15, 7896. [Google Scholar] [CrossRef]
Lai, C.; Li, J.; Chen, B.; Huang, Y.; Wei, S. Review of photovoltaic power output prediction technology. Trans. China Electrotech. Soc. 2019, 34, 1201–1217. [Google Scholar] [CrossRef]
Louzazni, M.; Mosalam, H.; Khouya, A.; Amechnoue, K. A non-linear auto-regressive exogenous method to forecast the photovoltaic power output. Sustain. Energy Technol. Assess 2020, 38, 100670. [Google Scholar] [CrossRef]
Almonacid, F.; Pérez-Higueras, P.J.; Fernández, E.F.; Hontoria, L. A methodology based on dynamic artificial neural- network for short-term forecasting of the power output of a PV generator. Energy Convers. Manag. 2014, 85, 389–398. [Google Scholar] [CrossRef]
Leva, S.; Dolara, A.; Grimaccia, F.; Mussetta, M.; Ogliari, E. Analysis and validation of 24 hours ahead neural network forecasting of photovoltaic output power. Math. Comput. Simul. 2017, 131, 88–100. [Google Scholar] [CrossRef]
Dolara, A.; Leva, S.; Manzolini, G. Comparison of different physical models for PV power output prediction. Sol. Energy 2015, 119, 83–99. [Google Scholar] [CrossRef]
Alam, S. Prediction of direct and global solar irradiance using broadband models: Validation of REST model. Renew. Energy 2006, 31, 1253–1263. [Google Scholar] [CrossRef]
Tao, K.; Zhao, J.; Tao, Y.; Qi, Q.; Tian, Y. Operational day-ahead photovoltaic power forecasting based on transformer variant. Appl. Energy 2024, 373, 123825. [Google Scholar] [CrossRef]
Li, B.; Chen, X.; Jain, A. Enhancing power prediction of photovoltaic systems: Leveraging dynamic physical model for irradiance-to-power conversion. arXiv 2024, arXiv:2402.11897. [Google Scholar] [CrossRef]
Singla, P.; Duhan, M.; Saroha, S. A comprehensive review and analysis of solar forecasting techniques. Front. Energy 2022, 16, 187–223. [Google Scholar] [CrossRef]
Arimatsu, K.; Yoichi, S.; Hitoshi, H. Improvement of the accuracy of photovoltaic module equivalent circuit model using irradiance-dependent variable shunt resistor. In Proceedings of the International Power Electronics Conference (IPEC-Himeji 2022-ECCE Asia), Himeji, Japan, 6–9 November 2022; pp. 1354–1358. [Google Scholar] [CrossRef]
Zhu, R.; Li, T.; Tang, B. Research on short-term photovoltaic power generation forecasting model based on multi-strategy improved squirrel search algorithm and support vector machine. Sci. Rep. 2024, 14, 14348. [Google Scholar] [CrossRef]
Xu, N. A method to forecast short-term output power of photovoltaic generation system based on Markov-chain. Power Syst. Technol. 2011, 35, 152–157. [Google Scholar] [CrossRef]
Li, Y.; Su, Y.; Shu, L. An ARMAX model for forecasting the power output of a grid connected photovoltaic system. Renew. Energy 2014, 66, 78–89. [Google Scholar] [CrossRef]
Yoo, E.; Ko, H.; Pack, S. Fuzzy clustered federated learning algorithm for solar power generation forecasting. IEEE Trans. Emerg. Top. Comput. 2022, 10, 2092–2098. [Google Scholar] [CrossRef]
Korkmaz, D. SolarNet: A hybrid reliable model based on convolutional neural network and variational mode decomposition for hourly photovoltaic power forecasting. Appl. Energy 2021, 300, 117410. [Google Scholar] [CrossRef]
Wu, P. Generation power prediction of photovoltaic system. Power Gener. Technol. 2020, 41, 231–236. [Google Scholar] [CrossRef]
Chen, C.; Duan, S.; Yin, J. Design of photovoltaic array generation prediction model based on neural network. Trans. China Electrotech. Soc. 2009, 24, 153–158. [Google Scholar] [CrossRef]
Khan, W.; Walker, S.; Zeiler, W. Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy 2022, 240, 122812. [Google Scholar] [CrossRef]
Neshat, M.; Nezhad, M.M.; Mirjalili, S.; Garcia, D.A.; Dahlquist, E.; Gandomi, A.H. Short-term solar radiation forecasting using hybrid deep residual learning and gated LSTM recurrent network with differential covariance matrix adaptation evolution strategy. Energy 2023, 278, 127701. [Google Scholar] [CrossRef]
Souhe, F.G.Y.; Mbey, C.F.; Kakeu, V.J.F.; Meyo, A.E.; Boum, A.T. Optimized forecasting of photovoltaic power generation using hybrid deep learning model based on GRU and SVM. Electr. Eng. 2024, 106, 7879–7898. [Google Scholar] [CrossRef]
Alrashidi, M.; Rahman, S. Short-term photovoltaic power production forecasting based on novel hybrid data-driven models. J. Big Data 2023, 10, 26. [Google Scholar] [CrossRef]
Zhang, X.; Jin, L.; Cui, C.; Sun, J. A self-adaptive multi-objective dynamic differential evolution algorithm and its application in chemical engineering. Appl. Soft Comput. 2021, 106, 107317. [Google Scholar] [CrossRef]
Storn, R.; Price, K. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Khan, S.; Mazhar, T.; Shahzad, T.; Waheed, W.; Waheed, A.; Saeed, M.M.; Hamam, H. Optimizing deep neural network architectures for renewable energy forecasting. Discov. Sustain. 2024, 5, 394. [Google Scholar] [CrossRef]
Bashir, T.; Wang, H.; Tahir, M.; Zhang, Y. Wind and solar power forecasting based on hybrid CNN-ABiLSTM, CNN-transformer-MLP models. Renew. Energy 2025, 239, 122055. [Google Scholar] [CrossRef]

Figure 1. The processing of time series data with the sliding window.

Figure 2. The structure of the LSTM neural network.

Figure 3. Structure of the BiLSTM neural network. The ellipsis between layers represents hidden layers. The ellipsis in input layer and output layer represents input and output variables.

Figure 4. The flowchart of PV power forecasting based on GA-AMODE-BiLSTM: (1) Data preprocessing; (2) Hyperparmeter optimization; (3) Neural network training and forecasting; (4) Statistical testing.

Figure 5. The original data of photovoltaic power.

Figure 6. K-means clustering results based on PCA.

Figure 7. Curves of loss values for the training set and validation set.

Figure 8. The curves of predicted values (blue) and real values (red).

Figure 9. Comparison of evaluation indexes with different hyperparameter optimization algorithms.

Figure 10. Comparison of predicted and actual values.

Figure 11. Locally magnified curves of six key periods corresponding to the gray bar in Figure 10. (a) typical period on the first day; (b) typical period on the second day; (c) typical period on the third day; (d) typical period on the fourth day; (e) typical period on the fifth day; (f) typical period on the sixth day.

Figure 12. t-test results. The red line in T-Statistics diagram represents zero.

Figure 13. KS-test results. The red line in T-Statistics diagram represents zero. The red dash line in p-Values represents the significance threshold of 0.05.

Table 1. Parameter settings of the GA-AMODE-BiLSTM model.

	Parameters	Value
GA	Population size	24
GA	Maximum number of iterations	20
AMODE	Maximum number of iterations	100
AMODE	Scaling factor	0.5
BiLSTM	Learning rate searching range	[0.001, 0.01]
	Neurons searching range	[10, 50]
	Dropout rate	0.1
	Training epochs	50
	Optimizer	Adam

Table 2. Principal component loadings.

	Feature1	Feature2	Feature3	Feature4	Feature5	Feature6	Feature7
PC1	0.27437	0.42244	0.00122	−0.39567	0.45524	0.44279	0.42979
PC2	−0.10153	−0.59443	0.00659	0.49814	0.3589	0.34742	0.37233
PC3	0.19026	0.60255	−0.00728	0.76917	0.03312	0.03676	−0.08132

Table 3. Weights of original input features on output power.

Feature1	Feature2	Feature3	Feature4	Feature5	Feature6	Feature7
0.60085	0.23858	0.016	−0.30014	0.88298	0.85816	0.83878

Table 4. Loss function values corresponding to different training epochs.

Training Epoch	Loss Value of Training Set	Loss Value of Validation Set
20	0.0021	0.0020
30	0.0020	0.0013
40	0.0019	0.0014
45	0.0018	0.0011
50	0.0018	0.0011

Table 5. Comparison of evaluation indexes with different models.

Model	R²	RMSE/MW	MAE/MW
BP	0.956	3.416	1.77
LSTM	0.962	3.165	1.37
BiLSTM	0.978	2.261	0.85
GA-BiLSTM	0.987	1.732	0.686
AMODE-BiLSTM	0.985	1.817	0.763
GA-AMODE-BiLSTM	0.990	1.625	0.543

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Zhang, Z.; Xu, W.; Li, Y.; Niu, G. Short-Term Photovoltaic Power Forecasting Using a Bi-LSTM Neural Network Optimized by Hybrid Algorithms. Sustainability 2025, 17, 5277. https://doi.org/10.3390/su17125277

AMA Style

Wang J, Zhang Z, Xu W, Li Y, Niu G. Short-Term Photovoltaic Power Forecasting Using a Bi-LSTM Neural Network Optimized by Hybrid Algorithms. Sustainability. 2025; 17(12):5277. https://doi.org/10.3390/su17125277

Chicago/Turabian Style

Wang, Jibo, Zihao Zhang, Wenhao Xu, Yijin Li, and Geng Niu. 2025. "Short-Term Photovoltaic Power Forecasting Using a Bi-LSTM Neural Network Optimized by Hybrid Algorithms" Sustainability 17, no. 12: 5277. https://doi.org/10.3390/su17125277

APA Style

Wang, J., Zhang, Z., Xu, W., Li, Y., & Niu, G. (2025). Short-Term Photovoltaic Power Forecasting Using a Bi-LSTM Neural Network Optimized by Hybrid Algorithms. Sustainability, 17(12), 5277. https://doi.org/10.3390/su17125277

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Photovoltaic Power Forecasting Using a Bi-LSTM Neural Network Optimized by Hybrid Algorithms

Abstract

1. Introduction

2. Data Preprocessing

2.1. Handling of Missing Values

2.2. Min–Max Normalization

2.3. Principal Component Analysis

2.4. Time Series Processing

2.5. Gaussian Noise

3. Modeling of PV Power Forecasting

3.1. BiLSTM Neural Network

3.2. Genetic Algorithm

3.3. Adaptive Multi-Objective Differential Evolution

3.4. GA-AMODE-BiLSTM Model

3.5. Model Evaluation and Statistical Testing

3.5.1. Evaluation Indexes

3.5.2. Statistical Testing

4. Case Analysis

4.1. Simulation Experiment Conditions

4.2. Analysis of PCA Dimensionality Reduction

4.3. Model Simulation

4.3.1. Evaluation of Convergence

4.3.2. Forecasting Result Analysis

4.3.3. Analysis of Comparative Cases

4.4. Statistical Testing

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI