1. Introduction
With the construction of smart grids [
1] and the large-scale integration of renewable energy [
2], the evolving power system is facing multiple challenges, including the continuous rise in electricity demand, the integration and utilization of renewable energy, and the urgent need to optimize system operation efficiency. Safe, reliable, and cost-effective power system operations are critical in global energy strategies. Accurate electricity demand forecasting plays a vital role in achieving these goals. The accuracy of power-load forecasting directly affects the economic efficiency and safety of power system dispatch decisions [
3]. High-precision load forecasting is essential for guaranteeing the safe operation of the grid, maintaining system stability, and enhancing operational benefits [
4]. Furthermore, accurate load forecasting can provide reliable data support for grid scheduling, effectively preventing potential risks of power shortages or surpluses, thereby ensuring voltage and frequency stability and improving power supply quality.
Power-load forecasting accuracy depends on various factors, including socioeconomic variables like population distribution and industrial energy efficiency, as well as environmental factors such as climate conditions [
3,
4]. Among these, complex and variable meteorological factors are key elements in load forecasting and have the greatest impact on prediction results [
5]. Considering meteorological factors in forecasting can improve prediction accuracy.
In recent years, the frequent occurrence of extreme weather events has induced non-stationary abrupt changes in load curves. Consequently, numerous researchers have been conducting in-depth studies on the impacts of meteorological factors on power load and the characterization of spatiotemporal relationships among factors influencing power load. Scientifically analyzing and capturing these spatiotemporal relationships is crucial for enhancing the accuracy and stability of load forecasting, as well as for providing rational scheduling strategies.
The complex working conditions faced by modern power dispatch pose higher requirements for load forecasting. To address this, academia and industry have continuously improved the accuracy and robustness of forecasting models through diverse methodological innovations. The accuracy of load forecasting is determined by data quality, model performance, and load characteristics, and these interrelated factors have a synergistic effect on the prediction results. Traditional artificial intelligence methods have limitations in handling massive time-series data and high-dimensional multi-scale features. Their inherent gradient issues and overfitting phenomena can reduce the reliability of power forecasting [
6]. Deep neural networks (such as CNN, LSTM, GRU, Transformer, etc.) have gradually become a new research hotspot in the field of improving power-load forecasting accuracy [
7]. Li et al. [
8] proposed a novel deep learning-based forecasting (DLSF) method, utilizing a CNN network model to accurately cluster input data and demonstrating its excellent performance in both accuracy and efficiency. Muzaffar et al. [
9] applied LSTM networks to power-load forecasting and compared them with traditional methods, demonstrating the significant potential of LSTM in improving forecasting accuracy. L’Heureux et al. [
10] optimized the model workflow, designed an N-dimensional space transformation layer, and innovated context feature extraction methods, ultimately developing an improved Transformer architecture specifically for load forecasting. However, it should be noted that a single intelligent forecasting model has inherent limitations. Traditional time-series analysis methods can lead to information loss when the input sequence is too long [
11], Based on this, Graves [
12] proposed the bidirectional LSTM network, which can not only capture the temporal patterns of historical load data (such as daily periodic fluctuations) but also reversely learn the potential impact of future moments on the current state (such as pre-load adjustments before holidays), thus extracting temporal dependency features more comprehensively. Additionally, Graph Convolutional Networks (GCN) show marked superiority over conventional spatial feature extraction techniques, owing to their distinctive topological modeling proficiency. Chen et al. [
13] used a combination of GCN and LSTM to extract spatiotemporal features from large-scale data for power-load forecasting and validated the effectiveness of the model. Model specificity leads to boundary constraints on the engineering applicability of single models, making it urgent to develop adaptive hybrid modeling methods [
14].
Model ensemble-based forecasting methods provide an effective way to overcome the limitations of single models. This theoretical framework was first established by Bates and Granger in 1969, and their pioneering work laid the theoretical foundation for the subsequent development of combined forecasting techniques, continuously advancing research in this field [
15]. In power-load forecasting research, accurately capturing spatiotemporal features is key to improving prediction accuracy. Ma et al. [
16] proposed a novel combined model that integrates Convolutional Neural Networks (CNN) with Long Short-Term Memory (LSTM) networks, effectively extracting spatiotemporal features from the data and incorporating an attention mechanism to improve the accuracy of power-load forecasting. Bouktif et al. [
17] proposed combining RNN networks with LSTM networks and incorporating genetic algorithms (GA) and particle swarm optimization to improve the prediction accuracy of the model.
On the other hand, existing power-load forecasting models still suffer from the following limitations: (1) Inadequate capture of the spatiotemporal characteristics of data, which may degrade the model’s prediction accuracy. (2) A lack of adaptive mechanisms for dynamically allocating weights to base learners, resulting in a significant decline in prediction robustness during periods of load fluctuation (e.g., extreme weather events). (3) Challenges in quantifying the impact of model uncertainty on prediction outcomes. Although researchers have applied various algorithms to mitigate these drawbacks, gaps persist in the existing literature.
To solve this, this paper introduces a GCN-BiLSTM hybrid model within the AdaBoost framework, further enhanced through Bayesian optimization:
(1) Construct a graph structure with meteorological factors as nodes, where each node represents different meteorological elements such as temperature and humidity, and use the GCN layer to extract nonlinear spatial correlation features among these factors.
(2) The BiLSTM network is responsible for learning the temporal dynamic characteristics of load and meteorological factors.
(3) A Bayesian optimization method based on Markov Chain Monte Carlo (MCMC) sampling is used to achieve probabilistic hyperparameter tuning and uncertainty quantification of prediction results.
(4) An Adaboost ensemble mechanism is introduced to dynamically adjust the weights of multiple GCN-BiLSTM weak learners based on uncertainty obtained from Monte Carlo sampling, with a focus on enhancing prediction robustness under extreme weather events.
Finally, by comparing the GCN-BiLSTM-AB model with single BiLSTM, GCN-BiLSTM, and GCN-BiLSTM-Adaboost models, the proposed method is validated in terms of prediction accuracy, computational efficiency, and reliability of probabilistic outputs, providing a more reliable basis for power system scheduling decisions.
The structure of this paper is organized as follows:
Section 2 introduces the GCN-BiLSTM-AB hybrid model, which integrates Graph Convolutional Networks (GCN) for spatial feature extraction, Bidirectional Long Short-Term Memory (BiLSTM) networks for temporal learning, and a Bayesian-optimized AdaBoost framework.
Section 3 elaborates on the experimental setup, including data preprocessing procedures and evaluation metrics.
Section 4 assesses the model’s performance against baseline models using quantitative indicators (mean absolute error [
], mean absolute percentage error [
], root mean square error [
]) and robustness tests under extreme load conditions; it also includes an analysis of the graph construction method based on the Spearman correlation coefficient.
Section 5 summarizes the key findings, discusses their practical implications for power grid operations, and elaborates on the limitations of the proposed model as well as potential future improvement strategies.
2. Method
2.1. Model Framework
The power-load forecasting research framework proposed in this paper involves a specific process, as shown in
Figure 1. The model uses GCN to capture spatial dependencies and BiLSTM to extract long-term temporal dependencies in power system features. The GCN-BiLSTM model serves as the base learner, where AdaBoost integrates 10 such weak learners by dynamically adjusting their weights based on prediction errors. A Bayesian approach is introduced to quantify prediction uncertainty through Monte Carlo sampling, which simultaneously prevents overfitting. The weights of weak learners are further refined according to their uncertainty estimates. Ultimately, the weighted aggregation of all 10 learners’ predictions yields the final output, significantly enhancing both predictive stability and accuracy.
The research framework for short-term power-load forecasting involves a specific process, including data preprocessing and GCN-BiLSTM-Adaboost (with the introduction of Bayesian methods), as shown in
Figure 1:
Figure 1.
Model flow chart.
Figure 1.
Model flow chart.
(1) Data preprocessing:
The input dataset comprises 9 feature columns, including load value, humidity, temperature, wind speed, atmospheric pressure, precipitation, visibility, water vapor pressure, and perceived temperature.
For handling missing values, the mean value of the respective column containing the missing data is employed for imputation.
Subsequently, the data are normalized, and a sliding window (with a 24-h window size) is constructed to predict the load value for the subsequent hour.
(2) Feature correlation analysis and adjacency matrix construction:
The Spearman correlation coefficient is employed to analyze the correlation between input features in the dataset. Within the adjacency matrix of the GCN, a correlation coefficient exceeding 0.8 is defined as an edge (i.e., a connection weight of 1) between two features, while all other correlations are assigned a weight of 0. This approach is used to construct the adjacency matrix.
(3) Prediction and model evaluation:
GCN-BiLSTM is used to capture the spatial and temporal dependency relationships in the dataset.
Dropout is used to randomly discard some neurons during the training phase to prevent model overfitting.
Adaboost is utilized to integrate the results of multiple GCN-BiLSTM weak predictors, with the weight of each weak predictor adjusted based on its prediction error and uncertainty.
Bayesian methods are realized via Monte Carlo Dropout to estimate the uncertainty of prediction outcomes.
2.2. GCN Method
The core idea of Graph Convolutional Networks (GCN) is to aggregate information from nodes and their neighboring nodes through graph convolution operations [
18]. In this study, the nodes within the GCN layer represent the 8-dimensional feature vectors corresponding to meteorological factors at each time step, while the edges between nodes denote the connectivity relationships among these features. The primary objective of the GCN is to extract the interdependencies between features in the dataset.
Graph data can be represented as
. In this case, X represents the set of nodes, E represents the set of edges, and A is the adjacency matrix. This paper determines the connection relationships between nodes in GCN by calculating the correlation relationships between input features [
19], the detailed analysis process is presented in
Section 4.1. Since there is no obvious normal distribution relationship or linear correlation between the features, the Spearman correlation coefficient is chosen to analyze the relationships between the features [
20].
Assuming all negative correlation coefficients are taken as absolute values, a Spearman correlation coefficient absolute value less than 0.8 is typically interpreted as indicating weak or no correlation between two variables. Conversely, when the absolute value of the Spearman correlation coefficient is greater than or equal to 0.8, a connection relationship exists between the two nodes. The value of Spearman’s correlation coefficient is shown in
Figure 2.
The GCN layer can be represented by the following equation:
where
is the input feature matrix,
is the weight matrix,
is the adjacency matrix with self-loops added,
is the degree matrix,
is the nonlinear activation function.
GCN enhances spatial feature representation and can better capture the complex relationships between meteorological factors. The output of the GCN is passed to BiLSTM (Bidirectional Long Short-Term Memory) to capture the long-term dependencies in the time series. The structure of GCN is shown in
Figure 3.
2.3. BiLSTM Method
The cited paper [
21] mentions that Long Short-Term Memory (LSTM) was proposed by Hochreiter S and Schmidhuber J. By introducing gating mechanisms (input gate, forget gate, and output gate), LSTM addresses the vanishing gradient problem of traditional RNNs [
22]. The core formula is as follows:
(1) The forget gate determines what information needs to be forgotten from the cell state
at time
, as shown in Equation (
2) The forget gate reads the hidden layer state
and input sequence
at time
t, and outputs a value between 0 and 1, where 1 means retaining all information, and 0 means discarding all information.
where
is the forgetting gate state at time
t;
and
are the remaining relics, respectively, forget the weight and bias of the gate;
is the bipolar sigmoid activation function.
(2) The input gate reads the input
at time
t and decides the information to be saved in the neurons. Then, the tanh layer generates a temporary state for the memory unit at time
t. Finally, the cell state is updated to obtain the new cell state
, as shown in Equations (3)–(5).
where
is the input gate state at time t, and the control
is passed to
amount of information;
and
are the weight and bias of the input gate, respectively.
and
are the weight matrix and bias term of the cell state, respectively. tanh is a double curved tangent activation function, ⊗ is the Hadamard product.
(3) The output gate can choose important information to be output from the current state. The sigmoid layer first decides which part of the neuron state needs to be output. The neuron state to be output passes through the tanh layer and is multiplied by the output of the sigmoid layer to obtain the output value
, which is also the input value for the next hidden layer, as shown in Equations (6) and (7).
where
is the output gate state at time
t,
and
are the weight matrix and bias term for the output gate, respectively.
While unidirectional LSTM effectively leverages historical load data, it is important to note that it does not consider future information in its predictions.
Bidirectional Long Short-Term Memory (BiLSTM) network differs from traditional LSTM in that it simultaneously takes into account both the forward and backward information of the sequence, enabling a more comprehensive capture of long-term dependencies in the time series. The BiLSTM structure is shown in
Figure 5.
In this paper, BiLSTM can effectively capture long-distance dependencies between time steps, improving prediction accuracy. The final output is the concatenation of the forward and backward LSTM outputs:
where
is the output of the forward LSTM, and
is the output of the backward LSTM.
The LSTM unit represents the computation process of the unidirectional LSTM network. is the forward LSTM hidden state at time step t. is the backward LSTM hidden state at time step t. and are the weight matrix and bias term, respectively. represent the input data at each time step . represent the output data at each time step .
2.4. AdaBoost Algorithm
Adaboost (Adaptive Boosting) is an ensemble learning method that builds a strong predictor by combining multiple weak predictors. The fundamental principle of Adaboost lies in iteratively training weak learners and modifying sample weights according to each learner’s performance, enabling subsequent learners to concentrate on previously misclassified instances. Ultimately, Adaboost integrates the results of all weak predictors through weighted voting to produce the final prediction results [
23,
24].
In this study, the GCN-BiLSTM hybrid model acts as a weak learner. The AdaBoost framework is introduced to integrate 10 such weak learners. Based on prediction results, it dynamically adjusts the sample weights within the weak learners and the weights of the weak learners themselves. This enhances the accuracy and robustness of the final predictions. The specific steps are as follows:
Step 1: Initialize sample weights, with each sample assigned an equal initial weight, , where m is the number of training samples (1200), and i refers to the i-th sample.
Step 2: Train the
k-th weak learner using the current sample weights
, and for each sample
i, calculate the current prediction error
.
where
is the true value,
is the predicted value.
Step 3: Calculate the weight adjustment factor weight_sum and accumulate the weights of samples with large errors:
Calculate the weight of the weak learner
using an exponential decay formula: The weight of the current weak learner is calculated, which is related to its performance in training, and weak learners with good performance will be assigned higher weights.
Step 4: Update sample weights. Samples with large prediction errors (
) will be assigned higher weights to receive more attention in the next iteration.
For other samples, keep the weights unchanged:
Normalize the sample weights so that the sum of the sample weights is 1:
The weights of weak learners do not need to be normalized here, because a Bayesian method will be introduced in the subsequent testing phase to further adjust the weights of weak learners, and then the normalization of weak learner weights will be performed.
The structure of the algorithm is shown in
Figure 6.
It is worth emphasizing that the AdaBoost algorithm employed in this study has been modified from the traditional AdaBoost framework, with the specific improvements outlined as follows:
(1) The proposed framework directly models spatiotemporal features using GCN-BiLSTM, which enables more effective capture of complex patterns compared to the decision trees or shallow models utilized in traditional AdaBoost.
(2) Unlike the traditional AdaBoost algorithm, which adjusts the weights of all misclassified samples uniformly, this study only increases the weights of samples with prediction errors exceeding 0.3. This modification allows subsequent weak learners to focus on learning critical hard-to-predict samples (e.g., peak electricity consumption periods, load mutations induced by extreme weather) while omitting weight adjustments for samples with minor errors. This avoids excessive model sensitivity to noise or non-significant fluctuations. It should be noted that the threshold value of 0.3 should be adjusted based on error tolerance criteria and data characteristics in practical applications.
(3) The introduction of an “uncertainty” mechanism for further refinement of weak learner weights will be elaborated in
Section 2.5.
In summary, these modifications to the traditional Adaboost algorithm enhance the model’s sensitivity to sudden changes and periodic variations in load, while mitigating overfitting during stable load periods.
2.5. Bayesian Method
Owing to data uncertainty (i.e., errors induced by inherent noise in the dataset) and model uncertainty (i.e., uncertainty stemming from insufficient training data or inadequate model interpretability of data patterns), this study incorporates Bayesian methods to quantify prediction uncertainty. It should be noted that the “prediction uncertainty” in this study specifically refers to the variance of predictions from each GCN-BiLSTM weak learner.
The Bayesian method is a probability-based statistical inference technique that calculates the posterior distribution for prediction and uncertainty estimation by incorporating prior distributions and observational data [
25]. In this study, the Bayesian method is realized using Monte Carlo Dropout. Monte Carlo Dropout is an approximation to Bayesian inference that estimates the uncertainty of predictions by repeatedly applying Dropout during the testing phase to simulate the posterior distribution of the model parameters [
26].
Specifically, the Bayesian approach is implemented in two distinct phases. During the training phase, Dropout is employed to randomly deactivate neurons in each forward pass, thereby mitigating overfitting. In the testing phase, each GCN-BiLSTM weak learner generates 100 stochastic predictions
through Monte Carlo Dropout sampling, from which we compute both the mean prediction
and variance—the latter serving as the quantitative uncertainty measure for the weak learner.
Each weak learner adjusts its training-derived weight
(computed during the AdaBoost training phase) based on its prediction uncertainty, as specified in Equation (
18). Concretely, weak learners demonstrating higher predictive uncertainty undergo weight attenuation, whereas those with more certain predictions obtain magnified weighting. (High uncertainty means high variance. This shows the weak learner’s predictions are unstable, so we reduce its weight in the ensemble).
To ensure the weights sum to unity, a normalization step is applied to the weights
as shown in Equation (
19), yielding the final weak learner weights
.
The final prediction is obtained by computing the weighted sum of the weak learners’ predictions.
The advantages of introducing the Bayesian method are as follows:
(1) Quantifying uncertainty, improving model reliability.
(2) Introducing uncertainty weighting in AdaBoost, enhancing the stability of final predictions.
(3) Obtaining a more reasonable predictive distribution by calculating the mean and variance through multiple samplings.
4. Case Study
In this section, the advantages of the Spearman correlation coefficient-based graph construction method are verified by comparing it with several alternative graph construction methods, including k-nearest neighbors, learned graphs, and mutual information. Additionally, the effectiveness of the proposed model is validated from the following two perspectives:
(1) The prediction accuracy of the proposed model is compared with that of baseline models (LSTM, GRU, CNN-LSTM, GCN-LSTM, and CNN-BiLSTM) over one-day and one-week forecasting horizons. The results demonstrate its distinct advantage in prediction accuracy.
(2) By comparing the load forecasting performance of BiLSTM, GCN-BiLSTM, GCN-BiLSTM-Adaboost, and the proposed model over one-day and one-week horizons, the contributions of AdaBoost integration and Bayesian methods to enhancing the model’s prediction stability are validated.
The aforementioned forecasting performance is evaluated using three metrics: Mean Absolute Error (), Mean Absolute Percentage Error (), and Root Mean Square Error ().
4.1. Comparison of Methods for Constructing Graph Structures
There are many methods for constructing graphic structures [
27], like k-nearest neighbors(KNN) [
28], learned graphs [
29], mutual information [
30] and so on. This paper compares the one-week prediction results of the Spearman correlation coefficient with several graph construction methods, including k-nearest neighbors, learned graphs, and mutual information, as shown in
Table 2, verifying the superiority of using the Spearman correlation coefficient for graph construction.
As can be seen from
Table 2, compared with KNN, the graph construction method using Spearman correlation coefficient has improved
,
, and
by 0.08, 0.14%, and 0.12, respectively. Compared with Learned Graphs,
,
, and
have increased by 0.15, 0.29%, and 0.22, respectively. Compared with Mutual Information,
,
, and
have increased by 0.08, 0.13%, and 0.11, respectively.
The k-nearest neighbors (KNN) method connects features solely based on the numerical similarity of meteorological data, which may easily introduce spurious correlations. In contrast, the Spearman correlation coefficient identifies genuine correlations among meteorological features through statistical relevance, and the resulting graph structure is more consistent with actual physical laws. This enables it to more accurately reflect the combined impact of meteorological factors on load.
Learned graphs require large volumes of data and complex training processes, making them prone to overfitting. By comparison, the Spearman-based approach directly constructs graph structures based on the inherent correlations of meteorological factors, eliminating the need for training. It features simple computation and yields stable, reliable results.
Mutual information is sensitive to data distribution and sample size, often leading to estimation biases in small datasets. In contrast, the Spearman correlation coefficient is computed via simple rank statistics and can reliably capture monotonic relationships, regardless of data distribution characteristics.
4.2. Hyperparameter Settings
The proposed GCN-BiLSTM-AB-based power-load forecasting model was implemented and optimized using the PyTorch framework in Python. All experiments in this study were conducted on a workstation equipped with an NVIDIA GeForce RTX 4060Ti GPU (Nvidia Corporation, Santa Clara, CA, USA), a 13th Generation Intel® Core™ i7 CPU, and 32 GB of RAM (Intel Corporation, Santa Clara, CA, USA). The Adam optimization algorithm was employed for network parameter updates, with an initial learning rate set to 0.001. The deep learning model was trained over 1800 epochs.
The detailed structural parameters of the model are presented in
Table 3. The input layer of the model accepts raw time-series data consisting of 24 time steps and 8 feature dimensions (24 × 8) for target prediction. Graph convolutional networks (GCN) are utilized for spatial feature extraction, while bidirectional long short-term memory (BiLSTM) networks are employed for temporal feature extraction. During model training, a Dropout rate of 0.2 was applied to mitigate overfitting. Finally, the feature sequence is transformed into a 1 × 512 vector, and AdaBoost integration is used to dynamically adjust weights, ultimately outputting the power-load forecast for the subsequent hour. The hyperparameter design of the aforementioned model is configured to fit the provided dataset and ensure the model’s prediction accuracy.
4.3. A Comparative Analysis with the Traditional Model
Figure 7 and
Figure 8 show the comparison of forecasting results between the proposed model and traditional power-load forecasting models, while
Figure 9 and
Figure 10 present the comparison of forecasting errors. The forecasting results are presented with hourly intervals.
From
Figure 7 and
Figure 8, it can be seen that the GCN-BiLSTM-AB model outperforms traditional power-load forecasting models in terms of prediction accuracy, whether forecasting daily load or the load for the upcoming week.
The results in
Figure 9 and
Figure 10 allow for a quantitative description of the forecasting performance. Based on
measurements, compared with LSTM, GRU, CNN-LSTM, GCN-LSTM, and CNN-BiLSTM, the prediction accuracy improved by 2.41 and 1.27 for one day, 0.92 and 0.87 for one week, 1.39 and 0.44, and 1.40 and 1.65, respectively. Based on
measurements, compared with LSTM, GRU, CNN-LSTM, GCN-LSTM, and CNN-BiLSTM, the prediction accuracy improved by 3.63% and 2.80%, 1.42% and 1.73%, 2.35% and 0.95%, and 2.35% and 3.55%, respectively. Based on
measurements, compared with LSTM, GRU, CNN-LSTM, GCN-LSTM, and CNN-BiLSTM, the prediction accuracy improved by 2.8370 and 1.73, 1.16 and 1.27, 1.62 and 0.59, and 1.83 and 2.18, respectively.
Therefore, the comparison results show that GCN-BiLSTM-AB can better capture the spatiotemporal features of the input data, leading to more accurate predictions.
4.4. Model Prediction Robustness Analysis
To verify the effectiveness of using AdaBoost for model integration and incorporating Bayesian methods to enhance prediction robustness, the proposed model is compared with BiLSTM, GCN-BiLSTM, and GCN-BiLSTM-Adaboost models.
Figure 11 shows the one-day forecasting result comparison between the proposed model and the BiLSTM, GCN-BiLSTM, and GCN-BiLSTM-Adaboost models.
Table 4 presents the forecasting errors, and
Figure 12 illustrates the one-week forecasting results of the proposed model.
As illustrated in
Figure 11, during periods of daily load abrupt changes, other models exhibit degraded forecasting performance. In contrast, the GCN-BiLSTM-AB model retains relatively accurate predictions even at load turning points.
It can also be observed that as the model evolves from the standalone BiLSTM, to GCN-BiLSTM, then to GCN-BiLSTM-Adaboost, and finally to the integration of Bayesian methods into GCN-BiLSTM-Adaboost, the three types of prediction errors consistently decrease. This trend indicates a gradual improvement in the model’s forecasting performance.
Figure 12 shows the weekly power-load forecasting results of the proposed model, as well as the prediction uncertainty within the 95% confidence interval. The model proposed in this study takes 5 min and 56 s to run. It can be seen that the weekly power-load forecasts align well with the actual values, enabling relatively accurate power-load prediction.
This confirms that the AdaBoost ensemble algorithm can dynamically adjust the weights of weak predictors, while the Bayesian method—implemented via Monte Carlo Dropout—effectively mitigates overfitting and quantifies prediction uncertainty. By incorporating uncertainty-based weighting into AdaBoost, the stability of final predictions is enhanced, enabling the model to retain accuracy even during abrupt power-load changes induced by events such as extreme weather.
The 95% confidence interval in the
Figure 12 is the prediction uncertainty range generated by Monte Carlo Dropout sampling. Its statistical significance is that under repeated sampling conditions, the true value has a 95% probability of falling within this interval range. By visualizing the confidence interval, the model not only provides point predictions but also presents risk quantification indicators.
5. Conclusions
This paper proposes a GCN-BiLSTM-based power-load forecasting model, integrating models through the AdaBoost algorithm and incorporating Bayesian methods to enhance the robustness of the model’s predictions. The key findings of this study are:
(1) The BiLSTM network excels at extracting temporal dependency features, and the GCN network uniquely models topological relationships. The integrated GCN-BiLSTM network efficiently captures temporal and spatial features from input data, enhancing prediction accuracy.
(2) The Bayesian method is a probabilistic statistical inference approach capable of quantifying uncertainties in data and models. In this paper, the Bayesian method is implemented through Monte Carlo Dropout, which randomly drops neurons during the prediction process to prevent overfitting. Simultaneously, multiple sampling is used to calculate means and variances, yielding a more reasonable predictive distribution.
(3) The Adaboost algorithm uses GCN-BiLSTM as the weak learner, sets thresholds to dynamically adjust sample weights, incorporates “uncertainty” into the final weighted fusion of weak learners, and reduces the weights of predictions with higher uncertainty. These improvements not only retain the core idea of the Adaboost algorithm that gradually focuses on hard samples but also achieve more accurate weight allocation according to the characteristics of power-load forecasting, improving the prediction accuracy for key scenarios while enhancing the overall robustness of the model.
(4) The Spearman correlation coefficient balances efficiency (fast calculation), robustness (noise resistance), and interpretability (clear physical meaning) in power-load forecasting, and is particularly suitable for constructing graph structures based on statistical priors in multivariate time-series forecasting. However, KNN, Learned Graphs, and Mutual Information have limited performance in practical applications due to issues such as overly strong assumptions, complex calculations, and poor stability. Compared with the other three methods, the prediction results obtained by the graph construction method using the Spearman correlation coefficient show an average overall improvement in , , and by 0.10, 0.19%, and 0.15, respectively.
The research results show that, compared to traditional power-load forecasting models, the proposed GCN-BiLSTM-AB hybrid network effectively captures spatiotemporal features of data, achieving more accurate predictions. Additionally, the network exhibits high robustness and generalization capabilities during sudden power-load fluctuations, ensuring stable predictions even in scenarios such as extreme weather events causing abrupt load changes. The predictive capabilities of GCN-BiLSTM-AB can assist power dispatch in making scientific and reasonable decisions, thereby reducing costs, improving efficiency, and enhancing the overall stability of the power system.
While the proposed GCN-BiLSTM-AB demonstrates superior performance, several limitations warrant further investigation: (1) The static graph construction may not capture dynamic feature interactions during extreme events; (2) Computational overhead increases linearly with the number of weak predictors (K = 10) and Monte Carlo samples (100).
Regarding the above limitations, the future work that can be done is as follows:
(1) Introduce dynamic graph neural networks (DGNN) or temporal attention mechanisms to enable the adjacency matrix to be adaptively adjusted over time.
(2) Replacing MC dropout with Bayesian neural networks (BNNs) via Laplace approximation, which provides uncertainty estimates without iterative sampling;
(3) Implementing surrogate models (e.g., Gaussian processes) to approximate the posterior distribution of weak predictors, reducing the need for direct sampling;
(4) Adopting quantization-aware training to compress the ensemble model for edge deployment.
Additionally, multi-modal data fusion (e.g., grid topology maps) will be explored to enhance robustness under complex weather conditions.