1.1. General Introduction
With the evolution of new power systems, the integration of diversified loads, high-penetration renewable energy, and the inherent stochasticity and volatility of renewables have significantly increased the complexity of load patterns. Power load forecasting is a technical framework that scientifically estimates future electricity demand (including power and energy consumption) for specific timepoints or periods within a power system. It leverages multi-source information such as historical consumption data, meteorological conditions, and socio-economic activities, employing mathematical models and intelligent algorithms [
1]. The core objectives are to optimize resource allocation, ensure grid security, and enhance energy efficiency by analyzing load variation patterns. Based on temporal scales, power load forecasting is categorized into long-term, medium-term, and short-term forecasting. Effective short-term power load forecasting (STLF) enables grid operators to proactively plan generation scheduling, rationally arrange reserve capacity, minimize economic losses caused by prediction deviations, and maximize the utilization of clean energy [
2].
In recent years, significant progress has been made in system-level and regional-level power load forecasting research. Artificial intelligence methodologies, particularly machine learning (ML) and deep learning (DL), have facilitated the development of predictive models such as Support Vector Machines (SVM) [
3], Random Forests (RF) [
4], and Artificial Neural Networks (ANN) [
5]. Nevertheless, ANN models exhibits inherent limitations: they struggle to capture intrinsic correlations within sequential data, often requiring manual extraction of temporal features from historical load data to establish input-output mappings.
Power load sequences are characterized by nonlinearity, non-stationarity, and dynamic evolution, meaning current outputs depend not only on immediate inputs but also on historical states. Manual feature engineering disrupts the temporal continuity of load data, while simplistic input-output mappings further compromise prediction accuracy.
1.2. Motivation and Gap
To address these challenges, DL algorithms have gained prominence in load forecasting. Sequential modeling techniques—including Recurrent Neural Networks (RNN) [
6], Convolutional Neural Networks (CNN) [
7], Long Short-Term Memory networks (LSTM) [
8], alongside emerging architectures like Transformers and Graph Neural Networks (GNN) [
9] have been progressively integrated into this domain. RNNs, with their cyclic structure enabling implicit state transmission across time steps, significantly enhance the modeling of temporal dependencies, making them particularly effective for processing dynamic power load sequences.
Reference [
10] explores machine learning applications in short-term load forecasting, emphasizing their superiority over traditional methods in handling nonlinear load patterns and complex factors like weather and socioeconomic conditions. The article highlights deep learning models, particularly LSTM networks, for capturing temporal dependencies, and discusses hybrid approaches combining ML with statistical techniques to enhance accuracy. Key challenges include data quality, model interpretability, and computational efficiency. Practical implementation insights focus on feature selection, hyperparameter tuning, and ensemble methods. The study underscores ML’s potential to achieve high forecasting precision while addressing real-world constraints like outlier sensitivity and non-stationary data. Emerging trends like attention mechanisms and optimization algorithms are noted for future research directions. Reference [
11] propose a short-term load forecasting model combining Temporal Convolutional Network with channel and temporal attention mechanisms to capture nonlinear relationships between meteorological factors and load data. The model employs Maximum Information Coefficient for feature selection and Fuzzy c-means with Dynamic Time Warping for clustering similar load patterns. Experimental results demonstrate enhanced accuracy and generalization compared to baseline methods, validating its effectiveness in handling complex load dynamics and improving grid operational efficiency. Reference [
12] propose a Transformer-based model for short-term load forecasting, leveraging its self-attention mechanism to capture long-range dependencies in load data. The model addresses limitations of traditional methods by effectively processing non-linear and high-dimensional load patterns. Experimental results demonstrate superior accuracy compared to conventional approaches, highlighting its robustness in handling complex temporal variations. The study underscores the Transformer’s potential for improving grid operational efficiency through enhanced predictive performance.
Reference [
13] propose a hybrid model combining convolutional and recurrent neural networks for short-term load forecasting, effectively capturing spatial-temporal patterns in load data. The model demonstrates superior accuracy compared to traditional methods by integrating CNN’s feature extraction with RNN’s sequential modeling. Reference [
14] propose a Bagging-enhanced XGBoost model for extreme weather identification and short-term load forecasting. The hybrid approach improves accuracy by integrating ensemble learning to handle weather-induced load volatility, outperforming traditional methods in robustness and predictive performance.
In recent years, emerging networks such as the Temporal Convolutional Network (TCN) and Graph Convolutional Network (GCN) have provided novel approaches for short-term load forecasting. TCN, as a temporal convolutional model, processes time series in parallel to mitigate gradient explosion issues in sequential models; GCN effectively captures spatial dependencies in power grid topologies by aggregating neighborhood information of nodes. Reference [
15] propose a hybrid model combining improved Temporal Convolutional Network and DenseNet for short-term load forecasting. The enhanced TCN captures long-term dependencies through dilated convolutions, while DenseNet extracts hierarchical features via dense connections. The model integrates meteorological and historical load data, optimizing feature fusion to improve accuracy. Experimental results demonstrate superior performance over traditional methods, achieving lower prediction errors and robust generalization across diverse load patterns. The study highlights the effectiveness of deep learning architectures in handling nonlinear load dynamics for grid management. Reference [
16] propose a GCN-LSTM hybrid model for short-term load forecasting in new-type power systems, integrating spatial and temporal features from multiple influencing factors like weather, regional topology, and historical load data. The model leverages GCN to capture non-Euclidean spatial correlations and LSTM to process temporal dependencies, enhancing prediction accuracy. Experimental results demonstrate superior performance over traditional methods, effectively addressing the challenges of diversified energy use and complex load dynamics in modern power systems.
Notably, existing hybrid load forecasting models—though advanced—still face three core limitations that hinder their performance in complex power grid scenarios: (1) Static spatial modeling: GCN-LSTM relies on fixed adjacency matrices to capture grid topology, failing to adapt to dynamic changes caused by equipment maintenance, load redistribution, or grid expansion; this static constraint leads to inaccurate spatial correlation modeling when load patterns shift. (2) Simplified temporal feature fusion: CNN-LSTM merely concatenates local features extracted by CNN and temporal features modeled by LSTM, ignoring the complementary nature of multi-scale temporal patterns (e.g., short-term load fluctuations from industrial shifts vs. long-term trends from daily consumption cycles); this results in suboptimal utilization of temporal information. (3) Rigid fusion mechanisms: TCN-DenseNet uses pre-defined static weights to integrate spatio-temporal features, unable to adjust feature contribution ratios according to real-time load characteristics (e.g., prioritizing spatial features during peak load periods with strong regional synergy); this rigidity reduces adaptability in diverse scenarios.
Zhou et al. [
17] propose a short-term multi-energy load forecasting method integrated with a Transformer-based spatio-temporal graph neural network (STGNN). The model leverages the Transformer’s self-attention mechanism to capture long-range temporal dependencies across multi-energy load sequences (e.g., electricity, heat, gas) and incorporates graph neural network (GNN) structures to model spatial correlations between energy supply-demand nodes. By fusing spatio-temporal features inherent in multi-energy systems, the method addresses the limitations of traditional models in handling cross-energy coupling and dynamic load interactions. Experimental validations demonstrate that the proposed framework outperforms baseline models (e.g., LSTM, GCN-LSTM) in prediction accuracy and robustness, especially in scenarios with high volatility of multi-energy loads, providing a reliable technical support for integrated energy system operation and scheduling.
Recent studies on power load forecasting have focused on enhancing model performance through hybrid architectures and targeted optimizations: Wan et al. [
18] proposed an attention-enhanced CNN-LSTM model for combined heat and power (CHP) short-term load forecasting; Liu et al. [
19] developed the AC-BiLSTM model to improve short-term load prediction accuracy; Wang et al. [
20] focused on minute-level ultra-short-term forecasting by leveraging time series data features; Wang et al. [
21] introduced an LSTM-Informer integrated with ensemble learning for long-term load forecasting. These works collectively advance load forecasting across different temporal scales (ultra-short-term to long-term) by integrating deep learning components like CNN, LSTM, BiLSTM, and Informer, with attention mechanisms or ensemble learning to enhance feature extraction and prediction robustness.
Our proposed GAT-CNN-LSTM model targets these limitations through three tailored improvements: First, we replace GCN with GAT to dynamically adjust node attention weights based on real-time feature similarity, enabling adaptive capture of spatial correlations and resolving static topology constraints. Second, we design a hierarchical temporal feature extraction mechanism—multi-scale CNN kernels decompose local load fluctuations (e.g., 15 min interval variations), while bidirectional LSTM models long-term temporal trends (e.g., 24 h cycles)—realizing more comprehensive temporal information mining than the simple concatenation in CNN-LSTM. Third, we introduce a gated fusion module that adaptively balances the contribution ratios of spatial (from GAT) and temporal (from CNN-LSTM) features, avoiding information redundancy or loss caused by the rigid weight allocation in TCN-DenseNet. These improvements ensure our model overcomes the key shortcomings of existing hybrid approaches.