A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration

Dou, Xun; Yang, Ruiang; Dou, Zhenlan; Zhang, Chunyan; Xu, Chen; Li, Jiacheng

doi:10.3390/su17188162

Open AccessArticle

A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration

by

Xun Dou

¹

,

Ruiang Yang

¹,

Zhenlan Dou

²,

Chunyan Zhang

²,

Chen Xu

³ and

Jiacheng Li

^1,*

¹

College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing 211816, China

²

State Grid Shanghai Municipal Electric Power Company, Shanghai 200122, China

³

State Grid Integrated Energy Service Group Co., Ltd., Beijing 100052, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(18), 8162; https://doi.org/10.3390/su17188162

Submission received: 31 July 2025 / Revised: 4 September 2025 / Accepted: 8 September 2025 / Published: 10 September 2025

(This article belongs to the Special Issue Energy Conservation Towards a Low-Carbon and Sustainability Future)

Download

Browse Figures

Versions Notes

Abstract

With the advancement of new power system construction, thermostatically controlled loads represented by regional air conditioning systems are being extensively integrated into the grid, leading to a surge in the number of user nodes. This large-scale integration of new loads creates challenges for the grid, as the resulting load data exhibits strong periodicity and randomness over time. These characteristics are influenced by factors like temperature and user behavior. At the same time, spatially adjacent nodes show similarities and clustering in electricity usage. This creates complex spatiotemporal coupling features. These complex spatiotemporal characteristics challenge traditional forecasting methods. Their high model complexity and numerous parameters often lead to overfitting or the curse of dimensionality, which hinders both prediction accuracy and efficiency. To address this issue, this paper proposes a load forecasting method based on spatiotemporal partitioning and collaborative cross-regional attention. First, a spatiotemporal similarity matrix is constructed using the Shape Dynamic Time Warping (ShapeDTW) algorithm and an adaptive Gaussian kernel function based on the Haversine distance. Spectral clustering combined with the Gap Statistic criterion is then applied to adaptively determine the optimal number of partitions, dividing all load nodes in the power grid into several sub-regions with homogeneous spatiotemporal characteristics. Second, for each sub-region, a local Spatiotemporal Graph Convolutional Network (STGCN) model is built. By integrating gated temporal convolution with spatial feature extraction, the model accurately captures the spatiotemporal evolution patterns within each sub-region. On this basis, a cross-regional attention mechanism is designed to dynamically learn the correlation weights among sub-regions, enabling collaborative fusion of global features. Finally, the proposed method is evaluated on a multi-node load dataset. The effectiveness of the approach is validated through comparative experiments and ablation studies (that is, by removing key components of the model to evaluate their contribution to the overall performance). Experimental results demonstrate that the proposed method achieves excellent performance in short-term load forecasting tasks across multiple nodes.

Keywords:

multi-node power load; spatiotemporal similarity matrix; spatiotemporal partitioning; cross-regional attention mechanism

1. Introduction

In recent years, with the transformation of energy consumption structures and the deepening of power market reforms, the demand for refined power load forecasting has become increasingly prominent [1]. Especially against the backdrop of accelerated urbanization, the interaction between the physical topology of the power grid and diverse user electricity consumption behaviors has become increasingly close. The correlation of loads is not only reflected in the physical adjacency determined by the grid topology but also in the behavioral dependencies driven by similar lifestyles and work patterns. At the same time, these consumption behaviors are further influenced by factors such as temperature, forming dynamically evolving patterns. The interweaving of spatial correlations in the physical structure and temporal dependencies in behavior gives rise to complex spatiotemporal coupling characteristics [2]. Traditional methods are mostly limited to time series modeling for individual nodes, making it difficult to effectively capture the physical spatial correlations determined by the grid topology or the temporal dependencies driven by similar consumption behaviors [3,4]. This multidimensional characteristic poses new challenges for forecasting approaches.

Currently, research efforts targeting power system load forecasting can primarily be categorized into three types: statistical methods, machine learning methods, and deep learning methods [4]. Statistical methods include Auto-Regressive analysis [5], Exponential Smoothing Models [6], Kalman Filtering [7], Multiple Linear Regression [8], and Auto-Regressive Integrated Moving Average [9], among others. These methods have the advantage of effectively utilizing the linear correlations in time series data and feature relatively low computational complexity, making them suitable for handling load data with clear linear trends and seasonal characteristics [10]. In contrast, load forecasting methods based on machine learning and deep learning have been widely applied due to their effectiveness in capturing the complex nonlinear characteristics of power load data [11]. Ref. [12] proposes a short-term wind power forecasting model for large-scale multi-wind-farm systems based on enhanced global temporal and spatial feature extraction, which effectively addresses the issue of power fluctuations from grid-connected wind farms under conditions of high absolute prediction errors, thereby improving the stability of the power system. Ref. [13] proposes a short-term load spatiotemporal forecasting method based on Graph Neural Networks (GNN), which addresses the limitation in existing load forecasting studies where data dependencies are insufficiently captured due to the reliance on sequential input under Euclidean space assumptions. Ref. [14] proposes a novel short-term load forecasting method for new power systems based on the combination of Graph Convolutional Networks (GCN) and Long Short-Term Memory networks (LSTM), which takes multiple factors into account. This method effectively addresses the challenge of capturing both temporal dynamics and nonlinear relationships in multi-node load data. Ref. [15] proposed a dual-channel learner for the SpatioTemporal Graph Neural Network (STGNN) model to capture the complementary information of heterogeneous features, learning the unique contributions of different temporal components and combining dynamic traffic signals with spatiotemporal embeddings in a data-driven manner, enabling the model to dynamically extract dependencies in specific spatiotemporal contexts. However, when applying neural network models to large-scale multi-node load forecasting, the expansion of the number of nodes rapidly increases both the computational complexity and the risk of overfitting. It also intensifies the difficulty of effectively extracting spatiotemporal features, ultimately leading to a significant decline in forecasting performance.

To enhance the feasibility and effectiveness of multi-node joint forecasting, clustering-based node partitioning methods have become the mainstream solution for multi-node load forecasting. Ref. [16] proposes an integrated energy node clustering method based on multi-temporal-scale convolutional kernels, which groups similar load nodes to address the challenge of capturing complex patterns in load data across different time scales. Ref. [17] introduced a short-term load forecasting approach tailored to electricity market conditions was presented, which integrates real-time electricity prices. This approach adeptly captures the correlation between electricity prices and load demands, leading to enhanced forecasting accuracy. Ref. [18] presents a short-term wind power forecasting method based on Numerical Weather Prediction (NWP) data using a Sequence-to-Sequence (Seq2Seq) model. This approach applies the K-means algorithm to cluster loads, followed by individual forecasting using the Seq2Seq model. Ref. [19] proposed a Graph Convolutional Recurrent Neural Network for short-term residential load forecasting, demonstrating the effectiveness of graph-based models in capturing both temporal and spatial dependencies. Similarly, various hybrid models combining GCN, LSTM, and Spatiotemporal Graph Convolutional Networks (STGCN) have been applied to multi-node load forecasting, achieving better performance compared to traditional statistical methods. Beyond single-region modeling, researchers have recognized the importance of cross-region collaboration in large-scale load forecasting. Ref. [20] proposed a collaborative energy management framework for interconnected regional integrated energy systems, explicitly considering spatiotemporal dependencies. Ref. [21] further introduced a multi-space collaboration framework for optimal model selection in load forecasting, showing that collaborative strategies can enhance the robustness of the entire system. Ref. [22] demonstrated that collaborative forecasting of multiple energy loads in integrated systems based on feature extraction and deep learning can significantly improve predictive accuracy and reliability. Although the above-mentioned studies achieve grouped forecasting for multi-node loads through clustering methods, their partitioning strategies often consider only a single temporal scale or static spatial relationships, making it difficult to fully capture the complex spatiotemporal coupling characteristics. Moreover, the lack of effective cross-regional interaction mechanisms causes each partition to become an isolated information island, failing to capture the correlations between the divided regions.

To address the challenges in large-scale multi-node load forecasting, such as excessively high model complexity and the risk of overfitting caused by a rapid increase in the number of nodes, as well as insufficient consideration of temporal characteristics and spatial correlations in existing regional partitioning strategies and the lack of effective inter-regional association mechanisms after partitioning, this paper proposes a load forecasting method based on spatiotemporal partitioning and collaborative cross-regional attention. The innovative contributions of this study are mainly reflected in the following two aspects:

An improved spectral clustering partitioning method based on spatiotemporal comprehensive evaluation is constructed. By integrating the ShapeDTW algorithm with an adaptive spatial kernel function based on the Haversine distance, a comprehensive evaluation matrix is formed, which can simultaneously capture the similarity in load time series patterns and the geographical relationships among nodes. This evaluation matrix serves as the basis for spectral clustering, and the optimal number of partitions is adaptively determined by incorporating the Gap Statistic criterion. The proposed method provides a more discriminative partitioning foundation for multi-node load forecasting, significantly improving both partition quality and overall forecasting performance.
A regional collaborative forecasting method based on STGCN is constructed. A local STGCN is deployed in each sub-region to precisely extract intra-regional features. The cross-regional attention mechanism is then designed to achieve collaborative fusion of global information. By balancing the adaptability of regional models with the interaction of global features, the proposed method effectively improves the overall accuracy and stability of short-term load forecasting for multi-node systems.

2. Methodology

2.1. Overall Framework

In multi-regional power load forecasting tasks, traditional methods typically adopt a unified global modeling approach to capture the spatiotemporal correlations among all nodes. However, such methods have two notable limitations. First, as the regional scale increases, model complexity rises sharply, making it difficult to capture both global patterns and local details. Second, loads in different regions often exhibit significant spatial heterogeneity and temporal variability, which a single model cannot adequately represent. Moreover, simple regional partitioning methods that consider only spatial proximity neglect the temporal correlations among load sequences, potentially leading to partition results that do not align with the actual spatiotemporal evolution patterns. To overcome the aforementioned limitations, this paper proposes a load forecasting framework based on spatiotemporal partitioning and collaborative cross-regional attention, as illustrated in Figure 1. First, a comprehensive evaluation index is constructed based on temporal similarity and spatial proximity to achieve regional partitioning that reflects the spatiotemporal characteristics of the load. Second, the STGCN model is applied within each region to extract local features. At the same time, a cross-regional attention mechanism is introduced to learn global dependencies among regions, thereby alleviating the problem of information isolation between regional blocks. Finally, comprehensive comparative and ablation experiments are conducted on the load dataset. A thorough evaluation system is established using the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and coefficient of determination (R²) to systematically validate the effectiveness of the proposed method. This method effectively improves the forecasting accuracy of multi-node loads in complex power grids by combining local and global spatiotemporal feature learning, while also reducing the complexity of the graph structure.

The framework is composed of three sequential modules that systematically process spatiotemporal data to achieve high-accuracy load forecasting. The first module is the Regional Division Module based on Spatiotemporal Similarity, which is responsible for preprocessing the raw multi-node load data and partitioning the network. This module first constructs a comprehensive spatiotemporal similarity matrix based on time-series shape similarity (via the ShapeDTW algorithm) and geospatial proximity (via the Haversine distance). Then, it utilizes spectral clustering and the Gap Statistic method to objectively determine the optimal number of partitions, dividing the entire network into several sub-regions with highly homogeneous internal characteristics.

The second module is the Partition Collaborative Prediction Module, which serves as the core of the framework. Each partitioned sub-region, conceptualized as a “space-time extension map,” is fed into a dedicated SpatioTemporal Graph Convolutional Network (STGCN) to extract its local spatiotemporal features. Simultaneously, to address the “information island” problem caused by partitioning, this module designs a cross-regional attention mechanism to learn the correlation weights between different partitions, thereby enabling the collaborative fusion of global features.

The third module is the prediction results and evaluation indexes Module, used for the final display and validation of the model’s performance. Through the “Forecasting Results” chart, this module intuitively compares the fit of the prediction curves from the proposed method and various baseline models against the actual load curve. At the same time, the “Evaluation Indexes” section lists the mathematical formulas for three standard metrics used to quantify the model’s accuracy—namely, MAE, RMSE, and R²—to validate the effectiveness of the entire framework in a rigorous and transparent manner.

2.2. Regional Division Module Based on Spatiotemporal Similarity

2.2.1. Load Time Series Correlation Analysis Method Based on ShapeDTW

The traditional Dynamic Time Warping (DTW) algorithm is capable of finding the globally optimal alignment path between two time series. However, its main limitation lies in the insufficient consideration of local shape characteristics of the time series. As a result, the matching outcomes may sometimes lack practical interpretability. To address this issue, the ShapeDTW algorithm introduces local shape descriptions and incorporates structural information from the neighborhood of each time point into the dynamic programming process. Since power load sequences typically exhibit clear local characteristics, ShapeDTW is capable of capturing fine-grained patterns that are difficult for traditional DTW to handle. This allows for a more accurate representation of the true dynamic similarity, ultimately providing a solid basis for identifying nodes with similar consumption behaviors and performing reliable regional partitioning.

ShapeDTW defines a univariate time series

T = {(t_{1}, t_{2}, \dots, t_{L})}^{T}, T \in ℝ^{L}

, ShapeDTW encodes the structural information in the neighborhood of time point

t_{i}

using a shape descriptor

d_{i} \in ℝ^{m}

. In this way, the original real-valued sequence

T = {(t_{1}, t_{2}, \dots, t_{L})}^{T}

is transformed into a sequence of shape descriptors

d = {(d_{1}, d_{2}, \dots, d_{L})}^{T}, d \in ℝ^{L \times m}

with the same length. ShapeDTW then aligns the transformed multivariate shape descriptor sequence

d

using DTW. Finally, the alignment path of the shape descriptors is transferred to the alignment path of the original time series.

Given two univariate time series

P = {(p_{1}, p_{2}, \dots, p_{L_{P}})}^{T}

, where

P \in ℝ^{L_{P}}

, and

Q = {(q_{1}, q_{2}, \dots, q_{L_{Q}})}^{T}

, where

Q \in ℝ^{L_{Q}}

, let

D^{P} = {(d_{1}^{P}, d_{2}^{P}, \dots, d_{L_{P}}^{P})}^{T}

with

d_{i}^{P} \in ℝ^{m}

and

D^{P} \in ℝ^{L_{P} \times m}

, and

D^{Q} = {(d_{1}^{Q}, d_{2}^{Q}, \dots, d_{L_{Q}}^{Q})}^{T}

with

d_{i}^{Q} \in ℝ^{m}

and

D^{Q} \in ℝ^{L_{Q} \times m}

denote their corresponding shape descriptor sequences. The ShapeDTW alignment is equivalent to solving the following optimization problem:

D_{S h a p e D T W} = \arg \min_{{\tilde{w}}^{P} \in {0, 1}^{L \times L_{P}}, {\tilde{w}}^{Q} \in {0, 1}^{L \times L_{Q}}} ∥ {\tilde{w}}^{P} \cdot d^{P} - {\tilde{w}}^{Q} \cdot d^{Q} ∥_{1, 2}

(1)

where

{\tilde{w}}^{P}

and

{\tilde{w}}^{Q}

are warping matrices for

d^{P}

and

d^{Q}

, respectively, and

∥ {\tilde{w}}^{P} \cdot d^{P} - {\tilde{w}}^{Q} \cdot d^{Q} ∥_{1, 2}

denotes the

l_{1} / l_{2}

matrix norm. This multivariate time series alignment problem can be efficiently solved via dynamic programming, with a time complexity of

O (L_{P} \times L_{Q})

. The corresponding temporal correlation matrix is defined as:

R_{t} (i, j) = 1 - \frac{D_{S h a p e D T W} (P_{i}, Q_{j})}{\max (D_{S h a p e D T W})}

(2)

2.2.2. Spatial Proximity

In power load forecasting, spatial proximity reflects the degree of geographical association between different nodes. This paper constructs a spatial proximity scoring matrix based on the latitude and longitude coordinates of nodes using a Gaussian kernel function, in order to quantify the spatial correlation among nodes.

Given two nodes i and j in a power system, with latitude and longitude coordinates

({lat}_{i}, {lon}_{i})

and

({lat}_{j}, {lon}_{j})

, respectively. The spatial proximity score

R_{s} (i, j)

is calculated using the Haversine formula to compute the great-circle distance between the nodes, which takes into account the curvature of the Earth and is suitable for distance calculation based on latitude and longitude coordinates:

D_{geo} (i, j) = 2 r \arcsin (\sqrt{\sin^{2} (\frac{{lat}_{j} - {lat}_{i}}{2}) + \cos ({lat}_{i}) \cos ({lat}_{j}) \sin^{2} (\frac{{lon}_{j} - {lon}_{i}}{2})})

(3)

where r is the radius of the Earth (default value 6371 km), and lat and lon are expressed in radians.

To avoid manually setting the bandwidth parameter, the median of the distances between nodes is used as the adaptive bandwidth σ_s, enhancing the adaptability of the model:

σ_{s} = median (\{D_{geo} (i, j) | \forall i, j \in V, i \neq j\})

(4)

where V denotes the set of all nodes;

D_{geo} (i, j)

is the Haversine distance between node i and node j.

The spatial proximity matrix is finally obtained by mapping the geographical distances using a Gaussian kernel function:

R_{s} (i, j) = \exp (- \frac{D_{geo} {(i, j)}^{2}}{2 σ_{s}^{2}})

(5)

The spatial proximity score provides a reliable measure of spatial association for subsequent spatiotemporal joint clustering, and together with the temporal analysis based on ShapeDTW, forms a multi-dimensional evaluation basis for regional partitioning.

2.2.3. Spectral Clustering

Spectral clustering, as a nonlinear clustering method based on graph theory, is capable of effectively handling complex spatiotemporal correlation patterns in power load forecasting. Traditional clustering algorithms such as K-means exhibit clear limitations when dealing with non-convex distributed data, whereas spectral clustering can uncover the underlying manifold structure of the data by projecting it into a feature space. In this paper, the spectral clustering method is employed to integrate the temporal similarity extracted by the ShapeDTW algorithm with the spatial proximity based on geographical coordinates, thereby achieving a rational regional partitioning of power load nodes. Specifically, the Gaussian kernel function is first used to transform the spatiotemporal similarity measures into edge weights in the graph, constructing a joint spatiotemporal similarity matrix. Then, eigen-decomposition is performed on the normalized graph Laplacian matrix, and the eigenvectors corresponding to the K smallest eigenvalues are selected to form a low-dimensional embedding space. Finally, K-means clustering is applied in this low-dimensional space to obtain the final regional partitioning results [23].

Given N nodes in a power system, define the joint spatiotemporal similarity matrix

W \in ℝ^{N \times N}

:

W (i, j) = λ \cdot R_{t} (i, j) + (1 - λ) \cdot R_{s} (i, j)

(6)

where

λ

is the weight coefficient for the temporal–spatial combination, set to 0.7 in this case.

R_{t} (i, j)

is the temporal correlation matrix;

R_{s} (i, j)

is the spatial proximity matrix.

In spectral clustering, the construction of the objective function is a core step, which measures the similarity relationships among samples by optimizing graph partitioning criteria. The schematic diagram of spectral clustering is shown in Figure 2. In this paper, a graph theory-based approach is adopted to quantify the correlation strength among load nodes by constructing a joint spatiotemporal similarity matrix. The design of the objective function follows the normalized cut criterion, aiming to maximize intra-cluster similarity while minimizing inter-cluster connections. Its mathematical formulation is as follows:

\frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \sum_{t = 1}^{k} W (i, j) {(h_{i t} - h_{j t})}^{2}

(7)

where

H \in ℝ^{n \times k}

, and

h_{i t}

is the element located at the

i - th

row and

t - th

column of matrix H;

W (i, j)

is the joint spatiotemporal similarity matrix defined in this paper. Let D be a diagonal matrix with its diagonal elements being the average similarity values between nodes,

d_{k} = \frac{1}{n} \sum_{i = 1}^{n} W (i, k)

. The above equation satisfies the constraint condition

n H^{T} D H = I_{k}

. Let

F = \sqrt{n} D^{1 / 2} H

, transforming the above equation into the following matrix form:

\frac{1}{n} t r (F^{T} L F)

(8)

where

L \in ℝ^{n \times n}

is the Laplacian matrix, satisfying

L = I_{n} - D^{- 1 / 2} W D^{- 1 / 2}

, and F satisfies

F^{T} F = I_{k}

. The optimal solution for the above equation corresponds to the eigenvectors of the smallest k eigenvalues of L. Finally, the clustering indicator matrix is obtained as

H = \frac{1}{\sqrt{n}} D^{- 1 / 2} F

, and standard k-means clustering algorithm is applied on H to obtain the final clustering results.

2.2.4. Optimal Partitioning with Gap Statistic

In spectral clustering, the choice of the preset number of clusters (k) directly affects the rationality of the partitioning. The traditional Elbow Method relies on subjective observation, whereas this paper adopts the Gap Statistic method, which objectively determines the optimal number of partitions by comparing with a statistical reference distribution.

The definition of Gap Statistic is:

{Gap}_{n} (k) = E_{n}^{*} (\log (d_{k})) - \log (d_{k})

(9)

The

E_{n}^{*} (\log (d_{k}))

here represents the expectation of

\log (d_{k})

. The expected value of the eigenvalues is estimated by generating random sample sets of the same size as the original dataset and performing clustering analysis on them. Specifically, by averaging the eigenvalues of the Laplacian matrix for these random sample sets, an approximate value of the expected eigenvalue can be obtained. Subsequently, by calculating the Gap Statistic and determining the k value corresponding to its maximum value, the optimal number of clusters can be determined [24,25]. This method uses the distribution of eigenvalues from random samples to evaluate the clustering structure of the original data, thereby providing a basis for selecting an appropriate number of clusters. The basic idea of the Gap Statistic is:

E_{n}^{*} (\log d_{k}) = (\frac{1}{B}) \sum_{b = 1}^{B} \log (d_{k b}^{*})

(10)

Here, B is the number of sampling times, and

d_{k b}^{*}

is the

k - t h

eigenvalue of the

b - t h

random sample. The optimal k value is selected using the following formula:

w = (\frac{1}{B}) \sum_{b = 1}^{B} \log (d_{k b}^{*})

(11)

s_{k} = \sqrt{\frac{1 + B}{B}} \sqrt{(\frac{1}{B} \sum_{b = 1}^{B} \log {(d_{k b}^{*} - w)}^{2})}

(12)

The smallest k value that satisfies the equation is selected as the optimal number of clusters.

{Gap}_{k} \geq {Gap}_{k + 1} - s_{k + 1}

(13)

2.3. Regional Collaborative Prediction Model Architecture

2.3.1. Subgraph Construction Based on Spatiotemporal Characteristics

Power load forecasting is a complex modeling problem with spatiotemporal coupling characteristics. Due to the evident spatial aggregation features of power consumption behavior, load nodes that are geographically close exhibit spatial correlations through both the physical connections in the power grid topology and the similarity in user electricity consumption patterns. This spatiotemporal coupling effect means that load forecasting must consider both the dynamic evolution over time and the spatial interdependencies. In this paper, multi-load forecasting within a region is modeled as a spatiotemporal graph node regression problem: each load node within the region is treated as a basic spatial node, with its historical load time series serving as node features. The edge weights of the graph are jointly determined by the power grid topological relationships and spatiotemporal similarities between nodes.

Based on the load node partitioning results obtained from the spatiotemporal clustering in the previous section, this section constructs a subgraph tailored to the characteristics of the power system, as shown in Figure 3. This section constructs a subgraph oriented towards the characteristics of power systems. The graph structure within each subregion uses

G = (V, E)

to achieve physical-data fusion representation, where the node set

V = \{v_{1}, v_{2}, \dots, v_{n}\}

inherits the spatiotemporally similar nodes from the clustering division, and n is the number of spatiotemporally similar nodes within the subregion. The edge set

E = \{e_{1}, e_{2}, \dots, e_{m}\}

is generated based on the power grid topology structure, representing the collection of edges between similar nodes, with m being the number of edges.

Figure 3 visually demonstrates the data modeling process of transforming the real-world, geographically embedded power grid into a standardized graph structure suitable for the STGCN model. The left panel shows the 20 load nodes and their physical topological connections overlaid on a geographical map of Chongqing. The right panel shows the abstracted graph structure,

G = (V, E)

, which removes the geographical context to represent the pure topological relationships. This abstract graph serves as the direct input for the model, enabling it to learn and capture the spatial dependencies between nodes.

2.3.2. Adjacency Matrix Generation

In power load forecasting, accurately modeling spatial dependencies between nodes is crucial for improving prediction accuracy. Traditional adjacency matrix construction methods often consider only a single spatial distance factor, neglecting the unique spatiotemporal coupling characteristics of load data. The propagation of power load is simultaneously influenced by both the physical topology of the power grid and the similarity in user electricity consumption patterns: geographically close nodes may exhibit similar load patterns due to overlapping power supply radii, while nodes that are farther apart but have similar consumption behaviors can also show correlations. Such complex spatial dependencies cannot be accurately captured by a single distance metric.

Therefore, for each node within a subregion, we construct a weighted adjacency matrix that reflects the spatiotemporal dependencies of the region by comprehensively considering both spatial proximity and temporal similarity of load patterns. This matrix not only incorporates geographical distance information between nodes but also integrates the similarity features of load curves quantified using the DTW algorithm. The elements of the intra-region adjacency matrix can be expressed as:

A_{i j}^{l o c a l} = \{\begin{array}{l} \exp (- \frac{D_{geo} {(i, j)}^{2}}{2 σ_{s}^{2}}) \cdot \exp (- γ D_{ShapeDTW}^{2} (i, j)), & if j \in N_{k} (i) \\ 0, & otherwise \end{array}

(14)

In the formula,

D_{geo} (i, j)

represents the Haversine geographical distance between nodes i and j;

σ_{s}

is the self-adaptive bandwidth;

D_{ShapeDTW}

is the time-series distance of load curves calculated using the ShapeDTW algorithm;

γ

is the scaling factor, which is set to 1 in this case. Simultaneously, the adjacency matrix undergoes sparsification processing:

N_{k} (i)

denotes the k-nearest neighbor set of node i, based on the combined distance

D_{joint} = λ D_{g e o} + (1 - λ) D_{S h a p e D T W}

. Additionally, the weights of non-neighbor nodes are forcibly set to zero to ensure the sparsity of the matrix.

Global spatial associations between different subregions are dynamically learned through a cross-region attention mechanism, avoiding information loss caused by traditional fixed connection methods. The inter-region connection matrix can be expressed as:

A_{m n}^{global} = softmax (\frac{(h_{m} W_{Q}) {(h_{n} W_{K})}^{⊤}}{\sqrt{d}})

(15)

Among them,

h_{m} W_{Q}

is the query vector of the attention mechanism

Q_{m}

, which represents the feature of subregion m;

h_{n} W_{K}

is the key vector of the attention mechanism

K_{n}

, which represents the feature of subregion n;

h_{m}

is the spatiotemporal feature of subregion m; and d is the feature dimension.

2.3.3. STGCN-Attention Hybrid Model

The STGCN is specifically designed for handling load forecasting problems with spatiotemporal dependencies by integrating GCN and Temporal Convolutional Networks (TCN). Among these, the GCN layer is responsible for capturing spatial dependencies, while the TCN layer is tasked with extracting temporal patterns. The schematic of the STGCN principle is shown in Figure 4.

The GCN layer captures spatial dependencies through graph convolution operations:

H_{spatial} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H W_{G})

(16)

In the formula,

\tilde{A} = A + I

represents the adjacency matrix with added self-connections,

\tilde{D}

is the corresponding degree matrix, and

W_{G}

is the trainable parameter. The TCN layer utilizes dilated causal convolutions to extract multi-scale temporal features:

H_{temporal} = \tanh (W_{1} * X) ⊙ σ (W_{2} * X)

(17)

Among them, ∗ represents the dilated convolution operation, ⊙ is the Hadamard product, and the gating mechanism effectively controls information flow.

However, traditional STGCN faces two major challenges when dealing with large-scale power grids: one is that the number of edges in the graph structure grows quadratically with the number of nodes, leading to a sharp increase in model training complexity; the other is that uniform connection weights fail to reflect the actual spatial correlation strength between regions. To address these issues, this paper proposes a hierarchical spatiotemporal graph convolution method, which solves the aforementioned problems through regional modeling and global feature enhancement based on an attention mechanism.

Mathematical Expression of the Hierarchical Spatiotemporal Graph Convolution Layer:

H^{l + 1} = σ (A^{local} H^{l} W_{l} + A^{global} H^{l} W_{g})

(18)

Here,

H^{l}

is the node feature matrix of the

l - t h

layer;

A^{local}

is the intra-partition adjacency matrix;

A^{global}

is the inter-partition attention connection matrix;

W_{l}

and

W_{g}

are the learnable parameter matrices for local and global feature transformations, respectively;

σ (\cdot)

is the nonlinear activation function ReLU. Among them,

A^{local} H^{l} W_{l}

ensures that the topological connections between nodes within a partition and the similarity of regional load fluctuations strictly adhere to the rules;

A^{global} H^{l} W_{g}

dynamically learns the potential long-range dependencies between partitions through an attention mechanism, effectively compensating for the shortcomings of fixed formula matrices in modeling complex spatial correlations. The attention mechanism diagram is shown in Figure 5.

3. Examples Analysis

3.1. Experimental Setup

The sample data in this study were obtained from Chongqing, China. We collected the load data of 17,700 building air-conditioning users (including commercial and residential) across the city from 2 September to 8 September 2024, with a sampling interval of 15 min. To facilitate modeling at a regional level, this extensive user data was aggregated into 20 nodes, each representing a regional power supply substation. Therefore, each of the 20 nodes studied in this paper represents the aggregated air conditioning load of a specific region, containing 672 data entries per node. The specific regional node distribution is shown in Figure 6. To evaluate the performance of the proposed method in load forecasting tasks, load power is selected as the sole input feature for the prediction model (without including meteorological or other features). To ensure effective model learning and meet both optimization and evaluation requirements, the dataset is split into training, validation, and test sets in a ratio of 7:1:2, aiming to improve the stability and reliability of the predictions.

The proposed model is built using the Python 3.11 environment and the PyTorch 2.7.0 framework, and runs on hardware equipped with a Core(TM) i5-13500HX CPU (2.50 GHz) and 8 GB of memory. The entire model training time is controlled within 30 min. Since there is currently no well-established theoretical guidance for hyperparameter selection, this paper sets the initial parameters empirically; they were fine-tuned based on model performance during experimentation. After multiple trials, the selected hyperparameters are summarized in Table 1.

3.2. Evaluating Indicator

In this paper, the evaluation metrics for the model include the MAE, RMSE, and R². Smaller values of MAE and RMSE, and an R² value closer to 1 indicate a smaller deviation between the predicted and actual values, implying higher prediction accuracy of the model. The mathematical expressions are shown in Equations (18)–(20).

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - y_{i}^{'} |

(19)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - y_{i}^{'})}^{2}}

(20)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - y_{i}^{'})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(21)

In the formula, n represents the number of data points,

y_{i}

represents the actual value,

y_{i}^{'}

represents the predicted value, and

\bar{y_{i}}

represents the average of the actual values.

3.3. Analysis of Regional Partitioning Rationality

To address the challenges of high graph structural complexity and difficulty in modeling spatiotemporal feature coupling in large-area, multi-node load forecasting, this paper proposes a load forecasting method based on collaborative spatiotemporal partitioning and cross-region attention mechanisms. First, the ShapeDTW algorithm is employed to calculate the temporal similarity matrix of user load curves, accurately capturing the similarity in consumption patterns among different users, especially with strong identification capability for phase shifts and local fluctuation patterns. Second, an adaptive Gaussian kernel function based on Haversine distance is constructed to model spatial proximity by incorporating user geographical locations. Then, spectral clustering is applied to perform load regional partitioning based on the weighted fusion matrix of the spatiotemporal similarity matrix. This approach not only effectively reduces the complexity of the entire graph structure but also ensures that users within the same partition share highly similar spatiotemporal characteristics. This provides a structurally rational and feature-consistent data foundation for the subsequent partition-based STGCN forecasting model.

This paper applies spectral clustering combined with spatiotemporal correlation analysis to perform spatiotemporal regional partitioning of electricity users in Chongqing. The Gap Statistic method is used for optimal partition decision-making, where the Gap value and standard deviation S_k are calculated to determine the optimal number of partitions. The results of the Gap Statistic are shown in Table 2.

The Gap statistic in the table reflects the difference in log-dispersion between the actual data and the reference distribution, with larger values indicating higher significance of the clustering structure. The S_k index, on the other hand, characterizes the stability of the Gap value. A smaller S_k indicates lower dispersion in the reference distribution, ensuring that the comparison results based on Gap are statistically significant. As shown in the table, when k = 4, the system simultaneously satisfies the following two optimal conditions: (1) Gap(k) reaches the global maximum of 0.7977; (2) The corresponding S_k value achieves the minimum of 0.1443. This indicates that dividing the power consumption areas into four regions not only maximizes the spatiotemporal similarity within regions but also ensures the stability of the partitioning result. Figure 7 shows the optimal partitioning result when k = 4.

To systematically evaluate the effectiveness of the spatiotemporal coupled partitioning method proposed in this paper, a comparative experiment is designed as follows: Traditional spectral clustering (considering only spatial distance) and the regional partitioning method proposed in this paper (integrating ShapeDTW temporal similarity and Haversine spatial distance) are, respectively, employed. The number of partitions is set to K∈{3,4,5}, and prediction performance across multiple scenarios is compared using the forecasting model.

Table 3 presents a comparative analysis of the air conditioning load forecasting performance for typical nodes in commercial and office areas. The load curve exhibits a typical “double-peak and double-valley” daily cycle pattern (morning peak: 09:00–13:30, evening peak: 15:00–19:30). Experimental results demonstrate that the proposed Spatiotemporal Spectral Clustering (STSC) method achieves significant performance improvement under the optimal partition number k = 4. The RMSE, MAE, and R² are 831.629 kW, 605.7206 kW, and 0.9899, respectively. Compared to the traditional spectral clustering (SC) method, RMSE is reduced by 46.0%, MAE by 53.5%, and R² is improved by 2.16%. Moreover, the STSC method consistently demonstrates superior performance across all tested partition numbers (k = 3,4,5). Specifically, when k = 3 and k = 5, the RMSE is reduced by 48.9% and 55.2%, respectively, while MAE is reduced by 54.8% and 52.6%, respectively. As shown in Figure 8, during the peak load periods with the most intense fluctuations, the STSC method with k = 4 still maintains excellent prediction accuracy. This optimal partitioning result aligns with the previous Gap Statistic analysis, where the Gap value reached its maximum of 0.7977 and the S_k value reached its minimum of 0.1443 at k = 4. This consistency validates the effectiveness of the spatiotemporal correlation-based partitioning method proposed in this paper.

Figure 9 illustrates the comparison of prediction performance for 20 nodes under different partitioning strategies. In this three-dimensional line chart, the x-axis represents node numbers (1–20), the y-axis shows the R² score, and the z-axis compares the proposed method with the traditional spectral clustering method across different numbers of partitions. Specific results are presented in Table 4: Across all partition numbers, the R² curves of the proposed method (represented by red, green, and yellow lines) are significantly higher than those of the traditional spectral clustering method (represented by gray, blue, and purple lines), with an average improvement of 3.142%. When k = 4, the R² curve of the proposed method exhibits the best stability characteristics and compared to the proposed method with 3 or 5 partitions, it shows an average improvement of 0.698% and 0.641%, respectively. These experimental data fully demonstrate that the proposed method not only achieves optimal overall performance at the optimal partition number (k = 4) but also maintains stable prediction accuracy across various nodes. Meanwhile, the complete data of the 20 regions’ indicators have been placed in Table A1 in Appendix A.

3.4. Prediction Performance Comparison

To systematically evaluate the performance advantages of the spatiotemporal partitioning and global feature enhancement-based STGCN model proposed in this paper for multi-region short-term load forecasting, comparative and ablation experiments were designed for both the regional partitioning method and the global feature-enhanced prediction model. The results of the ablation study and the performance of various prediction methods on evaluation metrics are shown in Table 5 and Figure 10. Table 5 and Figure 10 present a comparative analysis of the load forecasting performance for Node 1. As shown in the table, the proposed method achieves the best performance. This result demonstrates that both the spatiotemporal partitioning and the global attention mechanism are effective at improving forecasting accuracy in multi-region tasks. Specifically, after introducing the STSC method, the RMSE and MAE metrics improved by 27.75% and 32.71%, respectively; after incorporating the global attention mechanism, the RMSE and MAE metrics improved by 29.56% and 33.83%, respectively. The STGCN model with global feature enhancement through cross-region attention mechanisms outperforms the single STGCN model in forecasting tasks, validating the effectiveness of the global feature enhancement strategy applied to partitioned loads. In addition, compared with traditional single forecasting models for multi-region short-term load forecasting, the proposed method demonstrates significant improvements in RMSE and MAE metrics, achieving average improvements of 53.66% and 57.69%, respectively, further confirming the overall effectiveness of the proposed method.

To comprehensively evaluate the consistency of the model’s prediction performance across individual nodes, this paper conducts comparative and ablation experiments on 20 load nodes, with evaluation metrics shown in Figure 11. Meanwhile, the R² indicator data for each region has been placed in Table A2 in Appendix A. The proposed method maintains stable R² performance across all nodes, with an R² range of [0.9635, 0.9920]. The standard deviation of R² across regions is only 0.0081, which is one-fourth that of the STGCN method. Even for Node 18, where the prediction performance is the lowest, the proposed method still outperforms all comparison methods. Meanwhile, removing the spatiotemporal partitioning mechanism leads to a performance drop of more than 8% in Region 2/17, and disabling the attention mechanism increases the fluctuation by 37% in Region 7/19. These results fully demonstrate that the combination of the spatiotemporal partitioning strategy and the cross-region attention mechanism enables the proposed method to achieve better spatial uniformity and stability, providing a reliable solution for load forecasting in complex power grid environments.

4. Conclusions

To overcome the limitations of traditional models in regional multi-node short-term load forecasting, where a single global graph structure is adopted, this paper proposes a load forecasting method based on spatiotemporal partitioning and cross-region attention mechanisms. Such traditional approaches fail to accurately capture the spatiotemporal heterogeneity across different regions and neglect inter-regional dependencies in load patterns, leading to information isolation and prediction distortion between regions. Through a case study involving 20 air conditioning load nodes in Chongqing, the effectiveness of the proposed method is verified, and the following conclusions are drawn:

An integrated similarity matrix is constructed by combining the temporal correlation matrix based on ShapeDTW and the spatial proximity matrix, which, together with spectral clustering and Gap Statistic for adaptive determination of the optimal number of partitions, enables the division of load nodes into multiple sub-regions with high spatiotemporal similarity. This approach transforms a single complex graph structure into several simpler ones, effectively reducing graph complexity while enhancing regional homogeneity. Comparative ablation experiments validate the effectiveness of the method. The spectral clustering model based on the spatiotemporal partitioning strategy achieves an average improvement of 3.142% in the R² metric across 20 nodes compared to the traditional spectral clustering model.
The synergistic effect of partitioned STGCN and cross-region attention mechanisms effectively addresses the issue of “information islands” and insufficient inter-regional dependencies, significantly improving both prediction accuracy and stability. Verified through comparative ablation experiments, the proposed method achieves an average improvement of 3.563% in the R² metric across 20 nodes compared to traditional models. The proposed model demonstrates high balance and stability, with an R² distribution range of [0.9635, 0.9920] and a standard deviation only one-fourth that of the traditional STGCN model. The regional partitioning strategy effectively mitigates the negative impact of complex graph structures on local prediction accuracy, while the cross-region attention module suppresses error propagation across regions, ensuring balanced and stable prediction performance across all nodes.

Author Contributions

Conceptualization, X.D., R.Y. and J.L.; methodology, R.Y.; software, X.D., R.Y. and J.L.; validation, Z.D., C.Z. and C.X.; formal analysis, X.D. and R.Y.; investigation, Z.D., C.Z. and C.X.; data curation, X.D. and R.Y.; writing—original draft preparation, X.D. and R.Y.; writing—review and editing, C.Z., C.X. and J.L.; visualization, X.D., R.Y. and J.L.; project administration, X.D. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by the Science and Technology Projects from the State Grid Corporation of China (Research and application on multi-temporal adjustable load of normal marketization participating in grid interactive technology, No.:5400-202317586A-3-2-ZN).

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to the State Grid Corporation of China.

Conflicts of Interest

Author Zhenlan Dou was employed by the company State Grid Shanghai Municipal Electric Power Company. Author Chunyan Zhang was employed by the company State Grid Shanghai Municipal Electric Power Company. Author Chen Xu was employed by the company State Grid Integrated Energy Service Group Co. LTD. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from the Science and Technology Projects from the State Grid Corporation of China (Research and application on multi-temporal adjustable load of normal marketization participating in grid in-teractive technology, No.:5400-202317586A-3-2-ZN). The funder had the following involvement with the study: “provided the research data”.

Appendix A

Table A1. R² Metric for Validating Partition Rationality.

Region	SC (3 Regions)	STSC (3 Regions)	SC (4 Regions)	Proposed: STSC (4 Regions)	SC (5 Regions)	STSC (5 Regions)
1	0.9550	0.9601	0.9517	0.9899	0.9674	0.9810
2	0.9638	0.9844	0.9640	0.9920	0.9532	0.9882
3	0.9737	0.9849	0.9520	0.9884	0.9609	0.9905
4	0.9108	0.9473	0.9746	0.9884	0.9766	0.9761
5	0.9693	0.9883	0.9629	0.9745	0.9551	0.9707
6	0.9511	0.9870	0.9671	0.9881	0.9333	0.9332
7	0.9448	0.9911	0.9426	0.9914	0.9368	0.9851
8	0.9483	0.9682	0.9541	0.9786	0.9373	0.9890
9	0.9385	0.9895	0.9529	0.9878	0.9861	0.9778
10	0.9608	0.9819	0.9380	0.9804	0.9639	0.9721
11	0.9667	0.9523	0.9421	0.9879	0.9760	0.9785
12	0.9691	0.9927	0.9298	0.9699	0.9421	0.9786
13	0.9067	0.9823	0.9533	0.9898	0.9048	0.9913
14	0.9695	0.9783	0.9835	0.9863	0.9672	0.9634
15	0.9427	0.9795	0.9332	0.9783	0.8657	0.9898
16	0.9424	0.9793	0.9459	0.9775	0.9416	0.9907
17	0.8996	0.9818	0.9180	0.9915	0.8983	0.9307
18	0.9475	0.9875	0.9452	0.9635	0.9334	0.9797
19	0.9611	0.9357	0.9690	0.9888	0.9454	0.9879
20	0.9726	0.9798	0.9766	0.9793	0.9579	0.9886

Table A2. R² Metric for Ablation Study.

Region	Proposed	STSC-GCN-Attention	STSC-STGCN	STSC-GCN	STSC-Informer	STSC-TCN	STGCN	GCN	Informer	TCN
1	0.9899	0.9758	0.9800	0.9665	0.9692	0.9637	0.9553	0.9500	0.9378	0.9395
2	0.9920	0.9899	0.9245	0.9658	0.9517	0.9587	0.9700	0.9394	0.9558	0.8878
3	0.9884	0.9688	0.9440	0.8344	0.9477	0.9558	0.8629	0.9600	0.9669	0.9098
4	0.9884	0.9799	0.9202	0.9417	0.8714	0.9300	0.8940	0.9498	0.9744	0.8966
5	0.9745	0.9445	0.9237	0.8963	0.9608	0.9522	0.9592	0.9565	0.9575	0.8868
6	0.9881	0.9535	0.9457	0.9325	0.9186	0.9249	0.9496	0.9762	0.9433	0.9161
7	0.9914	0.9781	0.9815	0.9811	0.9495	0.9367	0.9905	0.9053	0.9571	0.9820
8	0.9786	0.9888	0.9617	0.9840	0.9427	0.9587	0.9685	0.9013	0.9485	0.9817
9	0.9878	0.9814	0.9798	0.9890	0.9392	0.9673	0.9627	0.9346	0.9505	0.9795
10	0.9804	0.9786	0.9735	0.9756	0.9689	0.9763	0.9695	0.9177	0.9600	0.9767
11	0.9879	0.9878	0.9798	0.9642	0.9835	0.9779	0.9539	0.9253	0.8903	0.9208
12	0.9699	0.9720	0.9823	0.9846	0.9822	0.9762	0.9800	0.9201	0.8688	0.9658
13	0.9898	0.9884	0.9530	0.9909	0.9566	0.9781	0.9890	0.9148	0.9016	0.9752
14	0.9863	0.9798	0.9751	0.9280	0.9828	0.9753	0.9145	0.9123	0.9072	0.9225
15	0.9783	0.9763	0.9248	0.9662	0.9650	0.9470	0.9621	0.9317	0.8932	0.9462
16	0.9775	0.9783	0.9039	0.9762	0.9759	0.9329	0.9747	0.8960	0.9652	0.9477
17	0.9915	0.9354	0.9187	0.9751	0.9717	0.9576	0.9629	0.9221	0.9325	0.9630
18	0.9635	0.9593	0.9264	0.9519	0.9697	0.9485	0.9316	0.9239	0.9196	0.9143
19	0.9888	0.9775	0.9731	0.9550	0.9729	0.9507	0.9592	0.9516	0.9361	0.9544
20	0.9793	0.9804	0.9535	0.9712	0.9673	0.9634	0.9214	0.9383	0.9355	0.8912

Appendix B

Table A3. Important symbol definitions.

Symbol	Definition
$ShapeDTW (\cdot)$	The Shape Dynamic Time Warping algorithm, used to calculate the structural similarity between two time series
$T = {(t_{1}, t_{2}, \dots, t_{L})}^{T}$	Univariate time series
$D_{geo} (i, j)$	The Haversine distance (great-circle distance) between node i and node j
r	The radius of the Earth, with a default value of 6371 km
$R_{t} (i, j)$	Temporal correlation matrix
$σ_{s}$	The adaptive bandwidth for the Gaussian kernel function, taken as the median of the distances between all nodes
$({lat}_{j}, {lon}_{j})$	The latitude and longitude coordinates of a node, expressed in radians
$R_{s} (i, j)$	The spatial proximity matrix
$W (i, j)$	Spatiotemporal joint similarity matrix
L	Laplacian matrix
H	Clustering indicator matrix
k	The preset number of clusters
α	The weight coefficient for the temporal-spatial combination, set to 0.7 in this paper
$Gap (k)$	The Gap Statistic, used to determine the optimal number of clusters
$E_{n}^{*} (\log (d_{k}))$	$The expectation of \log (d_{k})$ , estimated by clustering random sample sets
B	The number of sampling times for calculating the Gap Statistic
$s_{k}$	The standard deviation of the Gap value, which characterizes its stability
$V = \{v_{1}, v_{2}, \dots, v_{n}\}$	Node matrix set
n	The number of spatiotemporally similar nodes within a sub-region; also, the number of data points for evaluation metrics
m	The number of edges within a sub-region
$A_{m n}^{global}$	Interval-based connection matrix
$h_{m} W_{Q}$	Query vector in the attention mechanism
$H^{l + 1}$	Hierarchical spatiotemporal graph convolutional layer
$σ (\cdot)$	The nonlinear activation function ReLU

Table A4. Abbreviation definitions.

Abbreviation	Full Name
ShapeDTW	Shape Dynamic Time Warping
STGCN	Spatiotemporal Graph Convolutional Network
GCN	Graph Convolutional Network
TCN	Temporal Convolutional Network
GNN	Graph Neural Network
LSTM	Long Short-Term Memory
DTW	Dynamic Time Warping
STLF	Short-Term Load Forecasting
NWP	Numerical Weather Prediction
Seq2Seq	Sequence-to-Sequence
SC	Spectral Clustering
STSC	Spatiotemporal Spectral Clustering
MAE	Mean Absolute Error
RMSE	Root Mean Square Error
R²	Coefficient of Determination

References

Wang, W.; Chen, Y.; Xiao, C.; Yang, Y.; Yao, J. Design of short-term load forecasting method considering user behavior. Electr. Power Syst. Res. 2024, 234, 110529. [Google Scholar] [CrossRef]
Zhu, L.; Liu, J.; Hu, C.; Zhi, Y.; Liu, Y. Analysis of Electricity Consumption Pattern Clustering and Electricity Consumption Behavior. Energy Eng. 2024, 121, 2639–2653. [Google Scholar] [CrossRef]
Shang, Q.; Zhang, Q.; Ju, C.; Zhou, Q.; Yang, Z. A unified traffic flow prediction model considering node differences, spatio-temporal features, and local-global dynamics. Phys. A Stat. Mech. Its Appl. 2025, 667, 130554. [Google Scholar] [CrossRef]
Wang, Y.; Hao, Y.; Zhao, K.; Yao, Y. Stochastic configuration networks for short-term power load forecasting. Inf. Sci. 2025, 689, 121489. [Google Scholar] [CrossRef]
Wang, Q. The characteristic analysis and forecasting of mid-long term load based on spatial autoregressive model. J. Northeast. Dianli Univ. 2021, 41, 118–123. [Google Scholar] [CrossRef]
Hu, Y.; Qu, B.; Wang, J.; Liang, J.; Wang, Y.; Yu, K.; Li, Y.; Qiao, K. Short-term load forecasting using multimodal evolutionary algorithm and random vector functional link network based ensemble learning. Appl. Energy 2021, 285, 116415. [Google Scholar] [CrossRef]
Zhao, F.; Sun, B.; Zhang, C. Cooling, heating and electrical load forecasting method for CCHP system based on multivariate phase space reconstruction and Kalman filter. Proc. CSEE 2016, 36, 399–406. [Google Scholar] [CrossRef]
Amral, N.; Ozveren, C.S.; King, D. Short term load forecasting using multiple linear regression. In Proceedings of the 2007 42nd International Universities Power Engineering Conference, Brighton, UK, 4–6 September 2007; IEEE: New York, NY, USA, 2008; pp. 1192–1198. [Google Scholar] [CrossRef]
Mai, H.; Xiao, J.; Wu, X.; Chen, C. Research on ARIMA model parallelization in load prediction based on R language. Power Syst. Technol. 2015, 39, 3216–3220. [Google Scholar] [CrossRef]
Li, Y.; Wang, H.; Huang, X.; Hao, J.; Lei, W.; Wang, Q. Short-term power load forecasting in distribution networks considering human comfort level. Front. Energy Res. 2025, 13, 1514755. [Google Scholar] [CrossRef]
Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S. Short-term load forecasting for industrial customers based on TCN-LightGBM. IEEE Trans. Power Syst. 2021, 36, 1984–1997. [Google Scholar] [CrossRef]
Huang, N.; Li, B.; Sun, H.; Wang, Y.; Cai, G.; Zhang, L. Short-term Prediction of Wind Power in Wide-area Multi-wind Farms with Enhanced Time-space Characteristics. Power Syst. Technol. 2025, 49, 3688–3698. [Google Scholar] [CrossRef]
Wang, S. Short-Term Load Temporal-Spatial Forecasting Based on Graph Neural Networks. Master’s Thesis, Northeast Electric Power University, Jilin, China, 2023. [Google Scholar]
Chen, H.; Zhu, M.; Hu, X.; Wang, J.; Sun, Y.; Yang, J. Research on short-term load forecasting of new-type power system based on GCN-LSTM considering multiple influencing factors. Energy Rep. 2023, 9, 1022–1031. [Google Scholar] [CrossRef]
Wang, P.; Feng, L.; Zhu, Y.; Wu, H. Hybrid spatial–temporal graph neural network for traffic forecasting. Inf. Fusion 2025, 118, 102978. [Google Scholar] [CrossRef]
Cao, J.E.; Liu, C.; Chen, C.-L.; Qu, N.; Xi, Y.; Dong, Y.; Feng, R. A short-term load forecasting method for integrated community energy system based on STGCN. Electr. Power Syst. Res. 2024, 232, 110265. [Google Scholar] [CrossRef]
Kong, X.; Zheng, F.; Zhijun, E.; Cao, J.; Wang, X. Short-term Load Forecasting Based on Deep Belief Network. J. Mod. Power Syst. 2018, 42, 133–139. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Y.; Zhang, G. Short-term wind power forecasting approach based on Seq2Seq model using NWP data. Energy 2020, 213, 118371. [Google Scholar] [CrossRef]
Arastehfar, S.; Matinkia, M.; Jabbarpour, M.R. Short-term residential load forecasting using Graph Convolutional Recurrent Neural Networks. Eng. Appl. Artif. Intell. 2022, 116, 105358. [Google Scholar] [CrossRef]
Zhao, W.; Chang, W.; Yang, Q. Collaborative energy management of interconnected regional integrated energy systems considering spatio-temporal characteristics. Renew. Energy 2024, 235, 121363. [Google Scholar] [CrossRef]
Xian, H.; Che, J. Multi-space collaboration framework based optimal model selection for power load forecasting. Appl. Energy 2022, 314, 118937. [Google Scholar] [CrossRef]
Wang, Z.; Duan, J.; Luo, F.; Qiu, X. Collaborative Forecasting of Multiple Energy Loads in Integrated Energy Systems Based on Feature Extraction and Deep Learning. Energies 2025, 18, 1048. [Google Scholar] [CrossRef]
Zhao, J.; Itti, L. shapeDTW: Shape Dynamic Time Warping. Pattern Recognit. 2018, 74, 171–184. [Google Scholar] [CrossRef]
Song, K.; Yao, X.; Nie, F.; Li, X.; Xu, M. Weighted bilateral K-means algorithm for fast co-clustering and fast spectral clustering. Pattern Recognit. 2021, 109, 107560. [Google Scholar] [CrossRef]
Khan, I.K.; Daud, H.B.; Zainuddin, N.B.; Sokkalingam, R.; Naheed, N.; Janisar, A.A.; Inayat, A.; Rana, M.S. Standardization of expected value in gap statistic using Gaussian distribution for optimal number of clusters selection in K-means. Egypt. Inform. J. 2025, 30, 100701. [Google Scholar] [CrossRef]

Figure 1. Overall flowchart.

Figure 2. Schematic diagram of spectral clustering.

Figure 3. Spatiotemporal information graph of load.

Figure 4. The schematic of the STGCN principle.

Figure 5. Attention mechanism diagram.

Figure 6. Regional Node Distribution Map.

Figure 7. Regional Partitioning Results.

Figure 8. Comparison of prediction performance under different partitioning strategies at a single node.

Figure 9. Prediction Performance Comparison Across Nodes.

Figure 10. Prediction Performance Schematic of Different Methods.

Figure 11. R² Metrics of 20 Regions for Various Methods.

Table 1. Parameter Settings.

Parameter Name	Parameter Value
Seq len	96
Number of attention heads	4
Epoch	150
Batch size	24
Patience	15
Optimizer	Adam
Learning rate	0.001
Activation function	RELU
Loss function	MSE

Table 2. Gap Statistic Results.

k	Gap	S_k
3	0.5779	0.1617
4	0.7977	0.1443
5	0.7515	0.2332

Table 3. Prediction performance metrics under different partitioning strategies at a single node.

Partitioning Strategy	Number of Divided Regions	Index
Partitioning Strategy	Number of Divided Regions	RMSE (KW)	MAE (KW)	R²
SC	3	2108.3017	1830.0889	0.9421
	4	1541.1333	1301.5839	0.9690
	5	2320.9904	1738.6011	0.9298
Proposed (STSC)	3	1075.2232	826.4382	0.9849
	4	831.6296	605.7206	0.9899
	5	1040.4775	823.7987	0.9859

Table 4. Average R² Metrics Across 20 Nodes Under Different Partitioning Strategies.

Partitioning Strategy	Number of Divided Regions	Average R²
SC	3	0.9497
	4	0.9528
	5	0.9451
Proposed (STSC)	3	0.9766
	4	0.9836
	5	0.9771

Table 5. Performance Metrics of Predictions for Various Methods.

Method	RMSE (KW)	MAE (KW)	R²
TCN	2048.8692	1444.1106	0.9395
Informer	2060.6376	1506.0454	0.9378
GCN	1554.0343	1422.9149	0.9500
STGCN	1633.9125	1360.4150	0.9553
STSC-TCN	1540.0439	1071.3119	0.9637
STSC-Informer	1524.3477	1050.3312	0.9692
STSC-GCN	1343.7922	1021.9116	0.9665
STSC-STGCN	1180.6844	915.4481	0.9800
STSC-GCN-Attention	1302.1242	1001.2071	0.9758
Proposed (STSC-STGCN-Attention)	831.6296	605.7206	0.9899

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dou, X.; Yang, R.; Dou, Z.; Zhang, C.; Xu, C.; Li, J. A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration. Sustainability 2025, 17, 8162. https://doi.org/10.3390/su17188162

AMA Style

Dou X, Yang R, Dou Z, Zhang C, Xu C, Li J. A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration. Sustainability. 2025; 17(18):8162. https://doi.org/10.3390/su17188162

Chicago/Turabian Style

Dou, Xun, Ruiang Yang, Zhenlan Dou, Chunyan Zhang, Chen Xu, and Jiacheng Li. 2025. "A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration" Sustainability 17, no. 18: 8162. https://doi.org/10.3390/su17188162

APA Style

Dou, X., Yang, R., Dou, Z., Zhang, C., Xu, C., & Li, J. (2025). A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration. Sustainability, 17(18), 8162. https://doi.org/10.3390/su17188162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Load Forecasting Model Based on Spatiotemporal Partitioning and Cross-Regional Attention Collaboration

Abstract

1. Introduction

2. Methodology

2.1. Overall Framework

2.2. Regional Division Module Based on Spatiotemporal Similarity

2.2.1. Load Time Series Correlation Analysis Method Based on ShapeDTW

2.2.2. Spatial Proximity

2.2.3. Spectral Clustering

2.2.4. Optimal Partitioning with Gap Statistic

2.3. Regional Collaborative Prediction Model Architecture

2.3.1. Subgraph Construction Based on Spatiotemporal Characteristics

2.3.2. Adjacency Matrix Generation

2.3.3. STGCN-Attention Hybrid Model

3. Examples Analysis

3.1. Experimental Setup

3.2. Evaluating Indicator

3.3. Analysis of Regional Partitioning Rationality

3.4. Prediction Performance Comparison

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI