Multi-Step-Ahead Electricity Price Forecasting Based on Temporal Graph Convolutional Network

: Traditional electricity price forecasting tends to adopt time-domain forecasting methods based on time series, which fail to make full use of the regional information of the electricity market, and ignore the extra-territorial factors affecting electricity price within the region under cross-regional transmission conditions. In order to improve the accuracy of electricity price forecasting, this paper proposes a novel spatio-temporal prediction model, which is combined with the graph convolutional network (GCN) and the temporal convolutional network (TCN). First, the model automatically extracts the relationships between price areas through the graph construction module. Then, the mix-jump GCN is used to capture the spatial dependence, and the dilated splicing TCN is used to capture the temporal dependence and forecast electricity price for all price areas. The results show that the model outperforms other models in both one-step forecasting and multi-step forecasting, indicating that the model has superior performance in electricity price forecasting.


Introduction
Over the past few decades, as electricity reforms have progressed, in many countries, electricity markets have shifted from traditional government monopolies to a deregulated and competitive market [1]. In a free competitive market, electricity can be traded like ordinary commodities, and its price can truly reflect the supply and demand situation in the market, and directly affect the interests of market players [2]. Consequently, accurate and effective forecasting of electricity price is of great importance for market entities to make decision plans and grasp market laws. For power generators, accurate forecasting of electricity prices allows them to develop reasonable bidding strategies to maximize revenue. For power sales companies, advance forecasting of electricity prices allows them to buy power at the lowest possible price. Market managers can better manage and optimize the electricity market by anticipating changes in electricity prices. However, how to accurately predict electricity price trends is still a problem that deserves more in-depth study [3], since the series is susceptible to geography, weather, and various other conditions, and is nonlinear and nonstationary in nature [4].
There are two main directions of research on electricity price forecasting. One is the market simulation forecasting method, which uses the mechanism of electricity price formation to simulate market transactions by forecasting the electricity supply and demand in the market to obtain the electricity price [5]. The other is data analysis forecasting, which is based on the assumption that electricity price series data are cyclical and regular, and analyzes and uses the past electricity price to achieve the forecast of the future electricity price. As mentioned above, electricity price is susceptible to other factors, and the huge data size of the current electricity market and the complex electrical connection between price areas make it difficult to apply market simulation forecasting methods to actual decision-making. Therefore, the data analysis forecasting method has become the main research direction of electricity price forecasting [6].
Due to the nonlinear and nonstationary nature of electricity price, statistical models have been criticized for their limitations in handling this type of data [14]. In recent years, emerging artificial intelligence algorithms have been widely used in the prediction of electricity price. For instance, Li et al. [15] forecasted electricity price based on a long shortterm memory (LSTM) neural network, using a test period of 4 weeks. Aslam et al. [16] focused on the performance of a convolutional network (CNN) in medium-term electricity price forecasting, and showed that the CNN model performs well. Yang et al. [17] built an innovative model based on a deep neural network (DNN) for electricity price forecasting, using a test dataset spanning a month. Chen et al. [18] developed a bidirectional recurrent neural network (RNN) to forecast prices in the French market, and the proposed model was compared with the deep learning method and a regression method. Xiao et al. [19] used an innovative model based on an extreme learning machine (ELM) to implement dayahead electricity price forecasting, and it was found that ELM is suitable for the day-ahead electricity price forecasting task.
The above improvements enhance the performance of the algorithm, but they are all based on a single time series data analysis and algorithm improvement in the time domain, ignoring the geospatial influence factors under cross-regional transmission conditions of a large grid [20]. In order to expand markets and increase market entities to enhance competition and promote the optimal allocation of resources, major economies are actively promoting the cross-region and cross-border power markets, such as AEMO in Australia, PJM in the United States, and Nord Pool in Europe [21]. The increasing frequency of crossregion and cross-border power market transactions and the long-distance transmission of power have also introduced extraterritorial market entities to the region, which affects the electricity price in the region [22]. In other words, forecasting regional electricity price in the electricity market relies not only on the historical series of the region, but also on the influence of neighboring regional electricity price on it.
Mathematically speaking, this is multivariate time series forecasting, and one of its basic assumptions is that its variables are interdependent. However, the above time-domainbased approaches do not effectively capture the potential spatial dependence between price areas. Statistical methods, such as VAR and GARCH, although widely used for single time series forecasting due to their simplicity and interpretability, do not scale well to multivariate time series data because the model complexity of this method increases at a high rate with the number of variables, and when there are more variables, the problem of over-fitting is encountered. [23]. Deep learning-based methods are excellent for capturing nonlinear patterns, such as LSTNet [24] and TPA-LSTM [25], which use CNN to obtain local dependencies between variables and RNN to maintain long-term temporal dependencies. However, the interactions between variables are encapsulated into a global hidden state, which weakens the interpretability of the model.
Graph is a special data form that is widely used to describe power system topology. However, because the graph data carried by the graph model is a non-Euclidean structure, it has long been difficult for it to be trained by ordinary neural networks. Recently, GCN has been considered to be better able to handle graph data due to their local connectivity and combinatorial nature [26]. GCN enables each node in the graph to extract information from surrounding nodes, allowing information to be propagated through the graph structure. From a graph perspective, the variables in a multivariate time series can be considered as nodes in a graph, which interact with each other through potential dependencies [27].
The performance of the prediction models including GCN is superior compared to the common method [28]. However, GCN still faces the following problems in implementing multivariate time series prediction tasks: (1) existing GCN methods need to be based on a pre-given graph structure; however, multivariate time series do not have an explicit graph structure. Hidden relationships between variables need to be mined from the data. (2) Even with available graph structures, existing GCN methods ignore the fact that manually predefined graph structures may not be optimal and should be optimized during training.
Based on the above analysis, a novel spatio-temporal prediction model is proposed to improve the forecasting accuracy of electricity price, termed as T-GCN. Firstly, the model extracts the graph adjacency matrix between variables based on multivariate time series data through a graph construction module. Next, the mix-jump GCN is used to capture the spatial dependence and the dilated splicing TCN is used to capture the temporal dependence. Finally, the output module converts the hidden states into the required output dimension to obtain the forecast sequence of electricity prices.
Based on the above research, the main innovations and contributions of this paper can be summarized as the following three aspects: (1) Creatively using GCN to forecast electricity prices in multiple price areas in the electricity market from the perspective of time and space. (2) This paper proposed a novel graph construction module to capture the hidden spatial correlation between variables, which solves the problem that there is no predefined graph structure for multivariate time series and the graph structure is not optimal. (3) This paper develops a modified mix-jump GCN that avoids the gradient problem that often occurs with GCN. An improved dilated splicing TCN is also developed in order to be able to capture multiple common time models.
The rest of this article is composed as follows. Section 2 describes in detail the mathematical principle of the prediction task and the proposed spatio-temporal prediction model used in this paper. After establishing the proposed model, in Section 3, the electricity price series of fifteen price areas from Nord Pool are collected for empirical research. Section 4 is the concluding remarks.

Electricity Price Series Modeling
Before presenting the network structure, we first analyze the nature of the AI networkbased model that realizes electricity price forecasting. Suppose a known series of electricity price x 0 , . . . , x T is given as input, and we wish to predict some corresponding electricity price series y 0 , . . . , y N as output. Formally, the AI network that accomplishes the electricity price prediction task is any function f that generates the mappinĝ and satisfies the causal constraint thatŷ 0 , . . . ,ŷ N , depending only on previously observed x 0 , . . . , x T and not on any "future" inputs. In the electricity price prediction task, the AI network uses learning methods, such as gradient descent, to iteratively update the parameters in the network f based on historical data (i.e., the training set), with the goal of minimizing the expected loss between the mapped predicted electricity price and the actual electricity price L(y 0 , . . . , y N , f (x 0 , . . . , x T )), thereby establishing a mapping relationship from input to output. The electricity price in the power market undergoes changes in the time domain, which are subject to geographical factors, human life, and production laws, and reflect certain periodicity and regularity. Therefore, the mapping relationship of past electricity price (i.e., the training set) also holds for future electricity price (i.e., the test set), thus enabling the prediction of future electricity price. The intrinsic regularity of electricity prices in the electricity market provides the theoretical support for this AI network model.

Traditional Propagation Layer
The complex spatial dependence between different price areas in the electricity market is a key problem in electricity price forecasting. Traditional convolutional neural networks (CNN) cannot handle complex topologies reflecting a large-scale cross-regional transmission network, and thus cannot accurately capture spatial correlations. Recently, GCN, which can handle irregular graph structure data, has attracted extensive attention. A graph is formulated as G = (V, E), where V is the set of nodes, and E is the set of edges. GCN propagates the implicit graph information using the structural information about the edge-vertex connections of the graph and the attribute information attached to the graph structure. The traditional GCN model with the following layer-wise propagation rule: where H (l) is the output of l layer, H (l) ∈ R n×d ; n is the number of nodes in the graph, G = (V, E); and each node is represented by a d-dimensional feature vector. A is the adjacency matrix of the undirected graph, A = A + I N ; I N is the identity matrix; D is the degree matrix; D = ∑ j A ij ; W (l) ∈ R d×h is the parameter to be trained. h is the output dimension; σ(·) denotes an activation function. GCN is concerned with the information within the kth-order neighbors centered at a node in the graph. Single layer GCN can only extract the information of first-order neighbors. In order to extract information from a wider range of nodes in the graph, this can be achieved by stacking multiple layers of GCN, as shown in Figure 1.
The electricity price in the power market undergoes changes in the time domain, which are subject to geographical factors, human life, and production laws, and reflect certain periodicity and regularity. Therefore, the mapping relationship of past electricity price (i.e., the training set) also holds for future electricity price (i.e., the test set), thus enabling the prediction of future electricity price. The intrinsic regularity of electricity prices in the electricity market provides the theoretical support for this AI network model.

Traditional Propagation Layer
The complex spatial dependence between different price areas in the electricity market is a key problem in electricity price forecasting. Traditional convolutional neural networks (CNN) cannot handle complex topologies reflecting a large-scale cross-regional transmission network, and thus cannot accurately capture spatial correlations. Recently, GCN, which can handle irregular graph structure data, has attracted extensive attention. A graph is formulated as G = (V, E), where V is the set of nodes, and E is the set of edges. GCN propagates the implicit graph information using the structural information about the edge-vertex connections of the graph and the attribute information attached to the graph structure. The traditional GCN model with the following layer-wise propagation rule: n is the number of nodes in the graph, G = (V, E); and each node is represented by a d-dimensional feature vector. A is the adjacency matrix of the undirected graph, GCN is concerned with the information within the kth-order neighbors centered at a node in the graph. Single layer GCN can only extract the information of first-order neighbors. In order to extract information from a wider range of nodes in the graph, this can be achieved by stacking multiple layers of GCN, as shown in Figure 1.

Mix-Jump Propagation Layer
GCN can merge a node's information with its neighbors' information. For each node, in order to extract its multi-order neighbors' information, it is necessary to stack multi-layer graph convolutional layers. However, as the number of graph convolution layers increases, the node hiding state gradually converges to a single point, which means that some information of the nodes' original state will be lost [29]. Therefore, this paper proposes the mix-jump propagation layer to cope with information flow between graph nodes, which retains a part of the nodes' original state during the propagation, so that the nodes' state can maintain the locality and globality after propagation. The composition relationship between the graph convolution module and the mix-jump propagation layer is shown in Figure 2.
in order to extract its multi-order neighbors' information, it is necessary to stack multilayer graph convolutional layers. However, as the number of graph convolution layers increases, the node hiding state gradually converges to a single point, which means that some information of the nodes' original state will be lost [29]. Therefore, this paper proposes the mix-jump propagation layer to cope with information flow between graph nodes, which retains a part of the nodes' original state during the propagation, so that the nodes' state can maintain the locality and globality after propagation. The composition relationship between the graph convolution module and the mix-jump propagation layer is shown in Figure 2. The proposed mix-jump layer consists of two parts: information propagation and information filtering. As shown in Figure 3, it first propagates information horizontally and then filters information vertically. The information propagation part is defined as follows: where α is the ratio of keeping the nodes' original state, in H is the hidden states output by the preceding layer, Typically, not all neighborhood information is valuable; the information filtering part is used to filter out the unimportant information generated at each jump. The information filtering part is defined as follows: where L is the depth of propagation, out H represents the output of this layer. The proposed mix-jump layer consists of two parts: information propagation and information filtering. As shown in Figure 3, it first propagates information horizontally and then filters information vertically. The information propagation part is defined as follows: where α is the ratio of keeping the nodes' original state, H in is the hidden states output by the preceding layer, Typically, not all neighborhood information is valuable; the information filtering part is used to filter out the unimportant information generated at each jump. The information filtering part is defined as follows: where L is the depth of propagation, H out represents the output of this layer.

Graph Construction Module
Existing GCN methods depend on manually predefined graphical structures to achieve time series prediction. However, in most cases, there is no explicit graphical struc-

Graph Construction Module
Existing GCN methods depend on manually predefined graphical structures to achieve time series prediction. However, in most cases, there is no explicit graphical structure for the multivariate time series, and the spatial dependence between multivariate time series must be discovered from the data rather than provided as basic facts. Even if the graphical structure is available, the manually predefined graphical structure may not be optimal and should be updated during training [30]. To address this problem, based on the findings of the literature [31], graphs can be trained from the backpropagation of the loss function using gradient descent, and this paper proposes a graph construction module. This module can model multivariate time series data, treat the variables in the multivariate time series as nodes in the graph, describe the relationships between the nodes using a graph adjacency matrix, and learn and update the internal graph structure simultaneously during the training process. The basic steps of the graph construction model are as follows: First, start with node embedding, i.e., the nodes are mapped to a low-dimensional feature space and represented as a matrix, which can be expressed as: where U 1 , U 2 represent node embedding random initialization, which will be learned during training; θ 1 , θ 2 are model parameters; tanh is hyperbolic tangent function; β is the saturation rate of the activation function. Second, it generates graph adjacency matrix using the following equation: where ReLu is a rectified linear unit that regularizes the adjacency matrix. Finally, for each node, choose the k nodes with the strongest spatial association with it as connected nodes, and set the non-connected node weights to zero while preserving the connected node weights. For i = 1, 2, . . . , n, compute the A as: where n is the number of nodes in the graph, and nontopk(·) gets the index of non-top k maximum value of a vector.

Temporal Convolution Module
The temporal dependence is another vital problem in electricity price forecasting. Recurrent neural networks (RNN) are models dedicated to sequence data; however, the architecture of RNN determines that it is prone to gradient explosion or gradient disappearance during training. LSTM [32] and GRU [33] are used to solve the above problems, but they have longer training time, more model parameters, and are prone to overfitting. Recently, temporal convolutional networks (TCN) [34] have been shown to perform significantly better than generic recurrent architectures, such as LSTM and GRU, in processing sequence data, and they exhibit longer memory than recurrent architectures with the same capacity. TCN captures the time dependence of sequence data through a one-dimension convolutional filter. In order to capture associations between temporal models with different lengths and process long time series, this paper proposes two dilated splicing layers making up a temporal convolution module. There is a tangential hyperbolic activation function behind one layer and a sigmoid activation function behind the other, both of which act as gates to control the amount of information that can be passed to the next module. The composition relationship between the dilated splicing layer and temporal convolution module is shown in Figure 4. dimension convolutional filter. In order to capture associations between temporal models with different lengths and process long time series, this paper proposes two dilated splicing layers making up a temporal convolution module. There is a tangential hyperbolic activation function behind one layer and a sigmoid activation function behind the other, both of which act as gates to control the amount of information that can be passed to the next module. The composition relationship between the dilated splicing layer and temporal convolution module is shown in Figure 4.

Splicing Architecture
Choosing the right convolutional kernel size is a critical step in building a convolutional network: not too small to capture long-term temporal models fully or too large to represent short-term temporal models delicately. Therefore, the convolution in this paper is performed by applying the splicing architecture method, i.e., connecting the outputs of convolution filters with different kernel sizes. Time models typically have several common cycles, such as 7, 12, and 24. Therefore, splicing architecture in improved TCN consists of four convolution kernels of sizes 1 × 2, 1 × 3, 1 × 6, and 1 × 7. These filter combinations can capture the common cycles described above. For example, the combination of filters 1 × 7 and 1 × 6 can capture cycle 12.

Splicing Architecture
Choosing the right convolutional kernel size is a critical step in building a convolutional network: not too small to capture long-term temporal models fully or too large to represent short-term temporal models delicately. Therefore, the convolution in this paper is performed by applying the splicing architecture method, i.e., connecting the outputs of convolution filters with different kernel sizes. Time models typically have several common cycles, such as 7, 12, and 24. Therefore, splicing architecture in improved TCN consists of four convolution kernels of sizes 1 × 2, 1 × 3, 1 × 6, and 1 × 7. These filter combinations can capture the common cycles described above. For example, the combination of filters 1 × 7 and 1 × 6 can capture cycle 12.

Dilated Convolution
The receptive field of the convolutional network is linearly related to the network depth and the kernel size. In order to deal with long-term sequences, it is often necessary to use deeper networks or larger filters, which will increase the complexity of the model. This paper adopts dilated convolution to reduce complication.
Dilated convolution is a convolution that skips input values of a certain step size in order to obtain a larger receptive field. [35]. It is equivalent to convolution with a larger filter that is obtained by dilating the original filter with zeros, but is significantly more efficient. Figure 5 shows dilated convolution with dilation factors 1, 2, 4, and 8. Note that dilated convolution with a dilation factor of 1 is equivalent to the standard convolution.
With only a few layers of dilated convolution, the network can have a large receptive field while maintaining its computational efficiency.

Dilated Convolution
The receptive field of the convolutional network is linearly related to the network depth and the kernel size. In order to deal with long-term sequences, it is often necessary to use deeper networks or larger filters, which will increase the complexity of the model. This paper adopts dilated convolution to reduce complication.
Dilated convolution is a convolution that skips input values of a certain step size in order to obtain a larger receptive field. [35]. It is equivalent to convolution with a larger filter that is obtained by dilating the original filter with zeros, but is significantly more efficient. Figure 5 shows dilated convolution with dilation factors 1, 2, 4, and 8. Note that dilated convolution with a dilation factor of 1 is equivalent to the standard convolution.
With only a few layers of dilated convolution, the network can have a large receptive field while maintaining its computational efficiency.

Dilated Splicing Layer
Formally, the structure of modified TCN that incorporates splicing and dilated convolution is shown in Figure 6. For a 1D sequence input T x R ∈ and filters containing f R × ∈ , the modified TCN is expressed as follows:

Dilated Splicing Layer
Formally, the structure of modified TCN that incorporates splicing and dilated convolution is shown in Figure 6. For a 1D sequence input x ∈ R T and filters containing f 1×2 ∈ R 2 , f 1×3 ∈ R 3 , f 1×6 ∈ R 6 , and f 1×7 ∈ R 7 , the modified TCN is expressed as follows: taking the output length of the largest filter as the standard, the outputs of the four filters are truncated to the same length and connected across the channel dimension, the dilated convolution operation x f 1×k on element t is defined as: where k is the filter size, d is the dilation factor.

Residual Connections
Residual connections have been repeatedly shown to be important in the stability and improving the accuracy of TCN [36]. Formally, the residua fined as: where x and y are the input and output of the residual block, F is a serie mations. This ensures that layers are not learning the entire transformatio changes in the identity mapping.

Model
As displayed in Figure 7, the proposed T-GCN model consists of a graph module, graph convolution modules, and temporal convolution modules. cover latent spatial dependencies between price areas, the graph construc computes the graph adjacency matrix by loss function and gradient desce feeds it into all the graph convolution modules. Then, graph convolution temporal convolution modules are interleaved to capture the spatio-tempor

Residual Connections
Residual connections have been repeatedly shown to be important in maintaining the stability and improving the accuracy of TCN [36]. Formally, the residual block is defined as: where x and y are the input and output of the residual block, F is a series of transformations. This ensures that layers are not learning the entire transformation, but rather, changes in the identity mapping.

Model
As displayed in Figure 7, the proposed T-GCN model consists of a graph construction module, graph convolution modules, and temporal convolution modules. First, to discover latent spatial dependencies between price areas, the graph construction module computes the graph adjacency matrix by loss function and gradient descent, and then feeds it into all the graph convolution modules. Then, graph convolution modules and temporal convolution modules are interleaved to capture the spatio-temporal correlation between multivariate time series data. Figure 7 illustrates the collaboration between graph convolution modules and temporal convolution modules. To improve the stability and accuracy of the model, the residual modules are used to connect the output of temporal convolution modules and the input of the output module. Finally, the output module converts the hidden states into the required output dimension. We observe that multi-step forecasting generates much higher losses than one-step forecasting. Therefore, in order to improve the prediction accuracy, this paper uses a training algorithm for multi-step prediction task. Based on the idea of "from easy to difficult", the algorithm starts from solving the simplest one-step prediction task, and the number of prediction steps gradually increases with the increase of iteration times until the model can complete the more difficult multi-step prediction. The details are shown in Algorithm 1. compute the stochastic gradient of Ω according to L. 11: update model parameters Ω according to their gradients and the learning rate ζ.

Data Collection
This paper uses 15 regional electricity price series from the Nordic electricity market for the empirical study. The Nordic electricity market is a regional electricity market that includes several countries, such as Denmark, Sweden, and Norway. Due to geographical and demographic factors, there is a mismatch between power resources and load in the We observe that multi-step forecasting generates much higher losses than one-step forecasting. Therefore, in order to improve the prediction accuracy, this paper uses a training algorithm for multi-step prediction task. Based on the idea of "from easy to difficult", the algorithm starts from solving the simplest one-step prediction task, and the number of prediction steps gradually increases with the increase of iteration times until the model can complete the more difficult multi-step prediction. The details are shown in Algorithm 1. compute the stochastic gradient of Ω according to L. 11: update model parameters Ω according to their gradients and the learning rate ζ.

Data Collection
This paper uses 15 regional electricity price series from the Nordic electricity market for the empirical study. The Nordic electricity market is a regional electricity market that includes several countries, such as Denmark, Sweden, and Norway. Due to geographical and demographic factors, there is a mismatch between power resources and load in the Nordic region, with cheap hydropower concentrated in northern Norway, and expensive thermal power concentrated in Denmark and Finland. In addition, the northern part of the Nordic region is sparsely populated, and the load is low, whereas the load is mainly concentrated in the densely populated and industrialized southern region. At the same time, due to climatic factors, the generation capacity of hydropower also varies seasonally, so that during the high wet season, hydropower concentrated in the northern part of Norway is delivered to the southern part, whereas during the dry season, thermal power from Denmark is delivered to the northern part. The above factors objectively contribute to the formation of the Nordic regional market, which is divided into 15 price areas, corresponding to the distribution of price areas as shown in Figure 8, with significantly different electricity price series in different price areas. Therefore, the case used in this paper can effectively reflect the validity of the proposed model.
Mathematics 2022, 10, x FOR PEER REVIEW Nordic region, with cheap hydropower concentrated in northern Norway, and ex thermal power concentrated in Denmark and Finland. In addition, the northern the Nordic region is sparsely populated, and the load is low, whereas the load is concentrated in the densely populated and industrialized southern region. At t time, due to climatic factors, the generation capacity of hydropower also varies sea so that during the high wet season, hydropower concentrated in the northern par way is delivered to the southern part, whereas during the dry season, thermal pow Denmark is delivered to the northern part. The above factors objectively contribu formation of the Nordic regional market, which is divided into 15 price areas, corr ing to the distribution of price areas as shown in Figure 8, with significantly differ tricity price series in different price areas. Therefore, the case used in this paper c tively reflect the validity of the proposed model. The electricity price in the Nordic electricity market has 24 observations per the time interval between observations is one hour. In this paper, electricity price 15 price areas from 1 June 2018 to 31 August 2018, with a total of 2208 observat selected to demonstrate the usability of the proposed model. In addition, the firs the data is defined as the training set, and the last 20% as the training set.
For market players in the electricity market, multi-step-ahead forecasting is m uable than single-step-ahead forecasting. However, the accuracy of multi-step-ah diction is usually inferior to that of single-step-ahead prediction due to the accum of errors and the increase of uncertainty factors. [37]. This study is devoted to bu new spatio-temporal forecasting model to achieve higher accuracy multi-step-ahe tricity price forecasting.
To reduce the training time, we normalize the input data to the interval [0, 1

Evaluation Metrics
This paper uses three common evaluation metrics to evaluate the effectivene proposed model, namely, mean absolute error (MAE), root mean square error and mean absolute percentage error (MAPE). The computational formulas of the evaluation metrics are provided as follows: The electricity price in the Nordic electricity market has 24 observations per day, i.e., the time interval between observations is one hour. In this paper, electricity price data for 15 price areas from 1 June 2018 to 31 August 2018, with a total of 2208 observations, are selected to demonstrate the usability of the proposed model. In addition, the first 80% of the data is defined as the training set, and the last 20% as the training set.
For market players in the electricity market, multi-step-ahead forecasting is more valuable than single-step-ahead forecasting. However, the accuracy of multi-step-ahead prediction is usually inferior to that of single-step-ahead prediction due to the accumulation of errors and the increase of uncertainty factors. [37]. This study is devoted to building a new spatio-temporal forecasting model to achieve higher accuracy multi-step-ahead electricity price forecasting.
To reduce the training time, we normalize the input data to the interval [0, 1].

Evaluation Metrics
This paper uses three common evaluation metrics to evaluate the effectiveness of the proposed model, namely, mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). The computational formulas of these three evaluation metrics are provided as follows: where y j i andŷ j i represent the real electricity price value and predicted value of the jth observation point in the ith price area, respectively. M is the number of observations in the time series; N is the number of price areas.
Specifically, smaller values of MAE, RMSE, and MAPE represent better predictions.

Experimental Results
We compare the performance of the T-GCN model with the following methods: The performance of the T-GCN model and other methods in the multi-step-ahead electricity price forecasting task is shown in Table 1. It is not difficult to see that compared with other methods, the T-GCN model has achieved the best evaluation index in all forecasting tasks, which proves its effectiveness in multi-regional electricity price forecasting tasks. Figure 9 provides three histograms of MAE, RMSE, and MAPE values based on different models, which present more intuitively the performance differences between the different methods.

Discussion
(1) Higher prediction accuracy. It was found that neural network-based approaches that include temporal feature modeling, such as T-GCN models and TCN models, typically have better prediction accuracy than other methods, such as ARIMA models and SVR models. . This is mainly due to the difficulty of methods such as ARIMA and SVR to handle complex non-stationary time series data. (2) Spatio-temporal prediction capability. We compared the T-GCN model with the GCN model and TCN model to verify the ability of the T-GCN model to capture its temporal and spatial characteristics from electricity price data of multiple price areas. As shown in Figure 9, the T-GCN model based on spatio-temporal features has higher prediction accuracy than the GCN and TCN models based on a single feature, indicating that the T-GCN model can capture spatio-temporal features from the electricity price data. The T-GCN model can obtain the best prediction performance by training, regardless of the change of the prediction horizon, indicating that the proposed model is insensitive to the prediction horizons and has strong stability. Therefore, the T-GCN model can be used for both short-term and long-term forecasting. Figure 9 shows the comparison of RMSE of different models, with the T-GCN model achieving the best results for different prediction horizons. Figure 10 shows the changes of T-GCN's performance at different forecasting horizons. It can be seen that the trend of increasing error is small and has some stability.

Discussion
(1) Higher prediction accuracy. It was found that neural network-based approaches that include temporal feature modeling, such as T-GCN models and TCN models, typically have better prediction accuracy than other methods, such as ARIMA models and SVR models. . This is mainly due to the difficulty of methods such as ARIMA and SVR to handle complex non-stationary time series data. (2) Spatio-temporal prediction capability. We compared the T-GCN model with the GCN model and TCN model to verify the ability of the T-GCN model to capture its temporal and spatial characteristics from electricity price data of multiple price areas. As shown in Figure 9, the T-GCN model based on spatio-temporal features has higher prediction accuracy than the GCN and TCN models based on a single feature, indicating that the T-GCN model can capture spatio-temporal features from the electricity price data. The T-GCN model can obtain the best prediction performance by training, regardless of the change of the prediction horizon, indicating that the proposed model is insensitive to the prediction horizons and has strong stability. Therefore, the T-GCN model can be used for both short-term and long-term forecasting. Figure 9 shows the comparison of RMSE of different models, with the T-GCN model achieving the best results for different prediction horizons. Figure 10 shows the changes of T-GCN's performance at different forecasting horizons. It can be seen that the trend of increasing error is small and has some stability.

Further Illustration of the Model
To better understand the contribution of the constructed graph adjacency matrix, Figure 11 shows the geographic location of the three price areas, LV, EE, and FI, where LV is the area geographically bordering EE, and FI is the constructed maximum weighted neighbor of EE. We plot the raw price data for these three regions in Figure 12.We observe that LV is closer to EE on the map, but the price data are less correlated. In contrast, the constructed maximum weighted neighbor FI is further away from EE, but their electricity price data are strongly correlated. Based on the flow data for the period, as shown in Figure 11, FI is the area that delivers the most power to EE, 5331.3 MWh more than LV, which shows that the T-GCN model can mine the potential dependence between variables through multivariate time-series data.

Further Illustration of the Model
To better understand the contribution of the constructed graph adjacency matrix, Figure 11 shows the geographic location of the three price areas, LV, EE, and FI, where LV is the area geographically bordering EE, and FI is the constructed maximum weighted neighbor of EE. We plot the raw price data for these three regions in Figure 12.We observe that LV is closer to EE on the map, but the price data are less correlated. In contrast, the constructed maximum weighted neighbor FI is further away from EE, but their electricity price data are strongly correlated. Based on the flow data for the period, as shown in Figure 11, FI is the area that delivers the most power to EE, 5331.3 MWh more than LV, which shows that the T-GCN model can mine the potential dependence between variables through multivariate time-series data.

Conclusions
In this paper, we propose an effective method to capture the intrinsic dependencie among multiple electricity price series and build a new electricity price prediction model T-GCN, to solve the electricity price prediction problem through a graph-based deep learning approach. On the one hand, the connection structure between nodes in the graph is captured by GCN to obtain spatial dependencies; on the other hand, TCN is used to capture the dynamic changes of nodes' own attributes to obtain temporal dependencies Evaluated on a Nordic electricity market dataset containing 15 price areas, the T-GCN model achieves better performance in different forecasting ranges compared with th ARIMA model, SVR model, GCN model, and TCN model. In conclusion, the T-GCN model successfully captures the spatio-temporal characteristics of multiple electricity price data and realizes high-precision forecasting. From the mathematical point of view this is because our method has strong fitting ability for multiple time series with potentia dependence, and can better map the relationship between input and output. Therefore, i can be applied to other multivariable time series prediction tasks with hidden depend ency, such as multi region wind power generation prediction, distributed photovoltai output prediction, etc.

Conclusions
In this paper, we propose an effective method to capture the intrinsic dependencies among multiple electricity price series and build a new electricity price prediction model, T-GCN, to solve the electricity price prediction problem through a graph-based deep learning approach. On the one hand, the connection structure between nodes in the graph is captured by GCN to obtain spatial dependencies; on the other hand, TCN is used to capture the dynamic changes of nodes' own attributes to obtain temporal dependencies. Evaluated on a Nordic electricity market dataset containing 15 price areas, the T-GCN model achieves better performance in different forecasting ranges compared with the ARIMA model, SVR model, GCN model, and TCN model. In conclusion, the T-GCN model successfully captures the spatio-temporal characteristics of multiple electricity price data and realizes high-precision forecasting. From the mathematical point of view, this is because our method has strong fitting ability for multiple time series with potential dependence, and can better map the relationship between input and output. Therefore, it can be applied to other multivariable time series prediction tasks with hidden dependency, such as multi region wind power generation prediction, distributed photovoltaic output prediction, etc.