DC-STGCN: Dual-Channel Based Graph Convolutional Networks for Network Trafﬁc Forecasting

: Network trafﬁc forecasting is essential for efﬁcient network management and planning. Accurate long-term forecasting models are also essential for proactive control of upcoming congestion events. Due to the complex spatial-temporal dependencies between trafﬁc ﬂows, traditional time series forecasting models are often unable to fully extract the spatial-temporal characteristics between the trafﬁc ﬂows. To address this issue, we propose a novel dual-channel based graph convolutional network (DC-STGCN) model. The proposed model consists of two temporal components that characterize the daily and weekly correlation of the network trafﬁc. Each of these two components contains a spatial-temporal characteristics extraction module consisting of a dual-channel graph convolutional network (DCGCN) and a gated recurrent unit (GRU). The DCGCN further consists of an adjacency feature extraction module (AGCN) and a correlation feature extraction module (PGCN) to capture the connectivity between nodes and the proximity correlation, respectively. The GRU further extracts the temporal characteristics of the trafﬁc. The experimental results based on real network data sets show that the prediction accuracy of the DC-STGCN model overperforms the existing baseline and is capable of making long-term predictions.


Introduction
As the main communication means, the internet plays an important role in daily life, production and the military. The internet not only meets people's daily communication needs but also contributes greatly to the development of the country. With the rapid development of data networks and the increasing demand for network traffic, wireless network operators need to guarantee the Quality of Service (QoS) of their networks, thus, leading to a need to plan their network traffic properly [1,2]. Traffic planning provides a scientific approach to allocate traffic, while network traffic forecasting provides a solution to the network traffic planning problem. Timely and accurate traffic forecasts are becoming increasingly important for network management and planning. Such forecasts enable managers to formulate resource allocation strategies in advance, and proactively manage upcoming overload events. Nevertheless, due to the complex temporal and spatial relationships between traffic flows, it is difficult for traditional forecasting models to accurately predict network traffic [3][4][5].
Network traffic prediction is a typical spatial-temporal sequence prediction problem, where the topology of the network nodes affects the traffic. The traffic data sampled between the nodes are not independent of each other. The spatial and temporal dependencies of network traffic are shown in Figures 1 and 2. The line between each node in Figure 1 represents the weight of their interaction, where the darker the color the larger the weight 1 represents the weight of their interaction, where the darker the color the larger the weight value. It can be seen that for node A, its neighboring nodes in various locations differently affect this node while such effects are also time-dependent. The traffic variations at node A are shown in Figure 2a,b, over a week and a day, respectively. As shown, the network traffic is periodically changed over the week [6], and the traffic over the day also demonstrates high daytime and low nighttime variations. It is shown that the current network traffic is affected by the traffic at the previous times, or even from the same moment in the previous weeks and is interdependent in both the temporal and spatial dimensions. It is, therefore, essential to effectively extract the spatial-temporal characteristics of the data and accurately predict the network traffic. (Figures 1 and 2 are used to confirm that network traffic is spatially and temporally correlated.)   Traditional network traffic forecasting models include the Historical Average (HA) model [7], Autoregressive (AR) model, Autoregressive Moving Average (ARMA) model, and several extensions of these models [8,9]. With the development of artificial intelligence, nonlinear models such as Support Vector Regression (SVR), Extreme Learning Machines (ELM), and Recurrent Neural Network (RNN) based on machine learning algorithms, are also considered for such complex systems [10][11][12]. However, these models can  1 represents the weight of their interaction, where the darker the color the larger the weight value. It can be seen that for node A, its neighboring nodes in various locations differently affect this node while such effects are also time-dependent. The traffic variations at node A are shown in Figure 2a,b, over a week and a day, respectively. As shown, the network traffic is periodically changed over the week [6], and the traffic over the day also demonstrates high daytime and low nighttime variations. It is shown that the current network traffic is affected by the traffic at the previous times, or even from the same moment in the previous weeks and is interdependent in both the temporal and spatial dimensions. It is, therefore, essential to effectively extract the spatial-temporal characteristics of the data and accurately predict the network traffic. (Figures 1 and 2 are used to confirm that network traffic is spatially and temporally correlated.) Traditional network traffic forecasting models include the Historical Average (HA) model [7], Autoregressive (AR) model, Autoregressive Moving Average (ARMA) model, and several extensions of these models [8,9]. With the development of artificial intelligence, nonlinear models such as Support Vector Regression (SVR), Extreme Learning Machines (ELM), and Recurrent Neural Network (RNN) based on machine learning algorithms, are also considered for such complex systems [10][11][12]. However, these models can Traditional network traffic forecasting models include the Historical Average (HA) model [7], Autoregressive (AR) model, Autoregressive Moving Average (ARMA) model, and several extensions of these models [8,9]. With the development of artificial intelligence, nonlinear models such as Support Vector Regression (SVR), Extreme Learning Machines (ELM), and Recurrent Neural Network (RNN) based on machine learning algorithms, are also considered for such complex systems [10][11][12]. However, these models can only extract the temporal characteristics of the series and cannot describe the spatial relationships between flows. To better extract the spatial features, Tudose et al. [13] proposed the use of a convolutional neural network (CNN) to predict the time series. Nevertheless, CNN is usually applied to regular Euclidean data such as images, and hence is unable to characterize the spatial dependencies between nodes of complex network topologies. In recent years, Graph Convolutional Networks (GCN) [14] have been used to extract the spatial characteristics of such nonEuclidian data. Zhao et al. [4] proposed prediction of traffic via the T-GCN model. T-GCN consists of the GCN-GRU model that uses the traditional associative adjacency matrix to perform graph convolution operations. Traditional GCN only describes the connectivity between the network nodes and cannot capture the near-correlation between the network nodes. Therefore, T-GCN cannot effectively extract the spatial-temporal characteristics of network traffic.
To address these issues, this paper proposes a new prediction model, DC-STGCN (Dual-Channel Based Graph Convolutional Networks), to predict the network traffic, which can effectively extract the spatial-temporal characteristics between network nodes. The main contributions of this paper are summarized below.

•
A multitemporal input component is designed which consists of two independent time components that model the daily and weekly correlation of the network traffic.

•
Each component contains a spatial-temporal feature extraction module consisting of a DCGCN and a GRU. The DCGCN can effectively extract spatial features between the network nodes, and the GRU captures the temporal characteristics of the time series.

•
The proposed DCGCN consists of an adjacency feature extraction module (AGCN) and a correlation feature extraction module (PGCN). These enable DCGCN to capture connectivity between the nodes as well as near-correlation.

•
The DC-STGCN model is trained several times on the traffic network dataset of the city of Milan and the results show that the DC-STGCN model has the best prediction error, prediction accuracy and correlation coefficients compared to several existing baselines and has the ability of long-term prediction.
The remainder of the paper is organized as follows: Section 2 reviews the related works on network traffic prediction; Section 3 presents the details of the model proposed in this paper; Section 4 describes the implementation of the experiments and a discussion of the experimental results and Section 5 provides a summary and an outlook for future research.

Related Work
Existing time series prediction models are divided into linear prediction models and nonlinear prediction models. Network traffic is typically a time series and, initially, several linear models are adopted to solve its prediction problem. For example, the HA model uses historical averages as forecasts [7]. Furthermore, the ARMA and several combinations, were used by Laner et al. [8] where the ARMA model was used to characterize remotely correlated network traffic. Further, Rishabh et al. [9] used the Discrete Wavelet Transform (DWT) to decompose the traffic data into linear and nonlinear components, before using the Autoregressive Integrated Moving Average (ARIMA) model for predicting nonlinear components. However, with the development of networks, the complexity and burstiness of network traffic have significantly increased; hence, traditional linear models such as Poisson and Gaussian distributions can no longer meet the characteristics of modern network traffic [15].
With the development of artificial intelligence, several machine learning models have been used to predict network traffic, and these nonlinear prediction models are very good at predicting nonstationary sequences. For instance, Qian et al. [10] used the SVRs to predict the denoised traffic data after Phase Space Reconstruction (PSR) processing. Bie et al. [11] predicted the low and high-frequency components of traffic decomposition by ELM, and ELM combined with the Fruit Fly Optimization Algorithm. Sebastian et al. [16] also used GRU for predicting base station traffic. GRU is a variant of Recurrent Neural Network (RNN) that is capable of resolving long-term RNN dependencies. These models are efficient in extracting the temporal characteristics of traffic data; however, they ignore the spatial relationships between the sequences.
To better extract the spatial characteristics of the time series, Li et al. [17] proposed a CNN fused with a Long Short-Term Memory (LSTM) model for prediction. This model effectively captures the spatial characteristics through the convolutional and pooling layers. However, CNN is usually applied to regular Euclidean data such as images and cannot fully describe the spatial dependencies between complex topological nodes of the network. In 2014, Bruna et al. [18] combined graph theory with neural networks to define filters for graphs in the Fourier domain and, subsequently, GCN was widely used in a knowledge graph [19] and for traffic flow prediction [20,21]. Zhao and colleagues [4] proposed predicting network traffic via T-GCN. However, conventional GCN is unable to capture the near correlation between network nodes and, therefore, T-GCN cannot accurately characterize the spatial and temporal characteristics of the traffic.
To address the T-GCN, we define a DCGCN model by studying the traffic correlation between nodes of a disconnected network, setting their adjacency matrix as a connectivity adjacency matrix, and using a correlation coefficient matrix to extract the connectivity, and near correlation between the nodes, respectively. Two temporal components are also defined for modeling the daily and weekly cycle correlations of network traffic respectively, each consisting of a DCGCN-GRU (spatial-temporal feature extraction module).

Traffic Network
The topology of a network of nodes is described by defining an undirected graph of a traffic network G = (V, E), where V(v 1 , v 2 , · · · , v I ) denotes the traffic nodes, I denotes the number of nodes and E is a set of edges denoting connectivity between nodes. In this paper, the measured network throughput (Mb) X ∈ R I×T at each time of the traffic node is used as the characteristic matrix. In this model, T denotes the length of the historical traffic sequence, and x t i denotes the traffic characteristics of the node i ∈ (1, 2, · · · , I) at the time n ∈ (1, 2, · · · , N).

Traffic Prediction
In this paper, we define the historical flows of all flow nodes X = (X 1 , X 2 , · · · , X T ) and predict their flows Y = (Y T+1 , Y T+2 , · · · , Y T+N ) at a future point in time, N, where, y t+n i represents the predicted value of the node i at a future n ∈ (1, 2, · · · , N) moment in time.

Spatial Feature Modeling
A graph is a data format that describes individuals and relationships between individuals through points and edges. GCN is an application of graph structure data for deep learning and, unlike traditional CNN, GCN operates on the convolution of graph signals in the Fourier domain [22]. Processing the graph structure first requires obtaining a Laplace matrix L = D − A of the input signal x, where L is symmetric normalized to obtain: where L sys is the symmetric normalized Laplacian, I I is the unit matrix, A ∈ R I×I denotes the adjacency matrix of the graph and D is the degree matrix of the node composition D ii = ∑ j A ij . To obtain the eigenvalues, L sys is decomposed into: where U = (u 1 , · · · , u i ), and Λ = diag([λ 1 , · · · , λ i ]) is a diagonal matrix of the decomposed eigenvectors and eigenvalues, respectively. Since U is an orthogonal matrix, U −1 = U T and U T is the transpose of the identity matrix. The spectral convolution, which can be defined as the product of the signal x and the filter in the Fourier domain is approximated by: where g θ represents the convolution kernel, * is the convolution symbol and θ is the model parameter. In 2016, Defferrard et al. [23]. proposed ChebNet, defining the Chebyshev polynomials of the diagonal matrix of eigenvectors as filters, deriving that: where Λ = 2Λ/Λ max − I I represents the scaled eigenvector matrix. The Chebyshev polynomial is defined recursively as [24], in combination with Equations (1) and (2) and the result is shown in the following equation: To avoid over-fitting, and the gradient disappearing as a result of large values, by Adding an activation function σ, the output of the first l layers is: where, W (l−1) is a weighting parameter of the layer l − 1. Therefore, given the characteristic matrix X and the adjacency matrix A, GCN can extract spatial features between nodes by convolving the spectrum of the input nodes.
Combining the above equation, such thatÂ = D − 1 2 A D − 1 2 , the mapping of the input through the two-layer GCN model is: where W (l−1) ∈ R T×H and W l ∈ R H×N denote the weights from the input layer to the hidden layer and from the hidden layer to the output layer, respectively. Here, H is the number of hidden layer cells. The selection of H is discussed in the next section. The traditional adjacency matrix is set up as follows: The method of determining the adjacency matrix of the traffic network has some validity in that the correlation between the connected nodes is higher the correlation between the unconnected nodes. However, each target node has multiple connected nodes, and not every connected node has the same impact on the target node. To address this Electronics 2021, 10, 1014 6 of 16 issue, the impact of different nodes is analyzed using the Pearson correlation coefficient P a,b , P a,b which is defined as where cov(a, b) is the covariance between the (a, b) continuous variables, σ a , and σ b is the standard deviation of a, b, respectively. The results of the calculation are presented in Figure 3.
The method of determining the adjacency matrix of the traffic network has some validity in that the correlation between the connected nodes is higher the correlation between the unconnected nodes. However, each target node has multiple connected nodes, and not every connected node has the same impact on the target node. To address this issue, the impact of different nodes is analyzed using the Pearson correlation coefficient where cov( , ) a b is the covariance between the ( , ) a b continuous variables, a σ , and b σ is the standard deviation of a , b , respectively. The results of the calculation are presented in Figure 3.  Figure 3 indicates that there is a spatial correlation between different network nodes. It is further seen that the nodes with connectivity to A (i.e., B, C, D, E) have different spatial correlations to the target node A. There are nodes with correlation coefficients less than 0.9, and nodes that are not connected to point A with correlation coefficients greater than 0.9. The spatial correlations between the nodes with connectivity to A (i.e., B, C, D, E) and the target node A are different. Therefore, merely setting the connectivity neighborhood matrix is not enough to describe the spatial dependency of the traffic network. In this paper, a new DCGCN model is proposed as presented in Figure 4.  Figure 3 indicates that there is a spatial correlation between different network nodes. It is further seen that the nodes with connectivity to A (i.e., B, C, D, E) have different spatial correlations to the target node A. There are nodes with correlation coefficients less than 0.9, and nodes that are not connected to point A with correlation coefficients greater than 0.9. The spatial correlations between the nodes with connectivity to A (i.e., B, C, D, E) and the target node A are different. Therefore, merely setting the connectivity neighborhood matrix is not enough to describe the spatial dependency of the traffic network. In this paper, a new DCGCN model is proposed as presented in Figure 4.  The DCGCN model is built on top of the basic GCN model, which consists of the AGCN and the PGCN, and the results after Concat are: where | represents the stitching of the matrix, the adjacency matrix of the AGCN is set to the connectivity adjacency matrix described above, and the adjacency matrix of the PGCN The DCGCN model is built on top of the basic GCN model, which consists of the AGCN and the PGCN, and the results after Concat are: where | represents the stitching of the matrix, the adjacency matrix of the AGCN is set to the connectivity adjacency matrix described above, and the adjacency matrix of the PGCN is replaced by the Pearson correlation coefficient matrix. This matrix is fused with the features extracted by the adjacency feature extraction module to capture the connectivity between nodes, and the spatial correlation, respectively.

Temporal Feature Modeling
The most commonly used model for extracting the temporal characteristics of natural sequences is the RNN. However, as the time series becomes too long and the input information becomes larger, training RNN to converge to a global optimum becomes a challenging task. The GRU and LSTM are special recurrent neural networks that alter the hidden layers of RNN to better capture the deep connections and effectively improve the problem of gradient disappearance due to long sequences [25]. Note that the internal structure of the GRU is simpler and the training time is shorter than that of the LSTM. Therefore, the GRU model is used in this paper to extract the temporal characteristics of network traffic. The structure of the considered GRU is shown in Figure 5, where h t−1 denotes the hidden state at the time t − 1 and X t denotes the traffic characteristics of at t. The t hidden states of a moment h t are determined using an update gate Γ u that determines whether the hidden state of the previous moment h t−1 is maintained or it is updated to a candidate hidden state of the moment h t , Γ u using σ, a function that makes itself equal to a value approximately equal to 0 or 1. Note that Γ r is a reset gate to control the extent to which the previous status message h t−1 is ignored. The structure of the GRU enables capturing long-range dependencies and is well suited for extracting the temporal characteristics of long correlation sequences, which is typical for predicting network traffic over time.

DC-STGCN Model
To better extract the spatiotemporal characteristics of the traffic data, a DC-STGCN model is proposed in this paper based on the GCN and the GRU. The structure of this model is shown in Figure 6. It mainly consists of a Multitime Component Input (MT), a Spatial-temporal Feature Extraction Unit (ST-Block), and a Feature Fusion.

DC-STGCN Model
To better extract the spatiotemporal characteristics of the traffic data, a DC-STGCN model is proposed in this paper based on the GCN and the GRU. The structure of this model is shown in Figure 6. It mainly consists of a Multitime Component Input (MT), a Spatial-temporal Feature Extraction Unit (ST-Block), and a Feature Fusion.

DC-STGCN Model
To better extract the spatiotemporal characteristics of the traffic data, a DC-STGCN model is proposed in this paper based on the GCN and the GRU. The structure of this model is shown in Figure 6. It mainly consists of a Multitime Component Input (MT), a Spatial-temporal Feature Extraction Unit (ST-Block), and a Feature Fusion.

Multitime Component Input
The network traffic at the current moment is affected by the previous moment, as explained in the Section 1. Daily periodicity means that the network traffic of the past few days affects the traffic of the current day, while weekly periodicity means that the current traffic is similar to the traffic of the same moment in the past few weeks. Therefore, this paper uses these two different scales of MT to capture the effects of the daily and weekly periodicity of traffic. This further enriches the model's extraction of the temporal characteristics of the traffic. As shown in Figure 7, assuming a daily input with a time window of length f , the model has a daily cycle input , a weekly cycle input , and then predicts the flow in time Y .

Multitime Component Input
The network traffic at the current moment is affected by the previous moment, as explained in the Section 1. Daily periodicity means that the network traffic of the past few days affects the traffic of the current day, while weekly periodicity means that the current traffic is similar to the traffic of the same moment in the past few weeks. Therefore, this paper uses these two different scales of MT to capture the effects of the daily and weekly periodicity of traffic. This further enriches the model's extraction of the temporal characteristics of the traffic. As shown in Figure 7, assuming a daily input with a time window of length f , the model has a daily cycle input X D = [x (t−d * f ) , · · · , x (t− f ) ], a weekly cycle input X W = [x (t−7 * w * f ) , · · · , x (t−7 * f ) ], and then predicts the flow in time Y.

Spatial-Temporal Feature Extraction Unit
On the right side of Figure 6 is a spatiotemporal feature extraction unit that consists of DCGCN and GRU. The spatial features of the historical time series are extracted from the DCGCN and the temporal features are input into the GRU. Furthermore r Γ is the hidden state at the time of t , and u Γ r Γ are the update, and reset gates, respectively. The following equations show the calculation process, where ( , , )

Feature Fusion
The degree to which the characteristics of the inputs in different scales affect the pre-

Spatial-Temporal Feature Extraction Unit
On the right side of Figure 6 is a spatiotemporal feature extraction unit that consists of DCGCN and GRU. The spatial features of the historical time series are extracted from the DCGCN and the temporal features are input into the GRU. Furthermore Γ r is the hidden state at the time of t, and Γ u Γ r are the update, and reset gates, respectively. The following equations show the calculation process, where f (X t , A, P) is the output of the spatial feature extracted from the DCGCN input for the moment t, W u,r,c and b u,r,c are the weight, and bias terms, respectively.

Feature Fusion
The degree to which the characteristics of the inputs in different scales affect the prediction results varies. For example, if the peak and trough values of traffic at some network nodes are in a time pattern, and the daily traffic is less mutable, the daily cycle input and the weekly cycle input are important for the prediction results. Therefore, the predictions are learned from the inputs of the temporal components at different time scales, and the final result of the fusion is: where is the Hadamard multiplier, and P 1 , P 2 indicate the extent to which daily, and weekly cycle inputs influence the prediction results, respectively.

Data Description
To verify the validity of the model, an open dataset was chosen as the experimental dataset in this paper. This dataset contained the network traffic network data collected in the city of Milan [26]. The sampling frequency of the dataset was 10 min/time, i.e., one day contains 144 Sampling points. Two arrays of nine regions were selected for model evaluation: The first 80% of each data set was used as the training set, and 10% of the data in the training set was used as the validation set for the initial training. After the best model was saved, the complete training set was used for further training, and the last 20% of the data was used as the test set. Before the prediction, the sample data was normalized using the MinMaxScaler function. This standardized the data in the [0, 1] interval and the reverse normalization was carried out before outputting the results.

Evaluation Indicators
To fully validate the predictive accuracy of the model, as the data are sampled at an interval of 10 min, we used 10 min (1 point), 20 min (2 points), and 30 min (3 points) for single and multistep forecasting respectively. Besides, three evaluation metrics were selected as indicators to judge the effectiveness of the model as the following. 1 Root Mean Square Error (RMSE), which reflects the prediction error of the model. The error value is [0, +∞] in the range. The closer the error to zero, the better the model. The formula is as follows.
2 Accuracy reflects the accuracy of a model's predictions. The range of accuracy is [0, 1]. The closer the value of Accuracy to 1, the better the model. Which is defined as follows: 3 Coefficient of Determination (R 2 score): the value of R 2 indicates the degree of model excellence. The evaluation criterium is the same as for the accuracy: where y t true and y t pred denote the actual and predicted values at the time t, respectively, y t ave denotes the mean value of the data samples, and T is the number of samples.

Parameter Design
In this study, we used the Python 3.7 programming environment. The network framework was built using TensorFlow, and the operating system was Windows 10 64bit. The computer included an Intel(R) Core(TM) i7-9700CPU @ 3.00 GHz processor, with 32 GB RAM. And the equipment was purchased from Dell in Nanjing, China.
In this experiment, Adam was chosen as the optimizer, the learning rate was set to 0.001, the epoch for model training was 2000, and the minibatch size was 16. The sliding window L is equivalent to a slider sliding on the top of a scale with a specified length. Each unit sliding gives feedback on the data within the slider and that is the predicted data. The number of hidden cells H is the number of neurons in the input GRU. There are important hyperparameters that affect the precision of the prediction results. In this paper, we compared the accuracy and R 2 of the prediction results to determine the optimal values of L and H.
In this experiment, the length L of the sliding window was set to [4,8,12,16]. Figures 8 and 9 show the changes in the accuracy and R 2 on the two datasets of working days and holidays. As is seen the changes in both indicators of accuracy and R 2 were largely consistent and correlated. For the working days, Figure 8a shows that the Accuracy and R 2 values were the highest for the sliding window length of 8; conversely, for the holidays, as shown in Figure 8b, the accuracy and R 2 value were the highest for the sliding window length of 12. To select the best sliding window, a new dataset with both working day and holiday test sets was adopted. And the sliding window length L was set to [4,6,8,10,12,14,16] (Table 1). working day and holiday test sets was adopted. And the sliding window length L was set to [4,6,8,10,12,14,16] (Table 1).

Experimental Results
A comparison of DC-STGCN with the following traditional timing prediction models and machine learning models was performed and the results are presented in Table 2.
HA: historical average model, which uses historical averages as predictions. Here we used the average of the last eight time samples to predict the value at the next moment.
ARIMA: autoregressive integrated moving average model, one of the most widely used predictive models for time series.
SVR: supports vector regression model, for which the prediction results were obtained by training based on historical data. SVR has the advantage of fewer training parameters and better results. In this paper, a linear kernel function was used and the penalty factor set to 0.001. GRU: gated recurrent unit, a variant of the RNN, which is an efficient solution to the gradients vanishing issue after a long sequence of inputs. In this experiment, Table 2 shows the prediction results for the next 10, 20, and 30 min for different models on different data sets (working days, holidays). The models were trained five times and then averaged for the final results. As the value of ARIMA was too small, * represents the resultant data that can be ignored. Analysis of Table 2 indicates the following:  As seen in Table 1, the best results were obtained at L = 8. Besides, as the sliding window length increased, the difficulty of calculation increased and the prediction accuracy decreased. Therefore, this paper used L = 8 as the length of the sliding window, i.e., 8 historical network traffic data (X t , X t+1 , · · · · · · , X t+7 ) to predict the future traffic.
For the selection of the H number of hidden layers, we considered (8,16,32,64,128). Figure 9a,b show the values of accuracy and R 2 for different numbers of hidden layers on the two datasets. It is seen that for both datasets, accuracy and R 2 were at their maximum values for H = 64; therefore H was set to 64 in this experiment.

Experimental Results
A comparison of DC-STGCN with the following traditional timing prediction models and machine learning models was performed and the results are presented in Table 2.
HA: historical average model, which uses historical averages as predictions. Here we used the average of the last eight time samples to predict the value at the next moment.
ARIMA: autoregressive integrated moving average model, one of the most widely used predictive models for time series.
SVR: supports vector regression model, for which the prediction results were obtained by training based on historical data. SVR has the advantage of fewer training parameters and better results. In this paper, a linear kernel function was used and the penalty factor set to 0.001.
GRU: gated recurrent unit, a variant of the RNN, which is an efficient solution to the gradients vanishing issue after a long sequence of inputs. In this experiment, Table 2 shows the prediction results for the next 10, 20, and 30 min for different models on different data sets (working days, holidays). The models were trained five times and then averaged for the final results. As the value of ARIMA was too small, * represents the resultant data that can be ignored. Analysis of Table 2 indicates the following: (1) The DC-STGCN model had the best forecast error, forecast precision, and correlation coefficient. For example, for a forecast step of 10 min on the working day dataset, the accuracy and R 2 values for DC-STGCN were, respectively, 3.2% and 3.5% higher than that of the HA model, and the RMSE was reduced by 0.558. Compared to the ARIMA model, the RMSE and accuracy of DC-STGCN were, respectively, 1.719 lower and 21.0% higher. While the accuracy and R 2 of DC-STGCN were improved by 2.9% and 3.2% compared to the SVR, the prediction was poorer as the SVR used a linear kernel function. It can be further seen that the neural network-based models, both DC-STGCN and GRU, outperformed the other models. This is because of the poor fit provided by the HA and ARIMA to such a long series of unsteady data, whereas the neural network models fitted the nonlinear data much better. (2) The DC-STGCN model had long-term forecasting capability. By increasing the prediction time, the prediction performance of the DC-STGCN model was decreased. Nevertheless, the DC-STGCN model still had the best prediction performance comparing with the other models. Figure 10 shows the change in accuracy with increasing forecast time for the DC-STGCN model on both working day and holiday datasets. The accuracy decreased with the forecast time, but the downward trend was rather smooth. Therefore, the DC-STGCN model was less affected by forecast time and had stable long-term forecasting capability. (3) Network traffic is cyclical as well as self-similar [11]. The network traffic at the current moment is affected by the previous moment. Therefore, in this paper, we used two different scales of MT to capture the effects of the daily and weekly periodicity of traffic. In contrast, the model proposed by He et al. [5] didn't take into account the flow characteristic; therefore, the DC-STGCN could extract a richer temporal feature. Meanwhile, the results in Table 2 suggest that the DC-STGCN model predicted better than the model (T-GCN), proposed by Zhao et al. [4]. (4) Comparing the prediction results of the working day and holiday datasets, the DC-STGCN model predicted network traffic better on the working day dataset than the holiday dataset. This is because holiday network traffic peaks are higher than weekday peaks and the traffic, therefore, is harder to predict. The DC-STGCN model predicted traffic more accurately for the working day dataset than the holiday dataset. This is because, unlike the more regular weekday traffic, the network traffic on holidays is more random. flow characteristic; therefore, the DC-STGCN could extract a richer temporal feature. Meanwhile, the results in Table 2 suggest that the DC-STGCN model predicted better than the model (T-GCN), proposed by Zhao et al. [4]. (4) Comparing the prediction results of the working day and holiday datasets, the DC-STGCN model predicted network traffic better on the working day dataset than the holiday dataset. This is because holiday network traffic peaks are higher than weekday peaks and the traffic, therefore, is harder to predict. The DC-STGCN model predicted traffic more accurately for the working day dataset than the holiday dataset. This is because, unlike the more regular weekday traffic, the network traffic on holidays is more random.

Ablation Studies
To verify the advanced nature of the DC-STGCN model, this section validates the effectiveness of the two modules proposed in this paper. The effectiveness of the modules including DC (dual-channel GCN) and MT (Multi-time Component Input) were investigated through ablation experiments. Based on the single-channel GCN-GRU model, the modules (1) DC; (2) MT; and (3) DC+MT (the proposed model in this paper) were added in turn, resulting in three submodels for multistep prediction on the working day dataset. The results are presented in Figure 11.  (3) DC+MT (the proposed model in this paper) were added in turn, resulting in three submodels for multistep prediction on the working day dataset. The results are presented in Figure 11. It can be seen that the prediction precision of the submodel with the MT module was higher than that of the submodel with the DC module. Nevertheless, the prediction performance of the sub-model with the MT module was significantly decreased by increasing the prediction time. This suggests that this setting was not suitable for long-term prediction. The submodel with the DC module was stable in multistep forecasting, but as network traffic is typically a time series, its temporal characteristics affected the prediction precision of the model more than its spatial characteristics. The submodel with the MT module extracted richer temporal characteristics of the network traffic, so the overall prediction of the submodel with the DC module alone was inferior. The model with the DC+MT module (the DC-STGCN model) combined the advantages of both modules, and it can be seen that the DC-STGCN model not only provided high prediction precision but was suitable for multistep prediction.
The ablation experiments confirmed that the DC and MT modules proposed in this paper are effective. The DC module enabled the DC-STGCN model to capture the connectivity between the nodes, and the near-correlation, hence extracting richer spatial features. Furthermore, the DC-STGCN model used the MT module to capture the daily and weekly It can be seen that the prediction precision of the submodel with the MT module was higher than that of the submodel with the DC module. Nevertheless, the prediction performance of the sub-model with the MT module was significantly decreased by increasing the prediction time. This suggests that this setting was not suitable for long-term prediction. The submodel with the DC module was stable in multistep forecasting, but as network traffic is typically a time series, its temporal characteristics affected the prediction precision of the model more than its spatial characteristics. The submodel with the MT module extracted richer temporal characteristics of the network traffic, so the overall prediction of the submodel with the DC module alone was inferior. The model with the DC+MT module (the DC-STGCN model) combined the advantages of both modules, and it can be seen that the DC-STGCN model not only provided high prediction precision but was suitable for multistep prediction.
The ablation experiments confirmed that the DC and MT modules proposed in this paper are effective. The DC module enabled the DC-STGCN model to capture the connectivity between the nodes, and the near-correlation, hence extracting richer spatial features. Furthermore, the DC-STGCN model used the MT module to capture the daily and weekly flow periodicity, further enriching the model's extraction of the temporal features of the flow. The results in Table 2 also indicate that the DC-STGCN model achieved the best prediction error, prediction accuracy and correlation coefficient on both examined datasets. These results suggest that the DC-STGCN model proposed in this paper is effective and achieves accurate and stable long-term prediction capability.

Model Interpretation
To better understand the DC-STGCN model, this section explores the differences in the model's predictions on different datasets. We selected two network nodes on the working day and holiday datasets, respectively. We then visually compare the results of their one-step predictions with true values in Figures 12 and 13.  (1) Network traffic is more regular on weekdays, starting to rise as people go to work (at or around 7 am), sampling point 42, reaching its peak for the day (at 5 pm), sampling point 102, and then falling as the users leave work, and eventually reaching a minimum in the early hours of the morning.
(2) Network traffic on holidays is more complex and variable than that of the weekdays. As it is seen on the left side of Figure 13 that holidays traffic is more random and unpredictable from 08:00 AM (point 48) onwards, depending on people's lifestyles. On the right side of Figure 13, it is also seen that the holidays traffic is more likely to have sudden points of variation. The GCN model used a smooth filter moving in the Fourier domain to interact with the signal. This resulted in smoother predictions of the mutation regions so that the DC-STGCN model did not predict the datasets with multiple mutation points as well.
In summary, the DC-STGCN model had a lower prediction precision for holidays than for working days, but the prediction of trends in actual network traffic was almost identical. This indicates that the DC-STGCN model can accurately predict upcoming congestion events.

Conclusions
In this paper, we propose a DC-STGCN model consisting of two temporal components that characterize the daily and weekly correlation of network traffic. Each component contains a spatial-temporal feature extraction module consisting of a DCGCN-GRU  (1) Network traffic is more regular on weekdays, starting to rise as people go to work (at or around 7 am), sampling point 42, reaching its peak for the day (at 5 pm), sampling point 102, and then falling as the users leave work, and eventually reaching a minimum in the early hours of the morning.
(2) Network traffic on holidays is more complex and variable than that of the weekdays. As it is seen on the left side of Figure 13 that holidays traffic is more random and unpredictable from 08:00 AM (point 48) onwards, depending on people's lifestyles. On the right side of Figure 13, it is also seen that the holidays traffic is more likely to have sudden points of variation. The GCN model used a smooth filter moving in the Fourier domain to interact with the signal. This resulted in smoother predictions of the mutation regions so that the DC-STGCN model did not predict the datasets with multiple mutation points as well.
In summary, the DC-STGCN model had a lower prediction precision for holidays than for working days, but the prediction of trends in actual network traffic was almost identical. This indicates that the DC-STGCN model can accurately predict upcoming congestion events.

Conclusions
In this paper, we propose a DC-STGCN model consisting of two temporal components that characterize the daily and weekly correlation of network traffic. Each compo- (1) Network traffic is more regular on weekdays, starting to rise as people go to work (at or around 7 am), sampling point 42, reaching its peak for the day (at 5 pm), sampling point 102, and then falling as the users leave work, and eventually reaching a minimum in the early hours of the morning.
(2) Network traffic on holidays is more complex and variable than that of the weekdays. As it is seen on the left side of Figure 13 that holidays traffic is more random and unpredictable from 08:00 AM (point 48) onwards, depending on people's lifestyles. On the right side of Figure 13, it is also seen that the holidays traffic is more likely to have sudden points of variation. The GCN model used a smooth filter moving in the Fourier domain to interact with the signal. This resulted in smoother predictions of the mutation regions so that the DC-STGCN model did not predict the datasets with multiple mutation points as well.
In summary, the DC-STGCN model had a lower prediction precision for holidays than for working days, but the prediction of trends in actual network traffic was almost identical. This indicates that the DC-STGCN model can accurately predict upcoming congestion events.

Conclusions
In this paper, we propose a DC-STGCN model consisting of two temporal components that characterize the daily and weekly correlation of network traffic. Each component contains a spatial-temporal feature extraction module consisting of a DCGCN-GRU model. The DCGCN consists of the AGCN and PGCN, and their adjacency matrices are set as connectivity adjacency matrices, and correlation coefficient matrices, respectively, which are used to extract the connectivity and proximity correlation between nodes. In our proposed method, the GRU extracts the temporal characteristics of the traffic. Trained on two real datasets, the results show that the proposed model outperformed existing models in terms of prediction error, prediction precision, and correlation coefficient, and could make long-term predictions. Compared to the traditional ARIMA model, the DC-STGCN model with a forecast length of 10 min on the working day dataset reduced the RMSE and accuracy by 1.719, and improved it by 21%, respectively, with significantly better forecasting performance. In the future, we may consider the construction of multitask prediction models, such as the simultaneous prediction of traffic rates and throughput.

Data Availability Statement:
The article uses the network traffic data collected in the city of Milan as a dataset for network traffic forecasting. The data was provided by Telecom Italia, a large European telephone service provider, which divides the entire city into 100 × 100 areas. The dataset is a count of the traffic of users sending or receiving data via mobile terminals in the Milan area. The data download link is as follows: https://doi:10.1038/sdata.2015.55 (accessed on 30 October 2020).

Conflicts of Interest:
The authors declare no conflict of interest.