Fourier Graph Convolution Network for Time Series Prediction

: The spatio-temporal pattern recognition of time series data is critical to developing intelligent transportation systems. Trafﬁc ﬂow data are time series that exhibit patterns of periodicity and volatility. A novel robust Fourier Graph Convolution Network model is proposed to learn these patterns effectively. The model includes a Fourier Embedding module and a stackable Spatial-Temporal ChebyNet layer. The development of the Fourier Embedding module is based on the analysis of Fourier series theory and can capture periodicity features. The Spatial-Temporal ChebyNet layer is designed to model trafﬁc ﬂow’s volatility features for improving the system’s robustness. The Fourier Embedding module represents a periodic function with a Fourier series that can ﬁnd the optimal coefﬁcient and optimal frequency parameters. The Spatial-Temporal ChebyNet layer consists of a Fine-grained Volatility Module and a Temporal Volatility Module. Experiments in terms of prediction accuracy using two open datasets show the proposed model outperforms the state-of-the-art methods signiﬁcantly.


Introduction
Intelligent Transportation Systems (ITS) aim to establish a complete traffic management system and provide innovative services for traffic management departments through the research of basic traffic theory and the integration of advanced science and technology [1,2].In Intelligent Transportation Systems (ITS), one of the critical issues is traffic flow prediction.Accurate traffic flow prediction is vital in many scenarios, such as road resource management, traffic network optimization, and traffic congestion alleviation [3,4].However, actual traffic flow presents a complex mixture of periodicity and uncertainty.For example, traffic flow data are collected from sensors on the road, so the data shows periodic changes due to the regular activities of individuals, such as daily traffic peak periods.Meanwhile, abundant factors contribute to uncertainty in traffic flow data [5], such as weather conditions, unexpected accidents, and road maintenance.In addition, traffic flow data cannot be fully collected due to the burdensome cost, which further increases the complexity of the traffic flow prediction problem.Therefore, there remains a series of crucial challenges to extract patterns from complicated traffic flow and then make reliable predictions based on them.
Many emerging methods are dedicated to traffic flow prediction.As classical statistical methods, Autoregressive Integrated Moving Average (ARIMA) models are applied in stationary time series where the traffic flow prediction could also be regarded as a seasonal ARIMA process [6].Further, Autoregressive Conditional Heteroskedasticity (ARCH) [7] has been proposed to analyze heteroskedasticity in time series.Nevertheless, these classical methods have significant drawbacks in dynamically processing various complex requirements for traffic flow prediction.Advanced artificial intelligence technology [8,9] improves the prediction accuracy of traffic flow.The Graph Convolution Network (GCN) has recently drawn attention due to its powerful ability to capture spatiotemporal information.Its typical variations include Temporal Graph Convolutional Network (T-GCN) [10], Attention-based Spatial-Temporal Graph Convolution Network (ASTGCN) [11], Spatial-Temporal Synchronous Graph Convolutional Networks (STSGCN) [12], and Dynamic Graph Convolution Network (DGCN) [3].These models regard the actual traffic flow as an entity for prediction.As the preliminary work, an improved Dynamic Chebyshev Graph Convolution Network (iGCGCN) has been proposed [13] to enhance the attention mechanism and the data construction.
Based on the decomposition of time series data, traffic flow data can consist of several components.For example, modules, including the Time-Series Analysis and Supervised-Learning (TSA-SL) [14] and the hybrid model [15], try to decompose the traffic flow into two main parts, periodicity and volatility, and further learn the two parts separately to improve the prediction accuracy.Figure 1a illustrates the original traffic flow of three detectors, and each detector has a specific periodicity fluctuation associated with the dynamic traffic network.Figure 1b-d show the corresponding decomposition results of detector1 data, including trend, periodicity, and volatility.Trends may be stable, uptrend, or downtrend, and the red circle in Figure 1b indicates a downtrend.In Figure 1c, the red box shows the change in one period of traffic flow data.In Figure 1d, the complicated dynamic volatility of detector1 is influenced by many factors, such as traffic patterns, the noise of traffic data, incomplete traffic flow, etc.Therefore, traffic flow forecasting models need the ability to automatically feature the components of various periodicities, trends, and volatility from the traffic flow data.
Although much effort has been devoted to the issue of traffic flow prediction, there are still some crucial challenges remaining in capturing the various periodicities and dynamic volatility.

•
The existing methods learn the periodicity based on frequency-domain methods, such as spectral analysis and traditional Fourier Transform [14][15][16][17].These models generally require manual parameters and comply with rigorous assumptions, making these methods incapable of capturing various periodicities.

•
There is still a lack of an efficient way to learn dynamic volatility for improving robusticity, which is crucial to the dynamic spatial-temporal pattern recognition of the traffic network.

•
Some models capture periodicity and volatility, but these methods capture them independently and ignore their inherent relationship.
To address these issues, a robust Fourier Graph Convolution Network (F-GCN) architecture is proposed, which consists of two adaptive modules, including a Fourier Embedding (FE) and a stackable Spatial-Temporal ChebyNet (STCN) layer.The FE module is proposed to capture various periodicities without artificial intervention, and the STCN module with periodic embedding is developed to extract dynamic temporal volatility.The STCN is comprised of two sub-modules: a Fine-grained Volatility Module and a Temporal Volatility Module.In detail, the Fine-grained Volatility Module first captures the fine-grained volatility to decrease the difficulty of complex volatility learning.Then, the Temporal Volatility Module further captures the dynamic temporal volatility.Unlike independent learning periodicity and volatility, the F-GCN model can consider the correlation between the two parts.The main contributions of this work are summarized as follows.

•
A novel Fourier Embedding module is proposed to capture periodicity patterns, which is proven to learn diversified periodicity patterns.

•
A stackable Spatial-Temporal ChebyNet layer, including a Fine-grained Volatility Module and a Temporal Volatility Module, is proposed to handle the complex volatility and learn dynamic temporal volatility for improving the system's robusticity.

•
A dynamic Fourier Graph Convolution Network framework is proposed to integrate the periodicity and volatility analysis, which could be easily trained in an end-to-end method.Extensive experiments are conducted on several real-world traffic flow data, and the results significantly outperform state-of-the-art methods.
The remainder of this paper is constructed as follows.In Section 2, related studies are summarized in terms of traffic flow data decomposition and graph convolution network.Methods including preliminaries and the proposed model are given in detail in Section 3. Results and the Discussion are provided in Section 4. Finally, a summary is given in Section 5.

Literature Review
This section includes two subsections: traffic flow data decomposition and graph convolution network.

Traffic Flow Data Decomposition
Data decomposition has inspired many methods for improving the precision of traffic flow prediction.The procedure of these methods generally includes two independent phases.The traffic flow data are firstly decomposed into several parts, and then various algorithms are employed to learn traffic patterns from these decomposed parts.For example, a hybrid approach [15] for short-term traffic flow forecasting decomposes traffic flow data into periodic trends, deterministic parts, and volatility, then this method utilizes three different modules to learn patterns from their three components: the spectrum method, ARIMA, and an autoregressive model.
Remarkably, TSA-SL [14] regards traffic flow patterns as a combination of periodicity and volatility.Then, the traditional Fourier transform learns periodicity with manual parameters setting, and the conventional machine learning methods are employed to capture volatility.Further, a combination model [17] has been proposed to utilize Empirical Mode Decomposition (EMD) to disassemble the traffic flow into multiple components with different frequencies.After that, the ARIMA and the improved Extreme Learning Machine (ELM) are applied to learn these components.The method [16] utilizes an Ensemble Empirical Mode Decomposition (EEMD) and the artificial neural network layer for multiscales traffic flow forecasting.These methods learn the decomposed parts using empirical methods, and the relationship among these parts is difficult to consider fully.

Graph Convolution Network
Traffic flow prediction is among the essential tasks in ITS, and many methods are applied to spatial-temporal prediction.Historical Average (HA) and ARIMA [6] are the conventional statistical approach to time series analysis for predicting traffic flow, and can only learn the linear relationship of the traffic network.With the advancement of machine learning, conventional machine learning algorithms have been developed to represent more complex and nonlinear data relationships.For example, some works introduced the Support Vector Regression (SVR) [18] and the Bayesian model [19] to capture highdimensional nonlinear characteristics.Especially in recent years, deep learning has proven to be effective in various areas.Many deep learning models have been developed to predict traffic flow and improve performance.Convolution Neural Networks (CNN) and Gated Recurrent Unit networks (GRU) are commonly used to deal with spatial and temporal characteristics.For example, [20] employed 1D-CNN to learn the spatial features of traffic flow; [21] used bidirectional GRU for short-term traffic flow prediction, and ST-3DNet [22] constructed a 3D-CNN to capture traffic characteristics simultaneously in temporal and spatial dimensions.However, CNN-based methods require traffic flow to be structured data, which generally ignore the topology information of the road network.
With the extraordinary performance of deep learning in image and natural language processing, many researchers are devoted to dealing with graph structure with these methods.The Graph Neural Network has been proposed to model a graph structure, which could be summarized as follows.(1) Spectral Methods: Bruna et al. [23] initially introduced a graph convolutional network by generalizing the convolution kernel with the Laplacian matrix.Defferrard et al. [24] adopted Chebyshev polynomials to approximate eigenvalue decomposition with fewer parameters and significantly decrease computation complexity.Kipf et al. [25] proposed a first-order linear approximation of the graph convolutional model and further improved its computational efficiency.(2) Spatial ethods: Steven et al. [26] proposed an algorithm for undirected spatial graphs to update graphs with different distances.Unfortunately, the model parameters increase sharply for largescale graphs.Niepert et al. [27] proposed converting graph structure data into traditional Euclidean data to overcome this drawback.William L et al. [28] proposed a general inductive reasoning framework to generate node representation through sampling and adjacent node characteristics.
With much effort from industry and academia, the graph neural network has gradually appeared in many variants to learn potential spatial-temporal patterns.MRes-RGNN [29] used residual recurrent graph neural networks to capture spatial-temporal information.ASTGCN [11] explores temporal and spatial relationships using an attention mechanism and a graph neural network.STSGCN [12] is used to build a module to capture local spatial and temporal characteristics and then stack this module along the time to learn long-term characteristics.Further, AGCRN [30] can automatically learn node features without pre-defined graphs using two adaptive modules.These methods have significantly improved the capacity to consider the relationship between nodes with the graph structure.Nevertheless, there is still a lack of an efficient way to automatically learn periodicity and volatility from traffic flow data.

Preliminaries
This section reviews the mathematical concepts used throughout the paper and serves as a reference for studying subsequent chapters.

The Complex Fourier Series
Signal Processing (SP) and communication engineering often apply Fourier theory.The Complex Fourier Series (CFS) representation of a periodic signal x(t) with period T can be written as Equation (1).
where F s denotes the fundamental frequency and it equals 1/T.The coefficients X c (kF s ) are given by Equation (2).
where X c (kF s ) will be written as X c (k), and is referred to as the complex Fourier series coefficient of x(t).
The original Dirichlet Condition (DC) requires the signal to be Boundedly Varied (BV) over a period for CFS to converge.The CFS exists if x(t) satisfies the Dirichlet conditions that includes the condition in Equation (3).
or to the weaker condition in Equation ( 4).
This condition is expressed as a signal with finite energy in one cycle.The sudden truncation of the Fourier series results in oscillations near the discontinuity.As the number of terms increases, the oscillation frequency increases while the amplitude decreases.However, the magnitude of the first ripple on either side of the discontinuity remains almost constant.This phenomenon was first discovered by A. Michelson and later explained mathematically by Gibbs.It occurs in all signal representations by a truncated number of orthogonal basis functions.An example is shown in Figure 2 in terms of the truncated Fourier series for the sawtooth wave function.

Real Fourier Series
Real Fourier series (RFS) coefficients are the real part of  ( ) , and can be written as in Equations ( 5)- (7).
where (±) is sometimes referred to as the  harmonic.The inverse real Fourier series is given by Equation (8).

Real Fourier Series
Real Fourier series (RFS) coefficients are the real part of X c(k) , and can be written as in Equations ( 5)- (7).
where X(±k) is sometimes referred to as the k th harmonic.The inverse real Fourier series is given by Equation (8).
x(t) sin(2πkF s t)dt (10) and Thus, X(k) equals X 1 (k) for k ≥ 0, and X 0 (|k|) for k < 0. The Fourier series for the triangular wave has only the coefficients X 1 (k).As shown in Figure 1b, the traffic flow is similar to the triangular wave.
Traffic Graph G: The traffic network is defined as an undirect graph G = (V, E, A), where V is a set of nodes, |V| = N, and N is the number of nodes, E is a set of edges e i,j and e i,j represent the edge between node i and node j, A ∈ R N×N denotes an original adjacent matrix of the network, and its calculation is given in Equation (12).
Time2Vec [31,32]: This model proposes a way of time embedding, and the learned embedding can be used for different architectures, as in Equation (13).
where t2v(τ)[i] is the i th element of t2v(τ), τ represents the time scalar for different periodicity, ω i and ϕ i are the frequency and the phase-shift of the sine function, respectively, and are learnable parameters.Spatial-Temporal Signal: The spatial graph signal is denoted as X G ∈ R F×N , where F is the number of characteristics for the traffic graph, X G is extended along the time dimension, and the spatial-temporal graph signal is X G ∈ R F×N×T , where T is the total number of time slices.The modeling in this work aims to discover the spatial and temporal patterns from massive traffic flow data.

Problem Statement
The traffic flow prediction problem can be described as follows.A mapping function is , where T h is the length of historical data and T f is the length of target data.

Fourier Graph Convolution Network
The robust Fourier Graph Convolution Network (F-GCN) is described in this section.As shown in Figure 3, the architecture of F-GCN includes three primary modules: a data construction module, a Fourier Embedding (FE) module, and a stackable Spatial-Temporal ChebyNet (STCN) layer.First, the data construction module is employed to construct the graph containing three periodic data types and Laplacian.Then, the FE module learns various periodicity embedding for the graph.The stackable STCN is further utilized to explore the dynamic temporal volatility, and the loss values between the prediction and ground truth are calculated for backpropagation learning.
As shown in Figure 3, the architecture of F-GCN includes three primary modules: a data construction module, a Fourier Embedding (FE) module, and a stackable Spatial-Temporal ChebyNet (STCN) layer.First, the data construction module is employed to construct the graph containing three periodic data types and Laplacian.Then, the FE module learns various periodicity embedding for the graph.The stackable STCN is further utilized to explore the dynamic temporal volatility, and the loss values between the prediction and ground truth are calculated for backpropagation learning.

Fourier Embedding
The traffic flow pattern generally presents various periodicities, making it highly complex to predict the traffic flow.To address this issue, traffic flow patterns are decom-

Fourier Embedding
The traffic flow pattern generally presents various periodicities, making it highly complex to predict the traffic flow.To address this issue, traffic flow patterns are decomposed into two key elements: periodicity and volatility.A vector-based time representation method such as Time2Vec [31,32] is employed in other frameworks.Further, an operator Emb(.), named Fourier Embedding (FE), is developed to capture various periodicities, which is shown in Equation (14).Known from the Fourier series theory [33], any periodic function can be obtained in this operator by superimposing multiple sine and cosine functions with different frequencies.Unlike traditionally decomposing methods, the proposed FE module is based on the embedding method, and could effectively represent various periodicities of traffic flow.The calculation is in Equation (14).
where the Emb(.) operator is used to represent nodes of the graph, W e ∈ R T×d is a learnable parameter, and d is the length of the vector embedding.X G comprises three periods of traffic flow, which help increase the adaptability for different periodicities.Meanwhile, it also provides more complex volatility possibilities, enabling downstream modules to explore volatility better.
As shown in Equation ( 11), the Fourier truncated series polynomials of order M can be presented in Equations ( 15) and (16).
where a 0 ∈ R, a n ∈ R T×T , b n ∈ R T×T , W 1 m ∈ R d×h , and W 2 m ∈ R d×h are learnable parameters, cos(.) and sin(.) are trigonometric functions, F X G ∈ R F×N×T×h is the result of periodicity embedding, W f ∈ R h×1 is a learnable parameter, and X G ∈ R F×N×T is the output of the FE module.
The residual structure [34] is employed to learn the dynamic temporal volatility.The advantage of this structure is that the volatility in the original graph can be introduced into the downstream learning module in Equation (17).
where X res G ∈ R F×N×T .

Spatial-Temporal ChebyNet Layer
A stackable Spatial-Temporal ChebyNet (STCN) layer is proposed to capture the dynamic temporal volatility.This layer includes two main components: a Fine-grained Volatility Module and Temporal Volatility Module.

A. Fine-grained Volatility Module
As shown in Figure 1d, the volatility of the traffic flow is generally irregular and complicated.A Fine-grained Volatility Module is proposed in this work to represent the fine-grained volatility features for capturing complex volatility.Specifically, convolution operations with various kernel sizes are employed to capture the fine-grained volatilities, shown in Equation (18).A gate mechanism is also introduced to automatically control the impact of high volatility on the downstream networks, as shown in Equation (19).Finally, the results of multiple gates are concatenated as shown in Equation (20).
where denotes convolution operator, W i represents multiple convolution kernels, b i ∈ R C i ×N×T are learnable parameters, c i is the number of channels, and the output is X C i ∈ R C i ×N×T .The sigmoid function σ(.) is utilized as a gate, and represents the Hadamard product; X g ∈ R C×N×T , and C = ∑ i c i .

B. Temporal Volatility Module
Another difficulty in analyzing traffic flow is capturing the dynamic temporal volatility from massive traffic data.Inspired by Transformer [35,36], which introduces the potential semantics of context for natural language translation, a Slice Attention mechanism is proposed to capture the dynamic temporal volatility.As shown in Figure 3.The ChebyNet is employed to merge the interrelation between several time slices of a traffic flow graph and learn the dynamic temporal volatility.In this work, self-attention is employed to capture the influence of the time slice itself.S i att is used to represent the attention to the time slice i, and the average value S att is applied to dynamically adjust the Laplacian matrix L. The matrix is generated in Equations ( 21)- (24).
where Further, the ChebyNet with the dynamic Laplacian matrix L is employed to learn the dynamic volatility.A K-order ChebyNet operator is calculated in Equation (25).
where * represents the graph convolution operation, θ i is a set of learnable parameters, Finally, a Temporal-convolution module with temporal-attention is proposed in this work to learn dynamic temporal volatility, which is processed in Equations ( 26)- (28).
X g = So f tmax(H)X g (27) where σ(.) is sigmoid function, X temp = b t + W t X g (28) where denotes convolution operator, W t is the temporal convolution kernel, b t are learnable parameters, and X temp ∈ R C×N×T .

Fusion & Loss Function
To enhance the model flexibility, the X temp is further aggregated with Equation ( 29), and MSE (Mean Square Error) is employed to evaluate as a loss function in Equation (30).Ŷ = linear X temp (29) where Ŷ ∈ R C×N×T , Equation ( 30) is the objective formulation for optimization, and m is the number of testing datasets.In summary, the proposed F-GCN is shown as a pseudo-code in Algorithm 1.

Results and the Discussion
This section evaluates the proposed method with baselines and state-of-the-art methods on two real-world traffic datasets.

Data Description
This work employs two real traffic flow datasets, PeMSD4 and PeMSD8, to evaluate the performance of the proposed F-GCN model.The two datasets are a subset of the original data collected by PeMS, which are used by works such as ASTGCN [11] and DGCN [3].The original data were collected by the California Highway Performance Measurement System (PeMS) [37], containing 39,000 road sensors, and the data collection interval is 30 s.The data are reaggregated into 5-min intervals for traffic prediction.The PeMSD4 dataset is the traffic flow data collected by 307 detection stations deployed on 29 roads in San Francisco Bay area, with a period between January and February, 2018.Similarly, the PeMSD8 dataset was collected by 170 detection stations on eight highways in the San Bernardino area from July to August 2016.

Evaluation Metrics
The performance of the proposed method is evaluated with three indicators: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE).The three metrics are listed in Equations ( 31)- (33).

Experimental Settings
The primary processing components of the test platform are the Intel (R) Core (TM) i7-10700 CPU @ 2.90 GHz processor and the NVIDIA GeForce GTX 3080.The deep learning framework adopted in this study is Pytorch 1.9.0.The grid search methodology is used to make the proposed model more efficient.The time slices are generated with week-period, day-period, and recent-period.As super-parameters in the method, these periods are set as 2 weeks, 1 day, and 2 h.The three kinds of slices shown in Figure 4 are 36, 24, and 24, respectively.In the training phase, the Adam optimizer with a decay rate of 0.95 is utilized to optimize the Mean Square Error loss function (MSE).In this work, all experiments were conducted for 40 epochs with a batch size of 16, and the learning rate was 0.0005.The order of the polynomial Chebyshev was set to 3.

Baselines and State-of-the-Art Methods
The following different models are introduced for the purpose of performance comparison, including baseline and state-of-the-art methods.
HA: The average value of the historical traffic flow is used as a baseline for estimating traffic flow in the future time range.
ARIMA: Autoregressive Integrated Moving Average Model is utilized as a baseline of the typical statistical method in this work.This model is generally used to capture linear characteristics.
GRU: The Gated Recurrent Unit network is generally employed to learn time characteristics for traffic flow prediction for its long-term memory.
STGCN: Spatio-Temporal Graph Convolutional Network employs a first-order approximate Chebyshev graph convolution network and a 2D convolution operator to capture spatial and temporal information.
ASTGCN: Attention-Based Spatial-Temporal Graph Convolutional Network integrates a spatial-temporal attention module and the graph convolution neural network module to capture the traffic flow patterns.
STSGCN: Space-Time-Separable Graph Convolutional Network builds a local spatialtemporal mapping module to capture localized information.Then, it captures more global temporal information along the time dimension.
AGCRN: Adaptive Graph Convolutional Recurrent Network involveds two adaptive modules to learn the pattern of nodes and the inter-dependencies between different traffic sequences.

Experiment Results
Two real-world datasets were used to evaluate the performance of F-GCN.Table 1 shows that the proposed F-GCN outperforms baselines and state-of-the-art methods.The typical time-series analytical methods, including HA and ARIMA, offer poor prediction performance because these can only learn linear characteristics.GRU performs better than traditional time-series analysis methods because it can capture complicated nonlinear features as a primary deep learning method.Still, it only considers the temporal characteristics of the road network.Compared with state-of-the-art methods, including STGCN, ASTGCN, STSGCN, and AGCRN, the proposed F-GCN significantly improves performance because it learns various periodicities and dynamic temporal volatility from the traffic flow.The experimental results show that the MAPEs of F-GCN were 13.390 and 13.166 on the PeMSD4 datasets for 30 min and 45 min traffic flow prediction, respectively.Although these two indicators were slightly better than AGCRN, the other indicators were all significantly improved.Therefore, the results indicate that the F-GCN outperforms state-of-the-art methods on two datasets for 15, 30, 45, and 60 min traffic flow prediction.

Performance of FE and STCN Modules
We conducted a comparative analysis to evaluate the effectiveness of the FE and STCN modules.As shown in Figure 5a-c, the black curve is the proposed F-GCN testing on the PeMSD8 datasets without the FE module (M = 0).Then, the experiment increases the order of the FE module and evaluates the performance for short-term and long-term predictions.The results indicate that the FE module model had a significantly improved effect on prediction accuracy.Specifically, when M = 3, the performance in terms of RMSE and MAPE was the best.
the order of the FE module and evaluates the performance for short-term and long-term predictions.The results indicate that the FE module model had a significantly improved effect on prediction accuracy.Specifically, when M = 3, the performance in terms of RMSE and MAPE was the best.
As shown in Figure 5d-f, the experiments on the PeMSD4 show the best performance in terms of MSE and MAPE while M = 1.Specifically, in term of RMSE, the setting with M = 3 tends to have the best performance when the forecast time  is less than 20 min, and the setting with M=1 has the best performance when the prediction time is greater than 20 min.Generally, M is set as one for different requirements in various application scenes.In this work, the parameter was set with M = 1 on the FE module for this dataset.As shown in Figure 5d-f, the experiments on the PeMSD4 show the best performance in terms of MSE and MAPE while M = 1.Specifically, in term of RMSE, the setting with M = 3 tends to have the best performance when the forecast time T f is less than 20 min, and the setting with M = 1 has the best performance when the prediction time is greater than 20 min.Generally, M is set as one for different requirements in various application scenes.In this work, the parameter was set with M = 1 on the FE module for this dataset.
To further illustrate periodicity learning, the learning capacity of the PE module in F-GCN was evaluated on these two datasets.As shown in Figure 1, the original traffic flow fluctuates with erratic volatility, which makes it hard to extract the periodicity from them.The PE module in this work is proposed to learn this periodicity from massive traffic flow data.After the original traffic flow data were inputted into the FE module, the results were visualized as heatmaps, as shown in Figure 6, in which time ranges of 84 prediction data are displayed on PeMSD8 and PeMSD4.
visualized as heatmaps, as shown in Figure 6, in which time ranges of 84 prediction data are displayed on PeMSD8 and PeMSD4.
As shown in Figure 6a,b from the PeMSD8 and Figure 6c,d from the PeMSD4, the output of the FE module in F-GCN presents regular periodicity.According to the color of the heat map, the traffic flow of several roads in red box 1 in Figure 6a has the same pattern because the colors are similar, but there is a significant difference between red box 1 and the red box 2, indicating that the FE module can capture differential characteristics of different roads.In black box 3, the periodicity of different roads is different in periods 36-48 and 48-60, indicating that the FE module can capture the various periodicities of the roads.The periodic fluctuation proves the effectiveness of the FE module.Then, these periodicity embeddings are transmitted to the downstream network for volatility learning.Besides periodicity, F-GCN aims to capture volatility for traffic flow prediction from massive data.The predicted results were checked against actual traffic flow to analyze the prediction performance for volatility.As shown in Figure 6a,b from the PeMSD8 and Figure 6c,d from the PeMSD4, the output of the FE module in F-GCN presents regular periodicity.According to the color of the heat map, the traffic flow of several roads in red box 1 in Figure 6a has the same pattern because the colors are similar, but there is a significant difference between red box 1 and the red box 2, indicating that the FE module can capture differential characteristics of different roads.In black box 3, the periodicity of different roads is different in periods 36-48 and 48-60, indicating that the FE module can capture the various periodicities of the roads.The periodic fluctuation proves the effectiveness of the FE module.Then, these periodicity embeddings are transmitted to the downstream network for volatility learning.
Besides periodicity, F-GCN aims to capture volatility for traffic flow prediction from massive data.The predicted results were checked against actual traffic flow to analyze the prediction performance for volatility.
The comparison between actual traffic flow and its prediction shows that the F-GCN model could effectively capture and predict the volatility of traffic flow data.The experiment selected two typical scenes of traffic flow and visualized the comparison as shown in Figure 7.The red circles in Figure 7a,b indicate that the volatility of traffic flow is sharp, and the green circles in Figure 7c,d show a relatively flat fluctuation.The prediction from F-GCN shows good fitting in these two typical situations, which indicates that the method efficiently learns and predicts dynamic temporal volatility from the massive traffic flow.
The comparison between actual traffic flow and its prediction shows that the F-GCN model could effectively capture and predict the volatility of traffic flow data.The experiment selected two typical scenes of traffic flow and visualized the comparison as shown in Figure 7.The red circles in Figure 7a,b indicate that the volatility of traffic flow is sharp, and the green circles in Figure 7c,d show a relatively flat fluctuation.The prediction from F-GCN shows good fitting in these two typical situations, which indicates that the method efficiently learns and predicts dynamic temporal volatility from the massive traffic flow.Finally, the efficiency and accuracy of F-GCN were further evaluated with different orders of the Chebyshev polynomial.It is noteworthy that the order of the Chebyshev polynomial is different from the order of Fourier series polynomials.As listed in Table 2, training time consumption grows as the polynomial order increases.However, the overall prediction accuracy of the model tended to be the best when the order of the Chebyshev polynomial equaled 3.  Finally, the efficiency and accuracy of F-GCN were further evaluated with different orders of the Chebyshev polynomial.It is noteworthy that the order of the Chebyshev polynomial is different from the order of Fourier series polynomials.As listed in Table 2, training time consumption grows as the polynomial order increases.However, the overall prediction accuracy of the model tended to be the best when the order of the Chebyshev polynomial equaled 3.

Conclusions
In this work, a Fourier Graph Convolution Network (F-GCN) model is proposed to improve traffic flow prediction, which consists of a Fourier Embedding (FE) module and a stackable Spatial-Temporal ChebyNet (STCN) layer.The FE module was developed to learn periodicities embedding, and the stackable STCN module was integrated to learn the dynamic temporal volatility from massive traffic flow data.Extensive experiments for 15, 30, 45, and 60 min traffic flow prediction were conducted on two actual datasets, and the results indicate that the proposed F-GCN outperformed state-of-the-art methods significantly.Furthermore, the FE and STCN modules could be integrated with other deeplearning models to improve time-series analysis and prediction accuracy.In the future, the The set of nodes in a graph.

E
The set of edges in a graph.

A
The adjacency matrix of the graph.

D
The degree matrix of A, D = ∑ N j=0 A i,j .

L
The Laplacian matrix L = D − A. e i,j The edge between node i and node j.X G ∈ R F×N×T The spatial-temporal graph by Data Construction.
The spatial-temporal graph in the case of d = 1.X G ∈ R F×N×T  The output of the FE module.
The output of the Fine-grained Volatility Module.X temp ∈ R C×N×T The output of the Temporal Volatility Module.

F
The number of original characteristics.

N
The nodes of the graph.

T
The number of time slices of the graph.d The length of the vector embedding.T h , T f The length of historical data and prediction data.

M
The order of the Fourier polynomial in the FE module.

Figure 1 .
Figure 1.The sample detector's traffic flow data [11] in one week.(a) Original data of three detectors.(b-d) Three components (trend, periodicity, volatility) that are decomposed from the data of a sample.Trend and periodicity are marked with red dotted lines respectively.

21 Figure 2 .
Figure 2. The Gibbs phenomenon in the truncated Fourier series representation of the sawtooth wave function. represents the number of harmonics.

Figure 2 .
Figure 2. The Gibbs phenomenon in the truncated Fourier series representation of the sawtooth wave function.N represents the number of harmonics.

Figure 3 .Figure 3 .
Figure 3. Framework of the proposed F-GCN.F-GCN contains the data construction module, Fourier Embedding module, a STCN (stackable Spatial-Temporal ChebyNet) layer including an FVM Figure 3. Framework of the proposed F-GCN.F-GCN contains the data construction module, Fourier Embedding module, a STCN (stackable Spatial-Temporal ChebyNet) layer including an FVM (Fine-grained Volatility Module) and a TVM (Temporal Volatility Module), and a loss function calculation module.3.2.1.Data Construction A flowchart of the data construction module is shown in Figure 4.As the system's input, the traffic flow data are utilized to generate the Laplacian matrix and output a feature vector.Specifically, the actual traffic flow data produce a graph G, in which nodes represent sensors on the road network and edges are connections among sensors.According to the G, Laplacian matrix L = D − A ∈ R N×N is calculated, where D = ∑ N j=0 A i,j ∈ R N×N represents the degree matrix, and A is the adjacent matrix.In this work, three periods of traffic network are considered, including week-period X T w G = X T wn G , . . ., X T w2 G , X T w1 G ∈ where [ ] denotes a concatenation operator, and |T w |+|T d |+|T r | = T. Mathematics 2023, 11, x FOR PEER REVIEW 9 of 21 (Fine-grained Volatility Module) and a TVM (Temporal Volatility Module), and a loss function calculation module.3.2.1.Data Construction A flowchart of the data construction module is shown in Figure 4.As the system's input, the traffic flow data are utilized to generate the Laplacian matrix and output a feature vector.Specifically, the actual traffic flow data produce a graph , in which nodes represent sensors on the road network and edges are connections among sensors.According to the  , Laplacian matrix  =  −  ∈ ℝ × is calculated, where D = ∑  , ∈ ℝ × represents the degree matrix, and  is the adjacent matrix.In this work, three periods of traffic network are considered, including week-period     =   , … ,   ,   ∈ ℝ × × , day-period     =   , … ,   ,   ∈ ℝ × × , and recent-period     =   , … ,   ,   ∈ ℝ × × .Finally, the three periods are concatenated into   =     ,     ,     ∈ ℝ × × , where [ ] denotes a concatenation operator, and | |+| | + | | = .

Figure 4 .
Figure 4. Flowchart of the data construction module.The road network is represented as a graph, and the Laplacian matrix is calculated for the graph.Instead of considering all the graphs in the time dimension, predicting a target graph is based on three types of periodic data: week-period, dayperiod, and recent-period.

Figure 4 .
Figure 4. Flowchart of the data construction module.The road network is represented as a graph, and the Laplacian matrix is calculated for the graph.Instead of considering all the graphs in the time dimension, predicting a target graph is based on three types of periodic data: week-period, day-period, and recent-period.
and V i ∈ R C×N×N represent Query, Key, and Value, respectively, d k represents the Key of dimension, and S i att ∈ R C×N×N .L = (2/λ max )L − I N ∈ R N×N , λ max represents the maximum eigenvalue of L; I N ∈ R N×N is an identity matrix, and L ∈ R C×N×N .

Figure 5 .
Figure 5. Performance of the Fourier Embedding module with a different order (M) on three indicators (MAE, RMSE, and MAPE).T f represents the time of prediction.(a-c) testing on the PeMSD8 datasets, and (d-f) testing on the PeMSD4 datasets.

Figure 6 .
Figure 6.Heatmap of the output from FE module on two datasets (a,b) from the PeMSD8 datasets, and (c,d) from the PeMSD4.① and ② represent different patterns for different roads, and ③ shows different patterns for all roads at different periods.

Figure 6 .
Figure 6.Heatmap of the output from FE module on two datasets (a,b) from the PeMSD8 datasets, and (c,d) from the PeMSD4. 1 and 2 represent different patterns for different roads, and 3 shows different patterns for all roads at different periods.

Figure 7 .
Figure 7.Comparison between actual traffic flow volatility (Ground True) and predicted results (F-GCN) of one-day data captured by detectors on four roads with node ID (a) 4 (b) 4 (c) 5, and (d) 9. Red and green circles represent steep and flat traffic flow fluctuations, respectively.

Figure 7 .
Figure 7.Comparison between actual traffic flow volatility (Ground True) and predicted results (F-GCN) of one-day data captured by detectors on four roads with node ID (a) 4 (b) 4 (c) 5, and (d) 9. Red and green circles represent steep and flat traffic flow fluctuations, respectively.
and b h ∈ R C×N×N are learnable parameters; H ∈ R C×N×N , and X g ∈ R C×N×T .

Table 1 .
Performance of the proposed model and baselines of three indicators (the best performance is highlighted in bold).

Table 2 .
Time and efficiency for different orders of Chebyshev polynomial.

Table 2 .
Time and efficiency for different orders of Chebyshev polynomial.