Expressway Speed Prediction Based on Electronic Toll Collection Data

: Expressway section speed can visually reﬂect the section operation condition, and accurate short time section speed prediction has a wide range of applications in path planning and trafﬁc guidance. However, existing expressway speed prediction data have defects, such as sparse density and incomplete object challenges. Thus, this paper proposes a framework for a combined expressway trafﬁc speed prediction model based on wavelet transform and spatial-temporal graph convolutional network (WSTGCN) of the Electronic Toll Collection (ETC) gantry transaction data. First, the framework pre-processes the ETC gantry transaction data to construct the section speeds. Then wavelet decomposition and single-branch reconstruction are performed on the section speed sequences, and the spatial features are captured by graph convolutional network (GCN) for each reconstructed single-branch sequence, and the temporal features are extracted by connecting the gated recurrent unit (GRU). The experiments use the ETC gantry transaction data of the expressway from Quanzhou to Xiamen. The results indicate that the WSTGCN model makes notable improvements compared to the model of the baseline for different prediction ranges.


Introduction
With its high capacity and low time cost, the expressway has become the preferred way to travel between cities [1]. Due to social and economic development, traditional traffic management techniques struggle to cope with the increasing traffic pressure, and there is an urgent need to develop Intelligent Transportation Systems (ITS) for expressways. With the accurate prediction of traffic information from ITS, travelers can develop reasonable travel routes before departure and improve the efficiency of travel, and road management departments can effectively conduct traffic guidance and alleviate traffic congestion and other problems based on reliable road traffic information [2]. In recent years, China's expressway ETC system has realized the networking of 29 provinces nationwide, built a total of 24,588 sets of ETC gantry systems, renovated 48,211 ETC lanes, and averaged nearly one billion ETC gantry transaction data per day [3], which has further improved the efficiency of expressways. The transaction data collected by the ETC gantry system can record the travel information of almost every vehicle on the expressway, and compared with detector data [4,5] and floating car data [6,7], the ETC gantry transaction data are more comprehensive and reliable, covering the expressway road network. Therefore, to further improve the service quality of the expressway ETC system, it is of great theoretical significance and practical value to research the traffic speed prediction based on ETC gantry transaction data [8].
• A data pre-processing method on ETC gantry transaction data is designed. The fusion of expressway network topology data, ETC and manual toll collection (MTC) transaction data constitutes spatio-temporal origin-destination (OD) data. Anomaly cleaning, missing repair and vehicle travel time statistics are performed on OD data, and a vehicle travel time outlier detection algorithm is proposed to eliminate outlier samples. In this way, the speed of the expressway section is constructed.
• The proposed WSTGCN model consists of wavelet transform, GCN and GRU. It reduces the disturbance of section speed and also captures the spatio-temporal correlation section speed. • The proposed WSTGCN model is evaluated on the ETC gantry transaction data of the Quanzhou-Xiamen Expressway. The results show that the model has the best prediction of section speed compared with the baseline method. Furthermore, the accuracy is still higher than that of the baseline prediction model in different prediction ranges.
The rest of the paper is organized as follows. The concepts related to expressways are introduced and problem description in detail in the "Preliminary" section. The model construction for expressway speed prediction is described in detail in the "Methodology" section. In the "Experimental Results and Analysis" section, the WSTGCN model is evaluated using ETC gantry transaction data from Fujian Province, and finally we present the 'Conclusions" of our paper.

Related Concepts
Definition 1. Each ETC gantry of the expressway is called a Node, and two adjacent Nodes on the road compose an expressway section, which is referred to as QD = {Q, Distance}, Q = Node 1 , Node 2 , where Node 1 is the start of the section, Node 2 is the end of the section, and Distance is the actual distance of the section.

Definition 2.
Expressway road network, all QD within the research area of expressway form expressway road network, referred to as LW = {QD 1 , · · · , QD 2 }. Definition 3. Vehicle Trajectory, the sequence of nodes arranged in chronological order formed by a vehicle on the ETC gantry on the expressway is called Traj = {Node 1 , · · · , Node n }, where Node 1 is called the trajectory start point, and Node n is called the trajectory end point.
where t 1 represents the time when the vehicle passes through the starting point of the section, and t 2 represents the time when the vehicle passes through the end point of the section.

Definition 5.
The average speed of vehicles passing through the same section in a certain period of time is called section speed, and the calculation method is shown in Equation (2) where v i represents the average vehicle speed of the ith vehicle, i is the ith vehicle passing through a certain section within a certain period of time, and n is the nth vehicle passing through a certain section within a certain period of time.
Definition 6. The time difference between a vehicle passing through a certain section is called the vehicle travel time, and the calculation method is shown in Equation (3) where t 2 represents the time of passing a gantry after a certain section, and t 1 represents the time of passing a gantry before a certain section.

Problem Description
The expressway road network can be abstracted as a graph. Generally, the unweighted graph G = (ϕ, E) can be used to represent the topology of the expressway road network, where ϕ represents the set of all nodes on the expressway road network, ϕ = {ϕ 1 , ϕ 2 , · · · , ϕ N }, and N represents the number of Nodes. E represents the set of interconnected edges between Nodes, all the connection information between Nodes is in the adjacency matrix A ∈ R (N−1)×(N−1) , and there are only 0 and 1 elements in the adjacency matrix, where 0 indicates that there is no connection between Nodes, and 1 indicates that there is a connection between Nodes. The section speed can be regarded as the attribute feature of the expressway road network Node, which is represented by the feature matrix as X ∈ R (N−1)×P , where P is the number of attribute features of the Node, the length of the historical time series. X t ∈ R (N−1)×i represents the section speed of all sections when the time section is i.
Therefore, the problem of expressway speed prediction is to learn a mapping function F based on the feature matrix X of the section speed in the past under the topology graph G of the expressway road network to predict the section speed at T times in the future.
where t is the time interval, n is the length of the historical time series, and T is the length of the time series to be forecasted.

Overview of the Overall Framework
This paper is based on the WSTGCN model to predict expressway section speed, which is mainly divided into four modules: expressway road network spatio-temporal OD data construction module, OD data pre-processing module, spatio-temporal feature extraction module, and output module. Figure 1 shows the whole framework structure of expressway speed prediction. The expressway road network spatio-temporal OD data construction module consists of expressway road network topology data, ETC transaction data and MTC transaction data. According to the expressway road network topology data, ETC, MTC transaction data in the OBU Plate and Flag ID group iterations, and then Trade time will be sorted to form the spatio-temporal OD data set. The data pre-processing module includes a data interpolation module, vehicle travel time construction module, travel time abnormality detection module, and section speed generation module. After interpolating the missing data of some trajectories, the vehicle travel time of each section is constructed, and then the anomaly is detected using the vehicle travel time outlier detection algorithm, finally, the section speed data set is constructed. The spatio-temporal feature extraction module includes wavelet transform, GCN, and GRU. The multi-scale wavelet decomposition is applied to the section speed time series data, decomposed and reconstructed to obtain the section speed after single branch reconstruction, and then GCN is used to capture the spatial feature information of section speed, and GRU is used to capture the temporal feature information of section speed. The final output module, which outputs the numerical summation of the predicted values of each reconstructed single-branch series, obtains the overall speed prediction results considering the spatio-temporal characteristics.

Data Pre-Processing
Vehicles enter the expressway through the ETC channel and MTC channel of the expressway toll station, and the expressway ETC gantry system can record the driving information of vehicles entering from the ETC channel and MTC channel at the same time. Therefore, ETC gantry transaction data are more complete. ETC gantry transaction data includes ETC transaction data and MTC transaction data. According to the ETC gantry transaction data statistics of this experiment, the percentages of ETC transaction data and MTC transaction data are shown in Figure 2.

Raw Data Cleaning
In the process of ETC gantry transaction data collection, the following three main abnormal problems exist in the collected ETC gantry transaction data, due to the influence of factors beyond control such as equipment abnormalities, wireless crosstalk, and bad weather, as shown in Figure 3. (1) Data redundancy. Duplication between multiple sets of data. (2) Missing data. The problem of data not being collected effectively occurs. For example, fields such as date, time, and vehicle type are missing at the entrance and exit station. (3) Data errors. Data records that do not match the normal traffic rules, such as the date of the entrance station being later than the date of the exit station, and the wrong entrance and exit numbers, which cannot correspond to the actual toll stations. These abnormal data greatly reduce the value of ETC big data-mining applications. To reduce the impact of erroneous data on the accuracy of the established prediction model and increase the reliability of prediction, such data will be removed.

Vehicle Travel Time Construction
After the abnormal data of ETC gantry transaction data are eliminated, the travel trajectory of each vehicle is constructed by the time sequence. Using the ETC gantry topology data of the expressway road network, the ETC gantry search is performed for each vehicle's travel trajectory, traversing two adjacent ETC gantries in the vehicle travel trajectory, checking the two adjacent gantry topology relationships and whether they exist in the ETC gantry topology data of the expressway road network. If it exists, the travel time of the vehicle through the section is calculated directly. If it does not exist, road section is searched with these two gantries, the driving trajectory of the vehicle is interpolated, the average speed of the vehicle through the road section according to the search result is calculated, and this average speed is taken as the average speed of all the sections between through these two gantries, through the distance of the two adjacent gantries, the travel time of this section can be derived. The specific construction method is shown in Algorithm 1.

Vehicle Travel Time Outlier Detection Algorithm
After constructing the vehicle travel time data, the data that objectively exist are reasonable. However, they contain some of the ETC gantry transaction data of abnormal driving behavior, for example, if a vehicle's travel time is too long or too short compared with the normal situation of similar models. Therefore, a vehicle travel time outlier detection algorithm is constructed to further reject such data. This algorithm is a combination of the outlier information detection algorithm in the literature [22] and the outlier elimination algorithm in the literature [23]. In the expressway ETC gantry transaction data, there are only a very few cases where the vehicle travel time is shorter than the normal value, and most of the outliers are long vehicle travel times, resulting in an asymmetric error interval. If only the outlier information detection algorithm is used, the 75% quantile value is relatively high, while the 25% quantile value is closer to the sample mean, which will not be able to eliminate the data where the vehicle travel time is much lower than the sample mean. Therefore, combining the two methods can solve this problem well. The basic idea of the vehicle travel time outlier detection algorithm is to use both upper and lower limits of the box line diagram and the centroid threshold of the statistical distribution of distance data for outlier detection, to determine the threshold interval for abnormal travel time data filtering, and to eliminate the data outside this threshold, and then to quickly filter out abnormal data in the massive ETC gantry transaction data, as shown in Figure 4.
t 25% means the time greater than 25% of the vehicle travel time, t 75% means the time greater than 75% of the vehicle travel time, is the mean value of vehicle travel time, and σ is the standard deviation of vehicle travel time. The final vehicle travel time is valid interval ∆t ∈ [t down , t up ]. If a vehicle passes through a section in a certain time period with a vehicle travel time within ∆t, the average vehicle speed of the vehicle passing through the section is directly generated, and the section speed is generated with a statistical window of 15 min. Upper and lower limits are inside the normal distribution (c) Lower limit is outside the normal distribution and upper limit is inside the normal distribution (d) Lower limit is inside the normal distribution and upper limit is outside the normal distribution.

Spatio-Temporal Feature Extraction
Based on the GCN-GRU model, wavelet transform is used to capture the spatiotemporal trend of expressway traffic speed by decomposing and reconstructing the ex-pressway traffic speed. The structure of the prediction model is shown in Figure 5, which contains three parts: (a) wavelet transform (b) GCN (c) GRU.

Wavelet Transform
The expressway ETC gantry transaction data generates a lot of noise due to its periodic volatility, and data containing noise is fatal for speed prediction. In real traffic analysis, it is known that real speed signals are usually low-frequency speed signals or relatively stable speed signals, while noisy signals are more high-frequency speed signals [24]. Therefore, with the help of the theory related to wavelet transform, the calculated section speed signals are filtered out of the noise signals to obtain relatively accurate section speed data. To separate the low-frequency part and the high-frequency part of the original signal, Mallet et al. proposed a multiscale decomposition and reconstruction algorithm for the repair signal, the principle of which is shown in Figure 6.
In the formula, t is the time series number of the time series data, t = 1, 2, · · · n, f (t) is the original signal, j is the number of layers of decomposition. H,G are wavelet decomposition filters in the time domain, A j is the wavelet coefficient of the low-frequency part of the signal f (t) in the jth layer, and D j is the wavelet coefficient of the high-frequency part of the signal f (t) in the jth layer. The decomposed signal can be reconstructed using Equation (9).
In the expressway speed prediction, the original section speed data consists of a set of non-smooth time series data. In many wavelet transform functions, the sym wavelet function is a linear phase, approximately symmetric and double orthogonal function. The smoothness is better, the calculation is simpler, and it has achieved good results in related research [25,26]. The sym5 is one of the commonly used wavelets in the sym wavelet group. Therefore, in this paper, the sym5 wavelet is chosen as the basis function. The number of decomposition layers cannot be too large or too small. If the number of decomposition layers is too large, it will reduce the variation pattern and trend of the section speed series. If the number of decomposition layers is too small, the signals with different frequency characteristics in the original section speed signal cannot be separated effectively. According to the existing research on wavelet transform for noise reduction of time series data [27], the number of decomposition layers is set to 3. The high-frequency part of the decomposed signal in each layer is processed using a threshold function. Finally, the low-frequency speed signal of the last layer after decomposition is reconstructed with the high-frequency speed signal after the threshold in each layer to obtain the noise-reduced section speed data. The decomposition results are shown in Figure 7, and the specific method is described in Algorithm 2. 2: Select the sym5 wavelet as the basis function; 3: j = 3 //the number of decomposition layers is specified as 3 layers; 4: for i to range(j); 5: ca ← ca i + · · · + ca j //store the trend signal; 6: cd ← cd i + · · · + cd j //store the noise signal; 7: For i to ca; 8: reca ← ca 1 + · · · + ca i //store the reconstructed trend signal; 9: For i to cd; 10: recd ← cd 1 + · · · + cd i //store the reconstructed noise signal; 11: v ← reca //wrap the reconstructed trend signal into v ; 12: return v ;

Graph Convolutional Networks (GCN)
The ETC gantry of the expressway has different topological relationships in different sections, and the mutual influence between the ETC gantry with different topological relationships must be different. If the topological relationship between the ETC gantry can be fully extracted and used, the speed prediction will be more accurate. Ordinary CNN can only handle Euclidean spatial data with regular structure, and cannot handle irregular non-Euclidean spatial data. Therefore, the literature [28] proposed the GCN model to deal with non-Euclidean spatial data very well. The spatial distribution of the ETC gantry of the expressway is a non-Euclidean spatial structure, so the GCN model is used to model the spatial distribution of the ETC gantry of the expressway. We treat the data as signals on a spectrogram, and process the signal on the graph to capture meaningful patterns and features in space. The connection relationship and mutual influence of the graph are represented by the Laplacian matrix of the graph. The Laplace matrix of a graph is defined as: The regularized Laplacian matrix is: where I n ∈ R N×N is the identity matrix, and the degree matrix D ii = ∑ i A ij . Decompose L into eigenvalues to obtain L = UΛU T , Λ = diag([λ 1 , · · · , λ n ]) is a diagonal matrix composed of eigenvalues of L, U = {u 1 , · · · , u N } is an orthonormal matrix consisting of the standard orthonormal eigenvectors of L. For a signal input X ∈ R N , the Fourier transform in the figure isx = U T x, and its inverse Fourier transform is x = U Tx . The convolution operation of the convolution kernel g and the input signal x in the time domain can be converted into the frequency domain inner product form as: where g s (Λ) = U T g = diag(Θ), represents the Hadamard product, and U T g means mapping g to the frequency domain space based on U. Due to the high computational complexity of g Θ , the hierarchical linear model constraints [29] and Chebyshev polynomials [30] are used to approximate the calculation. This paper adopts the simplified first-order polynomial form of g * x: There existsD − 1 2ÃD − 1 2 = I n + D − 1 2 AD − 1 2 , whereÃ = I n + A andD = ∑ iÃij , so the output of layer l is: A represent an adjacency matrix, which is used to represent the connection relationship between expressway nodes. Each row in A represents a section, and each value in A represents the connection between sections.Ã = A + I in the matrix is to prevent the ETC gantry of the expressway from being unable to transmit its characteristic information when capturing the characteristic information (section speed) of the adjacent nodes. D is the degree matrix,D − 1 2ÃD − 1 2 is to prevent the gradient from exploding or disappearing when the gantry propagates feature information layer by layer, which makes it impossible to perform the next step of training. W l−1 is the weight in the GCN model, X l−1 represents the feature matrix of the section speed, each row represents a different section, each column represents the section speed of the same time interval, σ represents an activation function.
As shown in Figure 8, assuming that each node in the figure represents the expressway ETC gantry, the essence of the GCN model is actually to capture the linear combination of the adjacent node features of the gantry and its own single node features. Therefore, Node 1 can obtain the spatial characteristics of itself and surrounding nodes through the GCN model.

Gated Recurrent Unit (GRU)
Currently, RNN models in neural network models are commonly used to process sequence data. However, the traditional RNN has the disadvantages of gradient disappearance and gradient explosion in the training process [31]. To solve this problem, GRU and LSTM were proposed as variants of RNNs. However, compared with LSTM, the GRU model has the advantages of simple structure, fewer parameters, and a short training time. Therefore, the GRU model was selected to obtain temporal features from the section speed, and its structure is shown in Figure 9. There are two gate units in the hidden layer of GRU, the reset gate (r t ) and the update gate (z t ). z t indicates how much the state information of the previous moment is transferred to the current state, and r t indicates how much the state information of the previous moment is ignored. The calculation process of GRU is as follows: Equations (15) and (16) show how to set the update gate z t and reset gate r t . W z represents the weight of the reset gate r t , σ represents the activation function, x t represents the section speed at the current moment, and H t−1 represents the hidden state at the time Equation (17) indicates that the output of the reset gate at the current time is multiplied by the hidden state of the previous time, and then the candidate hidden state is calculated through the full connection layer of the activation function. Equation (18) represents the update gate z t that calculates the current time, the hidden state H t−1 at the previous time point, and the weighted average of the candidate hidden statesH t at the current time to calculate the most probable state H t .
In general, this paper uses the historical n time series data of section speed to obtain the section speed of t time by using the hidden state of t − 1 time and the current section speed as input through the GRU model. The model captures the section speed at the current moment, while still maintaining the changing trend of historical traffic information, and obtains the dynamic time change characteristics of the section speed.

Data Description and Pre-Processing
The experimental data are mainly divided into two types of data, one is the ETC gantry transaction data of Fuzhou South to Xiamen North Expressway in Fujian Province for 30 days from 1 to 30 June 2020 from Fujian Expressway Information Technology Co., Ltd. (2F, Building 1, No.27 Jinji Shan Road, Jinan District, Fuzhou City, China). which mainly contains transaction 103 dimensions of data, such as transaction identifier, trade time, gantry number, OBU plate, OBU status and user type, with a total of about 20.53 million data samples. The main attributes of the ETC gantry transaction data used in this paper are shown in Table 1. Second, according to the longitude and latitude coordinates of the ETC gantry, the section distance is crawled by Amap, and the topological relationship data of the expressway ETC gantry is generated, which includes the name of the ETC gantry in different sections and the actual section distance. Match the ETC transaction data with the topological data to construct the vehicle travel time. Due to the existence of factors such as vehicles entering the service area or vehicles breaking down, a certain amount of abnormal data will be generated. By taking 15 min as the statistical window for each section, the vehicle travel time outlier detection algorithm is used to detect outliers, eliminate abnormal data and retain correct data. The average speed of each vehicle passing through each section is calculated, and then the speed dataset of the expressway section is constructed with a 15-min interval, and the main attributes are shown in Table 2. In this paper, min-max normalization is used to map the data to the [0, 1] interval, and the normalization formula is shown in Equation (19): In Equation (19), z represents the normalized data, x represents the original data, x min is the minimum value of x, and x max is the maximum value of x. According to the above data pre-processing, the section speeds of 16 sections travel in both directions every 15 min from 00:00 to 24:00. Then, a time series of section speed was generated, with 96 data samples per day and a total of 2880 data samples in 30 days. The first 80% of the 30 days of section speed data were used as the training set and the remaining 20% as the test set. The section speeds were predicted for the next 15 min, 30 min, and 45 min.

Evaluation Indicators
A total of 5 metrics, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Accuracy, Coefficient of Determination (R 2 ) and Explained Variance Score (Var), were used in the experiments, and they were used to compare and evaluate the prediction results of the models. Where y i is the actual section speed,ŷ is the predicted section speed, y i = 1 N ∑ N i=1 y i , and N is the sample size.
RMSE and MAE are both measures of prediction error, with smaller values indicating better predictions and larger values indicating worse predictions. R 2 and Var calculates the correlation coefficient to measure the ability of the prediction result to represent the actual data. The larger the value, the better the prediction effect.

Parameter Design
In this experiment, we choose the sym5 wavelet, the number of decomposition layers is 3, the learning rate of model parameters is set to 0.001, the batch size is set to 64, and the training epoch is set to 3000. Because the prediction accuracy may be affected by different numbers of hidden units, we try different numbers of hidden units and compare the predictions to choose the best number of hidden units. The number of hidden units is selected from [16,32,64,128] and the change in prediction accuracy is analyzed. As shown in Figure 10, the horizontal axis indicates the number of hidden units and the vertical axis indicates the change in metrics. When the number of hidden units is 64, the RMSE and MAE are the smallest. As the number of hidden units increases, the prediction accuracy increases first and then decreases. The main reason for this is that when the number of hidden units exceeds a certain level, the complexity and computational difficulty of the model increases greatly, leading to a decrease in prediction accuracy. Therefore, in all experiments, the number of hidden units is set to 64. Furthermore, the Adam optimizer is selected during the training process, and it will be used to calculate and update the network parameters of the model training and output so that the parameters are close to or reach the optimal values.

Experimental Results and Analysis
The experiment uses 12 historical data samples to predict the future 15-min, 30-min, and 45-min section speeds. Figure 11 shows the results of the visualization of the predicted future 15-min section speed using the WSTGCN model for four sections selected from the 16 sections. Black in the figure indicates the predicted section speed, red indicates the real section speed, the values marked by RMSE, MAE indicate the overall evaluation index of the section, and the orange and green bars indicate the RMSE and MAE calculated every three hours. It can be seen that the trend of the predicted speed and the actual speed are similar, and the variation between RMSE and MAE is small, indicating that the model can accurately predict traffic speed. From Figure 11c,d, it can be seen that the prediction accuracy will be reduced in the section where the traffic speed changes are complicated. As seen in the position of the red rectangle, the model can capture similarly varying trends in the face of sudden speed changes. To verify the reliability of the WSTGCN model, six baseline methods are used for comparison. These eight baseline methods are HA, SVR, ARIMA, GCN, GRU, LSTM, Spatial-Temporal Dynamic Network(STDN), which consists of CNN and LSTM, and GCN-GRU.  Table 3 shows the results of the evaluation metrics of the different models for predicting the section speed for the next 15 min. It can be seen that the results in all five evaluation indicators of WSTGCN are better than the baseline method. Compared with the GCN-GRU model, the RMSE of WSTGCN is 20.28% lower than that of GCN-GRU, and the MAE of WSTGCN is 11.69% lower than that of GCN-GRU. Because the GCN-GRU model does not consider the volatility of the data, which leads to lower prediction accuracy, and also proves that it is feasible to use WSTGCN to improve the accuracy of the prediction.
Compared with STDN, WSTGCN has 21.24% lower RMSE and 12.72% lower MAE, which indicates that the realistic expressway road network structure, GCN can capture spatial correlation better than CNN. The RMSE of WSTGCN is reduced by 55.68% and MAE is reduced by 59.13% compared with GCN, which shows that good prediction results cannot be obtained by considering only the spatial characteristics of expressway nodes without considering the temporal characteristics of their characteristic attributes. The RMSE of WSTGCN is reduced by 20.74%, 21.10%, and MAE is reduced by 13.18%, 14.01% compared with GRU and LSTM, which shows that ignoring the correlation between nodes among expressways when performing section speed prediction is also not achieving good prediction results. Therefore, combining GCN with GRU, which takes into account the spatial characteristics of nodes as well as the time series characteristics of nodes, can better improve the prediction results. Second, the RMSE of GRU is 79.91% lower and MAE is 53.72% lower than that of GCN, which is because GCN only considers the spatial characteristics of expressway nodes and does not consider the temporal characteristics of node feature vectors, and also indicates that the future section speed is more dependent on the section speed of historical time series. Compared with GRU, the RMSE of HA, ARIMA, and SVR are about 46.87%, 54%, and 1.19% higher. Compared with WSTGCN, the RMSEs of HA, ARIMA, and SVR are higher by roughly 57.64%, 63.33%, and 21.22%, which is mainly caused by their poor nonlinear fitting ability to complex spatio-temporal data. Then, the prediction performance of WSTGCN and other models at different time intervals are further discussed, and different models are used to predict the future 30-min and 45-min section speeds, and their prediction performance is compared. Tables 4 and 5 compare the effects of different models on section speed prediction at different time intervals. For the 30-min versus 45-min speed prediction, the RMSE is reduced by 18.85% and 8.67% for WSTGCN compared with GCN-GRU, 22.34% and 19.21% for WSTGCN compared with GRU, and 45.66% and 33.19% for WSTGCN compared with GCN. It shows that the WSTGCN model is also able to capture the spatial and temporal characteristics of the section speed well in the case of long-term prediction.  Figure 12 shows the evaluation metrics of different models at 30 min and 45 min. We can see that WSTGCN still has lower RMSE and MAE compared to other models for the same length of time. As the prediction time increases, the accuracy of the models gradually decreases, and WSTGCN still has high accuracy at different lengths of time. Therefore, WSTGCN has good long-term prediction ability.

Conclusions
In this paper, an expressway speed prediction method based on wavelet transform and spatio-temporal graph convolutional network is proposed. First, the ETC gantry transaction data are matched with the topological data to construct the vehicle travel time data. Then the vehicle travel time outlier detection algorithm is used to eliminate the time anomalies of each section, and then the section speed data set is constructed. Finally, the section speed data and topological data are input into the WSTGCN model for training and learning, and are compared with various other models for analysis. The experimental results show that the prediction accuracy of the WSTGCN model in expressway speed prediction is significantly better than other methods, and it can accurately predict the section speed of the expressway. In addition, there are some shortcomings in this paper, i.e., other factors (e.g., weather conditions) that affect traffic speed are not considered. Next, introducing other data sources to improve more accurate section speed prediction is a work we plan to continue in the future.