Short-Term Traffic State Prediction Based on Mobile Edge Computing in V2X Communication

: Real-time and reliable short-term traffic state prediction is one of the most critical technologies in intelligent transportation systems (ITS). However, the traffic state is generally perceived by single sensor in existing studies, which is difficult to satisfy the requirement of real-time prediction in complex traffic networks. In this paper, a short-term traffic prediction model based on complex neural network is proposed under the environment of vehicle-to-everything (V2X) communication systems. Firstly, a traffic perception system of multi-source sensors based on V2X communication is proposed and designed. A mobile edge computing (MEC)-assisted architecture is then introduced in a V2X network to facilitate perceptual and computational abilities of the system. Moreover, the graph convolutional network (GCN), the gated recurrent unit (GRU), and the soft-attention mechanism are combined to extract spatiotemporal features of traffic state and integrate them for future prediction. Finally, an intelligent roadside test platform is demonstrated for perception and compu-tation of real-time traffic state. The comparison experiments show that the proposed method can significantly improve the prediction accuracy by comparing with the existing neural network models, which consider one of the spatiotemporal features. In particular, for comparison results of the traffic state prediction and the error value of root mean squared error (RMSE) is reduced by 39.53%, which is the greatest reduction in error occurrences by comparing with the GCN and GRU models in 5, 10, 15 and 30 minutes respectively.


Introduction
Due to considerable urbanization in recent years, the increasing number of vehicles in cities has led to various traffic problems such as traffic congestion, accidents and environmental degradation. According to the state-of-the-art, the main solutions to these problems lie in improving the traffic capacity of the road [1][2][3]. However, most studies concentrate on in-car advisory systems on lane, speed, and headway [4], strategies design on traffic networks [5], and different model managements [6]. Traditional single-component sensors are difficult to meet the predicted requirements of complex traffic networks. Therefore, more efficient and effective approaches still need to be explored to improve the traffic capacity. Along with the development of communication technologies, the mobile communication networks are expected to be established in people, vehicles, and roads for transforming and transferring real-time information accurately. Vehicle-to-everything (V2X), fusing on various vehicles and roads information, has been treated as the core technology for the next generation of intelligent transportation system (ITS) to perceive realtime traffic state on the roads [7], which faces huge real-time data processing problems in transmission quality. In order to solve the problems, in this paper, an ITS is proposed by

•
How to perceive and predict traffic state in real time. The traditional traffic sensors are difficult to meet the predicted requirements of short-term traffic state prediction, especially for the accuracy requests of intelligent connected vehicles (ICVs).

•
How to perceive and analyze the dynamic data of vehicles. The vehicles running on the road may face all kinds of sudden traffic incidents. However, traditional vehicular dynamic data strategies (such as floating vehicle) are difficult to meet the requests of accurate prediction in real time.

•
How to effectively analyze and filter spatiotemporal features of traffic state. The spatiotemporal features of the traffic state are high nonlinear correlations, which are still the focus problems of urban traffic research. For example, the variation of traffic flow at one intersection will affect the traffic state at the adjacent intersection, meanwhile, affect the future traffic state over time.
To solve the above problems, this paper proposes a traffic perceptual and computational system, which incorporates the roadside intelligent sensors, ICVs, and traffic signal controller at the intersection. It can accurately perceive not only each single vehicular state, but also the whole intersection traffic state. The main contributions of the paper can be summarized as follows: • Firstly, this paper proposes a traffic perceptual and computational system based on the MEC architecture, in which each edge server is responsible for managing the data upload of vehicles within its service scope. Moreover, the MEC server will predict traffic state based on the perceptual information. • Secondly, this paper applies on board unit (OBU) data of ICVs to predict traffic state by tracking vehicular driving state in real time, which can effectively improve the accuracy of prediction. • Thirdly, with the characteristics of the MEC, this paper designs a traffic prediction model to analyze and evaluate the traffic state at the intersection in V2X environment. GCN and GRU models are combined to analyze spatiotemporal features of traffic data in the model. Then, the soft-attention mechanism is utilized to integrate the various extracted features. The reminder of this paper is organized as follows. The traffic state perception system of an intersection scenario, which is based on MEC architecture in V2X communication, is described in Section 2, where a predicted model of short-term traffic state is then constructed and clarified. Section 3 presents experimental results and demonstrates the effectiveness of the proposed method. Finally, the conclusion is provided in Section 4.

Materials and Methods
The framework of the proposed method is described in Figure 1, which consists of a traffic perception system based on V2X, an edge computing module, and a traffic state prediction output module. Traffic states from various perception sources, such as roadside sensors, OBU, and traffic signal controller, can be collected by traffic perception system based on V2X. These states will then be delivered to a MEC server for further processing. Through graph constriction of intersection network, spatiotemporal feature analysis and soft-attention mechanism, the short-term traffic state at the intersection based on fusion results of V2X multi-source sensors can be accurately depicted and predicted. The involved modules are described in Sections 2.1-2.3.

Traffic State Perception
Traffic state is an unevenly distributed and complex random variable related to time variation. At the micro-level, the traffic state perception can be regarded as perceiving the behavior state of each vehicle, and at the macro-level, as the flow, speed, density, etc. in traffic scenarios. In this section, a traffic state perception system based on V2X communication will be introduced, and the perceptual information of traffic state will be analyzed.

Traffic Perception System Based on V2X Communication
At an urban intersection, the traffic state usually refers to the driving state of all vehicles, which has the characteristics of dynamic, periodicity, randomness, etc. With the development of V2X communication technologies, the traffic state is generally perceived by the fusion of multi-source traffic sensors, then the information can be fed back to the users of other subsystems such as traffic management system (TMS) [24]. Finally, the perceived traffic information can be used to solve traffic and ICVs problems for ITS.
To improve the traffic efficiency, a traffic perception system based on the V2X communication technology is designed by taking advantages of low delay, high reliability, and high security of the V2X in this paper. As shown in Figure 2, intelligent roadside infrastructures mainly include LiDAR sensors, high definition (HD) cameras, intelligent roadside units (RSUs), switches, traffic signal light controllers, intelligent OBUs with V2X communication functions, a shared base station, and a series of MEC servers. In the perception database at the intersection of this system, the perception data are distributed, dynamic, heterogeneous, and spatiotemporal. Average traffic flow and mean speed capacity are chosen to be predicted and evaluative indicators of traffic state, then the comprehensive analyses are shown in the following parts.

Traffic Perception Data
The purpose of our system is to construct a stereoscopic and accurate perception database of the intersection. By comparing our system with the traditional traffic perception methods, the performance of the constructed system will be observed that whether it can realize more complex information interaction between vehicles and roads, and perceive the driving status of a single vehicle and roadside traffic environment information.
In ITS and cooperative vehicle infrastructure systems (CVIS), the traffic management department monitor the real-time traffic state by a variety of advanced sensors. Therefore, fusing more than two traffic sensors of data can provide more efficient, reliable, and accurate results. The fusion information of the system is reflected in feature-level data fusion, which is from roadside sensors, intelligent OBUs, and traffic signal controllers. The types and sources of the traffic perception data in this system are shown in Table 1.

Data from Roadside Sensors
In the system, roadside sensors include LiDAR sensors and cameras, by which realtime states of each target within the range of the sensors at the intersection can be easily obtained. To prevent the impact of pedestrians and other traffic participants, vehicles at the intersection are selected as targets to predict the traffic state in this system.
Based on the data fusion of the point cloud and the image, the accurate information of each vehicle, including the license plate number, the latitude, the longitude, the speed, the horizontal distance, the heading angle, and so on, can be perceived. Moreover, LiDAR sensors and cameras can also perceive the traffic state of the intersection, including the average vehicle speed, the average traffic flow, the average queue length and the parking line location.

Data from OBU
OBU data consist of the real-time information of ICVs while driving at the intersection. Therefore, millions of observations and terabytes of data are generated every day [25]. In this system, the recorded information from OBU contains an anonymous identifier (ID), the timestamp, the GPS position (latitude, longitude), the speed, the acceleration, the license plate number, the wheel speed, the steering angle, and so on. Additionally, OBU data includes vehicular braking states.

Data from Traffic Signal Controller
Traffic signal control is a fundamental element in traffic guidance at urban signalized intersections [26,27]. The core of the integration between traffic signal control and traffic guidance is in temporal and spatial synchronization [28]. At the macro level, traffic signal control can actively guide the drivers to choose the path by combining the traffic guidance information to balance the traffic pressure. At the middle and micro level, the spatial variables, such as the number of lanes, lane functions, and traffic flow directions, are combined with the temporal variables to achieve more efficient traffic optimization, including green signal ratio, phase sequence and phase difference of signal light. In our system, the signal cycle, the signal phase, the traffic light color, the remaining time of green, etc., are involved in the traffic signal controller.

MEC Architecture
MEC architecture can extend the cloud computing services to the edge of networks according to the white paper proposed by European Telecommunications Standards Institute (ETSI). In our system, edge computing provides a service environment with high bandwidth and low latency in tasks offloading from mobile vehicles to MEC servers. The MEC architecture of this system includes three modules, which are cloud module, roadside module, and on-board module, as described in Figure 3. In this paper, roadside computing resources provided by the MEC architecture can meet the computing demands. As shown in Figure 3, the roadside module is the perceptual part of the system, where LiDAR sensors and HD cameras are wired to the MEC server through the switch, and the ICVs are connected with traffic signal controllers based on V2X communications provided by RSU. The cloud module can store the historical traffic data from roadside sensors and provide cloud computing services. The on-board module refers to ICVs with OBUs, which can upload their own messages to the MEC server by task offloading strategy. Specifically, in the MEC architecture, ICVs can receive application-oriented information and upload messages to MEC server via PC5 interface. The service schedule is described in Figure 4. In Figure 4, the MEC server is responsible for dealing with the uploaded data of vehicles in the local server, which includes data acquisition and traffic state prediction. The MEC server can be integrated into the strategy development of vehicular data acquisition. The process of data acquisition can be described as: • Firstly, the MEC server can decide whether the requests of data uploading can be received by listening to messages broadcast from vehicles. These messages include the required bandwidth of the vehicle and the section ID associated with the vehicle.

•
The MEC server will check if the ID of the corresponding section is selected. If the ID does not match, the MEC server rejects the request; otherwise, if the ID matches, the actions proceed to the next step.

•
The MEC server then checks whether the required bandwidth can be met. If not, the MEC server rejects the request; if yes, the MEC server allocates bandwidth to the vehicle.

•
Finally, the MEC server is ready to receive the uploaded vehicle data. If the data is uploaded successfully, the MEC server updates the allocated bandwidth of the corresponding road section. Otherwise, the allocated bandwidth will not be updated.
As for the traffic state prediction, the MEC server can fuse the actual traffic state based on the uploaded messages of vehicles and multi-source information of roadside sensors. The process can be depicted as: • Firstly, multi-source information can be fused by the MEC server. • Then, the MEC server processes the original vehicular data, including eliminating invalid data and sensing the traffic state information based on vehicular data.

•
Finally, by designing a model of short-term traffic state prediction, the MEC server can predict the traffic state of the intersection of the system. Considering the temporal and spatial features of traffic states, a short-term traffic state prediction model is designed. As shown in Figure 5, the framework is divided into five modules: input module, temporal feature module, spatial feature module, attention mechanism module, and prediction module.

Graph Construction of Intersection Network
In the network of urban intersection, vertexes can represent a series of traffic features, which usually include vehicular speed, acceleration, position, and other vehicular state information.
As shown in Figure 6, the network of intersection at different times can be described as In the graph G, each vertex is treated as a vehicle, and V is a set of vehicle vertexes, where N represents number of vertexes; E is a set of edges, where ij e E Î . W is the adjacent matrix, which represents the connection between vehicles, . Adjacent matrix is the basis for exploratory analysis of the spatiotemporal correlation of traffic flow at the intersection network. The adjacency matrix is described by a binary matrix [0, 1], where 0 stands for there being no link between two edges, and 1 denotes that there is a link. In Figure 6, each vertex in networks has actual attributes including speed, acceleration and other vehicular information, and can be expressed as Time series  (1) and (2).
where V is the mean speed of all vehicles at the intersection in time k, and max where s is the length of historical time series, T is the length of time series that need to be predicted.

Spatial Feature Extraction
An intersection is defined as the structure of the road network graph, and the GCN model is more sensitive to traffic spatial features. The GCN model can deduce its own vertex information by using the information of the surrounding vertexes and its own original information. Therefore, in the graph of intersection, both vertex information and structure of graph should be integrated for consideration. The GCN model can learn not only the characteristics of vertexes automatically, but also the correlation information between any of two vertexes.
The convolutional neural (CN) model is an operation which convolves on the graph by the Fourier transform. The advantage of the GCN model is to extract spatial features by the neighborhood information of vertexes. According to the corresponding changes of graph convolution in [30], the most important graph convolution can be obtained, as shown in Equation (4): where g is convolution kernel function, ℎ is graph signal on each vertex, and . u is not only the basis of Fourier transform, but also the eigenvector of Laplace matrix. In particular, the more complicated the traffic situation becomes, the higher the computational complexity when the Laplace matrix is decomposed.
According to [31], the function f can be used Chebyshev polynomials for k-order approximation. The recursive function of Chebyshev polynomial is expressed as where k is the order of Chebyshev expansion. On the spectral graph, the approximate solution is carried out by Chebyshev polynomials, and the effect is equivalent to calculate the characteristic of each vertex by extracting the information of the 0~(k − 1)-th order neighbors of each vertex in the graph. Therefore, the output of the (l + 1)-th layer σ  represents the sigmoid function for a nonlinear model.

Temporal Feature Extraction
The variation in traffic over time is usually nonlinear and unstable. LSTM model, as the most extensive network, is used to deal with time series-related problems. The GRU is proposed based on LSTM. Compared with LSTM, the GRU model has a simpler structure, smaller parameter values and faster training speeds. The GRU model can effectively deal with short-term information with gated mechanism for different task processes.
As shown in Figure 7, the structure of GRU contains an update gate zt, a rest gate rt, and a memory unit ht. The reset gate is used to control the degree of ignoring the status information of the previous moment. The update gate is used to control the state information of the previous moment, so that the useful information can be brought into the current state.  The states of update gate, rest gate, and memory unit are defined as follows: [ ] In summary, the GCN model is used to extract the topological structure of intersection for obtaining spatial features. The GRU model is used to extract the dynamic variation of traffic state in the network for obtaining temporal features.

Attention Mechanism Module
To make use of the spatiotemporal characteristics of the historic traffic state at urban intersection, the short-term prediction of traffic state is carried out through the redistribution of weight by attention mechanism. First, the attention coefficient of each time series needs to be calculated for the predicted target. Then these coefficients are used to evaluate the prediction results. Finally, when predicting the traffic state, the state values of vertexes with stronger correlation are calculated by the soft-attention mechanism.
The structure of the attention mechanism is shown in Figure 8, where the output of the last layer of GRU model is the input of the soft-attention mechanism. For different time series, the weight of each time series feature is calculated by normalization of the SoftMax function, which can be expressed as Equations (13) and (14): Finally, the traffic state prediction result of the whole network is calculated, as shown in Equation (15):

Experimental Results and Discussion
In this section, a constructed platform of the traffic state perception is demonstrated, based on which we will describe the intersection scenario selected for the experiment. The datasets collected by ourselves are introduced. Then the parameter settings and experimental conditions of the neural network model are also clarified. Finally, experimental results based on the proposed predicted model are analyzed and discussed. In addition, the comparisons between our method and other methods are evaluated.

Field Test and Data Analysis
In the field test, a typical intersection, located in Fushi Road, Shijingshan District, Beijing City, is selected as the experimental intersection. There are four lanes in one direction from east to west, and the rightmost lane is the dedicated right-turn lane, which is not controlled by traffic lights, as shown in Figure 9a. In addition, we design a mobile intelligent roadside perception and computing platform for multiple test scenarios. The intelligent roadside platform consists of the LiDAR sensor, HD camera, switch, intelligent RSU, GPS, MEC server and monitor screen, as shown in Figure 9b.
At the experimental platform, the resolution of the camera is 1080 P (1920 × 1080) with a sampling rate of 25 Hz. After testing, the resolution can detect and recognize targets about 200 m. The LiDAR sensors with 32 lines can detect surrounding environment about 300 m. The camera is responsible for detecting traffic signs and vehicles, while the LiDAR sensor is for exploring blind spots and long-distance targets in complex traffic environment. In the experiment, we intercepted test data in 30 min, which is collected by the roadside perception platform containing a series of vehicles, pedestrians, and buildings. After data preprocessing, about 22,000 pieces of vehicular data are obtained, which are shown in Tables 2 and 3.

Evaluation Index
To describe the performance of the GCN-GRU model, the following three indexes are proposed to evaluate the predicted results.
(1) Root mean squared error (RMSE) (3) Accuracy (Accuracy) where t y and ˆt y represent the real traffic state value and predicted traffic state value, respectively. RMSE and MAE can reflect the difference between the real value and the predicted value, and both two values fall into the interval [0, +∞). These evaluation values are negatively correlated with the prediction effect, that is, the smaller the value is, the better the prediction effect is. Accuracy is used to detect the precision of predicted results, that is, the larger the value is, the better the prediction effect is.

Parameter Settings
In the experiment, we normalize the traffic data to the interval [0, 1] and the purpose is to speed up model training. In addition, we treat 70% of the dataset as the training set and the remaining 30% as the test set. In this experiment, the learning rate of the model is 0.001, the size of batch is 32 and the size of epoch is 800. In addition, the model uses L2 regularization to prevent over-fitting. Adam optimizer with gradient descent is employed to train the model.
In the multi-source information fusion of the dataset based on V2X technology, we set the size of the hidden layer to 20, 40, 60, 80, 100, 120, and 140, respectively. Table 4 shows the comparison of prediction performance under different hidden neurons. Figure  10 shows the comparison of prediction accuracy under different hidden neurons in the dataset. It can be seen from the results that the over-fitting phenomenon occurs when the number of neural units in the hidden layer increases from 20 to 140, the prediction accuracy increases at first and then decreases. When the number of neurons in the hidden layer is 100, the Accuracy is 0.9544. Therefore, the number of hidden layer neurons set in the model is set to 80.

Performance of Prediction Model
Based on the perception dataset of the traffic state, the experiment predicts the average traffic flow and the mean speed capacity of urban intersections in a short term. The specific analyses are presented as follows. Figure 11a shows the performance of the training model with dividing the dataset of the average traffic flow by 7:3 for training and testing. Figure 11b shows the fitting results of the actual average traffic flow data and the predicted results, as well as the errors between the predicted values and the real observed values. It is worth noting that the average traffic flow of the intersection has spatiotemporal periodicity, and the model accurately predicts the trend at the next time step by learning the spatiotemporal correlation of data. The values of RMSE and MAE are 0.9823 and 0.8049, respectively, which prove that the prediction results can meet the requirements [32,33].  Figure 12a shows the performance of the training model with dividing the dataset of the mean speed capacity by 7:3 for training and testing. For the threshold is set up as 0.6, the time of the mean speed capacity exceeding the threshold can be observed obviously. The prediction results can be treated as a reference to evaluate the short-term traffic state of the intersection in real time. As shown in Figure 12b, the fitting results between the real observation valued and the predicted valued of the mean speed capacity meets the predicted requirements [34]. The values of RMSE and MAE are 0.1970 and 0.1585, respectively, which can prove that the prediction model is effective.
Through the prediction of the average traffic flow and mean speed capacity, the traffic operation state of intersections can be perceived in real time. At the same time, it can provide data support for traffic managers to conduct real-time traffic guidance at intersections and relieve the queuing pressure at intersections.
In order to validate effectiveness of the proposed model, we compare it with the other four predicted models, including LSTM, RNN, CNN, and back propagation (BP) neural network models. In the process of prediction, we choose the same parameters described in Section 3.3. Tables 5 and 6 show comparison results of the average traffic flow and mean speed capacity for 5, 10, 15, and 30 min, respectively. In addition, the experiments also compare the training time and prediction time between the GCN-GRU model and the other four models, as shown in Table 7.  Table 7 that the training time and prediction time of GCN-GRU model are longer than that with the RNN model. However, the prediction accuracy of GCN-GRU model is much higher than that in other models. We can infer that the proposed method is practical and significant in the objective of improving the predicted accuracy. To sum up, the prediction performances of GCN-GRU are better than that with other neural network prediction models in prediction accuracy.

Comparison Experiment Results
To verify whether the proposed model can effectively extract the spatiotemporal features from the dataset of traffic state, we compare the GCN-GRU model with GRU and GCN model in the experiment. As shown in Figures 13 and 14, the average traffic flow and the mean speed capacity at the time points 5, 10, 15, and 30-min are compared, respectively, and then the error performances are analyzed intensively. In the beginning 5 min, by comparing with the GCN model which only considers spatial features, the predictions for the average traffic flow, the RMSE and MAE show the superiority of our proposed GCN-GRU model with 33.35% and 34.4% reductions. In the same way, for the 10, 15, and 30-min prediction of the average traffic flow, the RMSE performed in our GCN-GRU model is reduced by 30.89%, 37.51% and 39.53%, respectively, and the MAE is reduced by 33.66%, 36.9% and 34.95%, respectively. Thus, it can be clearly observed that RMSE and MAE errors of the proposed model have been significantly reduced. We infer that our proposed GCN-GRU model can extract spatial features efficiently.
Similarly, by comparing the prediction results of our GCN-GRU model with the GRU model at the time points 5, 10, 15 and 30 min, the RMSE of the GCN-GRU model is reduced by 28.68%, 24.36%, 32.54% and 35.43%, and the MAE is reduced by 33.47%, 32.39%, 30.2%, and 29.5%, respectively. The prediction results indicate that the GCN-GRU model can also efficiently extract temporal features.
In Figure 14, the predicted performance of the GCN-GRU model is consistent with the average traffic flow in the mean speed capacity. The errors of RMSE and MAE are reduced to a certain extent, then remain relatively stable in (0.2, 0.3) and (0.1, 0.3), respectively. Therefore, the significant error reductions of the proposed GCN-GRU in EMSE and MAE reveal that fusing the spatiotemporal features is effective in accurate short-term traffic flow prediction.

Conclusions
To improve the traffic efficiency, we proposed a short-term traffic state prediction model based on the data acquisition strategy of MEC-assisted V2X network. The model combines the advantages of GCN and GRU soft-attention mechanism to analyze the spatiotemporal characteristics of traffic data. In addition, we design an intelligent roadside platform to verify the proposed model. The main conclusions are summarized as follows: (1) This paper fuses multi-source information between intelligent OBUs, roadside sensors and traffic signal controller to accurately perceive the traffic state based on the V2X communication. (2) The prediction model considers the spatiotemporal dependence of all vehicles at vertexes of the intersection network. The proposed model can effectively extract vertexes features from the intersection, which greatly improves the prediction accuracy of the model. Based on the data acquisition strategy of the MEC-assisted V2X network, the comparative experiment reveals the effectiveness of our proposed model. (3) This paper mainly analyzes the traffic operation state of the single intersection, which limits the usage of the proposed model in extended scenarios and may pose a challenge to the adaptability of the model. In further research, we will study the traffic state prediction problems with the regional intersections to explore the efficiency and effectiveness of data support and implementation scheme in CVIS, as well as the adaptability of the model in these scenarios.
Funding: This research was funded by the Beijing Natural Science Foundation (Grant number 4212034).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: All data and models used during the study appear in this article.