Tra ﬃ c Accident Prediction Based on Multivariable Grey Model

: Owing to frequent tra ﬃ c accidents and casualties nowadays, the ability to predict the number of tra ﬃ c accidents in a period is signiﬁcant for the transportation department to make decisions scientiﬁcally. However, owing to many variables a ﬀ ecting tra ﬃ c accidents in the road tra ﬃ c system, there are two critical challenges in tra ﬃ c accident prediction. The ﬁrst issue is how to evaluate the weight of each variable’s impact on the accident. The second issue is how to model the prediction process for multiple interrelated variables. Aiming to solve these two problems, we propose e ﬀ ective solutions to deal with tra ﬃ c accident prediction. Firstly, for the ﬁrst issue, we exploit the grey correlation analysis to measure the correlation of factors to accident occurrence. Then, for the second issue, we select the main factors by correlation analysis to establish a multivariable grey model—MGM(1,N) for prediction process modeling. Further, we explore the collinearity between variables and better optimize the predictive model. The experimental results show that our approach achieves best performance than four general-purpose comparative algorithms in tra ﬃ c accident prediction task.


Introduction
With the rapid development of the automobile industry, the number of cars has increased dramatically. Meanwhile, due to the increase in traffic volume, the ratio of vehicles to traffic roads has been seriously unbalanced, which leads to frequent traffic accidents and casualties [1]. Therefore, it is necessary for traffic management departments to take measures to guarantee people's travel safety and the security of goods transportation [2][3][4]. However, the road traffic system is normally defined as a complex system with multiple factors, which greatly increases the difficulty for managers to deal with traffic problems. This is especially so, because the occurrence of traffic accidents is characterized by uncertainty [5], which often results in poor management of preventing traffic accidents. Although the occurrence of traffic accidents is very accidental, there is a certain regularity and relative stability for a large number of traffic accidents that occur in a region over a long period of time [6]. Consequently, we can make predictions on the number of possible traffic accidents in the next year through appropriate predictive models in order to make reasonable decision for traffic safety. However, owing to the many variables affecting traffic accidents in the road traffic system, we have to deal with two challenging issues in predicting the number of traffic accidents.
How to evaluate the weight of each factor's impact on the accident? Obviously, various factors have different impacts on the occurrence of traffic accidents, so they should play different roles in accident prediction modeling. Meanwhile, in the traffic system, each factor is normally defined as the variable sequence which describes the development trend of corresponding factor chronologically. Thus, actually we need to measure the impact of each factor-variable sequence on the accident-variable sequence in order to determine appropriate factors to devise a reasonable model to solve accident prediction. How do we model the prediction process for multiple interrelated variables? The occurrence of traffic accidents is considered to be correlated with several variables, so a multivariable model is required for accident prediction. Meanwhile, in the traffic system, one variable is not only affected by other variables, but it also affects them. Intuitively, exploring the high correlation between variables in a traffic system is beneficial to reducing the interaction between variables, which is helpful in addressing the issue of collinearity between variables and further improving model prediction accuracy. Consequently, we need to develop an effective approach to solve accident prediction considering the influence of the interaction among variables on the predicting results.
Aiming to solve these problems, in this paper we propose effective solutions to deal with traffic accident prediction. Firstly, we exploit the grey correlation analysis to measure the correlation of factors to accident occurrence. Following this, we select the main factors by correlation analysis to establish a multivariable grey model for prediction theory modeling. Further, we explore the collinearity between variables and better optimize the predictive model.
In summary, we make the following contributions in this paper: We propose an effective solution to predict the number of traffic accidents by exploring multidimensional time series prediction strategies. Firstly, we use grey correlation analysis to measure the impact of the time series of each factor-variable on the time series of accident-variable. Following this, aiming to solve prediction, we select the main factors by correlation analysis to establish a multivariable grey model-MGM(1,N) which meets the nonlinearity and randomness of practical traffic system.
We present a general method to reduce the influence of the interaction among factor-variables on the predicting results in traffic system. Specifically, we build various MGM(1,N) models according to different factor-variables and further find the optimal model from candidates by measuring the collinearity between variables. The optimal MGM(1,N) model consists of the most influential factor-variables which can best reflect the change of the traffic system.
We conduct experiments on a real traffic statistics dataset for 13 years from China Statistical Yearbook. Compared with four general-purpose algorithms for accident prediction, the experimental results show that our approach achieves best performance than its competitors. In addition, the advantage of our proposal confirms the multivariable grey model has higher prediction accuracy than univariate predictive models in the task of accident prediction.

Related Work
Traffic accident prediction is very important for traffic management and decision-making, which has attracted more and more attention in recent years. At present, related work can be roughly divided into the following two aspects: traffic accident prediction based on machine learning and traffic accident prediction based on grey system. Aimed at the issue of traffic accident prediction, the pioneer work [7] investigated the support vector machine model for predicting the number of road traffic accidents and evaluated the outperformance of their work by comparing with various models in terms of prediction accuracy. In order to achieve an acceptable level of accuracy for traffic accident risk prediction, the FP-tree method was proposed to select most important explanatory variables by Lin's work [8]. In addition, the logistic regression model [9] was also used to improve prediction accuracy while the random forest model was exploited to verify the rationality of the predictive result [10]. Moreover, to analyze the impact of rainfall on urban traffic accidents, Jaroszweski et al. [11] leveraged the weather radar model to rainfall quantification and matched-pairs analysis of traffic accidents and demonstrate divergence by two approaches. Meanwhile, in Theofilatos's work [12], the idea of finite mixture logit models combined with Bayesian model was used to investigate accident likelihood and severity on urban arterials. Additionally, Ren et al. [13] analyzed the spatial and temporal patterns of traffic accident frequency and proposed a deep learning approach to predict the risk of citywide traffic accident. Considering the spatial heterogeneity challenge in the data, the special deep learning method Hetero-ConvLSTM framework was proposed in work [14], which made reasonably accurate prediction results.
As the extension of basic grey model-GM(1,1), the multivariable MGM(1,N) model was proposed to integrate multi-dimensions by Zhai [15], which can uniformly describe the interactional relationship of multi-variable system [16]. Specifically, it has been successfully applied in many aspects of engineering, such as ship motion [17], social economy [18], and wave prediction [19] and so on. Regarding for the studies on traffic accident prediction based on grey system, the traditional GM(1,1) model was considered to be reflecting the law of dynamic evolution in traffic accident and was applied into predictive analysis for different regions of China [20,21]. In addition, based on the GM(1,1) model, the improved and combined grey models were proposed in references [22] to predict traffic accidents and got more effective results than traditional algorithms. Additionally, in René S's work [23], in order to predict the development of road traffic accidents, the grey system theory was applied to predict traffic accident in Germany.
To our knowledge, so far there are very few works towards traffic accident prediction considering multidimensional time series of influential factors in traffic system. Most previous works only discussed the characteristics of one variable, namely the number of traffic accidents, and did not take the multi-factor effects of the traffic system into consideration. In this paper, we propose a new approach for traffic accident prediction. The unique features of our proposal lie in two designs. First, we exploit the grey correlation analysis to measure the correlation of factors to accident occurrence. Second, we explore the collinearity between variables and establish a multivariable grey model MGM(1,N) for accident prediction.

Methodology
In order to predict the number of traffic accidents, we present an effective predictive algorithm based on multivariable grey model MGM(1,N). Figure 1 shows the framework of our predictive system for traffic accidents.
Information 2019, 10, x FOR PEER REVIEW 3 of 12 urban arterials. Additionally, Ren et al. [13] analyzed the spatial and temporal patterns of traffic accident frequency and proposed a deep learning approach to predict the risk of citywide traffic accident. Considering the spatial heterogeneity challenge in the data, the special deep learning method Hetero-ConvLSTM framework was proposed in work [14], which made reasonably accurate prediction results.
As the extension of basic grey model-GM(1,1), the multivariable MGM(1,N) model was proposed to integrate multi-dimensions by Zhai [15], which can uniformly describe the interactional relationship of multi-variable system [16]. Specifically, it has been successfully applied in many aspects of engineering, such as ship motion [17], social economy [18], and wave prediction [19] and so on. Regarding for the studies on traffic accident prediction based on grey system, the traditional GM(1,1) model was considered to be reflecting the law of dynamic evolution in traffic accident and was applied into predictive analysis for different regions of China [20,21]. In addition, based on the GM(1,1) model, the improved and combined grey models were proposed in references [22] to predict traffic accidents and got more effective results than traditional algorithms. Additionally, in René S's work [23], in order to predict the development of road traffic accidents, the grey system theory was applied to predict traffic accident in Germany.
To our knowledge, so far there are very few works towards traffic accident prediction considering multidimensional time series of influential factors in traffic system. Most previous works only discussed the characteristics of one variable, namely the number of traffic accidents, and did not take the multi-factor effects of the traffic system into consideration. In this paper, we propose a new approach for traffic accident prediction. The unique features of our proposal lie in two designs. First, we exploit the grey correlation analysis to measure the correlation of factors to accident occurrence. Second, we explore the collinearity between variables and establish a multivariable grey model MGM(1,N) for accident prediction.

Methodology
In order to predict the number of traffic accidents, we present an effective predictive algorithm based on multivariable grey model MGM(1,N). Figure 1 shows the framework of our predictive system for traffic accidents.

Correlation Analysis for Accident Impact Factors
Traffic accidents are often affected by multiple factors which have complex relationships among them. Meanwhile, various factors have different influences on the occurrence of accidents. Thus, we aim to measure the impact of each factor variable on the accident variable in order to determine appropriate factors to devise a reasonable model to solve accident prediction. However, in the traffic

Correlation Analysis for Accident Impact Factors
Traffic accidents are often affected by multiple factors which have complex relationships among them. Meanwhile, various factors have different influences on the occurrence of accidents. Thus, we aim to measure the impact of each factor variable on the accident variable in order to determine appropriate factors to devise a reasonable model to solve accident prediction. However, in the traffic system, the impact factor for traffic accidents is normally defined as the variable sequence which describes the development trend of corresponding factor chronologically. More specifically, variable sequences represent the time series of various variables we want to simulate and predict in the task of traffic accident prediction. Thus, in the paper, the grey correlation theory is employed to estimate whether the connection among different sequences is compact according to the similarity of sequence curve's geometrical shape and the correlation is measured by calculating the degree of closeness between the accident variable and its' impact factors.
Suppose that the i-th variable sequence with n time points in the traffic system is denoted by Formula (1): Further, there are m variable sequences in the traffic system, which includes the factor-variable sequences which describe the development trend of corresponding factors chronologically and the accident-variable sequence which represents the time series of the accident variable. Here the sequence in the traffic system is defined by Formula (2).
Then the grey correlation degree between two sequences is represented by Formula (3): where ξ i (k) is used to calculate the grey correlation coefficient at point k, which is similar to other correlation coefficient, such as the Pearson correlation coefficient. Furthermore, ρ ∈ [0, 1] refers to the resolution coefficient. Additionally, in this formula, min respectively represent the two-stage minimum difference and the two-stage maximum difference. Based on Formula (3), the relevance of the sequencex 0 to the reference sequencex i is represent by Formula (4): where r i is the correlation degree between the factor-variable sequence and the accident-variable sequence.

Multivariable Grey Model MGM(1,N) for Accident Prediction
In the paper, we first consider the issue of accident prediction as a multiple time series prediction problem in order to describe the multivariable characteristics of the traffic systems. Secondly, although the occurrence of traffic accidents has certain contingency, the number of accidents is closely related to its past and present situation because there is a certain regularity and relative stability for a large number of traffic accidents that occur in a region over a long period of time. Namely, the number of traffic accidents can be predicted through the comprehensive analysis of its present situation and history. Based on these two hypothesis, we utilize the multivariable grey model MGM (1,N), greatly satisfying the multivariable stochastic process for the traffic system, to deal with accident prediction.
In the transportation system, one variable is not only affected by other variables, but also affecting other variables. It objectively reflects the laws that systems are interrelated and mutually influential. The MGM(1,N) model aims to uniformly describe the variables from the perspective of the system and to construct a traffic accident impact system, which can reflect the regular pattern. The specific modeling process is as follows: Specifically, if there are n time series in a traffic system defined as x (0) i (k) i , (i = 1, 2, · · · , n), the accumulation sequence is represented by Formula (5).
The multivariable grey model MGM(1,N) simulates prediction by establishing n first-order differential equations as shown in Formula (6).
In order to solve MGM(1,N) model, the corresponding variables and parameters are calculated by Formulas (7) and (8), respectively.
Based on the initializations of model variables and parameters represented by Formula (7) and (8), the n first-order differential equations, as shown in Formula (6), can be described by Formula (9).
Further then, we revise Formula (9) into a continuous time response function defined by Formula (10) and further discretized by Formula (11).
Additionally, in the paper, the least square method, defined by Formula (12), is utilized to solve the n first-order differential equations represented by Formula (9).
Finally, based on the differential equations results represented by Formula (13) and the recognized parameter matricesÂ andB defined by Formula (14), we solve the MGM(1,N) model for accident prediction, as shown in Formula (15).

Variables Selection for Model Optimization
Although correlation analysis for accident impact factors is able to be used for measuring the impact of each factor variable on the accident variable, the correlation between factor variables in traffic system is not considered in the task. However, the model prediction accuracy is greatly affected by the correlation between factor variables in traditional regression prediction task. Thus, in the paper we consider the issue of variables selection for model optimization inspired by the regression task. Algorithm 1 shows the details of the variables selection algorithm. Practically, there are three steps to fulfill variables selection in model optimization task. Firstly, we rank the grey correlation of each impact factor as discussed in 3.1 (line 2); secondly, we select the factors whose grey correlation coefficient is larger than the threshold from all the impact factors and establish different MGM(1,N) models with these factors respectively (lines 3 to 7); and then, we use the metric MAPE to measure the effectiveness of each model on the validation set in order to find the optimal solution quickly (lines 8 to 13).

Algorithm 1. Variables Selection for Model Optimization.
Input: W: The sequence of grey correlation degree of impact factors. Output: S: The set of selected impact factors which optimize the performance of the prediction model. Preliminary: W is represented by [w 1 , . . . , w n ] ∈ (0, 1), which w i is the grey correlation degree of the ith impact factor. 1. begin 2. Rank the sequence W in descending order; 3. Set a threshold θ 0 ; 4. for each w i in W do 5.

Model Evaluation
Aiming to compare proposed method with comparative methods, we perform model evaluations for prediction task. Specifically, we measure the accuracy of our model in terms of different metrics, namely MAPE, MAE and RMSE. The specific evaluation indexes are defined as follows: where x i (0) (k) andx i (0) (k) respectively represent the ground truth of the variable sequence and the simulation result by the predictive model.

Experiment
In this section, we discuss the performance evaluation of our algorithms. We aim to measure the effectiveness of our proposal for traffic accident prediction.

Dataset
In the paper, the dataset comes from China Statistical Yearbook, which consists of the traffic-related statistical data of China in total and all provinces or cities individually from 2004 to 2016. Here four impact factors, namely the number of private cars, the number of taxis, the number of road operating cars and the urban population, affecting the number of traffic accidents are considered in our work. The details of the dataset are as shown in Table 1.

Results of Grey Correlation Analysis
Firstly, we take equation σ to eliminate the scale difference between the dimension and the unit of each factor, where x and σ are the mean value and the standard deviation of the variable sequence x i respectively. Following this, the Formulas (3) and (4) defined in Section 3.1 are used to calculate the correlation coefficient of each impact factor and find the main factor variable that has a great influence on reference series.

Results of Model Selection
The results of Table 2 show that the number of private cars as the factor has a smaller impact on the number of traffic accidents. Thus, we do not consider this factor in model selection through setting a correlation threshold while the other three factors, namely taxi, road operating car, and population are taken as independent influencing variables of traffic accident. Consequently, we build different MGM(1,N) models respectively in terms of the results of correlation analysis. Here we use the metric MAPE to measure the effectiveness of each model. The evaluation results, as shown in Table 3, indicate that considering the factors having the most influence on the number of accidents will optimize the model performance. The reason is that the collinearity between different variables will greatly affect the prediction ability of the MGM(1,N) model. For example, the increase of population will affect the rise of the number of taxis, so the redundancy among variables will make it difficult for predictive model to work well. Additionally, the most influential factor-variable is considered to greatly reflect the change of the traffic system. Therefore, the MGM(1,2) model with the most influential factor-variable, namely the number of road operating cars, is considered in the paper.

Results of Traffic Accident Prediction
For the evaluation of overall performance, we compare our algorithm with four existing methods, including GM(1,1) model, MGM (1,5) model, linear regression model and BP neural networks. Actually, the statistics of China from 2004 to 2012 are used as training data, while data from 2013 to 2016 are used for model evaluation. Meanwhile we measure the model effectiveness in terms of the metric MAE, MAPE and RMSE through calculating the difference between the fitted value and ground truth.
The result as shown in Table 4 indicates that our model achieves better results compared with other algorithms in the task of China traffic accident prediction. Specifically, our method gets the minimal deviation measured by the evaluation metrics both in Table 4. Meanwhile the evaluation results show that tradition grey model GM(1,1) and MGM (1,5) are not applicable for our task because of the complexity and nonlinearity in transportation system. In addition, our approach performs better than linear regression model and BP neural networks with respect to the metrics MAE and MAPE in prediction task. The reason is that our model is more suitable for small sample data while the machine learning approaches cannot work well on it.
Additionally, in order to evaluate the performance of our proposal and comparative methods in terms of generalization, we conduct comparative experiments on two different corpus selected from China Statistical Yearbook as mentioned in Section 4.1, namely the traffic-related statistical data of Zhe Jiang province and Chong Qing city. The results are shown in Table 5, where we use the same metrics as in the national data measurement in Table 4, including MAE, MAPE, and RMSE, to evaluate the performance. As Table 5 shows, our model is superior to the other four approaches in terms of all the metrics.

Results of Trend Prediction
In order to evaluate the overall performance of our proposal and comparative methods in time series prediction, we demonstrate the trend prediction results for country and regional data sets respectively. As shown in Figures 2-4, the trend curve by our model best fits the ground truth curve in the whole time period on different corpus, which suggests our proposal can greatly indicate the dynamic variability of the traffic system. Note that although the regression model and neural network-based approach also perform well in the task of time series prediction, the results of machine learning method are sensitive to the size of training sets.

Conclusions
In this paper, we propose an effective approach to deal with traffic accident prediction. In particular, we exploit the grey correlation analysis to measure the correlation of factors to accident occurrence. Following this, we explore the collinearity between variables and establish a multivariable grey model MGM(1,N) for accident prediction. We conduct comparative experiments with different existing methods on a real data set containing the Chinese traffic-related statistics from 2004 to 2016 to measure the performance of our proposal, and the results demonstrate the superiority of our proposal with respect to various metrics.

Conflicts of Interest:
The authors declare no conflict of interest.