Short-Term Trafﬁc Flow Prediction Based on a K-Nearest Neighbor and Bidirectional Long Short-Term Memory Model

: In the previous research on trafﬁc ﬂow prediction models, most of the models mainly studied the time series of trafﬁc ﬂow, and the spatial correlation of trafﬁc ﬂow was not fully considered. To solve this problem, this paper proposes a method to predict the spatio-temporal characteristics of short-term trafﬁc ﬂow by combining the k-nearest neighbor algorithm and bidirectional long short-term memory network model. By selecting the real-time trafﬁc ﬂow data observed on high-speed roads in the United Kingdom, the K-nearest neighbor algorithm is used to spatially screen the station data to determine the points with high correlation and then input the BILSTM model for prediction. The experimental results show that compared with SVR, LSTM, GRU, KNN-LSTM, and CNN-LSTM models, the model proposed in this paper has better prediction accuracy, and its performance has been improved by 77%, 19%, 18%, 22%, and 13%, respectively. The proposed K-nearest neighbor-bidirectional long short-time memory model shows better prediction performance.


Introduction
With the development of the social economy, science and technology, and the acceleration of urbanization, the number of automobiles has increased rapidly. The resulting problems of traffic congestion and right-of-way distribution are also becoming increasingly obvious and seriously affect traffic safety and efficiency. The construction of an intelligent transportation system (ITS) can effectively alleviate road congestion, shorten travel time, reduce pollution, and improve traffic safety. Accurate prediction of short-term traffic flow is the core issue of ITS, which can provide monitoring and technical support of traffic flow in a certain period in the future. Timely and accurate prediction of traffic flow is the basis and prerequisite for traffic management and travel route planning. This can help managers to take appropriate preventive measures to make travelers choose more suitable routes, thus reducing congestion on roads and improving the efficiency of distribution between roads. Among the various applications of ITS, traffic flow prediction has attracted much attention in recent decades. However, this remains a difficult topic for transportation researchers.
Traffic flow forecasting is divided into short-term traffic flow forecasting and mediumand long-term traffic flow forecasting, depending on the time interval. The medium-term and long-term forecast units are generally based on days, weeks, months, and years. Due to the large interval, the data stability is relatively good, so it is often used for forecasting. The short-term traffic flow forecast is generally at intervals of 5-15 min. Due to the short time interval, the data stability is relatively poor, the complexity is high, and the random variation is large, which increases the difficulty of the forecasting work. Given the increasingly complex traffic situation, the development of more accurate short-term traffic flow forecasting to achieve accurate real-time traffic information determination is still an urgent problem to be solved.
Domestic and foreign scholars have carried out extensive research on short-term traffic flow prediction. According to the research content, they can be divided into three ignoring the spatial relationship between the detection points. Ma et al. [19], through periodic component processing, grouped data into 288 sample ranges per day. The data in each period were integrated into a matrix, which was input into the CNN model to extract spatial features, and finally passed into the LSTM model for fusion through the full connection layer. Qu et al. [20] proposed mining the potential spatial relations of context according to the supervised learning algorithm and transmitting the data to the deep neural network for training. Based on the literature [18], Ma et al. [21] used a genetic algorithm to sort the input context factors and then convert their importance into weights. A group of historical data was selected as the input prediction algorithm according to the similarity of weights for prediction.
Inspired by the application of the above models in predicting traffic flow and using analysis and summary, starting from the spatio-temporal characteristics of traffic flow, this study proposes a KNN-BILSTM combination model using KNN for spatial feature selection and adjusting the encoding vector by the attention mechanism. This study was divided into two parts. First, the KNN algorithm was used to screen the spatial site correlations of the selected data. By setting different thresholds, the selected data under different K values were used as input data of BILSTM for prediction, and the final prediction was produced. The result with the smallest error was considered the final result. Compared with other existing models, the model proposed in this study has better prediction accuracy and is a reasonable model for predicting traffic flow.

KNN Algorithm
As a very mature theoretical method, the original KNN model is used to solve classification and regression problems and has been used in many studies. In [22], the intuitive advantage of the KNN model is that there are no assumptions on data distribution, high flexibility, and easy operation. However, the original KNN model has a lag in the time series and cannot fully consider the correlation of nearby road segments, resulting in a deviation in the prediction accuracy. The current KNN considers the spatio-temporal correlation between roads and augments the original KNN model using a Gaussian weighting method, as found in Cai et al. [23]. The core idea of KNN is to calculate the distance between different eigenvalues, find the point closest to the target point, and obtain the result by the weighted average. The distance metric is used to measure the state vector and the current state vector in the historical database. Several methods are commonly used to measure the distance, including the Chebyshev distance, the Huffman distance, and the Euclidean distance formula. Luo X. et al. [17] proposed that since the Euclidean distance can be used to calculate not only arbitrary spatial distances but also small time series, this study uses the Euclidean distance formula to select the correlation of traffic flow.
Among the variables, x o (k) is the traffic flow detected by the target detection section at time k and x i (k) is the traffic flow of the i-th detection station in the road network at time k.

BILSTM Algorithm
Traffic flow data were used as time-series data, and information between adjacent nodes could be transferred to each other. In recurrent neural networks (RNNs), the output of neurons in the previous moment can be used as the input of neurons in the next moment so that the RNN has a memory function in short-term time series prediction. However, RNNs cannot preserve long-term historical data and have poor performance in long-term memory, with vanishing and exploding gradients. To overcome this shortcoming, a modified model of the RNN, the LSTM, is proposed, whose purpose is to allow memory cells to determine when to forget certain information in order to determine the optimal delay for time-series problems. A typical LSTM consists of an input layer, a recursive hidden layer with memory blocks as the basic unit, and an output layer. The memory block contains self-connected memory cells with stored temporal states and three adaptive multiplication gate cells (input, output, and forgetting gates) that control the flow of information in the block. Three additional gates provide a sequential simulation of the write, read, and reset operations on the block. The multiplication gates can learn to open and close. Therefore, [24] pointed out that LSTM memory cells can store and access information over a long period of time, mitigating the vanishing gradient problem. This was calculated as follows: Forgotten Gate : Input gate : Unit status : Output gate : The one-way LSTM model derives information for a given time in the future only from historical data. In traffic flow prediction, time series prediction not only refers to the historical information of the current moment but also takes into account the information of the future moment to achieve an accurate prediction of the long-term traffic flow. Therefore, based on the LSTM model, the BILSTM model was proposed to improve the prediction accuracy. The BILSTM model uses a double-layer LSTM model unit structure and simultaneously transmits information through forward and backward propagation. Forward propagation is calculated from time 1 − t, and the information output at each time is retained. Backward propagation is calculated from time t − 1, and the information output at each time is retained. Finally, the output state variables of the two results are concatenated as the final result. Table 1 lists the structural parameters of the model network. The state of the previous memory cell

BILSTM Model Prediction Process
When the BILSTM model predicts, as shown in Figure 1, it first assumes that its input samples are x t−1 , x t , x t+1 , and then calculates through two separate LSTM units: (a) First, input the samples x t−1 , x t , x t+1 into the LSTM unit in sequence according to the forward calculation, and obtain the forward state output h 1 + t x , and then calculates through two separate LSTM units: (a) First, input the samples (b) For the LSTM unit of backward calculation, the sample order is input according to of the backward state is obtained; (c) Splice two sets of output state variables of the same latitude to obtain

KNN-BILSTM Algorithm
Assuming that there are N-section detection points in the selected highway network structure, we take one of the detection points as the target detection point and form all traffic flow data into spatio-temporal correlation matrix data. The matrix of traffic flow data on the d-th day is defined as T d : The above formula, After determining the road network data, the KNN algorithm was used for spatial correlation screening. By setting different spatial thresholds, the filtered data samples were input to the BILSTM model for training, and the optimal result was selected as the final test result. The specific steps of the algorithm are described as follows: (1) The traffic flow data selected in the road network structure was averaged using all the traffic flow samples for the total number of days, and the average value of the different detection points was obtained. The formula used is as follows: (2) We determined the Euclidean distance between the average sample flow of the target detection point and that of the other detection points. (3) Multidimensional spatial data search with KDTree in the NS method was used to build a spatial matrix. (4) The KNN model was built using the above method, and the correlation calculation was performed. (5) We selected the MIV evaluation index to evaluate and sort the correlation size. (6) If K = 1, 2, 3 ... n, we selected from large to small according to the correlation and input the matrix corresponding to the sample data corresponding to different K values into the BILSTM model for training. (7) The first 80% of the selected data were training data, and the rest were prediction datasets.

Experimental Data
To verify the availability of the KNN-BILSTM model proposed in this paper, we experimentally tested the algorithm using traffic flow data from British motorways. The dataset includes real-time recordings of time, flow, and speed from all detection points on all motorways in the UK. Fifteen detection points in the interchange area of the M25 and the M23 near WARWICKWOLD in London, England, were selected as the research object. The corresponding positions of each detection point are shown in Figure 3. The selected detection point is the traffic flow in the same direction, and since the traffic flow on and off the overpass is affected, 3310 B is used as the target detection point. Due to the periodicity of the traffic flow data, the traffic flow data from 1 January 2021 to 30 April 2021 was selected as the experimental data, the time interval was recorded every 15 min, and the number of samples of the current traffic flow sequence was 4 × 24 × 31. Eighty percent of all the data was used as training data, and the rest of the data was used as test data for the experiments.

Experimental Data
To verify the availability of the KNN-BILSTM model proposed in this paper, we experimentally tested the algorithm using traffic flow data from British motorways. The dataset includes real-time recordings of time, flow, and speed from all detection points on all motorways in the UK. Fifteen detection points in the interchange area of the M25 and the M23 near WARWICKWOLD in London, England, were selected as the research object. The corresponding positions of each detection point are shown in Figure 3. The selected detection point is the traffic flow in the same direction, and since the traffic flow on and off the overpass is affected, 3310 B is used as the target detection point. Due to the periodicity of the traffic flow data, the traffic flow data from 1 January 2021 to 30 April 2021 was selected as the experimental data, the time interval was recorded every 15 min, and the number of samples of the current traffic flow sequence was 4 × 24 × 31. Eighty percent of all the data was used as training data, and the rest of the data was used as test data for the experiments.
perimentally tested the algorithm using traffic flow data from British motorways. The da-taset includes real-time recordings of time, flow, and speed from all detection points on all motorways in the UK. Fifteen detection points in the interchange area of the M25 and the M23 near WARWICKWOLD in London, England, were selected as the research object. The corresponding positions of each detection point are shown in Figure 3. The selected detection point is the traffic flow in the same direction, and since the traffic flow on and off the overpass is affected, 3310 B is used as the target detection point. Due to the periodicity of the traffic flow data, the traffic flow data from 1 January 2021 to 30 April 2021 was selected as the experimental data, the time interval was recorded every 15 min, and the number of samples of the current traffic flow sequence was 4 × 24 × 31. Eighty percent of all the data was used as training data, and the rest of the data was used as test data for the experiments. To evaluate the performance of the model in predicting traffic flow, the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) were used as prediction evaluation indicators, and the definition formula was as follows: To evaluate the performance of the model in predicting traffic flow, the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) were used as prediction evaluation indicators, and the definition formula was as follows: where n is the number of samples, y i is the real value of the data, andŷ i is the predicted value of the data sample.

Experimental Platform
The computer configuration used in this study was 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30 GHz, GPU: NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability: 8.6 , RAM: 16 G. The running environment in this model was: MATLAB R2021b.

Experimental Environment Parameter Settings
In several experiments, we set the BILSTM model to three layers: the input layer, the hidden layer, and the output layer, where the fully connected layer was used to adjust the dimension of the output layer. Parameter setting: The Adam function was optimized, the number of iterations of model training was 400, the batch size was 24, the initial learning rate was 0.005, the period in which the learning rate decreased was set to 100, and the learning rate was adjusted by setting the weight factor to 0.8. After adjustment, the learning rate was equal to the current learning rate × weight factor, and the dropout value was set to 0.2.

Experimental Results
Using the prediction of the experimental process in Figure 2, the predicted and actual values of the obtained traffic flow are shown in Figure 4, where the predicted data flow is represented by a solid red line, and the actual flow is represented by a dotted black line. It can be seen in (a) in Figure 4 that the predicted traffic flow is in good agreement with the actual traffic flow. It can be seen in the two sub-graphs (b) and (c) that the errors in training and prediction are relatively small. The training error is basically kept between −5 and 5 vehicles, and the test error is basically kept between −4 and 4, which shows that the KNN-BILSTM model proposed in this study is suitable for traffic flow forecasting.  To see the details of the prediction fit plot in Figure 4 more clearly, we screened the sample size of the model to 96 samples for prediction. The results are shown in Figure 5. In the figure, the y-axis represents the number of vehicles in each timestamp, and the xaxis represents the data output sample of the test set (the number of vehicles detected at each timestamp as a sample). To see the details of the prediction fit plot in Figure 4 more clearly, we screened the sample size of the model to 96 samples for prediction. The results are shown in Figure 5. In the figure, the y-axis represents the number of vehicles in each timestamp, and the x-axis represents the data output sample of the test set (the number of vehicles detected at each timestamp as a sample). i. 2023, 13, x FOR PEER REVIEW 10 of 16 In order to verify the influence of different K values in the KNN algorithm on traffic flow prediction, this paper selects the spatial correlation; that is, different values of K are selected for prediction. The results show that different K values have a significant impact on prediction performance. As can be seen in Figure 6, when k = 8, that is, the number of relevant detection points is 8, the effect is the best, and the loss value is the lowest. In order to more intuitively observe the data correlation between all detection points, we selected a total of 192 sample data from the first two days of all testing points for observation. It can be observed from bottom to top in Figure 7 that the amplitude of eight line segments, including target points, is small, indicating that the data structure correlation is strong. The amplitude of the above seven line segments is increasing, which can further prove the conclusion obtained when k = 8. At this time, the corresponding detection point numbers are 4465A, 4455A, 4459A, 4451L, 4470A, 4461L, 4453L, and 4475A, and they are all located in the upstream and downstream sections of the target detection point. In order to verify the influence of different K values in the KNN algorithm on traffic flow prediction, this paper selects the spatial correlation; that is, different values of K are selected for prediction. The results show that different K values have a significant impact on prediction performance. As can be seen in Figure 6, when k = 8, that is, the number of relevant detection points is 8, the effect is the best, and the loss value is the lowest. In order to verify the influence of different K values in the KNN algorithm on traffic flow prediction, this paper selects the spatial correlation; that is, different values of K are selected for prediction. The results show that different K values have a significant impact on prediction performance. As can be seen in Figure 6, when k = 8, that is, the number of relevant detection points is 8, the effect is the best, and the loss value is the lowest. In order to more intuitively observe the data correlation between all detection points, we selected a total of 192 sample data from the first two days of all testing points for observation. It can be observed from bottom to top in Figure 7 that the amplitude of eight line segments, including target points, is small, indicating that the data structure correlation is strong. The amplitude of the above seven line segments is increasing, which can further prove the conclusion obtained when k = 8. At this time, the corresponding detection point numbers are 4465A, 4455A, 4459A, 4451L, 4470A, 4461L, 4453L, and 4475A, and they are all located in the upstream and downstream sections of the target detection point. In order to more intuitively observe the data correlation between all detection points, we selected a total of 192 sample data from the first two days of all testing points for observation. It can be observed from bottom to top in Figure 7 that the amplitude of eight line segments, including target points, is small, indicating that the data structure correlation is strong. The amplitude of the above seven line segments is increasing, which can further prove the conclusion obtained when k = 8. At this time, the corresponding detection point numbers are 4465A, 4455A, 4459A, 4451L, 4470A, 4461L, 4453L, and 4475A, and they are all located in the upstream and downstream sections of the target detection point. The spatial correlation between the selected detection points and the target detection points at different K values is shown in Figures 6 and 8 and Table 2.
Appl. Sci. 2023, 13, x FOR PEER REVIEW 11 of 16 The spatial correlation between the selected detection points and the target detection points at different K values is shown in Figures 6 and 8 and Table 2.    To further evaluate the effectiveness of the KNN-BILSTM model, this paper selects MAE, RMSE, and MAPE as the evaluation indicators and selects the SVR, LSTM, GRU, KNN-LSTM, CNN-LSTM, and AT-convLSTM models as the comparison models for verification. By conducting 30 separate experiments on the 5 groups of models and encapsulating the experimental results, the average value of the evaluation indicators corresponding to the 30 groups of models is used for verification.
The current literature on traffic flow can be roughly divided into two categories. The first is traffic flow prediction based on Euclidean space data, and the second is traffic flow  To further evaluate the effectiveness of the KNN-BILSTM model, this paper selects MAE, RMSE, and MAPE as the evaluation indicators and selects the SVR, LSTM, GRU, KNN-LSTM, CNN-LSTM, and AT-convLSTM models as the comparison models for verification. By conducting 30 separate experiments on the 5 groups of models and encapsulating the experimental results, the average value of the evaluation indicators corresponding to the 30 groups of models is used for verification.
The current literature on traffic flow can be roughly divided into two categories. The first is traffic flow prediction based on Euclidean space data, and the second is traffic flow prediction based on non-European space, such as convolutional neural networks and variants. In traffic flow prediction based on Euclidean space, KNN, and CNN, there are many studies on classic models, such as LSTM. With the popularization of attention mechanisms, more researchers have added attention mechanisms based on CNN, KNN, LSTM, and other models to improve and achieve better results. Therefore, we first selected the representative model of AT-ConvLSTM for comparison. Among them, for the AT-convLSTM model, two groups of experiments were carried out in this paper. One group was used to add a new dataset. The new dataset had the same structure as [25]'s dataset, and both belong to the public data of PEMS Expressway District 10. The new dataset was substituted into the KNN-BILSTM model for training without considering the periodic influence of multiple components; the MAE and RMSE index values were 4.0442 and 4.6293, respectively, which were better than 7.14 and 9.69 found in the literature (Zheng et al., 2020). The error diagram is shown in Figure 9 shown. Another set of experiments was performed by putting the dataset in this paper into the AT-convLSTM model. The training results are shown in Figure 10. It can be seen that the training effect of the AT-convLSTM model for the dataset in this paper is relatively general.
Appl. Sci. 2023, 13, x FOR PEER REVIEW 13 of 16 Figure 9. New data prediction error. Figure 9. New data prediction error.  The results of other comparable models are shown in Figure 11 and Table 3. It can be seen that the model used in this study has the best predictive effect, and the SVR model has the worst fitting effect. The MAE index was compared with CNN-LSTM, KNN-LSTM, LSTM, GRU, SVR, and other models with improved performance of 13%, 22%, 19%, 18%, and 77%, respectively. In terms of running time, the SVR model takes the least amount of time, but its effect is the worst, indicating the shortcomings of SVR in predicting and processing high-dimensional data. Comparing CNN-LSTM, KNN-LSTM, and the model in this paper, although KNN-LSTM is superior to the model in this paper in running time and has fast convergence speed because it can only carry out one-way propagation in LSTM model training, the result is 22% different from the BILSTM model with 2-way propagation. Among them, the prediction effect of the LSTM model and the GRU model is close, and the GRU model is superior to the LSTM model in convergence speed and has fast convergence speed; however, the prediction performance of this model is still 18%, so the superior training speed can not make up for the performance gap, so the overall performance of this model is better than the listed comparison model. The results of other comparable models are shown in Figure 11 and Table 3. It can be seen that the model used in this study has the best predictive effect, and the SVR model has the worst fitting effect. The MAE index was compared with CNN-LSTM, KNN-LSTM, LSTM, GRU, SVR, and other models with improved performance of 13%, 22%, 19%, 18%, and 77%, respectively. In terms of running time, the SVR model takes the least amount of time, but its effect is the worst, indicating the shortcomings of SVR in predicting and processing high-dimensional data. Comparing CNN-LSTM, KNN-LSTM, and the model in this paper, although KNN-LSTM is superior to the model in this paper in running time and has fast convergence speed because it can only carry out one-way propagation in LSTM model training, the result is 22% different from the BILSTM model with 2-way propagation. Among them, the prediction effect of the LSTM model and the GRU model is close, and the GRU model is superior to the LSTM model in convergence speed and has fast convergence speed; however, the prediction performance of this model is still 18%, so the superior training speed can not make up for the performance gap, so the overall performance of this model is better than the listed comparison model.

Conclusions
In this study, the KNN-BILSTM combined model is used to predict the traffic flow of expressway sections. Considering the spatio-temporal characteristics of traffic flow data, the KNN model is selected to check the spatial correlation of the detection points of the road section. By selecting different K values, the correlation between the detection points of the target road section is sorted, and the two-way propagation characteristics of the BILSTM model are used to fully consider the change law of forward and backward traffic flow. When analyzing the time series of road section detection points, the effect is better than the LSTM model that only considers forward propagation. Sequence training and obtained prediction results are compared with SVR, LSTM, GRU, KNN-LSTM, and CNN-LSTM models. The experimental results prove that the model proposed in this study has good prediction accuracy. In this study, by predicting the traffic flow of expressways, managers can obtain effective information in a short time and control and guide traffic conditions promptly, thereby effectively alleviating traffic congestion. As a time series prediction model, the model is also applicable to other time series traffic forecasts, such as urban taxi traffic forecasts and subway crowd traffic forecasts.
Although the prediction effect of the model proposed in this paper is good, it also has some limitations. First of all, the prediction target of this paper is only single-feature prediction, and the impact analysis of weather factors and traffic accidents is lacking. Second, the model in this paper still predicts using Euclidean spatial data, and in the actual traffic scene, non-European spatial data prediction may be closer to life. Therefore, in future research work, we will add other influencing factors of traffic flow to predict the first problem to make the model better. Weather factors, traffic speed, road blockage, and other factors will make the prediction problem more complex and accurate. For the second problem, we will consider using a new model to predict the traffic flow of non-European spatial data, such as a graph convolution neural network. Therefore, we plan to propose a model that can accurately deal with data lost due to influencing factors.