Short-Term Demand Forecasting of Urban Online Car-Hailing Based on the K-Nearest Neighbor Model

Accurately forecasting the demand of urban online car-hailing is of great significance to improving operation efficiency, reducing traffic congestion and energy consumption. This paper takes 265-day order data from the Hefei urban online car-hailing platform from 2019 to 2021 as an example, and divides each day into 48 time units (30 min per unit) to form a data set. Taking the minimum average absolute error as the optimization objective, the historical data sets are classified, and the values of the state vector T and the parameter K of the K-nearest neighbor model are optimized, which solves the problem of prediction error caused by fixed values of T or K in traditional model. The conclusion shows that the forecasting accuracy of the K-nearest neighbor model can reach 93.62%, which is much higher than the exponential smoothing model (81.65%), KNN1 model (84.02%) and is similar to LSTM model (91.04%), meaning that it can adapt to the urban online car-hailing system and be valuable in terms of its potential application.


Introduction
An important part of the urban transportation system, online car-hailing has become the transportation choice for more and more urban residents. In 2020, there were 214 online car-hailing platforms across China, with an average 21 million daily orders taking place. Accurately forecasting the travel demand for online car-hailing is of great significance for regard to reducing vehicle idling, improving operational efficiency, and reducing traffic congestion and energy consumption [1][2][3][4]. Reasonable forecasting results can provide data support for vehicle scheduling and allocation, which is beneficial for solving problems caused by asymmetric supply and demand, as well as maximizing benefits for passengers, drivers, and ride-hailing platforms [5].
In the early stages in the development of online car-hailing, many scholars used questionnaires or interviews to make judgments concerning the future development status and changing trends with regard to the scale of travel. However, the survey process comes with problems such as low efficiency, not being able to guarantee timeliness, and in particular, a lack of an accurate description of travel demand [6]. With the accumulation of historical data on online car-hailing, scholars have carried out quantitative forecasting research on the operation status of online car-hailing. The main representative models are the time series model [7], historical average model, Kalman filter model, and the linear regression analysis method [8]. In recent years, intelligent forecasting models have gradually become used widely in urban traffic prediction, and mainly comprise neural networks, non-parametric regression prediction methods, support vector machines, and other methods [7]. Long short-term memory network (LSTM) is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in time series, and is suitable for traffic flow modeling and prediction. The unique unit structure of LSTM provides great advantages in dealing with temporal short-term traffic flow prediction problems. Ma  appeared in a relatively narrow time period and that T was below 10 time periods in most cases, but did not propose a specific calculation method for the T value. By traversing the possible T values, Wang X et al. (2019) proposed a method for selecting the T value based on the smallest prediction error, which has great reference significance [20].
In the K-nearest neighbor prediction algorithm, the value of K affects the accuracy of any short-term traffic flow prediction; choosing an appropriate K value plays a crucial role in short-term traffic flow prediction. There are many research theories on the selection of a K value. Zhou X et al. (2006) believe that the randomness of traffic flow is too high, and propose setting the number of K to different values according to different modes, thereby optimizing the prediction accuracy [21]. Yu B et al. (2012) believe that after the K value satisfies the expected value, a smaller K value can be taken to improve the model calculation speed [22]. Zhu B (2019) used a K-means clustering algorithm to divide the passenger flow into data sets under different conditions, and then performed K-Fold cross-validation on the data sets under different conditions to determine the value of the number of neighbors K under different conditions [23]. Lei S (2017) took 20 sets of data from the historical database according to the three traffic states for experimental verification; a value of K from 5 to 12 was taken in order to carry out short-term traffic flow predictions, and they were compared so as to obtain the prediction results on short-term traffic flow. Compared with the constant K value, the prediction method using the variable K value has better prediction accuracy [24]. The above scholars have proposed different approaches to K-value calculation from different perspectives, but their main core remains basically the same, namely the setting a range of K-value values and determining the evaluation method of prediction accuracy, traversing all K-values and determining the optimal K-value with the goal of the highest prediction accuracy or prediction accuracy reaching a certain threshold.
Existing research provides good methods for the short-term forecasting of urban online car-hailing demand, but there are also some shortcomings. Some model methods, such as the time series method, which perform well in the field of long-term forecasting are not fully adapted to the characteristics of a large quantity of data, strong volatility, and strong timeliness in the operation of online car-hailing, and expose a large deficiency in the aspect of prediction delay [18]. The K-nearest neighbor model can better adapt to these features. The algorithm is simple and the theory is mature, which can be used for classification and regression, and it is more suitable for automatic classification of class domains with large sample size. However, there is still room for optimization in the selection of historical data set classification, the state vector K, and the K value in the existing K-nearest neighbor methods. Therefore, based on the order data of urban online car-hailing platform, aiming at the characteristics of online car-hailing operation, this paper explores the optimization of K and T values to improve the K-nearest neighbor algorithm model. In response to these problems, this paper aims to forecast the demand for online car-hailing. First, the data set is divided into "n" categories according to the unit of day; secondly, the state vector dimension T takes values from 1 to n − 1 and calculates the prediction error under different values. At the same time, for each K, the possible value range of K is traversed, and then the T and K are found with the highest prediction accuracy. Finally, the errors from different forecasting methods are compared in order to verify the scientific nature and feasibility of the research method.

Basic Idea of the K-Nearest Neighbor Algorithm
The K-nearest Neighbor (KNN) algorithm is an efficient non-parametric classification algorithm proposed by Cover and Hart (1967) [25]. It makes predictions by searching for the K records in a historical database that are most similar to the feature vector of the predicted value. It has strong stability and has been widely used in classification, regression, and pattern recognition in recent years. Based on the improved K-nearest neighbor model algorithm, this paper establishes a short-term forecasting model for online car-hailing demand, and constructs the basic flow of the algorithm for the short-term prediction of urban online car-hailing demand as follows: (1) The original database is cleaned of historical orders, and one day is divided into 48 units (30 min per unit), which builds an order data set; (2) The search mechanism of the model is determined, which is composed of the state vector, the distance measurement method, the value of the state vector T and the number of neighbors K; (3) The K nearest neighbor prediction algorithm is determined and the prediction result is calculated; (4) The mean absolute percentage error (MAPE) is used as an indicator in order to evaluate the prediction results, and a comparative analysis is conducted.
The algorithm flow chart is shown in Figure 1. predicted value. It has strong stability and has been widely used in classification, regression, and pattern recognition in recent years. Based on the improved K-nearest neighbor model algorithm, this paper establishes a short-term forecasting model for online car-hailing demand, and constructs the basic flow of the algorithm for the short-term prediction of urban online car-hailing demand as follows: (1) The original database is cleaned of historical orders, and one day is divided into 48 units (30 min per unit), which builds an order data set; (2) The search mechanism of the model is determined, which is composed of the state vector, the distance measurement method, the value of the state vector T and the number of neighbors K ; (3) The K nearest neighbor prediction algorithm is determined and the prediction result is calculated; (4) The mean absolute percentage error (MAPE) is used as an indicator in order to evaluate the prediction results, and a comparative analysis is conducted. The algorithm flow chart is shown in Figure 1.

Classification of Historical Data
This paper divides 24 h a day into 48 time units, each time unit is 30 min, counts the number of online car-hailing orders in each time unit from the platform, and analyzes its change trends. In general, prediction accuracy increases as the classification of the data set increases, so the data is divided into 48 categories in order to improve the prediction accuracy, that is, 1 (00:00), 2 (00:30), 3 (01:00), 4 (01:30)...48 (23:30), as shown in Figure 2.

Classification of Historical Data
This paper divides 24 h a day into 48 time units, each time unit is 30 min, counts the number of online car-hailing orders in each time unit from the platform, and analyzes its change trends. In general, prediction accuracy increases as the classification of the data set increases, so the data is divided into 48 categories in order to improve the prediction accuracy, that is, 1 (00:00), 2 (00:30), 3 (01:00), 4 (01:30)...48 (23:30), as shown in Figure 2.

Constructing the State Vector
The state vector is the standard for comparing the current data with the historical data. Generally, the factors that are most relevant to the prediction object are selected to predict [22]. The real-time data on the forecast day can fully reflect the change trend of its passenger flow. Therefore, its nearest neighbor can be found in the historical database through variation of passenger flow presented by the real-time data, and the passenger flow at the next time can be calculated through the change law of real-time data and historical data so as to construct the state vector. In the formula, n represents the nth day, and when n is 0, it represents the forecast day; x n(T − 1) is the number of urban online car-hailing orders in the period T − 1 on the nth day before the forecast date; because T must be smaller than the dimension of the data set (48), the value range of T is [1,47]. In order to obtain the optimal T value, this paper intends to traverse all T values with the highest prediction accuracy as the goal.

Constructing the State Vector
The state vector is the standard for comparing the current data with the histo data. Generally, the factors that are most relevant to the prediction object are selecte predict [22]. The real-time data on the forecast day can fully reflect the change trend passenger flow. Therefore, its nearest neighbor can be found in the historical data through variation of passenger flow presented by the real-time data, and the passe flow at the next time can be calculated through the change law of real-time data and torical data so as to construct the state vector.
In the formula, n represents the nth day, and when n is 0, it represents the for day; x n(T − 1) is the number of urban online car-hailing orders in the period T − 1 on the day before the forecast date; because T must be smaller than the dimension of the dat (48), the value range of T is [1,47]. In order to obtain the optimal T value, this pape tends to traverse all T values with the highest prediction accuracy as the goal.

Distance Measurement Method
The distance measurement method is used to measure the approximation of historical sample in the historical database and the current data. Many previous stu have chosen Euclidean distance as the distance measurement method [16][17][18]. And clidean distance is a time series alignment method aligned according to time points culate the sum of Euclidean distances between the same time points as the distanc tween two time series, which is suitable for prediction on online car-hailing demand c parison at different time points.
In the formula, dn is the distance between the data from each period of the fore day and the data from each period of the historical day, xni is the number of online hailing orders in the city in the ith period of the nth day before the forecast date, and the delivery order volume for the city in the jth period of the forecast day.

Distance Measurement Method
The distance measurement method is used to measure the approximation of each historical sample in the historical database and the current data. Many previous studies have chosen Euclidean distance as the distance measurement method [16][17][18]. And Euclidean distance is a time series alignment method aligned according to time points, calculate the sum of Euclidean distances between the same time points as the distance between two time series, which is suitable for prediction on online car-hailing demand comparison at different time points.
In the formula, d n is the distance between the data from each period of the forecast day and the data from each period of the historical day, x ni is the number of online car-hailing orders in the city in the ith period of the nth day before the forecast date, and x 0j is the delivery order volume for the city in the jth period of the forecast day.

Evaluation Method
Commonly used evaluation model indicators are Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), and Mean Squared Percentage Error (MSPE). MAPE is standardized on the basis of the other evaluation indicators, which more intuitively reflects the prediction accuracy and difference of this model and has good adaptability. Therefore, this paper uses the mean absolute percentage error (MAPE) index as the performance evaluation of the model. The smaller the MAPE, the better the model is. Its calculation formula is as follows: In the formula, n is the number of samples, x i is the actual value of the sample, andx i is the predicted value of the sample.

Prediction Algorithms
The prediction algorithm is used to describe a way to use the searched K groups of neighbors to predict the demand at the next moment: In the formula,x 0 is the predicted value for the current data, x i is the order quantity corresponding to the ith neighbor searched in the historical database, and d i is the distance between the current data and the ith neighbor.

Calibration of Adaptive K Value and T Value
The way in which a reasonable K value and T value are chosen is key to the prediction model [22]. In this paper, the prediction scenarios are classified, the error rate of change of the prediction results under different K and T values is calculated for each category, and the prediction accuracy is compared to select the better value.
Step 1: The predicted scenarios are classified, and the predicted values are divided into N categories, set n = n i .
Step 2: The value of the state vector T in the scene is determined, and T = T i , T i ∈ [1, 47] is set.
Step 3: K = K j , K j ∈ [1, K max ] is set, and K max is comprehensively determined according to the quantity of historical data.
Step 4: Any one-day D e from the historical data set is selected as the test data set, and other n − 1 days are selected as the training data set, in which each data set has 48 pieces of data. Forty-eight and a half hours of urban online car-hailing orders are represented.
Step 5: The mean absolute error percentage is calculated for the test data set De under T i and K j .
Step 6: The mean absolute error percentage is calculated for all test data sets D e under T i and K j .
Step 7: When the minimum value of MAPE is obtained, the predicted corresponding T i and K j are the optimal values of the data set in the n i prediction scenario.

Research Data
The data originate from the online car-hailing supervision platform. The research area is Hefei City, which is the capital of the Anhui Province. It is a mega city with a permanent population of 8.087 million. In October 2018, 19 online car-hailing platforms in Hefei City obtained online car-hailing business licenses, and about 300,000 vehicles were registered. There are about 15,000 active vehicles, and about 200,000 orders are completed every day for online car-hailing in Hefei [26]. The data collection period is 265 days from 1 August 2019 to 4 February 2021. One piece of order data is generated every 30 min, and 48 pieces of order data are generated in one day. The original data include three fields: the first column is "date", the second column is "time", and the third column is "order volume". The original data are shown in Table 1 below. The raw data do not meet the conditions for direct analysis, and must be processed to be converted into standardized data. First, the null values, abnormal values, and so on, are deleted in the original order data; secondly, considering that the data during the novel COVID-19 epidemic in early 2020 were quite different, the order outliers during the epidemic were removed; finally, the string with time information was converted into a timestamp in order to adjust the data format, and the data scattered in different columns and different rows were integrated to make it a whole set of data for each day. Each group of data has one piece of data for every 30 min within 24 h of a day, forming 48 pieces of order volume data, with a total of 265 days of data, some of which are shown in Table 2 below.

Constructing the State Vector
The reasonable construction of the state vector is an important factor that affects prediction accuracy [27]. In contrast to some scholars sets the data dimension T of the state vector as a fixed value, the paper adopts the changing T of the state vector, which takes values from 1~n − 1. The prediction of the 23:30 data entry is taken as an example, and the state vector T at 23:30 can be ; that is, the state vector of 23

Selection of K Value and T Value
If the K value is too large or too small, it is not conducive to improving prediction accuracy. If the K max (Maximum value of K parameter) value is too large in the algorithm traversal process, the calculation time will be too long and the efficiency will be too low. The value of K max is generally 20%-40% of the overall data sample [17][18][19], and this paper determines K max = 30.
When T = 1, the state vector has only one value, and the number of search data sets is 1. Based on the standardized 265-day data, the adaptive K-value algorithm is used. Starting from the first day, the first day is used as the test set, and the remaining 264 days are used as the training set. The corresponding MAPE from K = 1 to K = 30 in the test set on the first day is calculated every 30 min. Then, the next day is used as the test set, and the remaining 263 days are used as the training set. Similarly, the corresponding MAPE from K = 1 to K = 30 per day is calculated. The above steps are repeated until all 265 days of data are traversed. From Formula (3), the prediction accuracy of all data in the 265 days under the condition of T = 1 is obtained, and the average prediction accuracy of the 48-time unit per day in the 265 days is calculated separately. When T = 2, the state vector has 2 values, and the number of search data sets is 2. Based on the standardized 265-day data, the adaptive K-value algorithm is used, and the above steps are repeated to obtain the prediction accuracy of all data in 265 days in the case of T = 2. Furthermore, the average prediction accuracy of the 48-time unit per day in the 265 days was calculated separately.
For analysis, the corresponding average prediction accuracy of each day is obtained from T = 3 to T = 47, and some data are shown in Table 3.  [1,47], as shown in Figure 3 below.  It can be seen from Figure 3 that the average error gap under different T values is large, and the overall MAPE value fluctuates between 7.7% and 17.3%. Taking the prediction data at 00:00 as an example, the optimal T value and the optimal K value were obtained with the highest prediction accuracy (minimum MAPE) as the goal. The prediction results are shown in Figure 4. The color depth in the figure represents the prediction accuracy, and the deeper the color, the more accurate the prediction. The optimal T value and K value corresponding to category 1 are T = 3 and K = 15, and the prediction accuracy is up to 96%. It can be seen from Figure 3 that the average error gap under different T values is large, and the overall MAPE value fluctuates between 7.7% and 17.3%. Taking the prediction data at 00:00 as an example, the optimal T value and the optimal K value were obtained with the highest prediction accuracy (minimum MAPE) as the goal. The prediction results are shown in Figure 4. The color depth in the figure represents the prediction accuracy, and the deeper the color, the more accurate the prediction. The optimal T value and K value corresponding to category 1 are T = 3 and K = 15, and the prediction accuracy is up to 96%. large, and the overall MAPE value fluctuates between 7.7% and 17.3%. Taking the predic-tion data at 00:00 as an example, the optimal T value and the optimal K value were obtained with the highest prediction accuracy (minimum MAPE) as the goal. The prediction results are shown in Figure 4. The color depth in the figure represents the prediction accuracy, and the deeper the color, the more accurate the prediction. The optimal T value and K value corresponding to category 1 are T = 3 and K = 15, and the prediction accuracy is up to 96%. The other category prediction methods are the same as above, and the optimal T value, K value and MAPE value corresponding to 48 categories are obtained by analogy, the minimum average absolute error percentage of all kinds of data (1-48) is less than The other category prediction methods are the same as above, and the optimal T value, K value and MAPE value corresponding to 48 categories are obtained by analogy, the minimum average absolute error percentage of all kinds of data (1-48) is less than 12%. Among them, the prediction accuracy of the third type is the highest, and the prediction error is the smallest (4.04%). The corresponding optimal T value is 3, and the K value is 3. The prediction accuracy of category 48 is the lowest, and the prediction error is the largest (11.18%). The corresponding optimal T value is 1, and the K value is 6. Overall, the prediction accuracy is better when T value is 18 and K value is [4,8], and the whole day prediction accuracy is 93% as shown in Figure 5. 12%. Among them, the prediction accuracy of the third type is the highest, and the prediction error is the smallest (4.04%). The corresponding optimal T value is 3, and the K value is 3. The prediction accuracy of category 48 is the lowest, and the prediction error is the largest (11.18%). The corresponding optimal T value is 1, and the K value is 6. Overall, the prediction accuracy is better when T value is 18 and K value is [4,8], and the whole day prediction accuracy is 93% as shown in Figure 5.

Discussion
According to the characteristics of online car-hailing order data, a method of optimizing K value and T value to improve the K-nearest neighbor algorithm model is proposed. To better illustrate the prediction effect, the exponential smoothing prediction model, KNN1 model, LSTM model and KNN2 model are used to predict the online carhailing order volume on 31 January 2021 and compare the results, as shown in Figure 6. In addition, this paper selects the period 14:30 on 31 January 2021 as the prediction analysis sample.

Discussion
According to the characteristics of online car-hailing order data, a method of optimizing K value and T value to improve the K-nearest neighbor algorithm model is proposed. To better illustrate the prediction effect, the exponential smoothing prediction model, KNN1 model, LSTM model and KNN2 model are used to predict the online car-hailing order volume on 31 January 2021 and compare the results, as shown in Figure 6. In addition, this paper selects the period 14:30 on 31 January 2021 as the prediction analysis sample.

Discussion
According to the characteristics of online car-hailing order data, a method of optimizing K value and T value to improve the K-nearest neighbor algorithm model is proposed. To better illustrate the prediction effect, the exponential smoothing prediction model, KNN1 model, LSTM model and KNN2 model are used to predict the online carhailing order volume on 31 January 2021 and compare the results, as shown in Figure 6. In addition, this paper selects the period 14:30 on 31 January 2021 as the prediction analysis sample. The exponential smoothing model, also known as exponential smoothing, is an important time series forecasting method. This paper uses SPSS for the exponential smoothing forecast, and the operation process is as follows: first, the date format is defined as a day; second, a time series prediction model is created, an exponential smoothing model is selected, and the steps are followed to complete the settings in order to obtain the prediction result value. This predicted value and the MAPE calculation formula are used to calculate the MAPE value predicted by the exponential smoothing model, and this can be found below in Table 4. The KNN1 is the model with a fixed T but improved K value. One day is divided into 48 time periods, so KNN1 model set the state vector T = 47. According to the minimum MAPE, the optimal K value is calculated, and the prediction results are shown in Table 4.
The LSTM model uses the MinMaxScaler scaler; all data are scaled between [0,1] to speed up convergence. The data in the form of time series is transformed into the form of supervised learning set, that is, the former number is taken as the input and the latter number as the corresponding output. An LSTM model was constructed and trained. The number of samples was 1, the number of trainings was 3, and the number of neurons in the LSTM layer was 5. After the predicted value is obtained, inverse scaling and inverse differentiation are performed to restore it to the original value range and traverse all test set data. The above operations are performed on each row of data and the final predicted value is saved; the prediction results are shown in Table 4.
The KNN2 model is based on the above method to determine the appropriate T value and K value. For the time unit of 14:30, the corresponding optimal T value is 18 and K value is 9; Finally, the predicted value and MAPE value are filled in Table 4 below. The comparison of the prediction results of the four methods is shown in Figure 7.
differentiation are performed to restore it to the original value range and traverse all test set data. The above operations are performed on each row of data and the final predicted value is saved; the prediction results are shown in Table 4.
The KNN2 model is based on the above method to determine the appropriate T value and K value. For the time unit of 14:30, the corresponding optimal T value is 18 and K value is 9; Finally, the predicted value and MAPE value are filled in Table 4 below. The comparison of the prediction results of the four methods is shown in Figure 7.  From Figure 6 and Table 4, it can be concluded that exponential smoothing model is modified on the basis of simple historical average model, but it lacks the ability to identify the turning point of the data, and the prediction accuracy is still low compared with KNN model. The KNN1 model optimizes the K value, but does not consider the influence of the fixed T value on the prediction results; compared with the exponential smoothing model, the prediction accuracy is improved while it is still low compared with the KNN2 model, indicating that the consideration of increasing the T value can make the prediction of the model more accurate. The LSTM model has an advantage in the time series problem because of its internal forgetting layer structure, which is relatively suitable for solving the problem of online car-hailing demand forecasting; the prediction results are relatively smooth and the average absolute percentage error can be optimized to 8.96%, which is similar to the prediction accuracy of the KNN2 model proposed in this paper. We believe that if the relevant parameters of LSTM are further optimized, its prediction effect can be further improved. The KNN2 model can better adapt to the prediction of the fluctuation data of the online car-hailing. Finally, the average absolute percentage errors of the exponential smoothing model, KNN1 model, LSTM model and the KNN2 model are 18.35%, 15.98%, 8.96% and 6.38%. Compared with the other three prediction methods, the prediction accuracy of the KNN2 model can reach 93.62%, which is much higher than the exponential smoothing model (81.65%) and the KNN1 model (84.02%), similar to LSTM model (91.04%). Yu Bin stated that the K-nearest neighbor prediction model has a high prediction accuracy in short-term predictions [22], and the research in this paper also verifies this point of view. It can be seen that the K-nearest neighbor prediction model has high prediction accuracy and applicability in the short-term prediction of online car-hailing orders.
In summary, when comparing the four prediction methods through the analyses in this paper, it can be ascertained that the K-nearest neighbor algorithm is simple in theory, easy to implement, has a high accuracy, the highest prediction accuracy and a stronger adaptive ability. It can change with predicted environmental conditions, and through the classification of historical data sets and the adjustment of search algorithms and related parameters, the appropriate T value and K value are adopted. It can more accurately predict the demand for online car-hailing in cities, has good applicability in real time, and is more suitable for the short-term prediction of complex mutations, as well as being able to predict the trend of data changes in real time.

Conclusions
This paper takes the short-term forecasting of urban online car-hailing demand in Hefei as the research object and standardizes its historical order data from 2019 to 2021. The data set is divided into 48 categories within a day as the data-set unit. With the goal of the short-term prediction of the order demand of the urban online car-hailing platform, an adaptive K-nearest neighbor model prediction model is constructed. Aiming to minimize the average absolute percentage error, the values of the state vector T value and the number of neighbors K value are optimized, which can effectively prevents prediction inaccuracy.
An example data analysis shows that the K-nearest neighbor prediction method has high accuracy in the field of the short-term prediction of online car-hailing demand. The accuracy rises as high as 93.62%, and the stability is very good. It can better adapt to the urban online car-hailing order data with large time fluctuations and unevenness.

Practical Implications and Directions for Further Research
The prediction model of the K-nearest neighbor algorithm has been verified using the data from Hefei City, but whether it can be adapted to other different types of cities requires further research. This paper has studied Hefei City as a whole, and in the process of online car-hailing operations and scheduling, data prediction in a smaller range may be required, for example, in a short-term forecast of the demand for online car-hailing in a certain transportation hub. It would be necessary to further mine the data in a smaller geographic space in order to better guide the practicality and applicability of the K-nearest neighbor algorithm model. In addition, this paper lacks the consideration of spatial information on the accuracy of prediction. If the space-time information conditions are available, the graph convolution network will be considered to predict the online car-hailing order volume. At the same time, we also hope that Transformer models are applied to the prediction of online taxi order volume in the future.
Author Contributions: The authors confirm contribution to the paper as follows: study conception and design: Y.X.; data collection: Y.X.; analysis and interpretation of results: W.K.; draft manuscript preparation: W.K. and Z.L. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Anhui Provincial Natural Science Foundation(Nos. 2208085ME147).
Institutional Review Board Statement: Not applicable.