Travel Characteristics Analysis and Passenger Flow Prediction of Intercity Shuttles in the Pearl River Delta on Holidays

As China’s urbanization process continues to accelerate, the demand for intercity residents’ transportation has increased dramatically. Holiday travel has different demand characteristics, causing serious shortage during peak periods. However, current research barely focuses on the passenger flow prediction along with travel characteristics of intercity shuttles. Accurately predicting passenger flow during the holidays helps to improve operational organization efficiency and residents’ satisfaction, and provides a basis for reasonable resource allocation by the management department. This paper analyzes the spatiotemporal characteristics of intercity shuttles passenger flow in the Pearl River Delta. Separate passenger flow prediction models on non-holiday and holiday are established using an improved genetic algorithm optimized back propagation neural network (IGA-BPNN) based on the characteristics of passenger flow, and the prediction models are validated based on panel data. The results of weekly flow show obvious holiday characteristics, and the hourly traffic flow of holidays is much larger than that of weekends and weekdays. There is a significant difference in the hourly flow between different holidays. The IGA-BPNN model used in this paper achieves lower prediction error relative to the benchmark BPNN approach (leads a two thirds reduction in MAPE, and an over 85% reduction in MSPE).


Introduction
China's urbanization process has been accelerating, and the boundaries between cities have blurred increasingly. Urban clusters such as the Pearl River Delta have gradually formed. It is the core area of Guangdong Province in the Guangdong-Hong Kong-Macao Bay Area. The road network structure is relatively developed, and the density of expressways ranks first in Asia. Nowadays, the traffic flow of multiple expressways in the Pearl River Delta is increasing, and some transportation hubs such as Guangzhou South station are already saturated [1], and problems such as traffic congestion and decline in service capacity continue to emerge. It is urgent to solve the problems of regional transportation organization. Close intercity connections generate a large amount of traffic demand especially during legal holidays. The intercity shuttle is one of the main intercity transportation modes. Due to the large travel demand of intercity travel and the uneven travel distribution on time and space, some hotspot intercity shuttle tickets are hard to get during holidays. The phenomenon of the excess capacity of intercity shuttles during normal periods and severely insufficient capacity model to improve the prediction accuracy and interpretability of traditional models [15]; Yu et al. proposed an EMD-BPNN prediction method combining empirical mode decomposition (EMD) and BP neural network (BPNN) to predict the short-term passenger flow of the subway system [16]. Yang and Wu considered the influence of weather, date, and other factors, analyzed the prediction accuracy of different training time and learning speed to determine the optimal number of neurons in the hidden layer, and proposed a BP neural network prediction method [17]. Li analyzed the data characteristics of the passenger flow of the National Day holiday of Guangzhou Metro Line 5, and proposed the passenger flow prediction method combining the time series model and the regression model [18]; Zhu introduced the urban rail transit daily passenger flow based on the average daily passenger flow index, the ARIMA model of urban rail transit daily passenger flow prediction was established, and the passenger flow before and after holidays and during holidays were predicted. The results showed that the relative error of predicted passenger flow was about 2% [19]. Ma et al. conducted a sensitivity analysis of urban rail transit passenger flow, focusing on the impact of holiday passenger flow fluctuation on prediction accuracy, and proposed a solution to smooth decomposition of holiday wave passenger flow [20].
Research of Mario proved the impact of public holiday [37], proposing that the daily traffic counts had obvious weekly variation cycles by using time series approach [38] or ARIMAX and SARIMAX models [39].
According the study of Chen the application of back-propagation neural networks improved the forecasting accuracy of air passenger demand with mean absolute percentage error (MAPE) of 0.34% [40]. Blinova developed the neural network method using 28 time-lagged feed-forward artificial neural networks to forecast air passenger traffic flows in Russia [41]. Other scholars also use neural network methods to predict short-term air transportation demand [42] or predict bus arrival times [43], sometimes in other prediction problem [44].
The peak load of a bus route is essential to service frequency setting [45]. Transit network planning requires prediction of state variables such as on-board loads [46]. Generally, the methods to predict bus on-board loads can be classified into two types-model-based approach [47] and simulation-based approach [48]. Other researchers also predict the real-time congestion information of the subway [49].

Summary of the Research
Firstly, the existing research about passenger flow prediction mostly deals with railway, conventional bus, and urban rail transit passenger flows, and there is less research on passenger flow prediction of intercity shuttle. It is more difficult to obtain history data of intercity shuttle than that of railways, conventional buses, and urban rail transit due to the more complicated operation subject it involves. Therefore, the relevant theoretical system has not yet been formed.
Secondly, the prediction model of current research has lower prediction accuracy and poor model applicability. Some scholars use a single model such as BP neural network and the k nearest Neighbor (KNN) to predict passenger flow but do not consider the large fluctuation of passenger flow and the defects of the single prediction model; relevant scholars perform linear or nonlinear fitting only based on historical passenger flow data, completely ignoring the relevant spatiotemporal features of passenger flow. Some scholars slightly consider the spatial and temporal characteristics of passenger flow, but still have not grasped the key features.
Moreover, most studies lack the consideration of holiday passenger flow characteristics, and have not established a passenger flow prediction model specifically for holidays. The predicted results are significantly different from the actual passenger flow, and the model's prediction effect is not good. Some scholars directly apply the method of non-holiday passenger flow prediction to predict holiday passenger flow, therefore, the results cannot provide a reliable reference for decision-making. This paper analyzes the passenger flow characteristics of the holiday, predicted the non-holiday and holiday passenger flow of the intercity shuttle using the BP neural network coupling with improved genetic algorithm.
The BP neural network needs to randomly assign initial connection weights and thresholds for each layer when starting training, and the BP neural network is more sensitive to the selection of the initial weights and thresholds. When they are not selected properly, it will lead to slower convergence speed and local extreme values, which will affect the calculation efficiency and prediction accuracy of BP neural network, and it may cause its prediction accuracy to be lower than that of traditional linear prediction models in severe cases. Some scholars have proposed to optimize the initial weights and thresholds of BP neural networks based on genetic algorithms using traditional roulette and tournament methods in the selection operation. The method of operation does not improve the fitness function, with the disadvantages of slow convergence and easiness to fall into the local optimum when the traditional genetic algorithm is optimized. Therefore, its prediction effect is still not ideal, and the prediction accuracy needs to be improved.
According to the passenger flow characteristics of the intercity shuttle, the passenger flow is predicted and studied, and the obtained non-holiday and holiday passenger flows of the intercity shuttle can provide corresponding reference for the intercity traveler. It provides an important theoretical and practical significance for providing reliable passenger flow basis for the operation and scheduling of intercity passenger transport lines in a megalopolis.

Data Introduction and Preprocessing
The data in this paper is the outbound passenger flow data of each passenger line in Guangdong Province provided by the Guangdong Provincial Department of Transportation and the passenger shift data obtained on the online ticketing official website of Guangdong Province. The outbound passenger flow data includes the outbound passenger flow data of each passenger line in Guangdong Province in 2017-2018, totaling more than 8400 passenger lines and more than 14 million records, including seven fields such as the day of the week, time of day(TOD), outbound time, line name, starting point, ending point, the total number of seats, and total number of outbound stations; the passenger shift data on the online ticketing official website of Guangdong Province includes 12 fields as the shift number, departure station, destination, departure time, passenger type, seat type, vehicle level, arrival station, route station, mileage, fare, and ticket sales. The two kinds of data are stored in the SQL (Structured Quevy Language) Server 2012 database to clean, query, and analyze. The null value data and the invalid data are replaced by the average value of the passenger bus off-site passenger flow data. The outbound passenger flow data format and example are shown in Table 1.

Space Characteristics
Spatial characteristics include three parts: Spatial shift characteristics, spatial flow characteristics, spatial passenger flow characteristics in holidays.

Spatial Shift Characteristics
The average daily departures of intercity shuttle in the Pearl River Delta in 2017-2018 is counted. The data in Table 2 shows that in the Pearl River Delta region, Guangzhou, Shenzhen, Foshan, Dongguan, and other cities with strong economic strength have more intercity shuttle buses, while the frequency of other cities is relatively low, indicating a serious imbalance. The origin and destination of the intercity shuttle bus cannot be in the same city, so there is "-" between the same cities.
Take the hourly passenger flow of a typical line (Guangzhou-Shenzhen and Shenzhen-Guangzhou) as an example to study the difference between the opposite directions. The 15 time periods departures in the operating hours from 6:00 to 21:00 are counted. The Guangzhou-Shenzhen intercity shuttle has a total of 451 departures, while Shenzhen-Guangzhou has 437, slightly less than the Shenzhen-Guangzhou intercity line.

Spatial Flow Characteristics
According to the departure passenger flow data and the starting and terminal station of the line, the annual passenger flow the origin and destination (OD) data of the intercity shuttle in the Pearl River Delta in 2018 is shown in Table 3. Table 3. Annual origin and destination (OD) passenger flow of intercity shuttle line among nine cities in the Pearl River Delta (10,000 persons). The origin and destination of the intercity shuttle bus cannot be in the same city, so there is "-" between the same cities.
It can be seen from Table 3 that the passenger flow has significant spatial heterogeneity. Passenger flow is mainly on 10 important lines such as Guangzhou and Shenzhen, Guangzhou and Foshan, and Guangzhou and Dongguan, Shenzhen and Dongguan, Shenzhen and Huizhou. There is less traffic in other directions, and the annual passenger flow between different intercity shuttle lines are quite different.  Table 4. The origin and destination of the intercity shuttle bus cannot be in the same city, so there is "-" between the same cities.
The passenger flows on other holidays were similar to that of the Spring Festival. According to the passenger flow in Table 4, there are significant differences in the distribution characteristics of passenger flow in the intercity shuttle between different holidays. Passenger flow on New Year's Day, Ching Ming Festival, Labor Day, Dragon Boat Festival, and Mid-Autumn Festival holiday are not much different, the urban intercity travel demand of them are generally large; passenger flow on National Day holiday is huge, the passengers demand for intercity travel has increased dramatically.

Time Characteristics
Due to the different travel needs of travelers in different time periods, the same passenger transport line will present different passenger flow characteristics in different time scales. This section will study the passenger flow data of Shenzhen-Guangzhou intercity shuttle in 2017-2018. The passenger flow characteristics of the intercity shuttle on different time scales (quarter, month, week, day, and hour) are explored.

Daily Variation Characteristics
The daily traffic of 2018 is classified according to the "day of the week", with 52 Tuesdays to Sundays and 53 Mondays. The maximum, minimum, average, and variance of the passenger traffic for each day from Monday to Sunday are counted separately.
The results are shown in Table 5. At the same time, in order to explore the passenger flow change laws during the week, the passenger flow in July 2018 is divided into four weeks, and the variation of passenger flow in the Shenzhen-Guangzhou intercity shuttle line is explored. Figure 1 presents a typical "three-stage": Stage 1, a downward trend from Monday to Tuesday; Stage 2, a relatively stable state from Tuesday to Thursday; Stage 3, a sharp upward trend from Thursday to Sunday. Different from the change of passenger flow in the city (large passenger flow on weekdays and small flow during the weekend), the intercity shuttle is just the opposite (the passenger flow is greater on the weekend than the weekdays).

Hourly Passenger Flow Characteristics
The natural habits of travelers determine the fluctuation of the passenger traffic of the intercity shuttle at different times of the day. In order to explore the hourly changes in passenger flow between different weekdays and normal weekends, the hourly passenger flow of the Shenzhen-Guangzhou intercity shuttle from 10 September 2018 to 16 September 2018 (Monday to Sunday) was selected for research. Meanwhile, the four holidays in 2018 are selected to study the hourly passenger flow of the Shenzhen-Guangzhou intercity shuttle. According to the operation time of the passenger line, the time scale is divided into 16 time periods.
It can be seen from Figure 3 that the hourly variation of passenger flow from Monday to Sunday presents roughly "M" type characteristics, both of which have two peak periods; the morning peak appears at 9:00 to 10:00, the evening peak at 15:00 to 16:00, the passenger flow is relatively flat during 10:00 to 15:00. The hourly flow changes on Tuesday and Thursday are roughly the same, at a relatively low level; the hourly passenger flow curve on Monday and Friday is approximately symmetrical, the passenger traffic on Monday is high in the morning, but is high in the afternoon on Friday. This may be because after the weekend break, more travelers make an intercity trip on Monday morning, and more travelers make intercity trips on Friday afternoons in advance; the hourly passenger flow changes on Saturday and Sunday are similar, and the passenger flow is at a high level every hour all day. This is because there are more travelers conducting intercity trips to visit relatives and friends on weekends.
It can be seen from Figure 4 that the hourly passenger flow change characteristics between different holidays are significant. The hourly flow change trend during the Ching Ming Festival and National Day is about the same, showing the "M" type curve. The peak passenger flow in the morning is greater than in the afternoon. The passenger flow in the morning and evening is small. The hourly curve of the Labor Day is roughly symmetrical as the Ching Ming Festival and the National Day (its passenger flow is small in the morning, and large in the afternoon). It may be because May 1 (Labor Day) is the last day of the holiday, most of the travelers choose to return in the afternoon, while April 5 (Ching Ming Festival) and October 1st (National Day) are both the first day of the holiday, and the traveler is more willing to choose the morning trip; the hourly passenger change curve of the Spring Festival is obviously lower than other three holidays, because it is the first As shown in Table 5, there are large differences in passenger flow between different "day of the week". Monday's average passenger traffic ranked fourth, but its passenger traffic fluctuated the most. The passenger flow characteristics on Tuesday, Wednesday, and Thursday are similar, and the average value is small. The daily traffic flow is also less volatile; Friday is more special, the average daily passenger flow is in the third place, but its passenger flow is the least volatile; the average passenger flow on Saturday is less than Sunday, but the daily passenger flow on Sunday fluctuates more than Saturday. In summary, the "day of the week" feature of daily passenger flow is more obvious, and it is necessary to consider the "day of the week" factor in the prediction of daily passenger traffic.
During the holiday period, according to the length and nature of different holidays, the daily passenger flow change law varies from holiday to holiday. Four important holidays (cars with less than seven seats travel free on highways) are selected: Spring Festival, Ching Ming Festival, Labor Day, and National Day. The traffic volume of seven days before and seven days after the holiday are also studied. There are total 15 days flow, the abscissa number "1-7" means seven days before the holiday, and the abscissa "8" means the holiday, the abscissa "9-15" means seven days after the holiday, as shown in Figure 2.

Hourly Passenger Flow Characteristics
The natural habits of travelers determine the fluctuation of the passenger traffic of the intercity shuttle at different times of the day. In order to explore the hourly changes in passenger flow between different weekdays and normal weekends, the hourly passenger flow of the Shenzhen-Guangzhou intercity shuttle from 10 September 2018 to 16 September 2018 (Monday to Sunday) was selected for research. Meanwhile, the four holidays in 2018 are selected to study the hourly passenger flow of the Shenzhen-Guangzhou intercity shuttle. According to the operation time of the passenger line, the time scale is divided into 16 time periods.
It can be seen from Figure 3 that the hourly variation of passenger flow from Monday to Sunday presents roughly "M" type characteristics, both of which have two peak periods; the morning peak appears at 9:00 to 10:00, the evening peak at 15:00 to 16:00, the passenger flow is relatively flat during 10:00 to 15:00. The hourly flow changes on Tuesday and Thursday are roughly the same, at a relatively low level; the hourly passenger flow curve on Monday and Friday is approximately symmetrical, the passenger traffic on Monday is high in the morning, but is high in the afternoon on There are significant differences in the daily passenger flows variation patterns of the four holidays, see Figure 2. Among them, the passenger flow showed a downward trend in the first seven days before the Spring Festival, then the New Year's Eve and the Spring Festival day were the lowest valleys of passenger flow. After that, the passenger flow rebounded slightly, but it remained at a low level. Less intercity travel may be due to a large number of migrants in this region returned to their hometown in the early Spring Festival. The passenger flow of Ching Ming Festival and Labor Day have similar variation, showing a rising passenger flow before the seven days of holiday, the peak appears on that day of the holiday, and next gradually declines to a stable level after seven days. The Mid-Autumn Festival is on 24 September 2018, and the passenger flow before the National Day holiday first declined slightly then rose straight to the highest point on 1 October, which was affected by the return journey after the Mid-Autumn Festival. The traffic volume in the next seven days is still greater than that in other holidays because of the longer National Day holiday; there is a downward trend during the seven-day holiday.

Hourly Passenger Flow Characteristics
The natural habits of travelers determine the fluctuation of the passenger traffic of the intercity shuttle at different times of the day. In order to explore the hourly changes in passenger flow between different weekdays and normal weekends, the hourly passenger flow of the Shenzhen-Guangzhou intercity shuttle from 10 September 2018 to 16 September 2018 (Monday to Sunday) was selected for research. Meanwhile, the four holidays in 2018 are selected to study the hourly passenger flow of the Shenzhen-Guangzhou intercity shuttle. According to the operation time of the passenger line, the time scale is divided into 16 time periods.
It can be seen from Figure 3 that the hourly variation of passenger flow from Monday to Sunday presents roughly "M" type characteristics, both of which have two peak periods; the morning peak appears at 9:00 to 10:00, the evening peak at 15:00 to 16:00, the passenger flow is relatively flat during 10:00 to 15:00. The hourly flow changes on Tuesday and Thursday are roughly the same, at a relatively low level; the hourly passenger flow curve on Monday and Friday is approximately symmetrical, the passenger traffic on Monday is high in the morning, but is high in the afternoon on Friday. This may be because after the weekend break, more travelers make an intercity trip on Monday morning, and more travelers make intercity trips on Friday afternoons in advance; the hourly passenger flow changes on Saturday and Sunday are similar, and the passenger flow is at a high level every hour all day. This is because there are more travelers conducting intercity trips to visit relatives and friends on weekends.

Data Preparation
The passenger traffic of the Shenzhen-Guangzhou intercity shuttle in 2017 and 2018 are measured on a daily time scale. In order to distinguish between "the day of the week" attributes of different days, the numbers 1, 2, 3, 4, 5, 6, 7 are respectively indicated on Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday; holiday attributes, with the number 0 for nonholiday, the numbers 1, 2, 3, 4, 5, 6, 7 represent New Year's Day, Spring Festival, Ching Ming Festival, It can be seen from Figure 4 that the hourly passenger flow change characteristics between different holidays are significant. The hourly flow change trend during the Ching Ming Festival and National Day is about the same, showing the "M" type curve. The peak passenger flow in the morning is greater than in the afternoon. The passenger flow in the morning and evening is small. The hourly curve of the Labor Day is roughly symmetrical as the Ching Ming Festival and the National Day (its passenger flow is small in the morning, and large in the afternoon). It may be because May 1 (Labor Day) is the last day of the holiday, most of the travelers choose to return in the afternoon, while April 5 (Ching Ming Festival) and October 1st (National Day) are both the first day of the holiday, and the traveler is more willing to choose the morning trip; the hourly passenger change curve of the Spring Festival is obviously lower than other three holidays, because it is the first day of the Spring Festival so most people choose to reunite with their families at home instead of carrying out intercity trips. Moreover, most of the residents in Guangzhou and Shenzhen are migrant population, thus they are usually not in the Pearl River Delta during the Spring Festival holiday.

Data Preparation
The passenger traffic of the Shenzhen-Guangzhou intercity shuttle in 2017 and 2018 are measured on a daily time scale. In order to distinguish between "the day of the week" attributes of different days, the numbers 1, 2, 3, 4, 5, 6, 7 are respectively indicated on Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday; holiday attributes, with the number 0 for nonholiday, the numbers 1, 2, 3, 4, 5, 6, 7 represent New Year's Day, Spring Festival, Ching Ming Festival, Labor Day, Dragon Boat Festival, Mid-Autumn Festival, and National Day holiday.
The 57-day passenger flow data from 2017 to 2018 were deleted to form the basic data. The single-sample Kolmogorov-Smirnov test was performed on the basic data using SPSS software. The bilateral progressive significance was less than 0.05, which did not meet the normal distribution. Therefore, the SPSS software is used to find the quartile of the passenger flow data instead of 3σ criteria, and the upper and lower limits of the normal value of the daily passenger flow data are obtained (1471, 7540), and the 11 daily traffic abnormal value data not belonging to the interval is deleted. The processed sample data is 655. The sample data is normalized by the mapminmax function in MATLAB, so that the value of the sample data is in the range of (−1, 1).

Passenger Flow Prediction Model
The passenger flow prediction model is mainly divided into three categories: Statistical model, nonlinear model, and mixed prediction model. BP neural network is a neural network model that

Data Preparation
The passenger traffic of the Shenzhen-Guangzhou intercity shuttle in 2017 and 2018 are measured on a daily time scale. In order to distinguish between "the day of the week" attributes of different days, the numbers 1, 2, 3, 4, 5, 6, 7 are respectively indicated on Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday; holiday attributes, with the number 0 for non-holiday, the numbers 1, 2, 3, 4, 5, 6, 7 represent New Year's Day, Spring Festival, Ching Ming Festival, Labor Day, Dragon Boat Festival, Mid-Autumn Festival, and National Day holiday.
The 57-day passenger flow data from 2017 to 2018 were deleted to form the basic data. The single-sample Kolmogorov-Smirnov test was performed on the basic data using SPSS software. The bilateral progressive significance was less than 0.05, which did not meet the normal distribution. Therefore, the SPSS software is used to find the quartile of the passenger flow data instead of 3σ criteria, and the upper and lower limits of the normal value of the daily passenger flow data are obtained (1471, 7540), and the 11 daily traffic abnormal value data not belonging to the interval is deleted. The processed sample data is 655. The sample data is normalized by the mapminmax function in MATLAB, so that the value of the sample data is in the range of (−1, 1).

Passenger Flow Prediction Model
The passenger flow prediction model is mainly divided into three categories: Statistical model, nonlinear model, and mixed prediction model. BP neural network is a neural network model that uses error back propagation algorithm. It is also the most widely used neural network, with strong nonlinear mapping and complex logic computing ability [50][51][52]. The BP neural network structure includes the input layer, the hidden layer, and the output layer. The training process mainly includes signal forward transmission and error back propagation. The signal is forwardly transmitted during the learning process. When the expected output cannot be obtained, the error propagates backward. By modifying the weights of the various neurons, the error is minimized until the desired output is obtained [53][54][55][56].
According to the analysis of passenger flow time characteristics above, the daily passenger flow of the intercity shuttle is fluctuating violently, the holiday passenger flow is much larger than the non-holiday passenger flow, and it has typical characteristics such as non-linearity and non-stationarity. BP neural network is a typical nonlinear passenger flow prediction model. Therefore, BP neural network is used as the basis for passenger flow prediction of intercity passenger transport.
Combined with the characteristics of the sample data of this study, a three-layer BP neural network model with a hidden layer was selected. The traffic volume of the prediction day is output by using the traffic volume of the seven days before the prediction, the year, month, day, and "day of the week" attribute of the prediction date. Therefore, the number of input nodes is n = 11, and the number of output nodes is l = 1 in this study. The value range of the number of hidden layer nodes is calculated by the empirical formula [5,14], and the number of hidden layer nodes m = 10 when the average relative error is selected in the prediction experiment is minimized. A tan-sigmoid function with a faster convergence speed and a wider output range is selected between the input layer and the hidden layer as a transfer function, a purelin function is selected (input and output values to take any value) as a transfer function between the hidden layer and the output layer. The gradient descent traingdx function of momentum back propagation and dynamic adaptive learning rate is selected to meet the operation requirements of faster training and larger capacity data. This study chose a smaller learning rate of 0.1. Select the more commonly used 0.9 as the momentum factor for this model. In this study, the ratio of the training set, the validation set, and the test set were set at 80%, 10%, and 10%, respectively, as most studies do. Set the training accuracy to 0.1 and set the maximum number of trainings to 10,000 to prevent the training time from being too long. Other unspecified parameters settings follow the default values in MATLAB.

Improved Genetic Algorithm-Back Propagation Neural Network (IGA-BPNN) Prediction Model
This paper abandons the selection operation of the roulette method in the traditional genetic algorithm, and adopts the selection operation method based on the fitness function to propose an intercity shuttles passenger flow prediction model based on the improved genetic algorithm (IGA) optimized BP neural network (IGA-BPNN), which mainly includes three parts: The first part is to determine the network structure of BP neural network, determine the number of network layers and the number of nodes in each layer; the second part is to use the improved genetic algorithm to optimize the initial weight and threshold of BP neural network, the core of which is the improvement of the selection operation; the third part is to optimize the BP neural network's initial weights and thresholds and then use the optimized network to predict the passenger flow of the intercity shuttle. Figure 5 shows the IGA optimization process.  The IGA optimization process includes five basic components: Step1: Chromosome coding. In this paper, the real number coding method is used to encode the input layer, hidden layer, and output layer nodes respectively, which are 11, 13, 1, so there are 156 weights and 14 thresholds. Therefore, when performing real number encoding, the chromosome encoding length is 170.
Step2  The IGA optimization process includes five basic components: Step1: Chromosome coding. In this paper, the real number coding method is used to encode the input layer, hidden layer, and output layer nodes respectively, which are 11, 13, 1, so there are 156 weights and 14 thresholds. Therefore, when performing real number encoding, the chromosome encoding length is 170.
Step2: Selection of fitness function. The absolute error between the predicted and original value is used as the individual fitness value. If the individual fitness value i is F i , and the corresponding absolute error is E(X i ), the fitness function is as Equation (1), the size of the fitness is a direct manifestation of the individual's performance. The optimization goal of the genetic algorithm is to minimize the individual fitness value until it tends to zero.
Step3: Select operation. The probability that an individual proposed in this paper p i is selected as Equation (2).
where, i means the individual i (i = 1, 2, · · · , N); F i means the fitness value of i, which used absolute error in this article; l means the adjustment factor.
Step4: Cross operation and mutation operation. Since the code in this paper is a real number code, the real number cross method is selected in the cross operation. According to previous studies, the initial crossover probability can be set to 0.6, and the mutation probability can be set to 0.5.
Step5: Initial population size and maximum evolutionary iteration. The initial population size can be set to 20. In this paper, the maximum evolution iteration is set to a slightly larger value. The fitness curve is observed after the training is completed; the most suitable evolutionary algebra is then selected.

IGA-BPNN Predictive Model Results Analysis
According to the above parameters set, the program corresponding to the IGA-BPNN model is written in MATLAB for prediction. In the improved genetic algorithm, the maximum evolution algebra is set to 400, and the optimal fitness for chromosome evolution is 155. When the evolution exceeds 300 generations, the fitness value has only a slight change. When the evolutionary algebra exceeds 350, the fitness value is almost unchanged. After obtaining the optimal initial weight and threshold, enter to BP neural network. After learning and training, BP neural network predicts the daily passenger flow of non-holiday intercity shuttle.
In order to more intuitively reflect the difference between the predicted and the actual value, the following comparison between the predicted and original value are made as shown in Figure 6.

IGA-BPNN Predictive Model Results Analysis
According to the above parameters set, the program corresponding to the IGA-BPNN model is written in MATLAB for prediction. In the improved genetic algorithm, the maximum evolution algebra is set to 400, and the optimal fitness for chromosome evolution is 155. When the evolution exceeds 300 generations, the fitness value has only a slight change. When the evolutionary algebra exceeds 350, the fitness value is almost unchanged. After obtaining the optimal initial weight and threshold, enter to BP neural network. After learning and training, BP neural network predicts the daily passenger flow of non-holiday intercity shuttle.
In order to more intuitively reflect the difference between the predicted and the actual value, the following comparison between the predicted and original value are made as shown in Figure 6.  The prediction results are relatively stable. Moreover, the error of the training set and the verification set data is close to the test set, and the BP neural network has no fitting phenomenon in the prediction. Therefore, the IGA-BPNN model of the non-holiday intercity passenger class passenger traffic has a certain reliability.

Comparative Analysis of Multiple Model Prediction Results
The advantages and disadvantages of the prediction result need to be judged by four evaluation indicators. The model prediction results of IGA-BPNN were compared with the prediction results of GA-BPNN (genetic algorithm optimized BP neural network) and BPNN model selected by roulette The prediction results are relatively stable. Moreover, the error of the training set and the verification set data is close to the test set, and the BP neural network has no fitting phenomenon in the prediction. Therefore, the IGA-BPNN model of the non-holiday intercity passenger class passenger traffic has a certain reliability.

Comparative Analysis of Multiple Model Prediction Results
The advantages and disadvantages of the prediction result need to be judged by four evaluation indicators. The model prediction results of IGA-BPNN were compared with the prediction results of GA-BPNN (genetic algorithm optimized BP neural network) and BPNN model selected by roulette method and ARIMA model, respectively. As is shown in Table 6. The MSPE of ARIMA model has not been calculated here due to the former three errors are large than IGA-BPNN obviously, so there is "-"of this error.
The prediction error of the IGA-BPNN model is the smallest (mean absolute percentage error MAPE = 6.43% < 10%), and the values of the other three indicators mean absolute error (MAE), root mean square error (RMSE), and mean square percentage error (MSPE) are relatively small (259.18, 323.90, 0.08). The IGA-BPNN prediction model has strong applicability and reliability for passenger flow prediction of non-holiday intercity shuttle.

Non-Holiday Space Passenger Flow Prediction
The same non-holidays are selected to predict and analyze the intercity passenger traffic between nine cities, and the prediction models are verified from the perspective of panel data. This paper selects 22 August 2018 as the prediction date, and obtains the predicted traffic volume of intercity shuttles among nine cities in the Pearl River Delta on 22 August 2018. The average absolute error between the predicted value and the actual value is calculated separately, and the results are shown in Table 7. The origin and destination of the intercity shuttle bus cannot be in the same city, so there is "-"between the same cities.
It can be seen from Table 7 that the predicted average absolute error is between 1.70% and 14.64%. The average absolute error of passenger flow prediction between nine cities on 22 August is 6.71%, with about 0.28% differs by the average absolute error of non-holiday passenger flow prediction for Shenzhen-Guangzhou above, which is an acceptable level of error. It indicates the spatial reproducibility of the non-holiday passenger flow prediction model, and its passenger flow prediction model has strong applicability in the non-holiday passenger flow prediction among nine cities in the Pearl River Delta.

The Introduction of Holiday
There are two types of holidays in China's legal holidays, including three-day and seven-day holidays. New Year's Day, Ching Ming Festival, Labor Day, Dragon Boat Festival, and Mid-Autumn Festival are three-day holidays. The Spring Festival and National Day are seven-day holidays.
The demand for intercity travel is huge during the holiday period. Some travelers will take pre-departure and postponed return trips to avoid travel difficulties such as purchasing tickets on holidays and traffic jams. Therefore, this article generalizes the flow peak duration before and after the holidays and the duration of the holiday collectively as the holiday impact time.
The holiday impact time can be analyzed according to the historical passenger flow data of the intercity shuttle, determined by the ratio of the daily passenger flow before and after the holiday to the average passenger flow of the same day of the year, and several days before and after the holiday. When the ratio exceeds the set threshold, the period of time is considered to be the holiday impact time. According to previous research results, the threshold is set to 1.2, and each day of the holiday belongs to the holiday impact time.

The Impact time of Each Holiday
First, the daily passenger flow of each holiday and the days before and after the holiday of a single intercity shuttle line of Shenzhen-Guangzhou are selected to study, then the ratios are calculated as Figure 7. In the three-day holiday, New Year's Day is selected as an example, and the daily passenger flow of the holiday and the 10 days before and after the holiday for a total of 23 days are selected for research. First, the daily passenger flow of each holiday and the days before and after the holiday of a single intercity shuttle line of Shenzhen-Guangzhou are selected to study, then the ratios are calculated as Figure 7. In the three-day holiday, New Year's Day is selected as an example, and the daily passenger flow of the holiday and the 10 days before and after the holiday for a total of 23 days are selected for research.
As shown in Figure 7a, the ratio of New Year's Day holidays in 2017-2018 is greater than 1.2. According to the above-mentioned influence time judgment criteria, two days before the New Year holiday, three days of New Year's Day holiday, and two days after New Year's Day holiday, a total of seven days is the New Year's Day holiday impact time. After similar analysis, the holiday impact time of the Ching Ming Festival, Labor Day Holiday, Dragon Boat Festival Holiday, and Mid-Autumn Festival Holiday are the same as that of the New Year's Day holiday, which is two days before the holiday, three days of the holiday, and two days after the holiday, for a total of seven days. During the three-day holiday, the peak of passenger outflows was concentrated, and the difference in passenger traffic on each day was small. We select the 10 days before the National Day holiday in 2017-2018, the seven days of the National Day holiday, and the seven days after the National Day holiday to study. The results are shown in Figure 7b, showing the three days before the National Day holiday, seven days of the National Day holiday, and three days after the National Day holiday, a total of 13 days is the National Day holiday impact time. The traffic volume on the two days before and after the holiday was As shown in Figure 7a, the ratio of New Year's Day holidays in 2017-2018 is greater than 1.2. According to the above-mentioned influence time judgment criteria, two days before the New Year holiday, three days of New Year's Day holiday, and two days after New Year's Day holiday, a total of seven days is the New Year's Day holiday impact time. After similar analysis, the holiday impact time of the Ching Ming Festival, Labor Day Holiday, Dragon Boat Festival Holiday, and Mid-Autumn Festival Holiday are the same as that of the New Year's Day holiday, which is two days before the holiday, three days of the holiday, and two days after the holiday, for a total of seven days. During the three-day holiday, the peak of passenger outflows was concentrated, and the difference in passenger traffic on each day was small.
We select the 10 days before the National Day holiday in 2017-2018, the seven days of the National Day holiday, and the seven days after the National Day holiday to study. The results are shown in Figure 7b, showing the three days before the National Day holiday, seven days of the National Day holiday, and three days after the National Day holiday, a total of 13 days is the National Day holiday impact time. The traffic volume on the two days before and after the holiday was relatively large, and the traffic on other days was relatively average.
A total of 37 days before the Spring Festival holiday in 2017-2018, seven days in the Spring Festival holiday, and 15 days before and after the Spring Festival holiday were selected for the study. The results are shown in Figure 7c. The calculation can be seen 10 days before the Spring Festival holiday, seven days of the Spring Festival holiday, and 12 days after the Spring Festival holiday, a total of 29 days is the Spring Festival holiday impact time. The daily traffic volume before and after the holiday is large, but the passenger traffic during the holiday is lower than the annual average daily passenger flow, and then grows faster.

Prediction Model
China's legal holidays contain, in total, 29 days (about 7.95%), a lower proportion. The number of holiday passenger data is limited. Due to the limitation of the number of holiday passenger flow data, many scholars' prediction accuracy of holiday passenger flow data is still not ideal. And although some scholars have obtained a large number of holiday historical passenger flow data, also using machine learning algorithms to predict future holiday flow, the data of holiday passenger flow in different years are absolutely different, especially the characteristics of holiday passenger flow between years that are far away, so the prediction accuracy cannot be guaranteed.
Therefore, the holiday passenger flow prediction model is combined with the holiday background traffic volume and the holiday passenger flow fluctuation coefficient. It is divided into three parts: The first part is to use the non-holiday passenger flow prediction model to predict the background passenger flow in each day of the holiday impact time; the second part is to combine the historical passenger flow data to determine the holiday passenger flow fluctuation coefficient of each day in the holiday impact time; in the third part, the holiday background passenger traffic is multiplied by the holiday passenger flow fluctuation coefficient to obtain the predicted value of the holiday passenger flow.
The predictive model can be shown as: where, i represents the seven holidays mentioned above respectively (i = 1, 2, · · · , 7); j means different dates during the holiday (j = −10, −9, · · · , 0, 1, · · · , 18); Y ij means the predicted value of daily passenger flow within the duration of the holiday; φ ij means the fluctuation coefficient of holiday passenger flow on each day of the holiday; y ij means the background traffic of each day within the holiday.

The Fluctuation Coefficient
The fluctuation coefficient of passenger flow for each day of different holidays is directly related to the fluctuation coefficient of historical passenger flow, and it is also related to the coefficient of variation of the average daily passenger flow in the forecast year.
Therefore, the fluctuation coefficient of the holiday passenger flow is composed of two parts: The first part is the ratio of the passenger flow for each day of holiday duration in the previous year to the average passenger flow of the "day of the week" to which it belongs, and the second part is the variation coefficient of the average holiday passenger flow between two years. The calculating formula is as follows: where, ϕ ij means the ratio of the passenger flow for each day of holiday duration in the previous year to the average passenger flow of the "day of the week" to which it belongs; k i means the variation coefficient of the average holiday passenger flow between the two years. Table 8 and Figure 8 show the results. The background passenger traffic of each holiday is predicted according to the IGA-BPNN intercity shuttle non-holiday passenger flow prediction model above and predicts the passenger flow of each day in 2018. Generally, the average of the holiday passenger flow prediction MAE, MAPE, RMSE, and MSPE is at an ideal level. There are differences in the average errors of the predicted values within the duration of different holidays, but the differences are not very significant. Therefore, the prediction accuracy of the holiday prediction model is relatively high, and it has superior prediction performance in the case of less passenger traffic data.  Table 8 and Figure 8 show the results. The background passenger traffic of each holiday is predicted according to the IGA-BPNN intercity shuttle non-holiday passenger flow prediction model above and predicts the passenger flow of each day in 2018. Generally, the average of the holiday passenger flow prediction MAE, MAPE, RMSE, and MSPE is at an ideal level. There are differences in the average errors of the predicted values within the duration of different holidays, but the differences are not very significant. Therefore, the prediction accuracy of the holiday prediction model is relatively high, and it has superior prediction performance in the case of less passenger traffic data.

Predictive Model Verification
Similarly, the model is validated with the support of the holiday passenger panel data with different routes.
The passenger traffic prediction value of May 1, 2018 is selected for verification, and the average value of various prediction errors is calculated ( Table 9). The mean absolute error (MAE) and root

Predictive Model Verification
Similarly, the model is validated with the support of the holiday passenger panel data with different routes.
The passenger traffic prediction value of May 1, 2018 is selected for verification, and the average value of various prediction errors is calculated ( Table 9). The mean absolute error (MAE) and root mean square error (RMSE) are relatively higher than the single-line above, the decrease may be due to the smaller passenger flow on the other intercity shuttles than that of Shenzhen-Guangzhou route; while the mean absolute percentage error (MAPE) and the mean square percentage error (MSPE) are not much different from that of Shenzhen-Guangzhou line, which indicates that the prediction result is stable and the prediction accuracy does not fluctuate much, which is at a relatively ideal level, proving the universality and replicability of this model. It can be seen that the prediction model proposed in this paper can overcome the limitation of the lack of historical holiday data and can be applied to the passenger flow prediction of the different intercity shuttle routes in megalopolis on holidays.

Application of Holiday Passenger Flow Prediction Analysis
Different city pairs have different rules of intercity shuttle due to the difference in the permanent resident number and the economic development level. According to the predicted passenger flow, operating companies can set up transportation plans and resource allocation reasonably to balance the corporate profits and holiday travel needs, and the departure schedule of the intercity shuttle can be appropriately optimized to adjust to maximize the operational efficiency and the passenger satisfaction. Operating enterprises can grasp the changing rules of the passenger flow in the future to reserve the vehicles and personnel in advance according to the specific flow growth. Passenger stations can achieve emergency dispatch measures of vehicles on different intercity shuttles and respond to the surge in passenger flow on a certain line according to the accurate prediction of passenger flow peaks during holidays, improving the efficiency of passenger flow evacuation in the station. Travelers can also adjust their travel plans according to their actual optimization, and obtain higher travel service quality by purchasing tickets in advance, adjusting travel time, and shifting shuttle bus to stagger the peak period.
Specifically, the characteristics of passenger travel on holidays are quite different from those on non-holidays. Operating enterprise can realize the organization and management of shuttle routes from the aspects of popular route capacity guarantee and time-differentiated capacity reserve. From the spatial level, the capacity organization is organized by different lines; from the time level, the allocation of capacity resources is carried out by different holidays and time periods.
(1) Popular route capacity optimization: There are huge differences in passenger flow of different intercity shuttle corridors. From a regional perspective, the large passenger flow in various cities is mainly concentrated between intercity shuttle lines such as Guangzhou, Shenzhen, Foshan, and Dongguan. These four cities have a large number of travel demands, of which the passenger flow between Guangzhou and Shenzhen is the most, and the passenger flow from Shenzhen to Guangzhou is slightly larger than the passenger flow from Guangzhou to Shenzhen. It is necessary to increase the number of passenger departures to ensure that the corridors between popular cities do not surpass capacity.
(2) Time differentiated capacity reserve: The characteristics of spatial passenger travel in different holidays are quite different, so that differentiated organization arrangements should be implemented according to different holidays. During the Spring Festival, a large number of migrant workers will return to their hometowns collectively. The hot travel routes before the holiday are mainly spread to the surrounding areas centered on megacities, and the hot return routes after the holiday mainly gather from small and medium cities to central cities. Tomb-sweeping trips are mostly for tourists from urban areas returning to cemeteries to sweep graves. Temporary routes from the urban area to surrounding cemeteries can be opened to meet residents' needs for grave sweeping. During Labor Day and National Day, the purpose of travel is mainly for leisure travel and visiting relatives. Special tourist shuttles connecting major transportation hubs to popular tourist attractions can be added to meet the transportation needs of tourists. Passenger transport routes should be adjusted dynamically according to the difference in passenger transport demand.
(3) Differentiated capacity equipment at different times during holidays: Even on the same holiday, the traffic characteristics at different times are also very different. Passenger flow peaks often occur before holidays, and there is a problem of insufficient capacity before holidays. Therefore, pre-holiday capacity allocation needs to be paid attention to. Capacity needs to be reserved before the peak of holiday travel to ensure the overall travel capacity demand during peak travel on holidays. Especially during the Spring Festival holiday, key cities such as Guangzhou and Shenzhen will have the phenomenon of export peaks in the early period of the holidays, empty cities in the middle of the holidays, and peak return trips in the latter part of the holidays. Therefore, the transportation resources of the holidays should be developed in advance, and the remaining transportation capacity in the middle of the holidays should be rationally allocated to the early and late holidays.

Conclusions
This paper analyzed the spatiotemporal characteristics of intercity shuttles passenger flow in the Pearl River Delta and established separate passenger flow prediction models on non-holiday and holiday, which were then validated based on panel data. Firstly, the spatial and temporal characteristics of passenger flow in the intercity shuttle were analyzed in detail. Then the passenger flow prediction model of intercity shuttle based on BP neural network was proposed. Next a passenger flow prediction model on holidays combining holiday background traffic flow and holiday flow fluctuation coefficient was proposed based on the non-holiday passenger flow prediction model. Finally, the two prediction models were verified by historical passenger flow data. The specific results and conclusions of this paper are as follows: First, the spatial and temporal characteristics of passenger flow in the intercity shuttle were analyzed. The passenger line flows in different cities have a direction imbalance. The main passenger flow direction is concentrated between cities with a large number of permanent residents and a good level of economic development, and there are also differences in passenger flow between the opposite directions of passenger transport. The same "day of the week" traffic between different weeks is closely relative, and the correlation is relatively large; the passenger flow varies greatly between holidays, and the nature of this difference is the different travel demand between different holidays. The passenger traffic on the weekend is basically higher than the weekday; the hourly passenger flow on the holiday is much larger than that on the weekend and weekday, having obvious holiday characteristics; there are also obvious differences in the passenger flow between different holidays. Second, a passenger flow prediction model for intercity shuttle on non-holiday was established. Based on the analysis of the space-time characteristics of the passenger flow in the intercity shuttle, aiming at the shortcomings of the existing traditional BP neural network prediction model, the genetic algorithm is used to optimize the initial weight and threshold of the BP neural network, and the selection operation of genetic algorithm is improved based on fitness, and the passenger flow prediction model (IGA-BPNN) of the intercity shuttle based on BP neural network with improved genetic algorithm is finally formed. The model is verified by historical passenger flow data. The results show that the prediction accuracy (6.43%) of this model is higher than that of pure BP neural network prediction model (BPNN 9.39%) and the traditional genetic algorithm optimization BP neural network prediction model (GA-BPNN 18.62%).
Third, a passenger flow prediction model for intercity shuttle on holiday was established. Aiming at the shortcomings of the lack of historical holiday passenger flow data of the intercity shuttle, based on the passenger flow prediction model on non-holiday, a passenger flow prediction model on holiday combining holiday background traffic flow and holiday passenger flow fluctuation coefficient is proposed. This model is verified using panel data of the intercity shuttle among cities in the Pearl River Delta. The results show that the holiday passenger flow prediction model has higher prediction accuracy and strong applicability when there is less passenger traffic data in the holiday. Thus, this model can overcome the limitation of the lack of historical holiday data and can be applied to the passenger flow prediction of the different intercity shuttle routes in the megalopolis.
If the method could be replicated for other megalopolises in different countries with different cultural backgrounds, the study has certain limitations: First of all, the research scope, Pearl River Delta, has a large number of factories and manufacturing industries. During short holidays, intercity traffic is frequent, which is the main tourist attraction. During the long holidays, especially the Spring Festival, most people choose to leave and go back to remote hometowns due to the influence of traditional concepts.
Next, there are few holiday data, only including the holiday data of more than 50 days in two years. The forecast results will be more accurate if there are more holiday data.
Furthermore, the algorithm time is relatively long, and the adaptive adjustment problem of BP network parameters also needs further study.
Due to the limitations of the data sources and the theoretical level, this paper has done some analysis on the passenger flow prediction of the intercity shuttles considering only some of the influencing factors (the historical passenger flow data and the time attribute of the prediction date). It can also be applied to predictions of other large cities in other megalopolises. However, many other influencing factors (extreme weather and large events) have not been considered, thus the method may not be applied on these conditions. These factors can be carried out in the future to enhance the applicability and accuracy of the prediction model. Besides, the less iterative efficiency of the genetic algorithm in the iterative process needs to be improved in some predictions with strong real-time requirements. As the data is enriched and refined in the future, the prediction accuracy of the model will increase.