1. Introduction
As the city scale expands, the integrated transportation system will become more and more complex to meet the travel needs of different residents. With the development of the economy and the expansion of the population, the subway has been opened in many cities in China to relieve the increasing traffic pressure. Smart cards have been widely used in the integrated transportation system, recording the electronic information of passengers’ travel. Smart card data records the station and time when passengers enter and leave the station, which is called the OD (origination–destination) record. The smart card big data accumulated by the AFC (Automatic Fare Collection) system provides the basis for the analysis of urban residents’ travel behavior.
OD records reflect the destination needs of residents. Kim H proposed an OD estimation model based on multi-source data, which can identify the travel demand of the destination [
1]. Kim K established a spatiotemporal autoregressive model of air passenger flow from the origin to the destination in order to better plan and dispatch the operation, and has verified the effectiveness of the model in predicting air passenger flow [
2]. Zhang C used the track data to track the OD mode of vehicle movement, and proposed a measurement method based on the on ramp flow. Compared with the traditional method of fixed sensors, it improves the measurement accuracy [
3].
Liu L proposed a new spatiotemporal network model to analyze the evolution of taxi demand. The results show that this method can predict the demand of the taxi destination better [
4]. Bachir D used mobile network data, combined with transportation network geospatial data, tourism survey, census and tourism card data, to infer the user’s dynamic departure destination [
5]. Mungthanya W proposed a new method to construct a dynamic OD matrix of taxi in space and time by using taxi trajectory data, and analyzed the demand mode of taxi travel [
6].
The OD data of passengers is extracted and distributed to the traffic line, which can analyze the passenger flow pressure on the traffic line. Javani B proposed an OD based on the algorithm for static traffic assignment with fixed OD demand, which is tested in Chicago and Philadelphia [
7]. Mehrabipour M proposed a multi OD algorithm based on the cell transmission model, which can optimize the dynamic traffic assignment of the system [
8]. Hoang N H proposed a new linear programming framework, which uses the relationship between UE (user equilibrium) and the system optimal solution to solve the dynamic traffic assignment problem [
9].
OD records can be combined with the passenger’s travel preferences to determine the passenger’s travel path choice. Wang Y proposed a two-stage algorithm, which uses the traffic flow data of link level to accurately estimate the OD matrix and improves the traveler’s path selection behavior model [
10]. Based on long-term GPS (global positioning system) data, Li D introduced OD attribute and used utility function to explore the influence of destination attribute on path selection [
11]. Ou J proposed a new framework for estimating dynamic OD flow using machine learning algorithm, and carried out an evaluation experiment on the real network of Kunshan City [
12]. Duan Z proposed a hybrid neural network prediction model, which can effectively predict the OD traffic of the urban taxi [
13].
In order to evaluate OD, it is necessary to set various indexes to evaluate passengers’ behaviors quantitatively. Yang S used taxi GPS data to estimate OD travel time, which helps to analyze the route preference of passengers [
14]. Margaretic P applied the spatial economic flow model and characteristic function spatial filtering method to air transportation, and discussed the impact of departure destination on air passenger flow [
15]. Hanseler F S proposed a framework for estimating pedestrian demand in railway stations, which considers passenger data and various direct and indirect demand indicators [
16].
Nigro M collected all the monitoring travel data of specific areas in Rome of Italy, and analyzed the travel time and route selection probability of the passenger’s OD, which can improve the temporal and spatial reliability of the demand matrix [
17]. Ma W used the daily traffic data collected over the years to propose a new theoretical framework, which is used to consider the daily changes caused by travelers’ independent choice of routes and estimate OD demand, path selection probability, and travel cost [
18]. Dai X proposed a data-driven short-term subway passenger flow prediction framework, which can be successfully used to describe different subway travel modes [
19].
Sensor technology can also be used to locate vehicles and predict traffic flow. Wang J proposed a Bayesian combination method to predict traffic flow [
20]. Zhu S proposed a sensor location model to determine the optimal deployment strategy for dynamic origination–destination demand estimation [
21]. Zhu N proposed a two-stage stochastic model to estimate the travel time of the highway corridor [
22]. Hobson B W used sensors to estimate occupancy of commercial and institutional buildings [
23].
Passenger’s identity, policy, environment, and other factors will affect passenger’s travel choice. Moslem S proposed a decision support program that can analyze and build consensus among different stakeholders in traffic development issues [
24]. Ghorbanzadeh O used an interval level method to analyze the inconsistency and uncertainty of the public transport user’ response [
25]. Duleba S used the analytic hierarchy method to analyze the significance of stakeholders in Mersin, Turkey [
26]. Duleba S used the analytic hierarchy method combined with real data to test the optimal public transport service [
27]. Moslems used the fuzzy analytic hierarchy method to analyze the sustainable development of urban traffic in Mersin, Turkey [
28]. Jiang X used the improved fuzzy analytic hierarchy process to analyze the most basic factors affecting the last kilometer distribution [
29].
Guo Y used the collected data of residents’ travel preferences to study the impact of domestic mobility, the household registration system, and family planning policy on passengers’ travel patterns [
30]. Ruan Y analyzed the differences between daily life and pre migration, the difficulties encountered after migration and the social adaptation pressure of the floating elderly [
31]. Guo Y analyzed the impact of the information provided to potential relocaters on their decision-making process and travel behavior after relocation [
32]. Li Y studied the effects of congestion pricing and incentive strategies on early travel mode choice of car travelers in Beijing [
33].
Based on the moving data of Beijing’s families, Wang D linked the residential building environment with travel behavior, and considered travel attitude before and after the housing change [
34]. Cheng G investigated the tourism preferences of residents in two underdeveloped small cities, analyzed the tourism data of different regions, and discussed their commonness [
35]. Using the data of passengers’ travel behavior, Guo Y explored the personal and social impact of the ban on motorcycles on the morning commuting of motorcyclists’ families [
36].
The establishment of the complex network model of the subway is an effective way to analyze the performance of the urban subway network. Feng J proposed a weighted complex network model based on travel data and the operation schedule of the Beijing metro system to describe the travel mode of passengers on weekdays and weekends [
37]. Saberi M discussed the statistical characteristics of the urban travel composite weighted network, indicating that the potential dynamic process in the urban tourism demand network is similar [
38]. Zhang J analyzed the network characteristics of the subway network in three cities of China by using the complex network method, and analyzed its robustness by means of simulated attack [
39]. Wu X established the subway network model of six cities in the world, analyzed and compared its robustness [
40].
Yu W analyzed the evolution of the Nanjing metro network by using the complex network method, combined with the urban spatial pattern [
41]. Wei Y proposed the concept and performance index of the supernetwork model based on the complex network model, and analyzed the performance analysis, combined with the Nanjing metro network [
42]. Yu W established the supernetwork model of the Nanjing metro network, and analyzed the changes of the supernetwork performance by using the simulation attack method [
43]. Kanwar K modeled and compared the existing Delhi metro network and its expansion based on the complex network. The results show that the degree distribution and degree related parameters of the two networks are almost the same, and the connection situation is slightly improved [
44].
The rapid development of the Nanjing metro network and the universal application of smart cards provide the basis for the use of big data to analyze passenger behavior. Yang M investigated commuters using public bicycles to enter the subway, and analyzed personal characteristics and their experiences before and after going to work [
45]. Based on the single line passenger flow data of Nanjing metro, Li J studied the influence of weather conditions on the passenger flow of Nanjing metro [
46]. Zhao D analyzed the transfer situation between metro and public transport by using the data of the Nanjing bus smart card [
47]. Wei Y analyzed the temporal and spatial change rule of passenger flow based on the data of the Nanjing metro smart card [
48]. Wei Y used smart card data to propose the data filtering process and exception recognition, and classified and explained exceptions [
49]. Yu J used the field data of the Nanjing metro stations to establish an improved social force model and simulate the efficiency of passengers under different organizational modes [
50].
The existing research involves the OD records of various modes of transportation, and analyzes the occurrence preference, route selection, and destination demand of passengers. The model of the urban subway is established by using the complex network method, and the performance of the subway network is analyzed. Different from other modes of transportation, metro OD data have clear time records of entry and exit. As an important part of the comprehensive transportation system in large cities, the travel time of passengers has certain stability and regularity. How to set the appropriate index to analyze the OD data of the metro is an urgent problem to be solved in traffic information management.
Previous research includes the use of complex network methods to model the subway network and analyze its related performance. OD travel time, as an important indicator of passenger travel quality, is affected by various external factors. Traffic big data has been collected and analyzed in each subsystem. This provides an idea for this research, which can consider the complex performance index of the metro network and OD travel time index, and analyze their correlation. The smart card data of subway passengers will provide an accurate quantitative basis for correlation.
The development of traffic big data technology makes it possible to accurately evaluate and identify passengers’ travel behavior. In the urban public transportation system, the smart card has been widely used, especially in the subway and bus system. In addition, GPS is generally used to locate taxis, shared cars, and shared bicycles, which also provides the possibility to track the movement of passengers. But these data formats may face the situation of nonstandard format, which makes the analysis difficult. As an important part of the urban transportation system, the data of passengers’ swiping card is of typical significance. This study uses the swiping card records of subway passengers to evaluate the influencing factors of passenger travel time, which can lay a research foundation for refining the influencing factors in the future.
In this research, the card swiping data of the Nanjing metro smart card is used to select the five working days of passengers’ entry and exit records, establish an index evaluation system for the OD travel time of passengers, and select appropriate images to express the big data visually. This paper mainly adopts the complex network method to build the subway network model, and analyzes OD travel time with other indicators. The OD travel time index can be divided into three categories: time index, complex network index, and composite index. The Pearson correlation of these indexes of OD travel time was quantitatively analyzed. This will help to understand the factors affecting the travel time of passengers, improve the flow efficiency of passengers, and optimize the layout of the subway network and urban space.
5. Correlation Analysis
Before the correlation analysis of evaluation indexes of OD travel time, OD records with zero passenger flow must be removed first, otherwise the accuracy of the correlation analysis will be affected.
The purpose of correlation analysis is to understand the correlation between OD travel time and those factors, and the degree of correlation between them. The metro management department can promote the flow of passengers and improve the operation efficiency of the subway system by analyzing the influencing factors.
5.1. Pearson Correlation Model
The Pearson correlation model is generally used to measure the linear correlation between variables. When both variables are normal continuous variables and the relationship between them is linear, the Pearson correlation coefficient is used to show the correlation degree between the two variables.
The calculation formula is shown in Formula 5:
where
is the correlation coefficient,
is the code of indicator value,
and
are the corresponding indicator values,
is the number of indicator values. In the actual calculation, two series are used to express different index values, and the correlation coefficient reflects the correlation between the two series.
The greater the absolute value of the correlation coefficient, the stronger the correlation. The closer the correlation coefficient is to 1 or −1, the stronger the correlation degree is. The closer the correlation coefficient is to 0, the weaker the correlation degree is. Generally, the following value ranges determine the correlation strength of variables: correlation coefficient 0.8–1.0 indicates extremely strong correlation, 0.6–0.8 indicates strong correlation, 0.4–0.6 indicates moderate correlation, 0.2–0.4 indicates weak correlation, and 0.0–0.2 indicates extremely weak correlation or no correlation.
5.2. Correlation Analysis of Travel Time between Stations
OD travel time of passengers includes inbound time, outbound time, transfer time, and ride time. The first three time factors can be unified as waiting time. The ride time is linearly related to the number of ride times. OD travel time is the sum of waiting time and taking time, which is a linear relationship with these two variables. The paper makes a supplementary explanation.
Table 6 shows the correlation between travel time, travel time variance and travel time, number of rides, and passenger flow between stations. Since the passenger’s ride time is based on the assumption of taking the minimum time, it is also necessary to analyze the correlation between ride time and travel time.
The parameters in the table refer to the relevant data between stations, travel time and the number of rides refers to the minimum value. It can be seen from the table that travel time is highly related to ride time, and travel time is strongly related to ride times, which means that the longer the ride time and the more rides, the longer the whole travel time. Passenger flow is negatively correlated with travel time. The larger the passenger flow is, the shorter the travel time is. The reason may be that when the number of passengers’ increases, the subway will be added, the moving speed of passengers will increase, and the average waiting time will be shorter.
Travel time variance is negatively correlated with travel time and passenger flow. The variance of travel time and the number of rides are sometimes a negative extremely weak correlation, sometimes a positive extremely weak correlation. This shows that the travel time variance has certain randomness and has no obvious correlation with other parameters.
5.3. Correlation Analysis of Waiting Time between Stations
Table 7 shows the correlation between the waiting time or waiting time variance and the number of rides, passenger flow. The waiting time is the difference between travel time and ride time. It can be seen from the table that the waiting time is strongly related to the number of rides. This shows that the more ride times it takes, the longer it takes to wait for the car, because the transfer of passengers takes more time. The weak correlation between waiting time and passenger flow is negative. This shows that the larger the passenger flow, the less waiting time. The reason may be that in the face of congestion, passengers are quicker. The waiting time variance is weakly correlated with the number of rides and the passenger flow, which indicates that the waiting time has a greater randomness.
5.4. Correlation Analysis of Flow Efficiency between Stations
Table 8 shows the correlation between the flow efficiency between stations and ride time, number of rides and passenger flow. The flow efficiency is positively correlated with ride time, which indicates that the longer the ride time is, the higher the flow efficiency is. This is because the ride time taken by metro has increased for the entire journey. The flow efficiency is weakly related to the number of passengers and the passenger flow, which has certain randomness.
6. Conclusions
From 2005 to 2017, Nanjing metro opened 7 metro lines in total. Lines 1, 2, 3, and 4 form the backbone network of Nanjing metro, with 128 stations in total. Nanjing metro’s AFC system accumulates the big data of passengers’ entering and leaving the station. These data can be used to analyze the temporal and spatial distribution of OD. However, to evaluate OD, we need to further establish systematic indexes and analyze the correlation of indexes.
Before analyzing the correlation, we select five working days of data to filter out the unreasonable data. OD time index can be divided into three categories: time, complex network, and composite index. The time index includes use time probability, passenger flow between stations, average use time between stations, and use time variance between stations. Space P and ride time models are constructed by the complex network method. The complex network index is based on three complex network models, including the minimum number of rides between stations related to Space P, and the shortest ride time between stations related to the ride time network model. Composite index includes flow efficiency between stations and network flow efficiency.
This research shows a five-day use time probability distribution. Taking February 13 as a representative, the distribution chart of time indexes is shown. The interaction between the main line stations is frequent and the traffic is large. The records of early peak and late peak account for about 45% of the total number of days. The average time between stations is mostly within 100 min. The main line has a large passenger flow between stations, which results in a relatively large time variance of passengers. The flow efficiency reflects the ratio of ride time to travel time. The higher the flow efficiency is, the better the network mobility is, and the shorter waiting time the passengers spend.
The Pearson correlation model is used to measure the linear correlation between the variables of the matrix, which is divided into positive correlation and negative correlation. The greater the absolute value of the correlation coefficient, the stronger the correlation.
Travel time is strongly related to ride time, and travel time is strongly related to ride times. This means that the longer the ride time and the more rides you take, the longer the travel time. The travel time variance has certain randomness, and has no obvious correlation with other indexes.
There is a strong correlation between waiting time and the number of rides. The weak correlation between waiting time and passenger flow is negative. The waiting time variance is weakly correlated with the number of rides and the passenger flow, which indicates that the waiting time has a greater randomness.
The flow efficiency is positively correlated with ride time, which indicates that the longer the ride time is, the higher the flow efficiency is. The flow efficiency is weakly related to the number of rides and the passenger flow, which has certain randomness.
Previous research considered the use of the complex network to model the subway network, analyzed various factors affecting OD travel time, and also used traffic big data as the basis for analysis. In this research, these methods are applied synthetically to analyze the influencing factors of OD travel time and the relationship between them. There is still some limitation in this research. Some abnormal travel records need to be further identified and filtered. Passenger’s travel route selection is based on the assumption of the shortest path, which needs to be combined with mobile signaling and other means for more accurate identification. Different stations, different weather, and different time periods have great influence on the travel time of passengers, which needs to be further analyzed in combination with previous research.
Based on the big data of the Nanjing metro smart card, this study uses the complex network method to construct and analyze the OD travel time index. These indexes consider the connection between the starting point and the terminal point of passengers, and can be used for quantitative evaluation of the connection between the subway station and the network. This method can be extended to the bus system and public bicycle system. Future research can further increase the length of the observation date, select subway data from different cities, carefully distinguish the influencing factors of passenger travel time, and analyze their correlation, so as to control variables and improve the operation efficiency and management level of the transportation system.