Exploring the Characteristics of an Intra-Urban Bus Service Network: A Case Study of Shenzhen, China

: The urban bus service system is one of the most important components of a public transport system. Thus, exploring the spatial conﬁguration of the urban bus service system promotes an understanding of the quality of bus services. Such an understanding is of great importance to urban transport planning and policy making. In this study, we investigated the spatial characteristics of an urban bus service system by using the complex network approach. First, a three-step workﬂow was developed to collect a bus operating dataset from a public website. Then, we utilized the P-space method to represent the bus service network by connecting all bus stop pairs along each bus line. With the constructed bus network, a set of network analysis indicators were calculated to quantify the role of nodes in the network. A case study of Shenzhen, China was implemented to understand the statistical properties and spatial characteristics of the urban bus network conﬁguration. The empirical ﬁndings can provide insights into the statistical laws and distinct convenient areas in a bus service network, and consequently aid in optimizing the allocation of bus stops and routes.


Introduction
A city can be considered as a place where people are more densely distributed, with developed industrial and commercial activities. Therefore, it is a place of aggregated people, resources, energy, and their interactions, and has been treated as a complex system since the 1950s. Recently, with the rapid growth of population and expansion of urban areas, the demand for urban mobility has been increasing. Consequently, big cities around the world have encountered a common challenge-traffic congestion. The development and improvement of urban public transport is an effective approach to alleviate urban traffic pressure. In a public transport system, the bus service system is a primary travel mode that attracts a large number of people because of its convenience and low cost. Therefore, when optimizing bus stops and lines, urban policy makers must understand the spatial configuration of urban bus systems to improve the service efficiency of these transport systems [1][2][3].
to enhance our understanding of the statistical and spatial characteristics of the allocation of the bus service system on the basis of the spatial units of traffic analysis zones (TAZs).
The contributions of this study are two-fold. First, it provides a deep insight into the statistical and spatial characteristics of an intra-urban bus service system in a metropolis of China through the classical complex network approach. Second, evaluation of the network properties and the importance of nodes in the bus service network can help urban governments, transportation and planning departments, and planners learn more about intra-urban bus networks. For example, by mapping the centrality of the bus network onto TAZs, planners could identify the TAZs with high or low accessibility, connectivity, and influence on others in the bus service network.

Study Area
The case study area of this work was Shenzhen, which is located in the southeast of China, near Hong Kong. It was designated as the first special economic zone after the launch of the reform and opening-up policy of China. Shenzhen is considered as a special economic zone; this means that it benefits from a number of welfare policies in economic development (e.g., creating a favorable investment environment, reducing customs duty, and allowing the introduction of advanced technologies). This city has experienced tremendous change after being developed into a famous international metropolis from three fishing villages over the past four decades. Currently, it encompasses 10 administrative districts, which are further divided into downtown areas, suburbs, and rural areas according to the economic development ( Figure 1). Shenzhen covers an area of approximately 2000 km 2 and has a population of more than 15 million, making it the most densely populated among Chinese cities [15,38]. The unique socioeconomic and demographic status of Shenzhen makes it an interesting area for the study of urban bus networks.
To meet the massive daily travel demand of citizens, the government has constructed a multimodal public transport service system comprised of buses, subways, taxis, and bicycles. This study focused on the bus service system. The service area generated by bus stops has a radius of 500 m, accounting for more than 80% of the entire area of Shenzhen. This coverage is outstanding when compared with other large cities in China. Due to the convenience and low fare of buses, approximately 55.6% of passengers in Shenzhen have opted to travel by bus. Therefore, Shenzhen was selected as the case study to explore the characteristics of a bus service system. The results of this study can be referenced by other cities to improve the efficiency of their bus services.  To meet the massive daily travel demand of citizens, the government has constructed a multimodal public transport service system comprised of buses, subways, taxis, and bicycles. This study focused on the bus service system. The service area generated by bus stops has a radius of 500 m, accounting for more than 80% of the entire area of Shenzhen. This coverage is outstanding when compared with other large cities in China. Due to the convenience and low fare of buses, approximately 55.6% of passengers in Shenzhen have opted to travel by bus. Therefore, Shenzhen was selected as the case study to explore the characteristics of a bus service system. The results of this study can be referenced by other cities to improve the efficiency of their bus services.

Dataset Collection
The bus dataset used in this study was acquired from a public transport inquiry website (www.8684.cn/) that provides bus query services for most Chinese cities. The website updates in real time, that is, once the operation of buses (e.g., bus stops and bus lines) shows a change in the city. The website is linked to the homepage of the Shenzhen Bus Company (the company in charge of Shenzhen's bus system) to provide a bus inquiry service for citizens. Thus, the dataset collected from this website is reliable for exploring characteristics of the urban bus system. A three-step workflow was proposed to collect the bus data based on this website. First, we developed a web crawler program to crawl all bus lines from the website including every bus line number and bus stop name that each bus passed. Consequently, we generated a set of bus stops. Second, we projected these stops onto a spatial map. To achieve this goal, we utilized the Amap platform, which is a famous map service operator in China to obtain the geographic coordinates of each stop [39]. The platform provides the service of a geocoding API (application programming interface) to interpret an address into the corresponding latitude and longitude on the basis of its coordinate system. Finally, we transformed the latitude and longitude into a universal WGS84 (World Geodetic System 1984) coordinate system to match the other geospatial data of the study area.
Through this workflow, a total of 1016 bus lines were obtained in Shenzhen. Each bus line operates in a two-way fashion, which implies that the buses take the same route in outward and return directions. Thus, two bus stops are located on both sides of the road for the same stop. This study did not consider the directions of the bus lines, and pairs of bus stops with the same name on either side of the road were merged into one stop. Accordingly, a total of 5334 bus stops were generated. Figure 1 shows the spatial distribution of the extracted bus stops, which were mainly distributed along the road network.

Complex Network Analysis
In this section, the construction of the proposed complex network based on the bus lines and bus stops is initially described. Then, some network indicators are introduced to quantify the structure and centrality of the urban bus service network.

Network Construction
Generally, two popular methods are used to represent transport network systems, namely, L-space and P-space methods. The L-space method directly connects the adjacent bus stops along the same bus line, as shown in Figure 2b, which could only represent the original configuration of the bus network. The P-space method connects all bus stop pairs that pass along the same bus line (Figure 2c). Unlike the L-space method, this method is used to measure whether two stops are accessible by the same bus line, thus it generates a completely connected graph [9,40]. For example, bus line 3 in Figure 2 passes three stops, namely I, G, and J. In L-space, there is no direct connection between stop I and stop J (Figure 2b), but there is a direct connection between the two stops in the P-space because they can access each other through bus line 3 ( Figure 2c). Therefore, the P-space method is more suitable for analyzing the spatial transfer and correlation of bus stops. This study utilized the P-space method to construct the bus service network.
In the P-space bus service network G = (V, E, W), a node v i is a bus stop i, and a weight is assigned to each link of a bus line. First, the weight w ij of edge e ij between v i and v j in the network is initialized as w ij = 0. For every bus line, the specific rule to obtain the weight matrix W is as follows: (1) If two nodes v i and v j can be accessed without any transfer, it is considered that the two nodes are connected to each other, and the corresponding weight w ij between v i and v j is set to plus 1. (2) For each pair of nodes on the same bus line, there is no need to make any transfer, as shown in Figure 2c. For instance, bus line 1 passes through the four stops of A, B, C, and D. This line can be converted into six links: A-B, A-C, A-D, B-C, B-D, and C-D. Finally, the weight matrix of a bus service network derived from P-space method is W = {w ij }.  The spatial analysis units in this study were the TAZs (traffic analysis zones). TAZs are the basic spatial analysis units used in transportation planning to forecast the trip generation and travel demand. It is assumed that people living in the same TAZ show similar demographic characteristics [41,42]. The design of TAZs is usually implemented by urban planners or transport geographers. Therefore, it is reasonable for planners and researchers to adopt the TAZ as a basic unit for researching urban transportation problems. In this study, we obtained the TAZ file from the local transport department. We aggregated the constructed stop-based network by excluding the edges when both stops were located in the same TAZ and generated the new TAZ-based network GTAZ = (V, E, W). In this network, node Vi represents TAZi, edge Eij represents the connection between TAZ i and TAZ j, and weight wij represents the number of bus lines that pass through the two TAZs. Ultimately, a TAZ-based network with more than 965 nodes and 71,000 links was constructed for use in the subsequent analysis.

Topological Analysis of the Bus Service Network Structure
In the field of complex network theory, a network can be classified into different types (e.g., random, small-world, and free-scale networks), according to their statistical characteristics such as degree distribution, clustering coefficient, and average path length. Therefore, this section presents the probe into the overall topological features of the bus service network by calculating the three basic indicators. The weight of edge is not considered in the topological analysis.
(1) Degree and degree distribution Degree is a basic measurement index in network analysis and represents the total number of edges connected to a node [8]. For an undirected network, the degree of node i can be defined as follows: In this study, N represents the total number of TAZs, and the degree ki of a TAZ represents the number of TAZs connected directly to the TAZ by at least one bus line. The degree distribution can The spatial analysis units in this study were the TAZs (traffic analysis zones). TAZs are the basic spatial analysis units used in transportation planning to forecast the trip generation and travel demand. It is assumed that people living in the same TAZ show similar demographic characteristics [41,42]. The design of TAZs is usually implemented by urban planners or transport geographers. Therefore, it is reasonable for planners and researchers to adopt the TAZ as a basic unit for researching urban transportation problems. In this study, we obtained the TAZ file from the local transport department. We aggregated the constructed stop-based network by excluding the edges when both stops were located in the same TAZ and generated the new TAZ-based network G TAZ = (V, E, W). In this network, node V i represents TAZ i, edge E ij represents the connection between TAZ i and TAZ j, and weight w ij represents the number of bus lines that pass through the two TAZs. Ultimately, a TAZ-based network with more than 965 nodes and 71,000 links was constructed for use in the subsequent analysis.

Topological Analysis of the Bus Service Network Structure
In the field of complex network theory, a network can be classified into different types (e.g., random, small-world, and free-scale networks), according to their statistical characteristics such as degree distribution, clustering coefficient, and average path length. Therefore, this section presents the probe into the overall topological features of the bus service network by calculating the three basic indicators. The weight of edge is not considered in the topological analysis.
(1) Degree and degree distribution Degree is a basic measurement index in network analysis and represents the total number of edges connected to a node [8]. For an undirected network, the degree of node i can be defined as follows: In this study, N represents the total number of TAZs, and the degree k i of a TAZ represents the number of TAZs connected directly to the TAZ by at least one bus line. The degree distribution can be calculated as follows: where n k represents the number of TAZs whose degree is equal to k.
(2) Average path length The average path length presents the average number of edges along the shortest paths between all possible node pairs in the network [43]. This parameter can be calculated as where d ij is the number of edges of the shortest path between nodes i and j. A small average path length indicates good network accessibility. (

3) Clustering coefficient
The clustering coefficient is used to measure the extent of the local aggregation in the network [43]. The clustering coefficient of a node can be defined as the proportion of the actual edges E i between nodes within its neighborhood divided by the maximal possible edges between them: The clustering coefficient of the entire network is the average of all nodes in the network. The larger the clustering coefficient of the network, the greater the local aggregation. This parameter can be expressed as follows:

Nodes' Centrality Measurement of the Weighted Bus Network Structure
In this section, weight is considered in measuring the importance of nodes (TAZs) in the configuration of the bus service network. Generally, centrality is used to quantify the extent of importance of the nodes in the network. Therefore, the node of closeness centrality, betweenness centrality, and PageRank score were used to represent the TAZs' accessibility, connectivity, and influence on others in the network, respectively.
(1) Closeness centrality The closeness centrality of a node is used to quantify how close the node is to others by using the shortest path [44], which is defined as the reciprocal of the average shortest path length from the node to others: where d ij is the shortest path length between TAZ i and TAZ j. The larger the closeness centrality, the more conveniently TAZ i can be accessed from other TAZs by taking a bus. Thus, closeness centrality can represent the accessibility of TAZs.
(2) Betweenness centrality Betweenness centrality measures the connectivity of a node, which reflects the capacity of the intermediate transitivity of the node in the network [45]. The index is defined as follows: where n jk represents the number of all shortest paths between TAZ j and TAZ k, and n jk (i) is the number of shortest paths that pass through TAZ i. The larger the betweenness centrality, the more critical the TAZ in connecting TAZ pairs by taking the shortest path. Thus, betweenness centrality reflects the importance of a TAZ as a critical bridge in the bus service network.
(3) PageRank score PageRank score is used to measure the importance of nodes by comprehensively considering the importance of the nodes it connects [46]. It considers a strongly connected node to be more important than nodes with few connections. Therefore, a PageRank score can differentiate the importance of nodes with the same degree or strength. For an undirected and weighted network, the PageRank score can be calculated as follows: where λ is a free parameter, and the value of 0.85 was used in this study for the calculation [45]; E i is the set of edges that connects with node i; k i represents the degree of node i; and PR i represents the PageRank score of node i, and its calculation is an iterative process. The larger the score, the more important the TAZ in the bus service network. Figure 3 shows the degree of spatial distribution. On the basis of the method of network construction, the degree of a TAZ indicates the number of directly connected TAZs, that is, at least one bus line runs between the TAZ and others. Thus, the larger the degree of the TAZ, the greater the number of other TAZs that can be reached by taking a bus without transfer. As shown in Figure 3, the TAZs located in the southern part of Shenzhen have more degree distributions than those in the northern part (downtown areas). Figure 4a displays the statistical degree distribution, the minimum and maximum values of degree were 3 and 466, respectively, and the average value of degree was 148. For the cumulative degree distribution (Figure 4b), the distribution appears as an approximate process of linear decline, and the decay process can be well fitted by an exponential function, which is consistent with previous studies [28,32,47]. This result demonstrates that the degree does not follow the power law distribution in Shenzhen, indicating that the bus service network does not show a scale-free network property. Moreover, this observation is consistent with the results in [48], which implies that urban public transport systems rule out the law of scale-free because the stops are connected nearly randomly in P-space representation [9,31]. Furthermore, the results illustrate that the configuration of the bus service system in Shenzhen is relatively fair in terms of the direct arrival of buses to few TAZs with enormous reachability.     Figure 3 shows the degree of spatial distribution. On the basis of the method of network construction, the degree of a TAZ indicates the number of directly connected TAZs, that is, at least one bus line runs between the TAZ and others. Thus, the larger the degree of the TAZ, the greater the number of other TAZs that can be reached by taking a bus without transfer. As shown in Figure  3, the TAZs located in the southern part of Shenzhen have more degree distributions than those in the northern part (downtown areas). Figure 4a displays the statistical degree distribution, the minimum and maximum values of degree were 3 and 466, respectively, and the average value of degree was 148. For the cumulative degree distribution (Figure 4b), the distribution appears as an approximate process of linear decline, and the decay process can be well fitted by an exponential function, which is consistent with previous studies [28,32,47]. This result demonstrates that the degree does not follow the power law distribution in Shenzhen, indicating that the bus service network does not show a scale-free network property. Moreover, this observation is consistent with the results in [48], which implies that urban public transport systems rule out the law of scale-free because the stops are connected nearly randomly in P-space representation [9,31]. Furthermore, the results illustrate that the configuration of the bus service system in Shenzhen is relatively fair in terms of the direct arrival of buses to few TAZs with enormous reachability.

Small-World Property
In complex network theory, the average path length and clustering coefficient are the two main measurements of small-world property network. A small-world network generally has a similar average path length and a larger clustering coefficient compared to a random network of the same size [43]. The average path length of the bus service network was 2.01, which indicates that people only need one transfer on average to reach any TAZ of the city by taking buses. For the clustering coefficient of the entire network, the value of Shenzhen was 0.47. The average path length and

Small-World Property
In complex network theory, the average path length and clustering coefficient are the two main measurements of small-world property network. A small-world network generally has a similar average path length and a larger clustering coefficient compared to a random network of the same size [43]. The average path length of the bus service network was 2.01, which indicates that people only need one transfer on average to reach any TAZ of the city by taking buses. For the clustering coefficient of the entire network, the value of Shenzhen was 0.47. The average path length and clustering coefficient of a random network with the same size were 1.85 and 0.15, respectively. Thus, the bus service network of Shenzhen presents the small-world property, which is beneficial for improving both robustness and stability in network planning and protection, and can be used to understand and interpret the bus service network. Figure 5 shows the spatial distribution of the clustering coefficient of TAZs. The high clustering coefficient of TAZs are mainly located at the edge of the city. Such places including mountains, forests and farmland are sparsely populated areas. In comparison with Figure 3, the TAZs with a large degree had a low clustering coefficient, and the correlation between degree and clustering coefficient showed a nearly opposite tendency ( Figure 6). In other words, the larger the degree of the TAZ, the smaller the clustering coefficient, which is consistent with studies of the Chinese aviation system and railway network [4,49].

Charactering Edge Weight of Bus Service Network
In this section, the weight of edges is considered in analyzing the importance of each TAZ in a bus service network. As described in Section 3.1, the weight ij w represents the number of bus lines. Figure 7 shows the constructed weighted network, and the color represents the weight of edges. Figure 8 shows the statistical distribution of weight. The weight exhibits a long-tailed distribution ( Figure 8a); that is, only a few TAZ pairs have an extremely large number of bus lines that pass through, whereas the majority of TAZ pairs have few bus lines. We utilized a typical power law function -∝ β p w to capture the long-tailed distribution, where p represents the cumulative probability of weight w , and β represents the friction coefficient of decay. As shown in Figure 8b, the distribution can be approximately fitted using a straight line on a log-log scale, which implies that the weight follows a power law distribution and the friction coefficient β is 1.764. Detecting the distribution decay in bus service network connections is a prerequisite for understanding constrained complex networks from a geospatial perspective.

Charactering Edge Weight of Bus Service Network
In this section, the weight of edges is considered in analyzing the importance of each TAZ in a bus service network. As described in Section 3.1, the weight ij w represents the number of bus lines. Figure 7 shows the constructed weighted network, and the color represents the weight of edges. Figure 8 shows the statistical distribution of weight. The weight exhibits a long-tailed distribution ( Figure 8a); that is, only a few TAZ pairs have an extremely large number of bus lines that pass through, whereas the majority of TAZ pairs have few bus lines. We utilized a typical power law function -∝ β p w to capture the long-tailed distribution, where p represents the cumulative probability of weight w , and β represents the friction coefficient of decay. As shown in Figure 8b, the distribution can be approximately fitted using a straight line on a log-log scale, which implies that the weight follows a power law distribution and the friction coefficient β is 1.764. Detecting the distribution decay in bus service network connections is a prerequisite for understanding constrained complex networks from a geospatial perspective.

Charactering Edge Weight of Bus Service Network
In this section, the weight of edges is considered in analyzing the importance of each TAZ in a bus service network. As described in Section 3.1, the weight w ij represents the number of bus lines. Figure 7 shows the constructed weighted network, and the color represents the weight of edges. Figure 8 shows the statistical distribution of weight. The weight exhibits a long-tailed distribution ( Figure 8a); that is, only a few TAZ pairs have an extremely large number of bus lines that pass through, whereas the majority of TAZ pairs have few bus lines. We utilized a typical power law function p ∝ w −β to capture the long-tailed distribution, where p represents the cumulative probability of weight w, and β represents the friction coefficient of decay. As shown in Figure 8b, the distribution can be approximately fitted using a straight line on a log-log scale, which implies that the weight follows a power law distribution and the friction coefficient β is 1.764. Detecting the distribution decay in bus service network connections is a prerequisite for understanding constrained complex networks from a geospatial perspective.  In addition, it is apparent that distance decay also exists in the bus service network. As shown in Figure 7, most edges with larger weights usually have a shorter distance between the two TAZs, which follows a typical geographical phenomenon, namely, distance decay. Moreover, community detection is implemented for the bus service network in order to check whether adjacent TAZs can be classified into the same community. In complex networks, community detection can partition the whole network into several densely connected subnetworks based on the weight of edges, and a community is constituted by some tightly connected nodes. Therefore, community detection has been introduced into geographical research to divide the spatial interaction of cohesive communities. In this study, TAZs within the same community represent those that can be closely connected by bus lines. The community detection algorithm utilized in this study was the fast modularity maximization algorithm, which shows better performance in detecting weighted and undirected networks; a detailed description of this algorithm can be found in [50].  In addition, it is apparent that distance decay also exists in the bus service network. As shown in Figure 7, most edges with larger weights usually have a shorter distance between the two TAZs, which follows a typical geographical phenomenon, namely, distance decay. Moreover, community detection is implemented for the bus service network in order to check whether adjacent TAZs can be classified into the same community. In complex networks, community detection can partition the whole network into several densely connected subnetworks based on the weight of edges, and a community is constituted by some tightly connected nodes. Therefore, community detection has been introduced into geographical research to divide the spatial interaction of cohesive communities. In this study, TAZs within the same community represent those that can be closely connected by bus lines. The community detection algorithm utilized in this study was the fast modularity maximization algorithm, which shows better performance in detecting weighted and undirected networks; a detailed description of this algorithm can be found in [50]. In addition, it is apparent that distance decay also exists in the bus service network. As shown in Figure 7, most edges with larger weights usually have a shorter distance between the two TAZs, which follows a typical geographical phenomenon, namely, distance decay. Moreover, community detection is implemented for the bus service network in order to check whether adjacent TAZs can be classified into the same community. In complex networks, community detection can partition the whole network into several densely connected subnetworks based on the weight of edges, and a community is constituted by some tightly connected nodes. Therefore, community detection has been introduced into geographical research to divide the spatial interaction of cohesive communities. In this study, TAZs within the same community represent those that can be closely connected by bus lines. The community detection algorithm utilized in this study was the fast modularity maximization algorithm, which shows better performance in detecting weighted and undirected networks; a detailed description of this algorithm can be found in [50].
The result of community detection is shown in Figure 9, in which all TAZs were grouped into six communities. It can be seen that spatially adjacent TAZs have been classified as the same community, which indicates that spatial distance influences the connection relationships of an intra-urban bus network. Through a comparison with Figure 1, the detected communities were similar to Shenzhen's administrative divisions. For example, communities C1, C2, and C4 cover most of the TAZs in the Guangming, Baoan, and Nanshan districts, respectively. The community C3 mainly includes the TAZs of the Longhua district. However, the three districts (Futian, Luohu, and Yantian) in the south of Shenzhen were identified as community C5, which indicates that the bus lines could connect bus stops between the three districts very well. Similarly, the bus lines also tightly connect the bus stops in Dapeng, Pingshan, and the north of Longgang (Community C6). The result of community detection is shown in Figure 9, in which all TAZs were grouped into six communities. It can be seen that spatially adjacent TAZs have been classified as the same community, which indicates that spatial distance influences the connection relationships of an intraurban bus network. Through a comparison with Figure 1, the detected communities were similar to Shenzhen's administrative divisions. For example, communities C1, C2, and C4 cover most of the TAZs in the Guangming, Baoan, and Nanshan districts, respectively. The community C3 mainly includes the TAZs of the Longhua district. However, the three districts (Futian, Luohu, and Yantian) in the south of Shenzhen were identified as community C5, which indicates that the bus lines could connect bus stops between the three districts very well. Similarly, the bus lines also tightly connect the bus stops in Dapeng, Pingshan, and the north of Longgang (Community C6). We calculated the closeness centrality, betweenness centrality, and PageRank score based on the weighted bus service network to measure the centrality of the TAZs in the bus service system. In order to visualize the spatial distribution of the three centrality indicators, for each centrality indicator, the famous approach of natural breaks classification was utilized to classify TAZs into six classes according to their statistical characteristics. Natural breaks are designed to determine the best arrangement of values into different classes according to their statistical characteristics; their

Charactering Centrality of Traffic Analysis Zones in Bus Service Network
We calculated the closeness centrality, betweenness centrality, and PageRank score based on the weighted bus service network to measure the centrality of the TAZs in the bus service system. In order to visualize the spatial distribution of the three centrality indicators, for each centrality indicator, the famous approach of natural breaks classification was utilized to classify TAZs into six classes according to their statistical characteristics. Natural breaks are designed to determine the best arrangement of values into different classes according to their statistical characteristics; their determination requires an iterative process that seeks to minimize the variance within classes and maximize the variance between classes of groups with similar elements according to their statistical features [51]. Thus, this method has been extensively used to classify and visualize geographic data. After classification, the dataset can be classified into different groups in ascending order. Therefore, the six classes are hierarchical levels, and were denoted as L1, L2, L3, L4, L5, L6, where L1 and L6 represent the lowest and highest classes, respectively. Figures 9 and 10 show the statistical and spatial distributions of the six levels of groups for the closeness centrality, betweenness centrality, and PageRank score of TAZs, respectively. Note that this study did not use the same standard for classifying the three indicators; we executed the natural breaks for each centrality indicator separately. Therefore, the natural breaks method was used to help visualize the spatial hierarchical differences of TAZs for each centrality indicator.
As previously mentioned, closeness measures the extent of ease in accessing other TAZs by taking a bus. The percentage of closeness in the six groups was close to a normal distribution, and the number of TAZs equal and greater than level L4 accounted for more than 55% of the total (Figure 10). TAZs with high bus accessibility were mainly concentrated in the southern areas of Shenzhen ( Figure 11). For betweenness centrality, the distribution in the six groups showed a declining trend, and the number of TAZs in level L1 accounted for approximately 50% of all TAZs. The betweenness index quantifies the connectivity and transitivity capabilities of the TAZ in the bus service network, and the percentage of TAZs with high bus connectivity (being equal and greater than L4) only accounted for less than 17% of all TAZs ( Figure 10). As shown in Figure 11, these high bus connectivity areas were mainly scattered over different districts of the city to connect various urban areas. The statistical distribution of the PageRank score was similar to that of the closeness centrality; the main difference is that the number of TAZs lying in less than level L4 was dominant and accounted for approximately 60% of all TAZs (Figure 10). The PageRank score evaluates the importance and influence of TAZs in the bus service network. The spatial distribution of PageRank was similar to the closeness centrality. The TAZs with a high PageRank score were mainly concentrated in the Southern Nanshan, Futian, and Luohu administrative districts ( Figure 11). Moreover, the important TAZs were those that play important roles in the geospatial bus network and represent regions that need to be protected, which could further assist decision makers in identifying potentially important areas. The result of community detection is shown in Figure 9, in which all TAZs were grouped into six communities. It can be seen that spatially adjacent TAZs have been classified as the same community, which indicates that spatial distance influences the connection relationships of an intraurban bus network. Through a comparison with Figure 1, the detected communities were similar to Shenzhen's administrative divisions. For example, communities C1, C2, and C4 cover most of the TAZs in the Guangming, Baoan, and Nanshan districts, respectively. The community C3 mainly includes the TAZs of the Longhua district. However, the three districts (Futian, Luohu, and Yantian) in the south of Shenzhen were identified as community C5, which indicates that the bus lines could connect bus stops between the three districts very well. Similarly, the bus lines also tightly connect the bus stops in Dapeng, Pingshan, and the north of Longgang (Community C6). We calculated the closeness centrality, betweenness centrality, and PageRank score based on the weighted bus service network to measure the centrality of the TAZs in the bus service system. In order to visualize the spatial distribution of the three centrality indicators, for each centrality indicator, the famous approach of natural breaks classification was utilized to classify TAZs into six classes according to their statistical characteristics. Natural breaks are designed to determine the best arrangement of values into different classes according to their statistical characteristics; their determination requires an iterative process that seeks to minimize the variance within classes and maximize the variance between classes of groups with similar elements according to their statistical features [51]. Thus, this method has been extensively used to classify and visualize geographic data. After classification, the dataset can be classified into different groups in ascending order. Therefore, the six classes are hierarchical levels, and were denoted as L1, L2, L3, L4, L5, L6, where L1 and L6 represent the lowest and highest classes, respectively. Figures 9 and 10 show the statistical and spatial distributions of the six levels of groups for the closeness centrality, betweenness centrality, and PageRank score of TAZs, respectively. Note that this study did not use the same standard for classifying the three indicators; we executed the natural breaks for each centrality indicator separately. Therefore, the natural breaks method was used to help visualize the spatial hierarchical differences of TAZs for each centrality indicator. Figure 11. Spatial distribution of closeness centrality, betweenness centrality, and PageRank score of the TAZs. Figure 11. Spatial distribution of closeness centrality, betweenness centrality, and PageRank score of the TAZs.

Correlation of Centrality between Bus Network and Road Network
As previously described, the degree reflects the number of TAZs that can be reached by taking a bus without any transfers. Meanwhile, the closeness, betweenness, and PageRank measure the TAZs' accessibility, connectivity, and influence on others in the network, respectively. In this section, we further examined the relationship between the degree and the three centrality indexes, as shown in Figure 12. The Pearson's correlation coefficients of closeness, betweenness, and PageRank were 0.953, 0.761, and 0.995, respectively, which demonstrates the strong correlation between the degree and these three indexes. Therefore, in the bus service network, a TAZ with a high degree usually has great accessibility, connectivity, and influence.
accounted for approximately 60% of all TAZs (Figure 10). The PageRank score evaluates the importance and influence of TAZs in the bus service network. The spatial distribution of PageRank was similar to the closeness centrality. The TAZs with a high PageRank score were mainly concentrated in the Southern Nanshan, Futian, and Luohu administrative districts ( Figure 11). Moreover, the important TAZs were those that play important roles in the geospatial bus network and represent regions that need to be protected, which could further assist decision makers in identifying potentially important areas.

Correlation of Centrality between Bus Network and Road Network
As previously described, the degree reflects the number of TAZs that can be reached by taking a bus without any transfers. Meanwhile, the closeness, betweenness, and PageRank measure the TAZs' accessibility, connectivity, and influence on others in the network, respectively. In this section, we further examined the relationship between the degree and the three centrality indexes, as shown in Figure 12. The Pearson's correlation coefficients of closeness, betweenness, and PageRank were 0.953, 0.761, and 0.995, respectively, which demonstrates the strong correlation between the degree and these three indexes. Therefore, in the bus service network, a TAZ with a high degree usually has great accessibility, connectivity, and influence.  In a city, because the bus travels along the urban network of roads, it is of interest whether the structure of the urban road network is correlated with the bus service network. In order to address this gap, we examined the correlation of the three centrality indicators between the bus service network and road network. Based on the theory presented in Section 3.3, we calculated the closeness, betweenness, and PageRank of urban road nodes and aggregated the average value for each TAZ. Similarly, Pearson's correlation coefficient was used to measure the correlation between the bus network and road network, and we normalized the centrality indicators by using the maximum and minimum before calculating the correlation coefficient. As shown in Figure 13, the correlation coefficients of closeness, betweenness, and PageRank were 0.452, 0.104, and 0.003, respectively. The correlation coefficient indicates that there was some correlation of closeness between the bus network and road network, which indicates that the convenience of the road network may be correlated with the bus network in some TAZs. For betweenness, the coefficient indicates that the connectivity of the bus network has a low correlation with the intermediate transitivity of the urban road network. However, the PageRank of the bus network was nearly unrelated with that of the road network. In the road network, the nodes connect with their adjacent nodes by road segments; usually the influence of road nodes is very small, which results in a relatively concentrated PageRank score (about 0.45 in Figure 13c) of nodes in the road network. In contrast, in the bus network, a bus stop can connect with many other stops only if there are bus lines passing through these stops. Therefore, the PageRank of the bus network is more dispersed. In a city, because the bus travels along the urban network of roads, it is of interest whether the structure of the urban road network is correlated with the bus service network. In order to address this gap, we examined the correlation of the three centrality indicators between the bus service network and road network. Based on the theory presented in Section 3.3, we calculated the closeness, betweenness, and PageRank of urban road nodes and aggregated the average value for each TAZ. Similarly, Pearson's correlation coefficient was used to measure the correlation between the bus network and road network, and we normalized the centrality indicators by using the maximum and minimum before calculating the correlation coefficient. As shown in Figure 13, the correlation coefficients of closeness, betweenness, and PageRank were 0.452, 0.104, and 0.003, respectively. The correlation coefficient indicates that there was some correlation of closeness between the bus network and road network, which indicates that the convenience of the road network may be correlated with the bus network in some TAZs. For betweenness, the coefficient indicates that the connectivity of the bus network has a low correlation with the intermediate transitivity of the urban road network. However, the PageRank of the bus network was nearly unrelated with that of the road network. In the road network, the nodes connect with their adjacent nodes by road segments; usually the influence of road nodes is very small, which results in a relatively concentrated PageRank score (about 0.45 in Figure 13c) of nodes in the road network. In contrast, in the bus network, a bus stop can connect with many other stops only if there are bus lines passing through these stops. Therefore, the PageRank of the bus network is more dispersed.

Conclusions and Future Work
This study aimed to investigate the spatial structure of urban bus networks using complex network theory. We took Shenzhen as a case study and developed a three-step workflow to collect the bus service dataset from a public transport inquiry website. On the basis of the operating configuration of bus lines, the P-space graph principle was used to establish the bus service network to represent the relationship among bus stops. Then, network analysis indicators were calculated to analyze the statistical properties of the network and quantify the accessibility, connectivity, and influence of nodes in the network. We found that the bus network of Shenzhen follows a small-world property, with an average path length of 2.01 and clustering coefficient of 0.47. This means that people need one transfer on average to reach any TAZ of the city by taking buses. The clustering coefficient represents the extent of the local aggregation in the network, so it indicates the connectivity of the TAZ with its local adjacent TAZs. The two indicators could be used to design an urban bus network, and a well-designed bus network should try to achieve a small average path length and high clustering coefficient. The weight of edges represents the spatial interaction between the TAZs, and the heavy-tailed distribution of weight indicates that only a few edges have an extremely large weight. Based on the weight of edges, we identified six spatially interacting communities from the bus service network, and discussed the similarity and difference between detected communities and administrative districts, which provides an understanding of the spatial interaction structure formed

Conclusions and Future Work
This study aimed to investigate the spatial structure of urban bus networks using complex network theory. We took Shenzhen as a case study and developed a three-step workflow to collect the bus service dataset from a public transport inquiry website. On the basis of the operating configuration of bus lines, the P-space graph principle was used to establish the bus service network to represent the relationship among bus stops. Then, network analysis indicators were calculated to analyze the statistical properties of the network and quantify the accessibility, connectivity, and influence of nodes in the network. We found that the bus network of Shenzhen follows a small-world property, with an average path length of 2.01 and clustering coefficient of 0.47. This means that people need one transfer on average to reach any TAZ of the city by taking buses. The clustering coefficient represents the extent of the local aggregation in the network, so it indicates the connectivity of the TAZ with its local adjacent TAZs. The two indicators could be used to design an urban bus network, and a well-designed bus network should try to achieve a small average path length and high clustering coefficient. The weight of edges represents the spatial interaction between the TAZs, and the heavy-tailed distribution of weight indicates that only a few edges have an extremely large weight. Based on the weight of edges, we identified six spatially interacting communities from the bus service network, and discussed the similarity and difference between detected communities and administrative districts, which provides an understanding of the spatial interaction structure formed by urban bus lines. In addition, we classified the TAZs into six classes based on the value of closeness, betweenness and PageRank score, and found that the TAZs in southern area have high accessibility, connectivity, and influence. Finally, we investigated the correlation of centrality between the bus service network and urban road network and found that closeness showed the highest correlation and that PageRank was not related. These empirical results provide an insight into the laws and significant accessible areas of a bus network, which promotes the understanding of the characteristics of configuration in urban bus service systems.
However, limitations still exist, which can be improved upon in future works. The dataset acquired from the website lacks the detailed operating information of bus lines such as the timetables and number of buses for each line, which is of great importance to assess the efficiency of a bus service. Moreover, passenger trajectories can reveal the usage patterns of citizens who take buses, which has great referential meaning in optimizing the configuration of a bus network system. Therefore, we will work on both issues by combining other datasets (e.g., smart card dataset) to help urban managers establish a bus-friendly city and improve the residents' green travel behavior.