Urban Spatial Interaction Analysis Using Inter-City Transport Big Data: A Case Study of the Yangtze River Delta Urban Agglomeration of China

: A better understanding of the urban spatial interaction is important for optimizing the spatial structure and layout of urban agglomeration (UA). We develop a crawler program to compile online big data for urban spatial interaction analysis. Differing from the previous studies, vectorial, realistic, and high spatiotemporal resolution inter-city, bus-passenger-ﬂow big data instead of statistical and modeled data are used for urban spatial interaction analysis. The Yangtze River Delta (YRD) is selected as a case study region to test the big data approach and to gain insights into the cities’ interaction in China’s largest UA. The results testiﬁed the superiorities of the big-data method over the traditional gravity model, conﬁrmed some phenomena discussed or mentioned in the literature and regional plans regarding the urban interaction in YRD, derived policy implications for enhancing the sustainability of UA, and suggested some potentials for improving the limitations of the big-data


Introduction
Cities are exchanging various flows such as material, energy, information, transportation, and migration with each [1]. This phenomenon is called urban spatial interaction, through which cities are connected with each other and integrate as a closely related city network system (namely urban agglomeration, UA) [2]. Since UA is recognized as having a stronger ability in aggregating population, capital, and resources compared to a single city [3,4], it is acknowledged as the new power engine in regional development [5,6]. The urban spatial interaction is an important driving force that determines the dynamic growth and the economic performance of UA [7,8]. The understanding of the spatial interaction among cities will help to strengthen the functional linkage within UA, optimize the spatial layout of industries, promote the competitiveness of UA, and provide policy implications for regional sustainable development.
Since the early 20th century, socioeconomic indicators, such as gross domestic product (GDP), population, and the unemployment rate, have been widely used in assessing the city's performance [9,10]. A great number of studies have been conducted on the theories and modeling of the urban spatial interaction based on the socioeconomic data and indicators. For example, Zipf first introduced Newton's law of gravity into an urban study and built a theoretical basis for

Study Area
The Yangtze River Delta (YRD) urban agglomeration, as shown in Figure 1, is located in the eastern region of China, which includes three jurisdiction units, i.e., Shanghai municipality, Jiangsu province, and Zhejiang province. Within the YRD UA, there are 25 cities in total, which are Shanghai, Nanjing (the capital of Jiangsu province), Hangzhou (the capital of Zhejiang province) and 22 prefecture-level cities (Wuxi, Xuzhou, Changzhou, Suzhou, Nantong, Lianyungang, Huaian, Yanchen, Yangzhou, Zhenjiang, Taizhou, Suqian, Ningbo, Wenzhou, Shaoxing, Huzhou, Jiaxing, Jinhua, Quzhou, Taizhou, Jiaxing, and Lishui). YRD UA is the 6th largest UA in the world and the biggest in China, which plays a significant role in the national and regional development of China. In 2016, the urban population and GDP of the YRD UA was 159 million and 15 trillion Chinese Yuan, respectively, accounting for 11.5% and 20.5% of China's total. However, it only occupies 2.3% (219 thousand km 2 ) of China's territory. In the future, the YRD UA has an ambitious urbanization plan to raise its urban population share to 72% by 2020, and to build world-class industrial, innovation, and urban clusters with a global impact (NDRC, 2010). It would be even more necessary to have insights into the urban spatial interaction in China's largest UA in order to generate policy implications for the realization of its ultimate goals, and to provide experiences for other UAs regarding their sustainable development. Delta (YRD) urban agglomeration in China is chosen as a case study region to test the spatially explicit approach and big dataset and to gain insights into the urban spatial interaction in China's UA. The remainder of this paper is organized as follows. Section 2 introduces the study area. Section 3 describes the materials and methods for collecting the passenger-flow data and the social network analytics. Section 4 interprets the results and discusses the research limitations and potential improvements. The last section summarizes the major findings.

Study Area
The Yangtze River Delta (YRD) urban agglomeration, as shown in Figure 1, is located in the eastern region of China, which includes three jurisdiction units, i.e., Shanghai municipality, Jiangsu province, and Zhejiang province. Within the YRD UA, there are 25 cities in total, which are Shanghai, Nanjing (the capital of Jiangsu province), Hangzhou (the capital of Zhejiang province) and 22 prefecture-level cities (Wuxi, Xuzhou, Changzhou, Suzhou, Nantong, Lianyungang, Huaian, Yanchen, Yangzhou, Zhenjiang, Taizhou, Suqian, Ningbo, Wenzhou, Shaoxing, Huzhou, Jiaxing, Jinhua, Quzhou, Taizhou, Jiaxing, and Lishui). YRD UA is the 6th largest UA in the world and the biggest in China, which plays a significant role in the national and regional development of China. In 2016, the urban population and GDP of the YRD UA was 159 million and 15 trillion Chinese Yuan, respectively, accounting for 11.5% and 20.5% of China's total. However, it only occupies 2.3% (219 thousand km 2 ) of China's territory. In the future, the YRD UA has an ambitious urbanization plan to raise its urban population share to 72% by 2020, and to build world-class industrial, innovation, and urban clusters with a global impact (NDRC, 2010). It would be even more necessary to have insights into the urban spatial interaction in China's largest UA in order to generate policy implications for the realization of its ultimate goals, and to provide experiences for other UAs regarding their sustainable development.

Materials
In this paper, inter-city passenger flow was adopted to measure the urban spatial interaction in UA. Instead of using the traditional statistical data, we compiled the bus passenger-flow data from an online bus ticket booking website, namely "Bus Steward" [41] using R language programming. As "Bus Steward" is one of the strategic partners of the China Road Transportation Association, which has close co-operation with many city-level transportation companies in YRD UA, it is believed that the passenger-flow data are the most detailed and reliable in YRD. With the help of R studio, especially its package "httr" and "xml2", a crawler program is developed to obtain the enormous long-term ticket information automatically from the "Bus Steward" website. In this study, over 270,000 pieces of ticket information in an hourly interval covering the whole month of June 2018 were obtained. They include the origin and destination station name, coach type, departure time, and remaining ticket numbers for each seat class. Table 1 shows some examples of the online collected data. Data of Lianyungang and Taizhou were failed to be obtained due to the website database structure problem. The pseudocode of the developed crawler program can be found in Appendix A. The R program code, input data, and readme file have also been uploaded in an online repository, GitHub, so that the crawler program can be accessed freely and implemented by the readers.

Methods for Urban Spatial Interaction Analysis
Basically, the urban spatial interaction between cities can be evaluated based on two aspects. One is the characteristics of a single city, the other is the characteristics of the city group. Some widely used indices, such as interaction intensity, degree and betweenness centrality, and convergence of iterated correlations (CONCOR) are adopted in this paper to quantify these characteristics. The flowchart is shown in Figure 2, and the details about each index are described as follows.

Interaction Intensity Index
The interaction intensity index is used to measure how strongly connected two cities are and is defined by Equation (1)

Interaction Intensity Index
The interaction intensity index is used to measure how strongly connected two cities are and is defined by Equation (1) I(i, j) = P ij + P ji (1) where I(i,j) is the interaction intensity between city i and j, P ij is the number of passengers depart from city i to j, and P ji is the number of passengers depart from city j to i.

Degree Centrality and Betweenness Centrality Index
Degree centrality means the number and intensity of links incident upon a node. To make a better use of the vectorial passenger-flow data, three specific degree centrality indices are defined. They are in-degree centrality, out-degree centrality, and net-degree centrality, which are represented by the number of passengers who move into the city, leave for other cities, and their net changes. It's easy to understand that cities with a higher in-degree centrality are more attractive than the cities with lower in-degree centrality. Correspondingly, the larger value of out-degree centrality means a city is less attractive.
where C Di (i) is the in-degree centrality of city i, P ji is the number of passengers who depart from city j to city i, n is the total number of cities in YRD UA.
where C Do (i) is the out-degree centrality of city i, P ij is the number of passengers who depart from city i to city j, n is the total number of cities in YRD UA. Net-degree centrality C Dn (i) is defined in Equation (4) to indicate the net change of degree centrality.
In addition to degree centrality, betweenness centrality, as defined in Equation (5), is used to represent the degree of which nodes stand between each other. It highlights the values of a node served as an intermediary. For example, a city with higher betweenness centrality in a UA would have more influence on the whole area because more information, population, and resources will pass through that node.
where σ st is the total number of shortest paths from city s to city t and σ st (v) is the number of those paths that pass through city v, V is the universal set of all the cities. All the above centrality indexes are calculated by the software (UCIENT 6) developed by Roberta Chase and Steve Borgatti [42] and are normalized between 0 and 1 to make the value comparable.

CONCOR Index
A cohesive subgroup is described as a subgroup that consists of several members, which have a relatively strong, direct, close, and positive relationship [43]. City network cohesive subgroup analysis can reveal the structure inside a UA in order to measure the number of cohesive subgroups and their group members, which could describe the development state of UA. In this study, we discuss the subgroup aggregation phenomenon in YRD UA according to the interaction intensity between two cities to reflect the relationship between different cities and reveal the spatial structure of UA cities. CONCOR (the convergence of iterated correlations) index is used to identify the cohesive subgroup in this study. CONCOR starts from a first correlation matrix C 1 , as shown in Equation (6).
where a ij is the normalized number of passengers who depart from city i to city j.
The Pearson correlation coefficient of every row and column is calculated to produce a new correlation matrix, C 2 . Then, the procedure is repeated until the result ends up with a matrix whose entries take one of two values: 1 or −1. The final matrix can be permuted to produce blocks of 1 s and −1 s, with each block representing a group of structurally equivalent actors. Finally, a tree diagram is used to reveal the structure of each subgroup and shows all the subgroup members. In addition, the progress mentioned above is also performed in UCIENT 6.

Comparison with Gravity Model
In order to find out the difference between our big-data-based method and the traditional statistical-data-based SNA, we performed the Gravity model for comparison. As shown in Equation (7), the population was used to represent the scale of each city and the interaction of each city is defined as follows.
where I ij is the interaction intensity between city i and city j. Pop i is the population of city i, Pop j is the population of city j. D ij is the Euclidean distance between city i and city j.

Interaction Intensity between Cities
Figure 3a presents the interaction intensity of cities in YRD UA. Generally, the value of interaction intensity between cities is not even in this region. Higher interaction could be observed mainly in the middle and south of Jiangsu province and the north of Zhejiang province. Cities in South Zhejiang province show a weak interaction with other cities, which may due to the inconvenient transport caused by the vast mountain terrain there.
Shanghai is the core city of YRD UA as it holds the most intensive interaction with other cities and has a larger radiation radius, see Figure 3b. In total, 12 out of 24 cities have a strong interaction with Shanghai with interaction intensity values all above 0.19. The interaction intensity of Shanghai is 5.775. It is much higher than the 2nd (Nanjing, 5.123) and 3rd city (Taizhou, 3.901). As shown in Figure 3c, Nanjing, the capital city of Jiangsu province, has a significant impact on the cities in the Northern Jiangsu province such as Suqian, Huaian, and Yancheng. While cities in the Eastern Jiangsu province have a stronger interaction with Shanghai. It is also observed that cities that are located between Nanjing and Shanghai (such as Changzhou, Zhenjiang, Wuxi, Suzhou, Nantong, and Taizhou) have strong interactions between each other and tend to form an organic whole. In contrast to Jiangsu and Shanghai, the interaction intensity of cities in Zhejiang province is relatively weaker. Hangzhou is still the central city, which serves as the intermediary to connect cities within Zhejiang province (such as Shaoxing, Ningbo, and Quzhou) and cities out of Zhejiang (Shanghai and Nantong). located between Nanjing and Shanghai (such as Changzhou, Zhenjiang, Wuxi, Suzhou, Nantong, and Taizhou) have strong interactions between each other and tend to form an organic whole. In contrast to Jiangsu and Shanghai, the interaction intensity of cities in Zhejiang province is relatively weaker. Hangzhou is still the central city, which serves as the intermediary to connect cities within Zhejiang province (such as Shaoxing, Ningbo, and Quzhou) and cities out of Zhejiang (Shanghai and Nantong).  Figure 4 shows the results of in-degree, out-degree, net-degree, and betweenness centrality of cities in YRD UA. Each index is classified into four grades according to the natural breaking methods [44]. In general, except for Shaoxin and Lishui, all cities in Zhejiang province show a positive net-degree centrality, which means a net passenger inflow. The situation is just the opposite in Jiangsu province as most cities, except for Nanjing and Yangzhou, show a negative net-degree centrality.

Degree and betweeness Centrality Of Cities in YRD UA
Specifically, Nanjing, Yangzhou, and Ningbo were the top three cities with respect to their net-degree centrality, suggesting these cities attract a net inflow of passengers from other cities in YRD UA. Though both the in-degree and out-degree centrality of the largest city in YRD UA (Shanghai) was large, its net-degree centrality was not as high as that of Nanjing, Yangzhou, and Ningbo, which means Shanghai may have less attraction to bus passengers compared to those cities. According to a survey conducted on the trip purposes of passengers along the Beijing-Shanghai corridor (it covers most cities in our study), work, tourism, and visits to friends and relatives are the top three purposes accounting for 99.3% of the total trips [45]. Taking Nanjing as an example, in recent years, especially after 2010, it has witnessed significant growth in its economy and employment opportunities. The mean growth rate of Nanjing's GDP has been sustained at 10.7% per annum, and its fixed asset investment also increased by 2% per annum, which was among the fastest-growing cities in YRD UA [46]. In addition, Nanjing has provided over 0.2 million new job opportunities in 2017 and it attracts massive population inflows, with the annual growth of net  Figure 4 shows the results of in-degree, out-degree, net-degree, and betweenness centrality of cities in YRD UA. Each index is classified into four grades according to the natural breaking methods [44]. In general, except for Shaoxin and Lishui, all cities in Zhejiang province show a positive net-degree centrality, which means a net passenger inflow. The situation is just the opposite in Jiangsu province as most cities, except for Nanjing and Yangzhou, show a negative net-degree centrality.

Degree and betweeness Centrality Of Cities in YRD UA
Specifically, Nanjing, Yangzhou, and Ningbo were the top three cities with respect to their net-degree centrality, suggesting these cities attract a net inflow of passengers from other cities in YRD UA. Though both the in-degree and out-degree centrality of the largest city in YRD UA (Shanghai) was large, its net-degree centrality was not as high as that of Nanjing, Yangzhou, and Ningbo, which means Shanghai may have less attraction to bus passengers compared to those cities. According to a survey conducted on the trip purposes of passengers along the Beijing-Shanghai corridor (it covers most cities in our study), work, tourism, and visits to friends and relatives are the top three purposes accounting for 99.3% of the total trips [45]. Taking Nanjing as an example, in recent years, especially after 2010, it has witnessed significant growth in its economy and employment opportunities. The mean growth rate of Nanjing's GDP has been sustained at 10.7% per annum, and its fixed asset investment also increased by 2% per annum, which was among the fastest-growing cities in YRD UA [46]. In addition, Nanjing has provided over 0.2 million new job opportunities in 2017 and it attracts massive population inflows, with the annual growth of net population immigration reaching 50,300 in 2016 [46]. The outstanding performance in economic development and providing job opportunities make Nanjing's net-degree centrality high. Similar driving factors could also be confirmed in Yangzhou and Ningbo's cases. These two cities are expected to be the rising cities in future development according to their planning. The Yangzhou government has paid great attention to the innovative business and outstanding talents attractions with the relevant investment reaching 50 billion RMB (China's monetary unit) in 2018 [47]. Similarly, the Ningbo government has put great emphasis on the national job fairs, which have provided over 1500 job opportunities in 54 enterprises in 2018 [48]. Meanwhile, Nanjing, Yangzhou, and Ningbo are also the top-ranked tourism cities in China and are full of natural and cultural tourism resources. As a result, as observed in Figure 4, the in-degree and net-degree centralities of Yangzhou and Ningbo were relatively higher than most of the other cities. On the contrary, Changzhou and Taizhou show a high value of out-degree centrality, with values of 0.142 and 0.128, respectively.
It suggests that these two cities are losing passengers because the net-degree centrality was −0.056 and −0.021, respectively.
When looking at the betweenness centrality of cities in YRD UA, see Figure 4d, it is found that Hangzhou and Yancheng have higher betweenness centrality than the other cities and they tend to be the intermediary cities within YRD UA. The functional orientation set in the master plan [49,50] could explain this phenomenon. Both cities are planning to strengthen their cooperations with other cities in YRD UA to push forward the process of regional integration, which makes them important nodes in the YRD UA's network.

Cohesive Subgroups of YRD UA
As shown in Figure 5, cities in YRD UA are divided into four cohesive subgroups. Generally, most of the subgroups consist of the cities geographically close to each other. Nanjing, Zhenjiang, Yangzhou, Huaian, Suqian, and Xuzhou, which are located in the west part of Jiangsu province, interacted more closely regarding their bus passenger flows. Shanghai, together with the cities in Eastern Jiangsu (Yancheng, Taizhou, Nantong, Changzhou, Wuxi, and Suzhou), were more tightly linked with each other. Similarly, cities in Zhejiang province were also divided into the north group (Hangzhou, Shaoxing, Ningbo, Huzhou Zhoushan, and Jiaxing) and south group (Quzhou, Jinhua, Lishui, and Wenzhou). Though the subgrouping results will be affected by the number of sampling cities, the result is consistent with the long-term layout of YRD UA that Nanjing-Shanghai-Hangzhou are planned to be the important node cities in the whole city network. Due to the increasing need to achieve a sustainable and balanced development between economy and environment, Nanjing and Hangzhou are planned to be important centers in regional development to ensure the shift of manufacturing industries from the super megacity, Shanghai [51]. Our cohesive subgrouping confirmed the targeted trend, which suggests the multi-core spatial layout has formed in YRD UA.

Cohesive Subgroups of YRD UA
As shown in Figure 5, cities in YRD UA are divided into four cohesive subgroups. Generally, most of the subgroups consist of the cities geographically close to each other. Nanjing, Zhenjiang, Yangzhou, Huaian, Suqian, and Xuzhou, which are located in the west part of Jiangsu province, interacted more closely regarding their bus passenger flows. Shanghai, together with the cities in Eastern Jiangsu (Yancheng, Taizhou, Nantong, Changzhou, Wuxi, and Suzhou), were more tightly linked with each other. Similarly, cities in Zhejiang province were also divided into the north group (Hangzhou, Shaoxing, Ningbo, Huzhou Zhoushan, and Jiaxing) and south group (Quzhou, Jinhua, Lishui, and Wenzhou). Though the subgrouping results will be affected by the number of sampling cities, the result is consistent with the long-term layout of YRD UA that Nanjing-Shanghai-Hangzhou are planned to be the important node cities in the whole city network. Due to the increasing need to achieve a sustainable and balanced development between economy and environment, Nanjing and Hangzhou are planned to be important centers in regional development to ensure the shift of manufacturing industries from the super megacity, Shanghai [51]. Our cohesive subgrouping confirmed the targeted trend, which suggests the multi-core spatial layout has formed in YRD UA.

Comparison with the Traditional Gravity Model
In general, there are two major merits of the big-data-based method that could be found when comparing to the traditional gravity model analysis.
First, the big-data method can describe interactions between cities that are more consistent with the reality. The big data method uses real inter-city transport flow data to measure the interaction intensity. While the gravity model uses city scale and distance as input parameters to

Comparison with the Traditional Gravity Model
In general, there are two major merits of the big-data-based method that could be found when comparing to the traditional gravity model analysis.
First, the big-data method can describe interactions between cities that are more consistent with the reality. The big data method uses real inter-city transport flow data to measure the interaction intensity. While the gravity model uses city scale and distance as input parameters to model the interaction, thus underestimation and overestimation often occurred. Figure 6 shows the normalized value of the interaction intensity (between 0 and 1) of the big-data-based method, see Figure 6a, and statistical-data-based gravity model, see Figure 6b. The urban spatial interaction intensity between those cities that are far from each other (such as Shanghai-Taizhou, and Nanjing-Huaian) was usually underestimated using the traditional gravity model due to the distance decay effect. It was not consistent with the reality. According to real inter-city passenger-flow data, these cities actually should have significantly high interaction intensities. This could be clearly identified through the big-data-based approach, see Figure 6a, but was underestimated in the traditional gravity model, see Figure 6b. On the contrary, the interaction intensities were usually overestimated in the case that two cities were geographically close. For example, the interaction intensity between Hangzhou and Ningbo was modeled to be higher in Figure 6b than that in the real situation, see Figure 6a.
Second, the big-data method can measure the degree and betweenness centrality of cities in UA, while the traditional gravity model cannot. Due to the directional attribute of inter-city passenger-flow big data, the characteristics of a city as a node in the city network can be measured quantitatively. However, as the traditional gravity model uses scalar data such as GDP and population for analysis, it cannot indicate the direction of flows between cities, thus cannot indicate the centrality of cities. In summation, the big-data method is relatively more reliable and informative due to the advantage of the big data over the statistical data used in the traditional gravity model. model the interaction, thus underestimation and overestimation often occurred. Figure 6 shows the normalized value of the interaction intensity (between 0 and 1) of the big-data-based method, see Figure 6a, and statistical-data-based gravity model, see Figure 6b. The urban spatial interaction intensity between those cities that are far from each other (such as Shanghai-Taizhou, and Nanjing-Huaian) was usually underestimated using the traditional gravity model due to the distance decay effect. It was not consistent with the reality. According to real inter-city passenger-flow data, these cities actually should have significantly high interaction intensities. This could be clearly identified through the big-data-based approach, see Figure 6a, but was underestimated in the traditional gravity model, see Figure 6b. On the contrary, the interaction intensities were usually overestimated in the case that two cities were geographically close. For example, the interaction intensity between Hangzhou and Ningbo was modeled to be higher in Figure 6b than that in the real situation, see Figure 6a. Second, the big-data method can measure the degree and betweenness centrality of cities in UA, while the traditional gravity model cannot. Due to the directional attribute of inter-city passenger-flow big data, the characteristics of a city as a node in the city network can be measured quantitatively. However, as the traditional gravity model uses scalar data such as GDP and population for analysis, it cannot indicate the direction of flows between cities, thus cannot indicate the centrality of cities. In summation, the big-data method is relatively more reliable and informative due to the advantage of the big data over the statistical data used in the traditional gravity model.

Policy Implications
Based on YRD UA's case study, we posit several policy recommendations for improving the single city's attractiveness and the whole UA's competitiveness so that the regional sustainability can be enhanced.
First, the development of transport infrastructure should be given continuous attention in the UA's planning and development. As proven by several pieces of the literature, transportation

Policy Implications
Based on YRD UA's case study, we posit several policy recommendations for improving the single city's attractiveness and the whole UA's competitiveness so that the regional sustainability can be enhanced.
First, the development of transport infrastructure should be given continuous attention in the UA's planning and development. As proven by several pieces of the literature, transportation infrastructure plays an important role in strengthening the connection and integration between cities [52,53]. Though tremendous money has been invested in the past decades in YRD's transport infrastructure, and the per capita level highway and railway in YRD has been relatively higher than many other regions, the overcapacity due to the dense population and high travel demand still exists, which could hinder YRD UA from becoming a world-class UA [51]. In the future, investment should be given to the construction or expansion of transport infrastructure according to the interaction intensity analysis and transport demand projection between cities.
Second, the major function and industry of each city should be carefully oriented and developed. Regional integration does not mean homogeneity. The optimal situation of a UA should be regarded as a human body, that is, that each city should function well and support each other to ensure the effective operation of the whole UA region. In YRD, though many policies and plans have been developed to address the issue, suitable function orientation of one city should be decided with the cooperation of the other cities rather than independently. The past experiences in YRD have warned us that, due to the ill-planned city functional orientation, many cities tended to compete rather than cooperate with each other. For example, port cities in YRD (such as Taicang, Lianyungang, and Nantong) developed the same marine logistics industry and even provided similar services due to their close natural and geographical endowments, which largely reduced the competitiveness of every single city and the regional sustainability. After realizing the negative impacts of the ill planning, Taicang city and Shanghai made changes to their industrial orientation and cooperated with each other. Currently, Nantong mainly focuses on the near-sea logistics business, while Shanghai is in charge of the outer-sea business. It makes these two cities subsequently achieve a win-win success [54]. As confirmed by our degree centrality analysis, Yangzhou and Ningbo have become the top cities for attracting net passenger flows due to their special attention to their local endowment and advantage. For instance, Ningbo focuses on the development of river-ocean combined transportation by using its geographical superiority and is trying to set up a comprehensive pivot port with the cooperation of Zhoushan and Taizhou. Yangzhou also upgrades its industrial structure from the conventional manufacturing industries to the innovation-driven manufacturing base of new materials and energy [55,56]. In addition, the cities with low degree centrality (such as Yancheng and Lishui in Figure 4c), which are mostly located at the edge of YRD UA, require special attention not only for the development of transport infrastructure but, more importantly, to the industries that fit into the local characteristics in order to attract more people.
Third, administrative barriers with regard to decision making across cities should be broken in order to further improve the competitiveness and sustainability of the UA region. Since YRD UA consists of one municipality and other 24 cities in two provinces with multiple administrative-level governments to conduct the planning and decision making, this situation may make the policy implementation deviate from its original intention due to the competition and buck-passing between the local governments. To address the challenges, a highly effective negotiation mechanism with a strong executive power should be established, covering the processes of overarching plan design and monitoring and evaluating the process of plan implementation. Specifically, using the negotiation mechanism, city government and a higher jurisdiction level government (for example, the central government of China) should work together on the plan to make their decision effective not only for an individual city but also for the UA as a whole. Moreover, the social welfare system should also be reformed under the mechanism to support the free flow of the population. For example, the reform of the household registration system is increasingly necessary to make the population migration freer. The social welfare system (including the provision of equal chances of education, healthcare, elderly care services, and so on) reform is also greatly needed to encourage the free flow of people between cities.

Limitations and Future Works
Big data provide a supplement to the traditional statistical-data-based urban spatial interaction analysis. However, some limitations still remain and require further research.
First, the data used to measure the spatial interaction between cities in this study only cover bus passenger flows, while the passenger flows by other transport modes such as railway, private car, and ferry are not considered due to data limitation. This could cause the result to be biased. More endeavors should be made to fill the data gap. Data from social media, cellphones, transport cards, and so on could be possible sources to obtain the desired passenger-flow information with high time and space resolutions. In addition, the study period could also be expanded as the current duration only covered one month. Long-term monitoring and evaluation on a yearly basis could yield a more comprehensive understanding of the urban spatial interaction issue and generate more robust implications for policies.
Second, though the comparison between the big-data method with the gravity model has confirmed the advantages of the former, the gravity model still holds superiority over the big-data approach. For example, the gravity model can be used to inform the future change of passenger flow and urban spatial interaction as it is driven by scaler factors (GDP and population) that can be projected in-line with the future socioeconomic plan. However, it is difficult for the big-data method to accomplish the same task due to the intrinsic defect in getting the real-time future passenger data. The combination of these two methods would be expected to overcome the demerits of each. Specifically, the big-data method could be used to calibrate the gravity model by using the real and historical passenger-flow data. Then, the calibrated gravity model could be more accurate and reliable in projecting the future trend.
Third, we only reveal the characteristics of urban spatial interactions among cities in this paper. An in-depth investigation of the mechanism of the urban spatial interactions and its correlation with the industrial location and socio-economic development of UA is needed in future studies so that more practical suggestions for policies can be generated to promote the competitiveness and sustainability of UA.

Conclusions
The study of the connection between cities and their spatial interaction will help to strengthen the functional linkage within UA and promote the sustainability of UA. In this study, we developed a crawler program based on R language to collect the hourly big data of inter-city bus passenger flow from a ticket booking website. Then, we analyzed the spatial interaction between cities in YRD UA regarding its interaction intensity, degree and betweenness centrality, and cohesive subgroups. To clarify the advantages of the big-data based method, we compared our result with the traditional statistical-data based gravity model. Finally, based on the analysis of our model result and the related literature, we also proposed some policy implications for improving the single city's attractiveness and strengthening the whole UA's competitiveness. The limitations of our method and future improvements are discussed. The major findings are summarized as follows.
First, the big-data based method relies on vectorial, realistic, and high spatiotemporal resolution information of urban interaction and is proven to be more reliable and informative than the traditional statistical-data based gravity. Its description of the interactions between cities is more consistent with the reality and it can measure the characteristics of a city as a node in the city network quantitatively, while the traditional gravity model cannot.
Second, some phenomena regarding the urban spatial interaction in YRD UA that were discussed or mentioned in the literature and regional plans, such as the spatially uneven interaction intensity between cities, the formation of a multi-core spatial layout, and the driving forces of city centrality can be quantitatively measured or confirmed in our study using big data. Specifically, a higher spatial interaction intensity could be observed in the middle region of YRD UA while the other regions showed a lower intensity. It's also revealed that Shanghai is the core city of YRD UA while Nanjing and Hangzhou have become the regional centers of their own provinces. Nanjing, Yangzhou, and Ningbo were the top three cities that attract the greatest inflow of passengers in YRD UA. However, Hangzhou and Yancheng tend to be the intermediary cities within YRD UA since they have higher betweenness centrality than the other cities. In addition, cities in YRD UA can also be divided into four cohesive subgroups with most of the subgroups consisting of the cities geographically close to each other.
Third, policy implications derived from the YRD UA's case can also be applicable to other UAs in China. The suggestions for improving the single city's attractiveness and the whole UA's sustainability include giving continuous importance to the development of transport infrastructure, orientating suitable and diversified function of each city, and breaking the administrative barriers of decision making across cities.
Fourth, the limitations of this study could be potentially improved through the completion and expansion of big data coverage, the combination of the big-data method with the gravity model, and the further investigation of the mechanism of urban spatial interaction and its correlation with the socio-economic development of UA. Railway, private car, and ferry passenger-flow data should be further covered in future studies in order to avoid bias. Data from social media, cellphones, transport cards, and so on could also be considered as possible sources to obtain the desired passenger-flow information with high time and space resolution. Moreover, a UA dashboard based on the combination of big data with socioeconomic indicators, and a big-data based approach with the traditional gravity model is strongly recommended as a future direction of research. By using the dashboard, various stakeholders such as government, citizens, enterprises, and researchers could be invited to be actively involved in the assessment, planning, and management of a single city or the UA as a whole.