Social Sensing of the Imbalance of Urban and Regional Development in China Through the Population Migration Network around Spring Festival

: Regional development di ﬀ erences are a universal problem in the economic development process of countries around the world. In recent decades, China has experienced rapid urban development since the implementation of the reform and opening-up policy. However, development di ﬀ ers across regions, triggering the migration of laborers from underdeveloped areas to developed areas. The interaction between regional development di ﬀ erences and Spring Festival has formed the world’s largest cyclical migration phenomenon, Spring Festival travel. Studying the migration pattern from public spatiotemporal behavior can contribute to understanding the di ﬀ erences in regional development. This paper proposes a geospatial network analytical framework to quantitatively characterize the imbalance of urban / regional development based on Spring Festival travel from the perspectives of complex network science and geospatial science. Firstly, the urban development di ﬀ erence is explored based on the intercity population ﬂow di ﬀ erence ratio, PageRank algorithm, and attractiveness index. Secondly, the community detection method and rich-club coe ﬃ cient are applied to further observe the spatial interactions between cities. Finally, the regional importance index and attractiveness index are used to reveal the regional development imbalance. The methods and ﬁndings can be used for urban planning, poverty alleviation, and population studies.


Introduction
Regional development differences have always been one of the core issues in urban study and a universal problem in the economic development process of countries around the world [1][2][3]. The unbalanced development theory reveals the unbalanced development of economic sectors or

Chinese Spring Festival Travel
China is a multiethnic country with a vast territory, large population, and diverse climate and natural conditions, in which 18.59% of the world's population lives on 6.3% of the world landmass (Worldometer's data). In recent decades, China has experienced rapid urban development as it implements a series of reform and opening-up policies [10]. However, due to the influence of the historical-geographical environment, natural resource conditions, transportation, government policies, etc., urban development differs significantly across the regions of China. The imbalance in urban development leads to various social and economic problems that cannot be ignored, such as the influx of migrant workers in developed cities, urban contraction and population loss in underdeveloped regions, and challenges in providing education for left-behind children [11]. In China, large-scale intercity population flow usually occurs around holidays and is dominated by tourism. However, the most significant migration phenomenon in China, Spring Festival travel (SFT), is different. The Spring Festival is a traditional Chinese festival and is regarded as an occasion for family reunion. Therefore, many migrants leave the city where they work and go back to their hometowns for the family reunion before the Spring Festival, and after that, they return to the cities where they live. The root cause of this phenomenon of SFT is the uneven development of the regional economy and deep-rooted cultural tradition. The migration pattern uncovered by the population flow of the SFT rush season can be used as evidence of uneven regional development in China.
Research on Chinese population migration is ongoing. Many scholars have discussed the spatial autocorrelation effect and network autocorrelation effect in interprovincial migration flows based on the fifth and sixth census data obtained in 2000 and 2010, respectively [12]. However, census data are counted every ten years, and other multisource data are needed as a time-sensitive supplement to support ongoing research on population migration. In recent years, crowd-sourced geographical data have been generated as a result of speedy scientific-technical progress, especially in mobile positioning technology, internet technology, and communication technology, triggering a data-driven channel for sensing human spatial behavior [13]. Crowd-sourced geographical data (e.g., mobile check-in data, cellular signaling data, and taxi trajectory data) are rich in information, low cost, and abundant [14]. Such data have been widely used in sensing the geographical environment [15], recognizing urban structure and functional areas [16], planning urban development [17], assisting sustainable economic development [18], perceiving geographical events [19,20], and crowdmapping [21]. Therefore, social sensing based on crowd-sourced geographical data provides a practical approach to explore the spatial behavior of the public and reveal geographical features of the socioeconomy [22].
In addition to the traditional decennial census data, limited population flow data have been made publicly available by several location-based services (LBS) platforms, allowing a data-driven social sensing approach to observe population migration around Spring Festival. Three mainstream LBS platforms (i.e., Baidu, Tencent, and Qihoo) were compared in reference [23]. Qihoo does not provide city-level population flow data, while Baidu and Tencent provide only the top 10 inflows and top 10 outflows for each city. For a country such as China with hundreds of cities, such data cannot allow the full exploration of the pattern of population migration across cities. By contrast, geotagged social media data provide another way to perceive human behavior. As information sharing (e.g., personal life updates and information sharing) and social connection are two primary motivations for using social media services [24], the spatial-temporal information embedded in social media data provides valuable indicators for sensing human movements. Sina Weibo is the most popular social media platform in China, with 57% of China's total Weibo users and 87% of China's Weibo activities. In 2018, Sina Weibo's monthly active users reached 462 million (2018 Sina Weibo User Development Report). Weibo data have been used in many areas, including public health [25], environmental issues [26], natural disasters [27], and urban land use [28]. Based on the timestamp and location record of each Weibo user's posts, the user's movement trajectory can be constructed. The resulting intercity migration record can support this study.
The current research based on SFT is based mainly on LBS data provided by Baidu and Tencent in 2015 and 2016 [23,29,30]. Therefore, we use the Sina Weibo data obtained through the API provided by the Sina Weibo platform in 2015 to conduct research, which is convenient for comparison with other related research, published urban economic statistics, and China's 13th Five-Year Development Plan. The Chinese Lunar New Year in 2015 is on February 19. Traditionally, the preparations and celebrations of Spring Festival start on Xiao Nian Day (approximately one week before Spring Festival) and end at the Spring Lantern Festival (two weeks after the Chinese Lunar New Year day). As observed from the Tencent and Baidu LBS data, population flows usually reach a trough on the third day after the Spring Festival, and the population intercity flow before and after this day is reversed in direction and highly symmetrical in magnitude [23,30]. Therefore, the typical SFT period in 2015 includes two periods: February 7-February 21 is the leaving period (two weeks), when the majority of migrants leave the city where they work and go back to their hometown for family reunion, and February 22-March 7 is the return period (two weeks), when people return to the cities where they live after celebrating Spring Festival. In addition, March 8-March 21 (two weeks) is used as the ordinary period (two weeks) for comparison. For these three periods, we obtain a total of 18,152,016 Sina Weibo posts for 360 cities, including four municipalities, 293 prefecture-level cities, some county-level cities, Hong Kong, Macao, and Taiwan, which comprise the majority of China, with a fixed download frequency and uniform geographical distribution [31]. Figure 1a shows the geographical distribution of Sina Weibo posts. Although we can obtain only a small part of Sina Weibo's records for academic research, due to data acquisition restrictions, the number of Sina Weibo posts in each city has a high positive correlation with China's urban population (China Statistic Yearbooks 2015), as shown in Figure 1b, proving that the obtained data represent a reasonable spatial sampling. As observed from the Tencent and Baidu LBS data, population flows usually reach a trough on the third day after the Spring Festival, and the population intercity flow before and after this day is reversed in direction and highly symmetrical in magnitude [23,30]. Therefore, the typical SFT period in 2015 includes two periods: February 7-February 21 is the leaving period (two weeks), when the majority of migrants leave the city where they work and go back to their hometown for family reunion, and February 22-March 7 is the return period (two weeks), when people return to the cities where they live after celebrating Spring Festival. In addition, March 8-March 21 (two weeks) is used as the ordinary period (two weeks) for comparison. For these three periods, we obtain a total of 18,152,016 Sina Weibo posts for 360 cities, including four municipalities, 293 prefecture-level cities, some county-level cities, Hong Kong, Macao, and Taiwan, which comprise the majority of China, with a fixed download frequency and uniform geographical distribution [31]. Figure 1a shows the geographical distribution of Sina Weibo posts. Although we can obtain only a small part of Sina Weibo's records for academic research, due to data acquisition restrictions, the number of Sina Weibo posts in each city has a high positive correlation with China's urban population (China Statistic Yearbooks 2015), as shown in Figure 1b, proving that the obtained data represent a reasonable spatial sampling.
(a) (b) The posts of each Sina Weibo user are extracted from the raw data based on the unique user identity labels. Then, the location information of each Weibo post is spatially joined with the administrative division data of China (from the resource and environment data cloud platform of the Institute of Geographical Sciences and Natural Resource Research of the Chinese Academy of Sciences) through overlay analysis provided by Arcmap10.5.1, to determine the city where the user posted information. According to the timestamp, the location record of each Sina Weibo user can be obtained. Then, it is determined whether the position records of the two sequential posts are made in the same city, and if there is a city change, an intercity movement is recorded. Finally, all of the intercity movement records of the same period are summarized to construct a weighted directed population flow network, with the cities as network nodes and the direction and volume of population flow as the directions and weights of the network edges. Compared with LBS migration data, Baidu migration data count population migration by hourly granularity, which can easily cut off unfinished journeys and trigger an increase in the number of short trips, while Tencent migration data count population migration data on a daily basis, ignoring night traffic. As personal life updates are one of the major motivations for users to post information, the spatiotemporal behaviors triggered by Spring Festival become an incentive for Weibo users to record their lives. Therefore, the method The posts of each Sina Weibo user are extracted from the raw data based on the unique user identity labels. Then, the location information of each Weibo post is spatially joined with the administrative division data of China (from the resource and environment data cloud platform of the Institute of Geographical Sciences and Natural Resource Research of the Chinese Academy of Sciences) through overlay analysis provided by Arcmap10.5.1, to determine the city where the user posted information. According to the timestamp, the location record of each Sina Weibo user can be obtained. Then, it is determined whether the position records of the two sequential posts are made in the same city, and if there is a city change, an intercity movement is recorded. Finally, all of the intercity movement records of the same period are summarized to construct a weighted directed population flow network, with the cities as network nodes and the direction and volume of population flow as the directions and weights of the network edges. Compared with LBS migration data, Baidu migration data count population migration by hourly granularity, which can easily cut off unfinished journeys and trigger an increase in Sustainability 2020, 12, 3457 5 of 21 the number of short trips, while Tencent migration data count population migration data on a daily basis, ignoring night traffic. As personal life updates are one of the major motivations for users to post information, the spatiotemporal behaviors triggered by Spring Festival become an incentive for Weibo users to record their lives. Therefore, the method of measuring intercity migration based on the location changes of Sina Weibo users' posts is flexible and feasible. Based on the movement trajectory of the Sina Weibo users, we constructed the weighted directed networks of intercity population migration flows considering the posts made during the entire SFT period, the leaving period, the returning period, and the ordinary period and obtained 996,901, 306,795, 429,907, and 175,911 intercity population flow records, respectively.

Overall Methodological Framework
This article proposes a geospatial network analytical framework to analyze the urban/regional development imbalance based on the PMN from the perspectives of geospatial science and complex network science. Three major steps are taken to complete the analysis, as shown in Figure 2. Firstly, the urban development difference is explored based on the intercity population flow difference ratio, PageRank algorithm, and attractiveness index. Secondly, the community detection method and rich-club coefficient are applied to further observe the spatial interactions between cities. Finally, the regional importance index and attractiveness index are used to reveal the regional development imbalance. The methods are described in detail in the following sections.

Overall Methodological Framework
This article proposes a geospatial network analytical framework to analyze the urban/regional development imbalance based on the PMN from the perspectives of geospatial science and complex network science. Three major steps are taken to complete the analysis, as shown in Figure 2. Firstly, the urban development difference is explored based on the intercity population flow difference ratio, PageRank algorithm, and attractiveness index. Secondly, the community detection method and richclub coefficient are applied to further observe the spatial interactions between cities. Finally, the regional importance index and attractiveness index are used to reveal the regional development imbalance. The methods are described in detail in the following sections.

Attractiveness Index Based on Intercity Migration
Regarding the SFT period, during the leaving period, the main flow of people leave their current residence and go to their hometown for the family reunion, and in the returning period, these migrants return to where they work. This migration phenomenon, triggered by Chinese traditional culture, leads to an imbalance in the population flow between cities in the two periods. Cities with relatively high outflows during the leaving period and relatively high inflows during the returning period are the more developed cities that offer higher salaries and better living quality and absorb more laborers than the less-developed cities. In contrast, cities with relatively high inflows during the leaving period and relatively high outflows during the returning period are the relatively underdeveloped cities that export laborers. The index of the relative flow difference ratio ( ) between the population outflow volume and population inflow volume, which is the ratio of the difference between the inflow volume and outflow volume to the total flow volume, is used to describe the unbalanced population flow of a city:

Attractiveness Index Based on Intercity Migration
Regarding the SFT period, during the leaving period, the main flow of people leave their current residence and go to their hometown for the family reunion, and in the returning period, these migrants return to where they work. This migration phenomenon, triggered by Chinese traditional culture, leads to an imbalance in the population flow between cities in the two periods. Cities with relatively high outflows during the leaving period and relatively high inflows during the returning period are the more developed cities that offer higher salaries and better living quality and absorb more laborers than the less-developed cities. In contrast, cities with relatively high inflows during the leaving period and relatively high outflows during the returning period are the relatively underdeveloped cities that export laborers. The index of the relative flow difference ratio (RFDR) between the population outflow volume and population inflow volume, which is the ratio of the difference between the inflow volume and outflow volume to the total flow volume, is used to describe the unbalanced population flow of a city: A positive RFDR indicates that the inflow volume exceeds the outflow volume and vice versa. When the population migration in a city is balanced during a period, the RDFR is approximately zero.
The index of attractiveness is designed based on the expectation that people usually migrate to a more attractive city from a less attractive city. J. Xu et al. (2017) defined attractiveness as the difference in the RFDR between the returning period and the leaving period, ignoring the situation in which the RFDR is positive or negative in both periods [29]. Whether the RFDR is positive or negative in both periods, indicating that the population continues to flow in or out, shows whether a city is more attractive or less attractive. Therefore, here we consider two cases. If the RFDR is positive or negative in both periods, we sum to express the attractiveness of the city; otherwise, we use the difference of RFDR between the two periods: If there are more people entering than leaving a city in both periods, then the attractiveness of the city is positive, and vice versa. Moreover, if there are more people leaving in the leaving period and more people entering in the returning period, then the attractiveness is positive, and vice versa.

Importance Evaluation Based on the PageRank Algorithm
The PageRank algorithm is a commonly used webpage ranking algorithm. It has been widely used in social network analysis [32], transportation planning [33], and network security [34]. The PageRank algorithm takes the number and quality of hyperlinks between web pages as the main factors to analyze the importance of a web page [35]. The basic assumption is that more important pages are often referenced by other web pages. The link from page A to page B is regarded as "page A votes for page B", and the importance of the page is determined based on the source and the importance of the voters. The SFT network is analogous to the internet in that more important cities in the migration network receive or transfer more routes and populations from other cities. Hence, the PageRank algorithm is suitable for understanding the importance of cities in the SFT network. The equation of the PageRank algorithm is: where PR(x) is the PageRank value of page x, P x depicts the set of pages that link to page x, PR(Y i ) is the PageRank value of page Y i linked to page x, C out (Y i ) describes the number of links out from page Y i , n depicts the number of pages, and σ is the damping factor used to deal with pages that have no external links. These pages are considered to be linked to all pages in the network, and the PageRank values of such pages are divided equally among all pages. The values of PageRank can be approximated with high accuracy through several iterations. The more links and the higher the PageRank value of the source page to page x, the larger the PageRank value of page x will be. In the SFT network, PR(x) is the PageRank value of city x; P x describes the cities that export travelers to city x; C out (Y i ) expresses the number of links from the city Y i , which is weighted by the volume of travelers; and n depicts the number of cities. Considering that China's transportation network is very developed and the traffic between cities is convenient, we weaken the weight of the damping factor and set it to 0.95. The PageRank algorithm is implemented by the python-NetworkX package.

Community Detection
As a typical feature of a complex network, community structure describes the characteristic that the connections within communities are relatively dense, while connections among the communities are relatively sparse. Recognizing the network communities is conducive to disclosing the latent relations between the network nodes. Community detection methods, such as the label propagation algorithm [36,37] and modularity-based algorithm [38,39], are designed based on the characteristics of different network types (e.g., topology network, binary network, and directed network) and widely used in various fields (e.g., social network analysis, biological network analysis, and internet network analysis). Intercity PMN is a typical weighted directed network, and we can apply appropriate community detection methods to explore the interactions between cities. The Infomap algorithm [40,41] has been shown to perform well in weighted directed networks, especially small networks (less than 1000 nodes), through a variety of comparative tests [42,43] and is suitable for observing the community structure of the intercity SFT network (360 nodes).
Based on information theory [44], the Infomap algorithm is dedicated to identifying network communities by selecting the fewest bits to express the route generated by random walks in the network. Each node is encoded as the Huffman code [45], and a node with a higher access frequency is assigned a shorter code. To divide nodes into different clusters, a two-level description strategy is adopted. Each cluster is given a unique name, and the nodes within each cluster are named with different Huffman codes, which can be reused in different clusters. Differently from a single-layer structure that does not consider community structure, a two-level description strategy allows the nodes within one cluster to share the same code, so that the nodes themselves can be depicted with fewer bit codes, and shorter average bits can be achieved. Regarding a community partition M of n nodes into m clusters, the average description length of a single step L(M) is: where H(Q) describes the information entropy of the clusters names; q describes the probability that the random walk switches clusters [44]; H P i depicts the information entropy of the movements within cluster i, including the exit code (a virtual node within each cluster expressing that the random walk is leaving the current cluster) for cluster i; and p i depicts the probability of movements within cluster i (including the probability of exiting cluster i). The access probability of each node is calculated by the "random surfer" method, which is similar to the PageRank algorithm, and the possible partitions are explored based on a simulated annealing approach [46] and deterministic greedy search algorithm [47]. In this paper, the Infomap algorithm is used to explore the community structure of the SFT network and the ordinary travel network separately, using the R-igraph package [48].

Rich-Club Coefficient
The rich-club coefficient is applied to observe the spatial interactions between the prominent cities. As early as the end of the 19th century, inspired by the social and economic disparity among people in different societies and countries, the 80/20 rule was proposed to describe the phenomenon that a select minority of elements are responsible for the vast majority of the observed outcomes in many real-world settings. Then, the rich-club phenomenon is defined to describe the tendency of prominent elements to establish stronger interactions among themselves than would be expected by random chance. This phenomenon has proven to be applicable in many areas, including transportation networks [49], scientific collaboration networks, and interbank networks [50]. The rich-club coefficient was first defined based on the topology network [51]. Subsequently, Opsahl, Colizza, Panzarasa, and Ramasco (2008) improved the rich-club coefficient ϕ w (r) to make it suitable for a weighted directed network [52]. All nodes in the network are ranked in terms of the rich parameter r. For each value of r, the nodes whose richness is larger than r construct the nodes club. E >r depicts the number of edges connecting the members whose richness is larger than r, W >r sums up the weights of these edges, and W l,rank is the lth ranked weight on the edges of the network. Then, we have ϕ w (r): In addition, the rich-club coefficient ϕ w,null (r), which is obtained from the corresponding null model that is random but still comparable to the real network, is introduced as a benchmark for comparison [49]. Therefore, the rich-club effect is determined as follows: Only when the rich-club coefficient of the actual network ϕ w (r) is greater than that of its corresponding randomized network ϕ w,null (r)-that is, only when the ratio ρ w (r) is greater than 1-can it be proved that there is a rich-club phenomenon in the network. For the weighted directed network, the richness parameter can be the out-degree (the number of links going out from a node) or out-strength (the sum of the weights attached to these links). High out-degree and out-strength prove the importance of a node because of its high external links and external participation in the network. In the SFT network, the rich-club ratio with these two richness parameters is employed to observe the spatial interaction pattern between the prominent cities using the R-tnet package [53].

Community Evaluation
To further explore the regional differences in China's development, we describe the regional differences based on the attractiveness and importance of the cities that make up the communities. The mean of the PageRank values of the cities included in the community is defined to describe the importance of the community: where PR(city i ) depicts the PageRank value of city i, C n expresses the city subgroup included in community n, Num C n is the number of cities included in community n, and PR(C n ) is the PageRank value of community n. If the cities in the community have relatively high PageRank values, then the community's PageRank value will be relatively high, and vice versa. Concerning the attractiveness of the community, urban attractiveness is calculated based on the relative flow difference ratio of the inflow and outflow of the urban population, so the average city attractiveness is not suitable to quantify the attractiveness of the entire community. Concerning a more prominent city, its attractiveness should have a more significant impact on the overall attraction of the community. Therefore, the ratio of the city PageRank value to the maximum PageRank value is used as the weight of urban attractiveness on the attractiveness of the community. The average weighted urban attractiveness within the community is used to define the attractiveness of the community: where Max PR(city) is the maximum PageRank value of the cities and Attractiveness city i is the Attractiveness of city i. The more influential and attractive the cities in the city community are, the more attractive the city community is. Figure 3 displays the RFDR of Chinese cities in the three periods, and an opposite trend is found during the leaving period and returning period. In the leaving period, as shown in Figure 3a, high  Figure 3c. A city that has more relative population inflow in the leaving period has more relative population outflow in the returning period, and vice versa. During the ordinary period, the RFDR distribution roughly shows a Gaussian distribution N(−0.015, 0.005), with 98% of the values distributed between −0.2 and 0.2 (Figure 3d). It can be seen from the variation in in three periods that the tradition of family reunion at Spring Festival caused a strong opposite population flow trend before and after the Spring Festival, and the trend of population flow becomes stable during the ordinary period. Therefore, according to the distribution during the ordinary period, we use the interval (−0.2,0.2) as the stability interval and judge a city's attractiveness based on the change in the of the city in the leaving and returning periods (Figure 3e). Thirty-four cities are classified as attractive cities because of the It can be seen from the variation in RFDR in three periods that the tradition of family reunion at Spring Festival caused a strong opposite population flow trend before and after the Spring Festival, and the trend of population flow becomes stable during the ordinary period. Therefore, according to the RFDR distribution during the ordinary period, we use the interval (−0.2,0.2) as the stability interval and judge a city's attractiveness based on the change in the RFDR of the city in the leaving and returning periods (Figure 3e). Thirty-four cities are classified as attractive cities because of the apparent negative value and positive value of the RFDR in the leaving and returning period, respectively, which indicates that 10% of Chinese cities have significant appeal to migrant workers. In total, 145 cities are regarded as stable cities because the RFDR values in both periods remain within the stable interval. Finally, 181 cities are rated as unattractive cities because of the prominent positive value and negative value of the RFDR in the leaving and returning periods, respectively. Similarly, Long and Wu (2016) identified 180 shrinking cities in China, with a decline in population density, based on China's censuses in 2000 and 2010 [54]. As shown in Figure 3f, three distinct attractive urban clusters are located in the Yangtze River Delta, Pearl River Delta, and Beijing-Tianjin-Hebei region. In addition, most provinces, except Xinjiang, Tibet, Taiwan, Hainan, and Heilongjiang, contain one or two attractive cities, surrounded by several stable cities and unattractive cities. In comparison with these five regions, Xinjiang, Hainan, and Heilongjiang contain several unattractive cities, reflecting the phenomenon of population loss in some subregions, while the population changes in Tibet and Taiwan are stable.

The Difference in Urban Attractiveness
The spatial distribution of urban attractiveness appears to be similar to that of the RFDR in the returning period (Figure 4a [54]. As shown in Figure 3f, three distinct attractive urban clusters are located in the Yangtze River Delta, Pearl River Delta, and Beijing-Tianjin-Hebei region. In addition, most provinces, except Xinjiang, Tibet, Taiwan, Hainan, and Heilongjiang, contain one or two attractive cities, surrounded by several stable cities and unattractive cities. In comparison with these five regions, Xinjiang, Hainan, and Heilongjiang contain several unattractive cities, reflecting the phenomenon of population loss in some subregions, while the population changes in Tibet and Taiwan are stable.

The Difference in Urban Attractiveness
The spatial distribution of urban attractiveness appears to be similar to that of the in the returning period (Figure 4a

The Difference in Urban Importance Based on PageRank
The importance of cities is similar to their classification and attractiveness (Figure 5a). The attractive cities classified based on the change in RFDR have relatively higher PageRank values, with Beijing yielding the highest value of 0.045, followed by Shanghai, Chengdu, Guangzhou, and Shenzhen. Moreover, lower values appear mostly in the western and northern regions of China. Figure 5b shows a heavy-tailed distribution of the PageRank values of cities, reflecting significant differences between Chinese cities. For a few core cities, the higher the ranking is, the higher the PageRank value, and there are significant differences between cities. However, for most of the remaining cities, their PageRank values are low, and there is little difference between cities. This is somewhat similar to the 80/20 rule: a few core Chinese cities have greater influence, while most other cities lack core competitiveness.   Figure 6 describes the community structures of cities during the SFT period and the ordinary period and their mapping relationship. For the ordinary period, 25 communities of cities are detected, as shown in Figure 6b. Most of the community divisions are consistent with the Chinese administrative division of the province, while some communities demonstrate two phenomena of interprovincial aggregation and fragmentation. Regarding interprovincial aggregation, community C1 (Yangtze River Delta) consists of Shanghai, Jiangsu Province, and Zhejiang Province; community C2 (Pearl River Delta) consists of Hong Kong, Macao, and Guangdong Province; community C3 (Beijing-Tianjin-Hebei region) consists of Beijing, Tianjin, and Hebei Province; and community C4 consists of Chongqing, Sichuan Province, and a city from an adjacent south province. These four regions have many cities with a high PageRank and attractiveness, forming tight spatial organizations based on developed infrastructure, such as transportation and communication. By contrast, the northeastern and western parts of Inner Mongolia join Heilongjiang Province and Ningxia Province to form communities C18 and C25, respectively, and the eastern and western parts of Qinghai Province are clustered with Gansu Province and Tibet to form communities C19 and C22. This segmentation phenomenon is affected by many factors, such as the economy, culture, and geography. For example, both western Qinghai and Tibet are mainly Tibetan living areas [55], while eastern Inner Mongolia and Heilongjiang Province have similar climates and living customs [56]. The community structure of cities during ordinary time reflects that the majority of the trips are intraprovincial trips and interprovincial trips between neighboring provinces.

City Communities Based on the Intercity Migration Network
of Qinghai Province are clustered with Gansu Province and Tibet to form communities C19 and C22. This segmentation phenomenon is affected by many factors, such as the economy, culture, and geography. For example, both western Qinghai and Tibet are mainly Tibetan living areas [55], while eastern Inner Mongolia and Heilongjiang Province have similar climates and living customs [56]. The community structure of cities during ordinary time reflects that the majority of the trips are intraprovincial trips and interprovincial trips between neighboring provinces. The community structure of cities during the SFT period is similar to that during the ordinary period, except that some communities merge and several cities move between neighboring communities, with only 22 communities remaining (Figure 6a). For the merger of the communities, the northeastern part of Heilongjiang Province, Liaoning Province, Jilin Province, and Inner Mongolia's eastern cities constitute Community 3 in the northeastern part of China; Anhui Province, The community structure of cities during the SFT period is similar to that during the ordinary period, except that some communities merge and several cities move between neighboring communities, with only 22 communities remaining (Figure 6a). For the merger of the communities, the northeastern part of Heilongjiang Province, Liaoning Province, Jilin Province, and Inner Mongolia's eastern cities constitute Community 3 in the northeastern part of China; Anhui Province, Zhejiang Province, Jiangsu Province, and Shanghai in eastern China constitute Community C1. The cities of Alxa League, Zhaotong, Qingyang, and western Qinghai, which are separated by neighboring provinces during the ordinary period, re-establish communities C20, C8, and C14 with the cities belonging to the same province during the SFT period. However, differently from these cities, during the SFT period, Chifeng city detaches from its province and joins the neighboring community of Beijing, Tianjin, and Hebei Province. This phenomenon is mainly affected by the family reunion behavior triggered by the traditional culture of the Spring Festival. The increase in the size of some communities also reflects the increase in long-distance travel during the SFT period.

Rich-Club Effect
Regarding the rich-club coefficient, Figure 7 shows that the vast majority of ρ w (k) and ρ w (s) are greater than 1 and present an upward trend, reflecting the remarkable rich-club phenomenon in the SFT network. This means that the prominent cities in China tend to engage in stronger interactions among themselves. In addition, a significant "demarcation point" feature can be found from the change in the curves of both ρ w (k) and ρ w (s). Regarding ρ w (k), k = 330 is a demarcation point: for points less than 330, the curve grows steadily, and for points larger than 330, the curve decreases. For ρ w (s), s = 24,687 is a demarcation point because the curve basically continues to rise, but there is a downward wave after this point. The prominent rich-club cities are extracted based on these demarcation points. Using k > 330 as the selection criterion, nine cities are obtained-namely, Beijing, Shenzhen, Chengdu, Shanghai, Chongqing, Sanya, Guangzhou, Hangzhou, and Wuhan. Each of these cities has a migration connection with more than 90% of China's cities. With s > 24,687 as the selection threshold, five cities are selected, including Beijing, Shanghai, Chengdu, Guangzhou, and Shenzhen. The analysis based on ρ w (k) emphasizes the topological connection between cities, while the analysis based on ρ w (s) focuses more on the strength of association between cities. Based on different selection criteria for the rich parameter, we can obtain rich-club cities at different rich levels. However, the core rich-club cities should have the most connections and the greatest connection strength. Therefore, we select the intersection of the two city lists as the core rich-club members, namely, Beijing, Shanghai, Chengdu, Guangzhou, and Chongqing. The population migration flow involving these five rich-club members accounts for 23.2% of the total migration flow during the SFT period. Located in different city communities, these core rich-club cities form the backbone network of the SFT network and can promote cross-regional population flow through close interconnections.
However, the core rich-club cities should have the most connections and the greatest connection strength. Therefore, we select the intersection of the two city lists as the core rich-club members, namely, Beijing, Shanghai, Chengdu, Guangzhou, and Chongqing. The population migration flow involving these five rich-club members accounts for 23.2% of the total migration flow during the SFT period. Located in different city communities, these core rich-club cities form the backbone network of the SFT network and can promote cross-regional population flow through close interconnections.

Imbalance of Regional Development
To further understand the regional development differences in China, we use the urban community structure of the intercity migration network in the ordinary period as an embodiment of the urban agglomeration to analyze the importance and attractiveness of the regions (Figure 8). Figure 8a shows the PageRank and attractiveness of communities. C1 (Yangtze River Delta), C2

Imbalance of Regional Development
To further understand the regional development differences in China, we use the urban community structure of the intercity migration network in the ordinary period as an embodiment of the urban agglomeration to analyze the importance and attractiveness of the regions (Figure 8).

Indices of Urban Development Based on PMN
There is a clear correlation between the PMN-based urban development indices, as shown in Figure 9a. The relationship between PageRank and presents a trend in which the PageRank value decreases while the city's value goes from low to high during the leaving period and from high to low during the returning period. Additionally, the attractive cities have relatively high PageRank values; correspondingly, the unattractive cities have relatively low PageRank values. It can be inferred that cities with higher PageRank values are more competitive, thus showing significant and opposite values during the two periods of Spring Festival. In addition, the core rich-club cities show the top five PageRank values and have significantly negative values in the leaving period and significantly positive values in the returning period, reflecting the prominent influence of these cities in the PMN.

Indices of Urban Development Based on PMN
There is a clear correlation between the PMN-based urban development indices, as shown in Figure 9a. The relationship between PageRank and RFDR presents a trend in which the PageRank value decreases while the city's RFDR value goes from low to high during the leaving period and from high to low during the returning period. Additionally, the attractive cities have relatively high PageRank values; correspondingly, the unattractive cities have relatively low PageRank values. It can be inferred that cities with higher PageRank values are more competitive, thus showing significant and opposite RFDR values during the two periods of Spring Festival. In addition, the core rich-club cities show the top five PageRank values and have significantly negative RFDR values in the leaving period and significantly positive RFDR values in the returning period, reflecting the prominent influence of these cities in the PMN.
because the attractiveness value reflects the relative attractiveness of the city itself based on the relative population flow ratio and is not very suitable for the horizontal comparison of development differences between cities. Weighted attractiveness considers the difference in the level of development between cities and can better reflect the difference in attraction between the cities, thus having a higher correlation with urban GDP. This also shows that using cities' weighted attractiveness to analyze the attraction of communities is more suitable than using attractiveness. However, although the PageRank value has a higher correlation with urban GDP than attractiveness and weighted attractiveness, having a high PageRank value does not mean that a city also has lower and higher values in the leaving period and returning period, respectively (Figure 9c,d). This is because the attractiveness of a city depends on not only its level of development The correlation coefficient between the PageRank value and GDP yields a very high value (Figure 9b), indicating that the PageRank value based on the migration network can reflect the development level of a city. Although attractiveness and weighted attractiveness are both positively correlated with GDP, the correlation coefficient between weighted attractiveness and urban GDP is higher than that between attractiveness and GDP. This is reasonable because the attractiveness value reflects the relative attractiveness of the city itself based on the relative population flow ratio and is not very suitable for the horizontal comparison of development differences between cities. Weighted attractiveness considers the difference in the level of development between cities and can better reflect the difference in attraction between the cities, thus having a higher correlation with urban GDP. This also shows that using cities' weighted attractiveness to analyze the attraction of communities is more suitable than using attractiveness.
However, although the PageRank value has a higher correlation with urban GDP than attractiveness and weighted attractiveness, having a high PageRank value does not mean that a city also has lower and higher RFDR values in the leaving period and returning period, respectively (Figure 9c,d). This is because the attractiveness of a city depends on not only its level of development but also various natural and social factors, especially the development levels of surrounding cities. For example, consider two cities with similar development situations, one adjacent to more developed cities and the other to underdeveloped cities. Then, influenced by the surrounding cities, the former city has a relatively low attractiveness, and the latter city has a relatively high attractiveness. This rule also applies to the city community. The PageRank value of C14 is slightly higher than that of C4, but C4 is almost twice as attractive as C14, largely because C14's attractiveness is affected by the adjacent C2 and C1 with higher PageRank values, and the PageRank values of the communities around C4 are relatively lower.

Rich-Club Cities with Interregional Connections
The rich-club coefficient based on two rich parameters (connections and weights) proves that there is a rich-club phenomenon in the SFT network, which means that these influential cities are not isolated from each other but have a close interaction. Although this finding is similar to the finding of a study based on the 2016 Tencent LBS data [30], there are two subtle differences that require further discussion. In the PMN based on the Tencent LBS data, which includes 369 cities, the maximum out-degree of the city nodes is just 272, and the breakpoint of k = 200 is selected to determine 11 rich-club cities based on their connections. However, in the PMN based on the Sina Weibo data, we find better connectivity between cities, with the maximum out-degree of the urban nodes being 353, and nine cities connecting with more than 330 cities. Additionally, the six core rich-club cities obtained based on the Tencent LBS data are all located in the coastal areas (four of which are consistent with our results) but ignore the central hub of Chengdu. Moreover, the connections involving these six core rich-club cities account for 49.57% of the population flow, more than twice the 23.3% of movements involved in the five core rich-club cities in this study. This inconsistency is mostly due to data limitations. As shown in [23], the Tencent LBS data provide only the top 10 inflows and top 10 outflows per city per day. Therefore, the resulting PMN cannot adequately describe the intercity Spring Festival migration. On the other hand, the flow data provided by the Tencent LBS data lack passenger information and cannot integrate the segments of a traveler's trip into the entire travel route and thus are unable to describe long-distance travel.
The community structure of the cities demonstrates that the migration between cities is influenced by geographical proximity, and the cities within the community are spatially adjacent. Moreover, five core rich-club cities are scattered in the north, central, eastern, and southern parts of China and serve as transportation hubs, connecting various scattered areas through close interconnections. Additionally, these core rich-club cities are attractive cities and have the highest PageRank values, meaning that these cities have greater attractiveness and higher competitiveness. It is inferred that a small number of rich-club cities dominated by Beijing, Shanghai, Chengdu, Guangzhou, and Shenzhen serve as critical regional nodes for local economic development, integrating spatially dispersed areas and promoting effective interaction across the country with close interconnection. It can be seen from China's development plan for the transportation system (China's 13th Five-Year Plan for the development of a modern comprehensive transport system) that China is currently systematically building multilevel transportation hubs to promote interconnectivity across the country, and the five core rich-club cities identified in this paper-namely, Beijing, Shanghai, Guangzhou, Shenzhen, and Chengdu-are listed as the first comprehensive international transportation hubs (among a total of seven). It can be inferred that the development of China's transportation network system will further enhance the rich-club characteristics in the population flow. These few rich-club cities form a dense and interconnected backbone network, which attracts, absorbs, and disseminates large-scale population flows. Figure 10 shows the distribution of urban agglomerations outlined in China's 13th Five-Year Plan (2016-2020). As shown in   Figure 8, these planned urban agglomerations include almost all attractive cities and high PageRank cities and cover all city communities except C22. However, among these urban agglomerations, only the development targets of the Yangtze River Delta (C1), the Pearl River Delta (C2), and Beijing-Tianjin-Hebei (C3) are world-class urban agglomerations, while other urban agglomerations are still in the early stages of cultivation or development, which is consistent with our results. C1, C2, and C3, which contain these three urban agglomerations, have the highest PageRank values and significant attractiveness. In 2015, the economic aggregate of these three urban agglomerations accounted for more than 40% of the national economy (China Urban Agglomeration Integration Report 2019).

City Communities with Urban Agglomeration Planning
The cultivation of urban agglomerations in the northeast and central regions will help to promote the coordinated and sustainable development of the local economy. The four city communities in the western region yield the lowest PageRank value, and the corresponding urban agglomeration planning could revitalize local development. It is observed that China's regional development level shows a gradient pattern of high to low from east to west, which is consistent with the results of studies based on DMSP/OLS nighttime light data 1992-2013 [57], the Development and Life Index 2000-2012 [58], and the China City Statistical Yearbook 2010 [59]. It is worth noting that C22 yields the lowest PageRank value, with no attractive city or high PageRank city, which indicates that the overall development level of this region is backward. However, the attractiveness value of C22 is near zero, indicating that there is no significant population loss in this region. This is mainly caused by cultural diversity; despite the low economic level in the Tibet area, the unique cultural traditions and lifestyle habits in the area restrict local population outflows [60]. In addition, in the community structures of the two periods, Taiwan has established communities with the Pearl River Delta and the Yangtze River Delta, indicating population flow between Taiwan and the Pearl River Delta and Yangtze River Delta. However, during these two periods, Taiwan's values were stable and did not show significant population inflows or outflows. This is mainly attributed to the current cross-strait policy [61]. On the whole, the national urban agglomeration plan in China is reasonable. However, for areas such as Tibet that are economically backward and lack external connections, it is more necessary for the state to carry out targeted poverty alleviation.

Conclusions
In this study, the social sensing approach based on Sina Weibo data is used to observe the intercity migration flow in China around Spring Festival and reveal the imbalanced development However, influenced by the eastern coastal areas, the attractiveness of the adjacent northeastern and central communities is significantly negative, indicating a large number of unattractive cities. The cultivation of urban agglomerations in the northeast and central regions will help to promote the coordinated and sustainable development of the local economy. The four city communities in the western region yield the lowest PageRank value, and the corresponding urban agglomeration planning could revitalize local development. It is observed that China's regional development level shows a gradient pattern of high to low from east to west, which is consistent with the results of studies based on DMSP/OLS nighttime light data 1992-2013 [57], the Development and Life Index 2000-2012 [58], and the China City Statistical Yearbook 2010 [59]. It is worth noting that C22 yields the lowest PageRank value, with no attractive city or high PageRank city, which indicates that the overall development level of this region is backward. However, the attractiveness value of C22 is near zero, indicating that there is no significant population loss in this region. This is mainly caused by cultural diversity; despite the low economic level in the Tibet area, the unique cultural traditions and lifestyle habits in the area restrict local population outflows [60]. In addition, in the community structures of the two periods, Taiwan has established communities with the Pearl River Delta and the Yangtze River Delta, indicating population flow between Taiwan and the Pearl River Delta and Yangtze River Delta. However, during these two periods, Taiwan's RFDR values were stable and did not show significant population inflows or outflows. This is mainly attributed to the current cross-strait policy [61]. On the whole, the national urban agglomeration plan in China is reasonable. However, for areas such as Tibet that are economically backward and lack external connections, it is more necessary for the state to carry out targeted poverty alleviation.

Conclusions
In this study, the social sensing approach based on Sina Weibo data is used to observe the intercity migration flow in China around Spring Festival and reveal the imbalanced development and spatial interactions among cities. Regarding the development of a city, the RFDR around Spring Festival indicates the differentiated development of Chinese cities. A small number of cities show significant attractiveness, while nearly half of the cities are unattractive. Then, the analysis based on the PageRank algorithm and the attractiveness index further reveals the differential development of cities. It is worth noting that the attractiveness of a city depends on not only its development level but also the development level of the neighboring cities. This also applies to regional differences. Regarding the spatial interactions between cities, the community structure based on the PMN presents apparent geographical proximity. The interprovincial integration and fragmentation that occur in the ordinary period are influenced by various factors, such as the economy, culture, and geography. By contrast, the phenomenon of community integration and conversion in several cities during the SFT period is mostly driven by the family reunion triggered by Chinese traditional culture. In addition, the analysis based on the rich-club coefficient confirms the existence of the rich-club phenomenon in the intercity PMN. A small number of rich-club cities dominated by Beijing, Shanghai, Chengdu, Guangzhou, and Shenzhen play essential roles in regional development and have made outstanding contributions to promoting cross-regional long-distance migration. The evaluation of the city community reveals China's regional development differences. The eastern coastal areas represented by the Pearl River Delta, Yangtze River Delta, and Beijing-Tianjin-Hebei region are economically developed and show apparent appeal, while the eastern and northeastern regions adjacent to the developed coastal cities reflect significant labor loss. The population flows of the southwestern region are relatively balanced, but the overall economic development level of the western region is low. The results of regional evaluation prove the rationality of the 13th Five-Year Plan regarding the construction of urban agglomerations. However, although Tibet maintains a relatively stable population flow due to its unique culture, its low economic level requires further government support to revitalize local urban development.
This study focuses on the population migration network around the Spring festival. Similarly to the Chinese Spring festival, Thanksgiving Day is a traditional holiday for family reunions that is popular in several countries in North America. The geospatial network analysis method proposed in this paper can also be used to discuss local development differences in North America. Furthermore, this methodological framework can also be applied to other related fields (e.g., social networks and traffic networks) to identify the differences and connections between nodes and subgroups in a network. For example, in a transit-oriented development model, the regional connectivity situation can be explored based on the public transportation network to assist in new transportation planning. This paper shows that population migration can be used for urban/regional development studies. However, urban studies based on crowdsourced geographic information have emerged only recently, and a much longer series and higher spatial-temporal resolution are needed to fully demonstrate the urban development process and the spatial interaction between cities. In the next step, it would be beneficial to go beyond Sina Weibo data to perceive urban/regional development across different types of crowdsourced geospatial data. Besides, for migration research, the attribute information of social media users (e.g., age, gender, and education) and social media content (e.g., text, location, and time) may be used in combination to further explore the reasons (e.g., work or study) for people's migration.