Spatiotemporal Patterns of Population Mobility and Its Determinants in Chinese Cities Based on Travel Big Data

: Large-scale population mobility has an important impact on the spatial layout of China’s urban systems. Compared with traditional census data, mobile-phone-based travel big data can describe the mobility patterns of a population in a timely, dynamic, complete, and accurate manner. With the travel big dataset supported by Tencent’s location big data, combined with social network analysis (SNA) and a semiparametric geographically weighted regression (SGWR) model, this paper ﬁrst analyzed the spatiotemporal patterns and characteristics of mobile-data-based population mobility (MBPM), and then revealed the socioeconomic factors related to population mobility during the Spring Festival of 2019, which is the most important festival in China, equivalent to Thanksgiving Day in United States. During this period, the volume of population mobility exceeded 200 million, which became the largest time node of short-term population mobility in the world. The results showed that population mobility presents a spatial structure dominated by two east–west main axes formed by Chengdu, Nanjing, Wuhan, Shanghai; and three north–south main axes formed by Guangzhou, Shenzhen, Shanghai, Wuhan, and Chengdu. The major cities in the four urban agglomerations in China occupy an absolute core position in the population mobility network hierarchy, and the population mobility network presents typical “small world” features and forms 11 closely related groups. Semiparametric geographically weighted regression model results showed that mobile-data-based population mobility variation is signiﬁcantly related to the value-added of secondary and tertiary industries, foreign capital, average wage, urbanization rate, and value-added of primary industries. When the spatial heterogeneity and nonstationarity was considered, the socioeconomic factors that a ﬀ ect population mobility showed di ﬀ erences between di ﬀ erent regions and cities. The patterns of population mobility and determinants explored in this paper can provide a new reference for the balanced development of regional economy.


Introduction
Since the reform and opening up in 1978, the rapid economic development and the process of social modernization in China have made population mobility between cities more common. Population mobility is considered as the re-allocation of production factors in space; the mobility of population in a specific space promotes the reaggregation and diffusion of social and economic factors, thus reshaping patterns of population distribution [1]. From 2000 to 2010, China's floating population

Research Data and Area
Location-based services (LBS) obtain the geographical location of a mobile user through the wireless communication network or the external positioning method of network operators. When users allow various mobile applications to call LBS services, their movement trajectories will be accurately recorded in real-time through the positioning information. The movement of a single user in geographical space seems to be random, but may take on a specific pattern when a large population group is accessed. According to the statistical report on the development of the Internet in China, by the end of 2018, the number of instant messaging users had reached 792 million, and the number of mobile Internet users had exceeded 817 million, accounting for 98.6% of Internet users using mobile phones [35]. In this context, every smartphone user can be seen as a mobile sensor, reflecting social characteristics and allowing for the collection of a massive amount of individual movement information, in real time and in an efficient manner.
The dataset we used in our research is the "Migration Map" section of Tencent Location Big Data [36], with the time interval set to a day and the accuracy able to be traced back to the individual level. The website counts the number of changes in the location of the smart terminal within a certain time interval to filter, summarize, and count the data. In consideration of user privacy, the website only provides the total amount of population inflows and outflows in a day, with the city as the basic unit (the intensity of inflows, source, and outflows limited to the destination of a single city on a certain day). The website provides a free Application Programming Interface (API) for researchers and programmers, allowing the above data to be obtained and used in scientific research. We used the API with Python programming language to obtain population mobility data during the Spring Festival of 2019 and store it in the SQL database.
The population mobility data we obtained contained the following content (after excluding the user's private information) and were added to a city as the basic units instead of the individual by Tencent company: the source city name and its coordinates, the target city name and its coordinates, time, mobility intensity, and mobility type, which was consistent with the content displayed on the website. After manual filtering and sorting, it contained a total of 40,591 pieces of information, each of which covered eight aspects (source city name and its coordinates, target city name and its coordinates, time, and mobility intensity). Based on this, we constructed a data table of 40,591 * 8, as shown in Table A1 (Appendix A). Other variables, such as total population, average wage, gross regional product (GRP), urbanization rate, unemployment rate, and other socioeconomic indicators, were obtained from the 2018 Urban Statistical Yearbook on the website of the National Bureau of Statistics. In this paper, the Geographic Information System (GIS), GeoDa, Gephi, and GWR4 were used to process, analyze, and calculate the data.
It should also be noted that the error factors and representativeness of the dataset cannot be ignored. Despite the huge amount of travel data obtained through location services, it is undeniable that there are still some groups who do not use any apps through smartphones, so their travel trajectories cannot be collected. It is predictable that this dataset is representative of specific regions and groups, such as developed regions and young middle-aged groups, which coincided with our research objects. Nonetheless, considering that this dataset provided a larger, more dynamic, and more efficient record of population mobility, combined with fine temporal and spatial resolution, it proved to be reliable in related research [37][38][39][40].
In this study, we considered 290 prefecture-level and above administrative units in China as the research focus, including four municipalities, two special administrative regions, 15 sub-provincial cities, and 269 prefecture-level cities. Due to the lack of data, some prefecture-level cities in Hainan province, Taiwan, and some ethnic minority autonomous prefectures in western China were not included in the study. Figure 1 shows the research areas in this paper.

Methodology
In this study, we first used the social network analysis method to analyze the characteristics of the population mobility network. Then, spatial autocorrelation analysis was used to validate the spatial dependence of population mobility, and the ordinary least squares (OLS) method and correlation test were employed to identify correlated factors of population mobility. After that, three types of regressions analysis, including ordinary least squares (OLS), geographically weighted regression (GWR), and semiparametric geographically weighted regression (SGWR), were conducted to reveal the correlated factors of population mobility. Figure 2 gives a flowchart of this research.

Social network analysis (SNA)
Using the social network analysis method and taking the intensity of population flow between cities as the weight, we established a 290*290 directed weighting matrix P = (Pij) to characterize population mobility within 14 days. Pij represents the intensity of population flow from city i to city j. We studied the network characteristics of population flow by using the social network analysis method. The population flow network is a small world, scale-free network between a fully regular

Methodology
In this study, we first used the social network analysis method to analyze the characteristics of the population mobility network. Then, spatial autocorrelation analysis was used to validate the spatial dependence of population mobility, and the ordinary least squares (OLS) method and correlation test were employed to identify correlated factors of population mobility. After that, three types of regressions analysis, including ordinary least squares (OLS), geographically weighted regression (GWR), and semiparametric geographically weighted regression (SGWR), were conducted to reveal the correlated factors of population mobility. Figure 2 gives a flowchart of this research.

Methodology
In this study, we first used the social network analysis method to analyze the characteristics of the population mobility network. Then, spatial autocorrelation analysis was used to validate the spatial dependence of population mobility, and the ordinary least squares (OLS) method and correlation test were employed to identify correlated factors of population mobility. After that, three types of regressions analysis, including ordinary least squares (OLS), geographically weighted regression (GWR), and semiparametric geographically weighted regression (SGWR), were conducted to reveal the correlated factors of population mobility. Figure 2 gives a flowchart of this research.

Social network analysis (SNA)
Using the social network analysis method and taking the intensity of population flow between cities as the weight, we established a 290*290 directed weighting matrix P = (Pij) to characterize population mobility within 14 days. Pij represents the intensity of population flow from city i to city j. We studied the network characteristics of population flow by using the social network analysis method. The population flow network is a small world, scale-free network between a fully regular

Social Network Analysis (SNA)
Using the social network analysis method and taking the intensity of population flow between cities as the weight, we established a 290*290 directed weighting matrix P = (Pij) to characterize population mobility within 14 days. Pij represents the intensity of population flow from city i to city j. We studied the network characteristics of population flow by using the social network analysis method. The population flow network is a small world, scale-free network between a fully regular network and a completely random network. Network characteristics are usually measured by the PageRank algorithm and "community" detection indicators.
P 12 · · · P 1(n−1) P 1n P 21 0 · · · P 2(n−1) P 2n . . . . . . . . . . . . . . . P (n−1)1 P (n−2)2 · · · 0 P (n−1)n P n1 P n2 · · · P n(n−1) 0 (1) PageRank is an algorithm used by Google search engines to rank the importance of web pages. It is then applied to network analysis in many fields, such as bibliometrics, social network analysis, and road networks [41,42]. Compared with other centrality indices for evaluating nodes in a network, such as degree, betweenness, and closeness, the PageRank algorithm not only considers the number of connections, but also measures the quality of connections, which means that if a node has fewer connections but all the important nodes are connected, it is still important. We believed that the mobility network formed during the Spring Festival was similar to the Internet and cities with higher importance attract more population and routes. Based on the intensity of population flow, we used the PageRank algorithm to rank the importance of urban nodes and get the hierarchical structure of population mobility. The formula of the PageRank algorithm is as follows: where PR i is the PageRank value of city I; d is a constant, usually set to 0.85; N is the number of all cities; B j is a collection of cities with all the population flow from city I; and L j is the number of links from city i, which is weighted by the intensity of the population flow. Many methods have been used for community detection testing, especially fast algorithms for large-scale networks, such as the Girvan-Newman algorithm, the CNM algorithm, SCAN algorithm, and so on [43][44][45]. In this paper, the multilevel algorithm was used for the community detection test, which is a bottom-up algorithm proposed by Blondel et al. through optimizing modularity [46].

Semiparametric Geographically Weighted Regression (SGWR) Model
This paper applied the geographically weighted regression (GWR) model [47] to reveal the relationship between population mobility changes and socioeconomic factors at the global and local levels. Compared with the traditional geographically weighted regression model, semiparametric geographically weighted regression [48,49] pays more attention to the analysis of the diversity and nonstationarity of geospatial data. The model was proven to be applicable in the fields of geography, environmental science, and economics [50,51]. This model has better performance than the traditional geographically weighted regression model because of changes in the geographic parameters. Therefore, the SGWR model was constructed based on the statistical yearbook data, and the formula is expressed by Equation (3): where k and j represent the global variables and local variables, respectively. (u i, v i ) represents the coordinates of location i, β j (u i, v i ) represents the local regression coefficient for the explanatory variable X j at location i, and ε i represents the error term. For the GWR model, the spatial weight matrix is critical. The selection of the spatial weight function has a great influence on the parameter estimation of the geographically weighted regression model [47]. In this paper, we used an adaptive bi-square kernel to calculate the weight matrix. We used adaptive bi-square kernels instead of fixed kernels based on two considerations. First of all, the regression points (the center of each city) appeared to be randomly distributed in the study area, and the adaptive kernel made the dataset large enough for each local regression [52]. Secondly, the adaptive bi-square kernel can reduce the bandwidth in the data-intensive place and expand the bandwidth in the scattered place of the dataset and have a clear-cut range when the kernel weight is not 0. It is widely used in studies taking the city as the unit [53,54]. Meanwhile, the accuracy of the GWR model is greatly affected by the bandwidth of the weight function. Akaike information criterion (AICc) and cross-validation (CV) are two methods commonly used to determine the bandwidth. Compared with the latter, the former can quickly and efficiently resolve differences in the degrees of freedom in various models [48]. Therefore, the AICc was selected to determine the appropriate bandwidth when constructing the GWR model.

Mobile-Data-Based Population Mobility Variation (MBPMV)
Due to data acquisition, we got only 14 days of Tencent location data from 290 cities, but considering the sample size and data accuracy, this dataset could be used as the basic data for in-depth research. The first day of the dataset was 29 January and the last day of the dataset was 11 February. We divided the 14 days into two phases, before the Spring Festival and after the Spring Festival. Therefore, the net population mobility in all cities during the two time periods was considered to be representative of the population distribution during the Spring Festival. Through changes in time series and differences in population mobility between the two time periods, we found differences in the distribution of human activity. Figures 3 and 4 show the inflow and outflow statistics of all cities before and after the Spring Festival, respectively. We selected cities with the most obvious population inflow and outflow in the two time periods, namely Beijing, Chongqing, Shenzhen, and Hengyang, and plotted their main population flow direction and intensity, as shown in Figure A1 (Appendix A). Combining Figures 3 and 4, we found that the cities with large population mobility were all located in the four major urban agglomerations in China, and the central region experienced a significant population inflow before the Spring Festival and a large population outflow after the Spring Festival. This pattern of population flow revealed the differences in China's regional development, so it was particularly important to explore the causes of this phenomenon, which may be geopolitical, economic, social, and geographical, etc. Meanwhile, the global spatial autocorrelation analysis of the population mobility changes in all cities after the Spring Festival resulted in a Moran index of 0.702, Z score of 50.128 and p value of 0.01, indicating that the probability of randomly generating the above spatial distribution pattern was less than 5%. Based on this, a SGWR model could be constructed to explore the socioeconomic factors related to the formation of the above population mobility pattern.
Sustainability 2020, 12, x FOR PEER REVIEW 7 of 24 not 0. It is widely used in studies taking the city as the unit [53,54]. Meanwhile, the accuracy of the GWR model is greatly affected by the bandwidth of the weight function. Akaike information criterion (AICc) and cross-validation (CV) are two methods commonly used to determine the bandwidth. Compared with the latter, the former can quickly and efficiently resolve differences in the degrees of freedom in various models [48]. Therefore, the AICc was selected to determine the appropriate bandwidth when constructing the GWR model.

Mobile-data-based population mobility variation (MBPMV)
Due to data acquisition, we got only 14 days of Tencent location data from 290 cities, but considering the sample size and data accuracy, this dataset could be used as the basic data for indepth research. The first day of the dataset was 29 January and the last day of the dataset was 11 February. We divided the 14 days into two phases, before the Spring Festival and after the Spring Festival. Therefore, the net population mobility in all cities during the two time periods was considered to be representative of the population distribution during the Spring Festival. Through changes in time series and differences in population mobility between the two time periods, we found differences in the distribution of human activity. Figures 3 and 4 show the inflow and outflow statistics of all cities before and after the Spring Festival, respectively. We selected cities with the most obvious population inflow and outflow in the two time periods, namely Beijing, Chongqing, Shenzhen, and Hengyang, and plotted their main population flow direction and intensity, as shown in Figure A2 (Appendix A). Combining Figures 3 and 4, we found that the cities with large population mobility were all located in the four major urban agglomerations in China, and the central region experienced a significant population inflow before the Spring Festival and a large population outflow after the Spring Festival. This pattern of population flow revealed the differences in China's regional development, so it was particularly important to explore the causes of this phenomenon, which may be geopolitical, economic, social, and geographical, etc. Meanwhile, the global spatial autocorrelation analysis of the population mobility changes in all cities after the Spring Festival resulted in a Moran index of 0.702, Z score of 50.128 and P value of 0.01, indicating that the probability of randomly generating the above spatial distribution pattern was less than 5%. Based on this, a SGWR model could be constructed to explore the socioeconomic factors related to the formation of the above population mobility pattern.

Independent variables selection and model construction
In order to analyze the determinants of population mobility patterns, we used three steps to determine the independent variables in the GWR model: (1) Select for socioeconomic factors related to urban development. The wage level of employees is a very important factor, because the difference in expected benefits is the main force driving population mobility [55]. Secondly, GRP and average gross regional product are direct reflections of the economic development of a city, which may affect population mobility [56]. At the same time, depending on different types of work, the three industries can also affect population mobility. Finally, considering that labor-intensive industries can absorb large amounts of labor, we considered foreign capital as a candidate variable [57]. In addition, we added several candidate variables related to urban development and social economy, such as urban total population, urbanization rate, and urban worker unemployment rate [58]. (2) Exclude multicollinearity between variables. We performed an OLS regression to detect multicollinearity between the variables. After all variables were normalized to fit the normal distribution, the variance inflation factor (VIF) of each independent variable was calculated, and then the independent variables with VIF > 7.5 were eliminated from the final model. In this process, the VIF values of the nine independent variables we selected were all less than 7.5, indicating that there was no multicollinearity among the variables (Table 1). (3) Perform a correlation analysis of variables, excluding variables that are not related to population mobility at a 95% confidence level. In this process, the unemployment rate (UER) was eliminated. The results showed that there were no redundant and uncorrelated problems in the remaining variables. Therefore, after the above three processes, total population (TP), average wage (AW), gross regional product (GRP), Avg_GRP, urbanization rate (UR), foreign capital (FC), valueadded of primary industry (VAPI), and value-added of secondary and tertiary industry (VASTI) were used for GWR model analysis. Table 2 shows the details of the above variables.

Independent Variables Selection and Model Construction
In order to analyze the determinants of population mobility patterns, we used three steps to determine the independent variables in the GWR model: (1) Select for socioeconomic factors related to urban development. The wage level of employees is a very important factor, because the difference in expected benefits is the main force driving population mobility [55]. Secondly, GRP and average gross regional product are direct reflections of the economic development of a city, which may affect population mobility [56]. At the same time, depending on different types of work, the three industries can also affect population mobility. Finally, considering that labor-intensive industries can absorb large amounts of labor, we considered foreign capital as a candidate variable [57]. In addition, we added several candidate variables related to urban development and social economy, such as urban total population, urbanization rate, and urban worker unemployment rate [58]. (2) Exclude multicollinearity between variables. We performed an OLS regression to detect multicollinearity between the variables. After all variables were normalized to fit the normal distribution, the variance inflation factor (VIF) of each independent variable was calculated, and then the independent variables with VIF > 7.5 were eliminated from the final model. In this process, the VIF values of the nine independent variables we selected were all less than 7.5, indicating that there was no multicollinearity among the variables (Table 1). (3) Perform a correlation analysis of variables, excluding variables that are not related to population mobility at a 95% confidence level. In this process, the unemployment rate (UER) was eliminated. The results showed that there were no redundant and uncorrelated problems in the remaining variables. Therefore, after the above three processes, total population (TP), average wage (AW), gross regional product (GRP), Avg_GRP, urbanization rate (UR), foreign capital (FC), value-added of primary industry (VAPI), and value-added of secondary and tertiary industry (VASTI) were used for GWR model analysis. Table 2 shows the details of the above variables. Total population (TP), average wage (AW), gross regional product (GRP), average gross regional product (Avg_GRP), urbanization rate (UR), foreign capital (FC), value-added of primary industry (VAPI), and value-added of secondary and tertiary industry (VASTI); ** represents that the variable is significant at the 0.05 level. Gross region product GRP annual gross regional product (100,000,000 yuan) Average gross region product Avg_GRP annual gross regional product per capita (10,000 yuan/person) After the OLS regression and correlation test, a total of eight variables were selected for the GWR model. In this paper, the significance (p < 0.05) of all variables was defined as the pseudo t (Est/SE) > 1.96 or < −1.96 [48,59]. Considering that local models can improve accuracy, the SGWR model was further used to explore the spatial stationarity and non-stationarity of parameters affecting population mobility. An iterative process was used to determine whether the parameters were global or local variables. The most suitable model was judged based on AICc and the model with the smallest AICc value was selected as the best [60,61]. Figures 3 and 4 show the inflow and outflow statistics of all cities before and after Spring Festival, respectively. We see that: (1) There were significant differences in population mobility in different cities in the two time periods. The population flow showed a high consistency with the city level-that is, the higher the city's development level, the greater its population flow during the Spring Festival, such as in Beijing, Shanghai, Guangzhou, Shenzhen, and Chengdu. (2) The inflows and outflows of population cities were also different in the two time periods. Before the Spring Festival, there was an obvious population inflow in the central and western regions, and the core cities in the four major urban agglomerations had a relatively obvious population outflow. After the Spring Festival, this phenomenon reversed. This might be because the regional core cities attract a large number of migrant laborers. Before the Spring Festival, the migrant laborers return home to be with their families. This is the commonly known as "returning flow." After the Spring Festival, they return to the original workplace to continue their job, which will lead to the so-called "migrant flow," resulting in a surge of migrant laborers to the workplace.

Spatiotemporal Patterns of Population Mobility
Based on the statistics of population inflows and outflows in the two periods (Table 3 and Figure 5), we divided all cities into four categories: continuous population inflows (II), continuous population outflows (OO), population inflows then outflows (IO), and population outflows then inflows (OI). Table 3 lists the four different types of cities. We can see that first-tier cities and provincial capital cities located in southeastern China belong to the OI type; second and third-tier cities located in central and western China and small cities around the regional core cities belong to the IO type. Most of the cities with continuous population outflows were located in northwestern China, which has low population attractiveness for economic, environmental, and geographical reasons. The problem of irreversible population loss should attract the attention of the relevant city managers. The same situation also occurred in the Pearl River Delta urban agglomeration. Guangzhou and Shenzhen absorbed a large number of human and material resources from the surrounding areas, resulting in a continuous outflow of the population from the surrounding small cities, which to some extent destroyed the sustainable development of the region. With the improvement of people's living standard, traveling for the New Year became common. Therefore, a tourism-oriented city such as Sanya, Zunyi, Lijiang, or Beihai can continue to attract visitors to some extent during the Spring Festival. large number of migrant laborers. Before the Spring Festival, the migrant laborers return home to be with their families. This is the commonly known as "returning flow." After the Spring Festival, they return to the original workplace to continue their job, which will lead to the so-called "migrant flow," resulting in a surge of migrant laborers to the workplace. Based on the statistics of population inflows and outflows in the two periods (Table 3 and Figure  5), we divided all cities into four categories: continuous population inflows (II), continuous population outflows (OO), population inflows then outflows (IO), and population outflows then inflows (OI). Table 3 lists the four different types of cities. We can see that first-tier cities and provincial capital cities located in southeastern China belong to the OI type; second and third-tier cities located in central and western China and small cities around the regional core cities belong to the IO type. Most of the cities with continuous population outflows were located in northwestern China, which has low population attractiveness for economic, environmental, and geographical reasons. The problem of irreversible population loss should attract the attention of the relevant city managers. The same situation also occurred in the Pearl River Delta urban agglomeration. Guangzhou and Shenzhen absorbed a large number of human and material resources from the surrounding areas, resulting in a continuous outflow of the population from the surrounding small cities, which to some extent destroyed the sustainable development of the region. With the improvement of people's living standard, traveling for the New Year became common. Therefore, a tourism-oriented city such as Sanya, Zunyi, Lijiang, or Beihai can continue to attract visitors to some extent during the Spring Festival.    Figure 6 is the grading map of the of population flow during the Spring Festival, from which we can clearly see the spatial pattern. First, unlike the diamond-shaped structure formed by the population mobility during the National Day Golden Week [62], the population flow during the Spring Festival presents a spatial pattern of two east-west main axes and three north-south main axes. The two east-west main axes are Shanghai-Nanjing-Chengdu and Shanghai-Wuhan-Chongqing, and the three north-south main axes are Shenzhen-Chengdu, Shenzhen-Wuhan and Guangzhou-Shanghai, all located in the four major urban agglomerations in China. At the same time, we noted that, although Beijing is not prominent in this structure, its coverage covers most areas of China and it is also a distributing center for population. The Shandong peninsula is not obvious in this structure, which is badly out of line with its position in the national development strategy. Second, compared with the population flow during the National Day Golden Week, the population flow boundary of the major cities during the Spring Festival is relatively obvious. Large cities have a typical spatial orientation, while medium-sized cities show strong spatial proximity.
population inflow during the Spring Festival. Figure 6 is the grading map of the of population flow during the Spring Festival, from which we can clearly see the spatial pattern. First, unlike the diamond-shaped structure formed by the population mobility during the National Day Golden Week [62], the population flow during the Spring Festival presents a spatial pattern of two east-west main axes and three north-south main axes. The two east-west main axes are Shanghai-Nanjing-Chengdu and Shanghai-Wuhan-Chongqing, and the three north-south main axes are Shenzhen-Chengdu, Shenzhen-Wuhan and Guangzhou-Shanghai, all located in the four major urban agglomerations in China. At the same time, we noted that, although Beijing is not prominent in this structure, its coverage covers most areas of China and it is also a distributing center for population. The Shandong peninsula is not obvious in this structure, which is badly out of line with its position in the national development strategy. Second, compared with the population flow during the National Day Golden Week, the population flow boundary of the major cities during the Spring Festival is relatively obvious. Large cities have a typical spatial orientation, while medium-sized cities show strong spatial proximity. For a further understanding of the spatiotemporal pattern characteristics of population flow, we first established a directed weighted matrix of population inflow and outflow between cities, then explored it by using the PageRank algorithm and community detection test in SNA. The PageRank algorithm was used to rank the importance of cities in the population mobility network, and the hierarchical structure of population flow was obtained. Figure 7 shows a hierarchical map of all cities in the population mobility network, which was classified according to the PageRank value by the natural break classification (NBC); the results are summarized in Table 4. We found that there are six cities in the nationwide network center, namely Beijing, Shanghai, Chongqing, Guangzhou, Shenzhen, and Chengdu, all located in the four major urban agglomerations of China, which is similar to the results in Figure 5. The nationwide network subcenter consists of 16 cities, which are either sub-provincial cities, provincial capitals, or developed cities in southeast coastal areas. To a certain extent, the above cities have a clear connection to population mobility during the Spring Festival. Compared with the cities in the southeast coastal areas, the cities in the central and western For a further understanding of the spatiotemporal pattern characteristics of population flow, we first established a directed weighted matrix of population inflow and outflow between cities, then explored it by using the PageRank algorithm and community detection test in SNA. The PageRank algorithm was used to rank the importance of cities in the population mobility network, and the hierarchical structure of population flow was obtained. Figure 7 shows a hierarchical map of all cities in the population mobility network, which was classified according to the PageRank value by the natural break classification (NBC); the results are summarized in Table 4. We found that there are six cities in the nationwide network center, namely Beijing, Shanghai, Chongqing, Guangzhou, Shenzhen, and Chengdu, all located in the four major urban agglomerations of China, which is similar to the results in Figure 5. The nationwide network subcenter consists of 16 cities, which are either sub-provincial cities, provincial capitals, or developed cities in southeast coastal areas. To a certain extent, the above cities have a clear connection to population mobility during the Spring Festival. Compared with the cities in the southeast coastal areas, the cities in the central and western regions are mostly regional network centers or local network centers, which are not prominent in the whole population mobility network, indicating that the above areas show extremely weak attraction or radiation force in both the population inflow and outflow. From this we can find obvious differences between regions. regions are mostly regional network centers or local network centers, which are not prominent in the whole population mobility network, indicating that the above areas show extremely weak attraction or radiation force in both the population inflow and outflow. From this we can find obvious Through the network analysis method to calculate the matrix of directed weighted population mobility, we found that the clustering coefficient of the population mobility network in the Spring Festival was 0.375, and the average path length was 2.792, which was much higher than the random network composed of 290 nodes (the clustering coefficient was 0.112, and the average path length was 2.075), indicating that the population mobility network during the Spring Festival conformed to the scale-free network characteristics and presented a typical "small world" network structure, which was different from Li's results at the provincial scale [37]. With the help of the community detection test, we further revealed the relationship between the cities hidden in the population mobility network. Nodes belonging to the same community tended to be more closely linked, indicating that cities within the same community have more frequent population mobility than other cities. Figure 8 gives the distribution map of the network community structure and Table 5 summarizes the community structure of all cities. Sustainability 2020, 12, x FOR PEER REVIEW 13 of 24 Based on the analysis of the population mobility network during the Spring Festival, 11 different community structures were identified. According to the spatial composition of the community, we divided the 11 communities into three categories: the first is the cross-regional community, such as the community composed of Shanghai, Jiangsu, Zhejiang, Chongqing, and Jilin; the second is the neighborhood community, such as the community composed of Shanxi, Shannxi, Ningxia, and Gansu; the third is independent provinces, such as the community composed of all cities in Shandong province. We found that the second and third community structures accounted for a large proportion of the 11 communities, indicating that large-scale population mobility is still affected by the geographical and geospatial environment. However, like the first community structure, the spatial span was large and distributed across several independent spaces, so it can be seen that, with the improvement of the transportation infrastructure and economic level of the target city, large-scale, cross-regional, and high-density population mobility will become a future development trend, and the space-time distance in the traditional sense will be severely compressed. This reflected the special structure of the population mobility network during the Spring Festival, but we still needed to obtain longer time series data for a more general analysis.  Based on the analysis of the population mobility network during the Spring Festival, 11 different community structures were identified. According to the spatial composition of the community, we divided the 11 communities into three categories: the first is the cross-regional community, such as the community composed of Shanghai, Jiangsu, Zhejiang, Chongqing, and Jilin; the second is the neighborhood community, such as the community composed of Shanxi, Shannxi, Ningxia, and Gansu; the third is independent provinces, such as the community composed of all cities in Shandong province. We found that the second and third community structures accounted for a large proportion of the 11 communities, indicating that large-scale population mobility is still affected by the geographical and geospatial environment. However, like the first community structure, the spatial span was large and distributed across several independent spaces, so it can be seen that, with the improvement of the transportation infrastructure and economic level of the target city, large-scale, cross-regional, and high-density population mobility will become a future development trend, and the space-time distance in the traditional sense will be severely compressed. This reflected the special structure of the population mobility network during the Spring Festival, but we still needed to obtain longer time series data for a more general analysis. Table 6 summarizes the basic parameters of the OLS, GWR, and SGWR model outputs. We can see that the constructed SGWR model made significant improvements over the normal regression model and the GWR model. Compared with the traditional regression model, the SGWR model had a smaller AICc (472.83) and a larger adjusted R 2 (0.751), which indicated better overall performance. Also, the F value (2.97) was much higher than the standard value (1.26), which meant the null hypothesis that the SGWR model does not improve upon the traditional regression model could be rejected at the 95% confidence level. Tables 7 and 8 illustrate the statistics of the SGWR model and global regression model outputs. The results showed that AW, UR, FC, VAPI, and VASTI were significant at the 95% confidence level, while TP, GRP, and Avg_GRP were not significant at the 0.05 confidence level. Among them, VASTI and MBPMV had the strongest positive correlation; VAPI and MBPMV had the strongest negative correlation; and FC, UR, and AW also had a strong positive correlation with MBPMV. TP, GRP, and Avg_GRP were not significantly correlated with MBPMV. Meanwhile, UR was finally selected as a global parameter after an iterative process in GWR4.  Figure 9 shows that there is a large spatial difference in the value of local R 2 , which indicates that, with the change in urban spatial location, explanatory variables have different interpretation forces on dependent variables, further reflecting the spatial nonstationarity between variables. In addition, the standard residual of the model was analyzed and the model presented a random distribution pattern in space, indicating that the constructed SGWR model had better performance.  The average wage of employees also has a positive correlation with MBPMV. This is because, as neoclassical theorists explain, the income level of the intended destination is the main driver force of the migration process. Therefore, when other costs are constant and incomes increase, more laborers will choose higher-paying areas for employment, which is similar to the impact of the added value of secondary and tertiary industries on MBPMV.

Semiparametric Geographically Weighted Regression (SGWR) Model Results
Total foreign capital also has a positive impact on population mobility. In most cases, overseas investment aims at the development of secondary and tertiary industries in the city, combined with the construction of labor-intensive enterprises, directly creating a large number of positions in the city, so this economic factor also increases population mobility.
Urbanization is also positively correlated with MBPMV. With the increase in the urbanization level, on the one hand, industrial industry can be effectively developed and more employment opportunities will be created through the intensive use of infrastructure. On the other hand, this accelerates residents' socialization and promotes developments in the service industry, which will also create a large number of employment opportunities.
There is a significant negative correlation between the added value of the primary industry and MBPMV, which indicates that, with the increase in primary industry, the population outflow will be intensified. This is determined by the nature of the work in primary industry. In China, agriculture, forestry, animal husbandry, and fisheries are classified as primary industries. In the context of mechanization, they cannot provide a large amount of labor or even absorb local surplus labor, resulting in population movement elsewhere.
In the above paragraph, we explained the explanatory variables related to MBPMV. Considering According to the statistical results of the model, the added value of the secondary and tertiary industries, the wage level of employees, the urbanization rate, and foreign capital are positively correlated to the population mobility. The added value of the primary industry is negatively correlated with the population mobility. There is no significant correlation between the total population, unemployment rate, GRP, and population mobility. In addition to the urbanization rate, other variables have different effects on different regions. These results are basically consistent with reality, as explained in the following.
The strongest positive correlation between VASTI and MBPMV indicates that the added value of secondary and tertiary industries has a significant effect on population mobility. This is because our research focused on the Spring Festival, during which the workflow is in an absolute position in the population mobility, representing the transfer of labor. Therefore, with the rapid development of the secondary and tertiary industries, the city will provide a large number of jobs, able to absorb the labor force in the surroundings and even farther afield. The population of underdeveloped areas will shift to developed areas, and the population of poor areas will shift to less developed areas. This progressive relationship affects population mobility in all areas. those in Hunan, Hubei, Yunnan, Guangxi and Henan, all of which are major labor-outputting provinces, and the development of secondary and tertiary industries will not be attractive to the population mobility. Sichuan and Chongqing are also major labor-outputting regions, but their population mobility has a positive correlation with the city's secondary and tertiary industries, which means that population outflow could be slowed by increasing the proportion of secondary and tertiary industries.  Figure 11 implies that there are both positive and negative correlations between foreign capital and population mobility. The Beijing-Tianjin-Hebei region, as well as Zhejiang and Fujian, are the most obvious, which means that the increase in foreign capital can not only reduce the local population outflow, but also attract a large population inflow. Considering that the above regions are the most developed regions of the country, as well as the places where talent concentrates, the high-tech industries directly funded by foreign capital can further absorb the talent in the surrounding areas, resulting in a large population inflow. Large negative regression coefficients exist in Henan, Anhui, Sichuan, and Chongqing, indicating that increasing foreign capital might not be a good measure to attract population inflow for provinces with a large labor force output. There are weak positive and negative regression coefficients in the northeast and southwest, which may mean that they are already saturated with foreign investment and an increase will not attract further migration.
From Figure 12, we find that the value-added of the primary industry is negatively correlated with the population mobility in all cities studied, which is consistent with a recent study that examined the effects of rising agricultural productivity on migration [63]. The nature of the primary industry means that it can solve the local surplus labor to a certain extent, but it cannot attract external population. The correlation between the central and eastern coastal areas is much higher than that of other regions, which indicates that the abovementioned regions, especially those of Anhui, Jiangsu,      From Figure 13, we see that the average wage of employees is positively correlated with population mobility in all the cities studied. The coefficients of all cities in southern China are higher than those of the north, which means that the average wage of employees in the southern region, especially in Hunan, Guangdong, and Fujian, is more closely related to population mobility. These regions can attract population inflows by increasing the income of employees. The Beijing-Tianjin-Hebei region and central Shaanxi, Sichuan, Hubei, and other cities have a weak positive correlation coefficient. The former may mean that, even with the further increase in wages, it has not been able to attract a population inflow, while the latter group is mostly labor-outputting cities, perhaps due to the increase in local wages still not reaching the level of developed regions.

Discussion and conclusions
Traditional census data cannot reveal the spatial patterns of population mobility and relevant socioeconomic factors within a specific period or even track people's trajectories because of the slow updating frequency and other shortcomings. Secondly, China's published population distribution statistics often have problems such as low granularity and a poor refinement level, which means daily population movement cannot be described with high spatial and temporal resolution. The The average wage of employees also has a positive correlation with MBPMV. This is because, as neoclassical theorists explain, the income level of the intended destination is the main driver force of the migration process. Therefore, when other costs are constant and incomes increase, more laborers will choose higher-paying areas for employment, which is similar to the impact of the added value of secondary and tertiary industries on MBPMV.
Total foreign capital also has a positive impact on population mobility. In most cases, overseas investment aims at the development of secondary and tertiary industries in the city, combined with the construction of labor-intensive enterprises, directly creating a large number of positions in the city, so this economic factor also increases population mobility.
Urbanization is also positively correlated with MBPMV. With the increase in the urbanization level, on the one hand, industrial industry can be effectively developed and more employment opportunities will be created through the intensive use of infrastructure. On the other hand, this accelerates residents' socialization and promotes developments in the service industry, which will also create a large number of employment opportunities.
There is a significant negative correlation between the added value of the primary industry and MBPMV, which indicates that, with the increase in primary industry, the population outflow will be intensified. This is determined by the nature of the work in primary industry. In China, agriculture, forestry, animal husbandry, and fisheries are classified as primary industries. In the context of mechanization, they cannot provide a large amount of labor or even absorb local surplus labor, resulting in population movement elsewhere.
In the above paragraph, we explained the explanatory variables related to MBPMV. Considering that the model takes into account the non-stationarity of space, we will focus on explaining the variation of variables in space as follows.
From Figure 10, we see that the development of the secondary and tertiary industries in the eastern coastal areas has a positive correlation with population mobility, which indicates that, if investment in the secondary and tertiary industries is increased in Jiangsu, Zhejiang, and Shanghai, they could attract more of the floating population. Meanwhile, there is a weak negative correlation between the Beijing-Tianjin-Hebei region and the Pearl River Delta region, which indicates that these two regions will not attract people through the development of secondary and tertiary industries. Some cities in the central and western regions have relatively obvious negative coefficients, especially those in Hunan, Hubei, Yunnan, Guangxi and Henan, all of which are major labor-outputting provinces, and the development of secondary and tertiary industries will not be attractive to the population mobility. Sichuan and Chongqing are also major labor-outputting regions, but their population mobility has a positive correlation with the city's secondary and tertiary industries, which means that population outflow could be slowed by increasing the proportion of secondary and tertiary industries. Figure 11 implies that there are both positive and negative correlations between foreign capital and population mobility. The Beijing-Tianjin-Hebei region, as well as Zhejiang and Fujian, are the most obvious, which means that the increase in foreign capital can not only reduce the local population outflow, but also attract a large population inflow. Considering that the above regions are the most developed regions of the country, as well as the places where talent concentrates, the high-tech industries directly funded by foreign capital can further absorb the talent in the surrounding areas, resulting in a large population inflow. Large negative regression coefficients exist in Henan, Anhui, Sichuan, and Chongqing, indicating that increasing foreign capital might not be a good measure to attract population inflow for provinces with a large labor force output. There are weak positive and negative regression coefficients in the northeast and southwest, which may mean that they are already saturated with foreign investment and an increase will not attract further migration.
From Figure 12, we find that the value-added of the primary industry is negatively correlated with the population mobility in all cities studied, which is consistent with a recent study that examined the effects of rising agricultural productivity on migration [63]. The nature of the primary industry means that it can solve the local surplus labor to a certain extent, but it cannot attract external population. The correlation between the central and eastern coastal areas is much higher than that of other regions, which indicates that the abovementioned regions, especially those of Anhui, Jiangsu, and Zhejiang, should focus on reducing the development of the primary industry in the hopes of attracting external population.
From Figure 13, we see that the average wage of employees is positively correlated with population mobility in all the cities studied. The coefficients of all cities in southern China are higher than those of the north, which means that the average wage of employees in the southern region, especially in Hunan, Guangdong, and Fujian, is more closely related to population mobility. These regions can attract population inflows by increasing the income of employees. The Beijing-Tianjin-Hebei region and central Shaanxi, Sichuan, Hubei, and other cities have a weak positive correlation coefficient. The former may mean that, even with the further increase in wages, it has not been able to attract a population inflow, while the latter group is mostly labor-outputting cities, perhaps due to the increase in local wages still not reaching the level of developed regions.

Discussion and Conclusions
Traditional census data cannot reveal the spatial patterns of population mobility and relevant socioeconomic factors within a specific period or even track people's trajectories because of the slow updating frequency and other shortcomings. Secondly, China's published population distribution statistics often have problems such as low granularity and a poor refinement level, which means daily population movement cannot be described with high spatial and temporal resolution. The spatiotemporal location big data explores the route and direction of population mobility in a relatively continuous time interval, which provides new data support for the study of population distribution and population mobility. Different from the macro model under the long-term evolution rule of statistical data, research based on travel big data can not only reflect the new characteristics of population inflow and outflow between cities and describe the agglomeration and diffusion of population flow, but also analyze the increasingly complex relationships between cities from the perspective of flow through space.
Based on the Tencent application dataset, this study first used the social network analysis method to explore the spatial and temporal distribution patterns and characteristics of population mobility during the Spring Festival, then constructed a SGWR model to reveal the socioeconomic factors related to population mobility. Different from the diamond-shaped structure proposed by Pan and Lai [62], the population mobility network during the Spring Festival presents a typical structure of two east-west main axes and three north-south main axes. The vertices of the structure are all located in the four major urban agglomerations of China, which reflects the great attraction of the above areas. The social network analysis method not only identified different community structures, but also classified all cities hierarchically, reflecting the status of different cities in the population mobility network. The results of the SGWR model show that population mobility is significantly correlated with regional average wage level, urbanization rate, foreign capital, value-added of primary industry, and value-added of secondary and tertiary industry, which is consistent with findings of Zhong [64] and Li [37].
Using refined datasets with high temporal and spatial resolution, this study explored the structural characteristics of population mobility networks and the heterogeneity of different regions in attracting populations, thus surpassing previous studies. We found that the population tends to shift from low-wage, low-input and low-development-level areas to high-wage, high-input and high-development-level areas. On the one hand, this supports the early analysis conclusion based on census data-that economic differences between regions are the main forces driving population mobility [10,11]. On the other hand, it validates the neoclassical migration theory that migrants make their decision to move as a response to interregional or rural-urban wage differentials and rational cost-benefit calculations [6,7]. The imbalance between developments in this region is due to the dual urban-rural development in the planned economy era and the priority development of the eastern region after the reform and opening-up [16,37]. The imbalance in regional development is directly manifested in the imbalance of economic development, which intensifies the scale of population mobility and ultimately leads to a growing gap between regions. Meanwhile, such large-scale movement is also a huge challenge to the transportation system. A series of social problems such as left-behind children, poor living conditions, and urban diseases are often hidden behind the large-scale and long-term population mobility. Although we are unable to solve these problems, we can propose some positive development and management strategies through the analysis of population mobility. First, using location big data to track population activity trajectories and investigate population distribution is critical to the good operation of urban systems. Second, different economic and social development strategies should be implemented according to the development status and conditions to avoid excessive population loss and the "shrinking city" phenomenon. For major labor-outputting regions such as Sichuan and Shandong, the former can push industrial transformation to speed up the development of secondary and tertiary industries; the latter can increase the intensity of attracting foreign investment, and the central region should consider increasing the income level of local employees.
The limitations of this paper cannot be ignored. First, there will inevitably be problems such as data deviation, data discontinuity, and data loss. As mentioned earlier, Tencent has hundreds of millions of users, but there are still some people who do not use products developed by Tencent, so their travel behaviors will not be recorded. Due to privacy issues, the original dataset does not provide the social attributes of population (gender, age, occupation, and purpose), so we cannot accurately assess the purpose of the population movement (the majority of it is migrant worker flow, but there is still some student and tourism flow). The support of multi-source data and the cross-application of multi-disciplinary fields are the keys to studying cities and population in the era of big data. Second, population mobility is restricted and influenced by many complex factors. In this paper, socioeconomic factors were explored with the help of the SGWR model. Recent studies investigated amenity-led, public policy-led and tourism industry-led population mobility and indicated that the above two factors play an increasingly important role in attracting population mobility [57,[65][66][67]. In the next stage, the relationships between the quality of life factor, amenity factor, and public factor and differences in regional population mobility need to be explored. Third, we analyzed only the 14-day population mobility in this paper because of the confidentiality of the data. It is well known that there is also a peak period of population mobility before and after the Lantern Festival (the 15th day of the 1st lunar month), which was not reflected in this paper. Therefore, in future studies, population mobility data with longer time series should be obtained and analyzed, and the results may be more representative and instructive. This paper combined the social network analysis method and the SGWR model to explore the spatiotemporal patterns and characteristics of population mobility, thereby revealing the socioeconomic factors related to population mobility. The research results can provide data support for urban policy makers and researchers, while also promoting progress in studies on population mobility.    . (a, b, c, d) represent Beijing, Chongqing, Shenzhen and Hengyang, respectively. Beijing is the outflow direction before the Spring Festival, Chongqing is the inflow direction before the Spring Festival, Shenzhen is the inflow direction after the Spring Festival, and Hengyang is the outflow direction after the Spring Festival. The closer to red line color, the higher the intensity of population flow, and vice versa. (c) (d) Figure A1. Main population direction and intensity in four sample cities before and after the Spring Festival. (a-d) represent Beijing, Chongqing, Shenzhen and Hengyang, respectively. Beijing is the outflow direction before the Spring Festival, Chongqing is the inflow direction before the Spring Festival, Shenzhen is the inflow direction after the Spring Festival, and Hengyang is the outflow direction after the Spring Festival. The closer to red line color, the higher the intensity of population flow, and vice versa.