3.1. General Analysis and Comparison
The general differences and characteristics of the three types of Spring Festival big travel data were investigated from three aspects: general spatial characteristics of travel networks, quadratic assignment procedure (QAP) correlation, and time series characteristics. The data used in the analysis were normalized through extremum method, and natural break method was used to classify data. Three travel networks and the census migration networks are shown in
Figure 2, and the general spatial characteristics can be obtained. The networks of Baidu and Tencent had a similar characteristic, that is, strong travel flows mostly occurred between short-distance province pairs (
Figure 2f,g). However, the Qihoo network had a different characteristic, that is, strong travel flows mostly occurred between relative long-distance province pairs, such as Guangdong-Sichuan, Guangdong-Hubei, Guangdong-Henan, Sichuan-Zhejiang, and Beijing-Heilongjiang (
Figure 2h). The census network had a combination characteristic, that is, strong migration flows existed in both short- and long-distance province pairs (
Figure 2e).
QAP is used to measure network correlation [
28], and we use it to compare the linear and log-transformed correlations among the four networks (
Table 2). The networks of the census, Baidu and Tencent, had high linear and log-transformed correlations with one another; in particular, the linear correlation between Baidu and Tencent reached 0.95. However, the network of Qihoo had a weak correlation with that of Baidu and Tencent and had a moderate correlation with that of the census. After logarithmic transformation, the correlations of Qihoo network with the others improved greatly, and the correlation with the census network reached 0.82.
As shown in the time series line charts of Spring Festival travel flows (
Figure 3), the three original travel data had a great difference, which was possibly caused by platform user scale and data acquiring approach. However, these data had a similar overall travel trend, which was evident after data normalization. Furthermore, the travel strength of Tencent was generally stable, but the travel strength of Baidu and Qihoo showed a generally declining trend. The possible reason of this observation for Qihoo was that the train ticket supply after the Spring Festival was not as tight as it before the Spring Festival. Therefore, Qihoo could only provide data until the Lantern Festival (5 March 2015). The three travel datasets showed two travel flow peaks. One appeared approximately five days before the Spring Festival, and the other appeared at approximately six days after the Spring Festival, which was the day the holiday finished. Two travel flow valleys appeared on two important festivals, namely, Spring Festival and Lantern Festival, which were the days for family reunion. As shown by Tencent and Qihoo data, two small peaks appeared, that is, one before and one after 25 February 2015, which was the day the holiday finished. However, Baidu data did not have this characteristic.
The following basic characteristics of the three travel datasets could be derived from the preceding analysis: First, the travel data of Baidu and Tencent had similar characteristics because both of them were obtained from LBS. Baidu and Tencent could acquire travel data including multiple modes of transportations, such as vehicles, trains, and planes, in real time. These modes of transportation covered short- and long-distance travel. Second, the Qihoo travel data, which were obtained from the third-party automatic train ticket-booking platform of Qihoo, might have two distinguishing characteristics. On the one hand, the platform could only acquire the railway passenger flow data characterized by long-distance travel, as mentioned earlier. On the other hand, the platform could not collect the whole train ticket booking data because this platform was a third party, thereby resulting in incompleteness of the data. Third, the census migration data used as a reference represented long-term macro migration. The characteristic of the census migration data—namely, strong migration flows—existed in both short- and long-distance province pairs, was reasonable. This finding was consistent with the Levy flight model.
3.2. Net Travel Flow Analysis
Net travel flows could be used to determine that provinces are immigration or emigration ones in the whole travel network. The time series of net travel flows showed that the three travel datasets consistently formed two significant clusters (
Figure 4a–c). One was immigration cluster whose net travel flows was positive before the Spring Festival and negative after the Spring Festival, whereas the other was emigration cluster, which behaved oppositely. The overall variation of immigration provinces was significantly larger than that of emigration provinces for Baidu and Tencent; however, this difference did not exist for Qihoo. This difference reflected the strong attractiveness of population for the few immigration provinces. Furthermore, the variation of Qihoo was more irregular than that of the others. Two variation peaks of net travel flows existed. The first variation peak appeared approximately five days before the Spring Festival, and the second variation peak appeared at approximately six days after the Spring Festival. This observation was consistent with the overall travel trend analyzed earlier. However, the variation valley of Baidu appeared in the middle of Spring Festival holiday, whereas the variation valleys of Tencent and Qihoo appeared on the Spring Festival. For Baidu, the second variation peak of Guangdong Province, which was apparently delayed, appeared at approximately 12 days after the Spring Festival.
The four-quadrant diagram of net travel flows could easily identify the function and attractive strength of provinces. In general, the data points of Baidu and Tencent could significantly fit a straight line. The fitting line slope of Tencent was close to −45°, whereas that of Baidu was obviously small (
Figure 4d–f)). This observation indicated that the travel flow strength of Tencent before and after the Spring Festival was nearly the same, whereas the travel flow strength of Baidu before the Spring Festival was stronger than those after the Spring Festival. The former was identical with theoretical expectation, but the latter was not. The points of Qihoo were discrete, and no obvious regularities existed. Furthermore, the immigration provinces could be divided into two categories. The first category was the eastern coastal developed provinces characterized by their manufacturing and service-oriented industries. These provinces included Guangdong, Beijing, Shanghai, Zhejiang, and Jiangsu. The second category was the central and western underdeveloped provinces characterized by their labor-intensive industries, such as agriculture and mining. These provinces included Xinjiang, Qinghai, and Inner Mongolia. The majority of the emigration provinces were populous and were developing provinces located in the central and western China, such as Anhui, Henan, Sichuan, Jiangxi, and Hubei. The three travel datasets had a high consistency with the census migration data; however, the travel data of Qihoo had a larger difference with the census data than the other two data. Spearman correlation coefficients among the travel and migration data of the census, Baidu, and Tencent were all above 0.95, but the correlation coefficients of Qihoo with the others were only approximately 0.85 (
Table 3). This observation suggested that some drawbacks might exist for Qihoo travel data.
As analyzed earlier, Tencent travel data had a good performance in the overall trend, theoretical expectation, and the consistency with the census data. Baidu travel data had a deficiency, that is, the travel flows of Baidu before the Spring Festival were stronger than those after the Spring Festival. This observation did not meet the theoretical expectation. Qihoo travel data and the others had large differences which might be caused by acquiring railway passenger flow data only and by data incompleteness.
3.3. Travel Network Analysis
Small-world and scale-free analyses are the first necessary programs of a complex network analysis. The average clustering coefficients of the four travel and migration networks were all larger than those in the random situation, and the average shortest paths of the four networks were all longer than those in the random situation (
Figure 5a,b). Moreover, the small-world-ness indexes of the census, Baidu, Tencent, and Qihoo networks were 0.69, 0.65, 0.74, and 0.54, respectively, which were all smaller than 1. Therefore, the four networks all did not have the small-world characteristic. The node strength distributions of the four networks did not follow the power law. This observation proved that the four networks did not have scale-free characteristics. However, their edge strength distributions significantly followed the power law (
Figure 5c,d). Furthermore, either node or edge strength distribution of Qihoo network showed a deviation from that of the others. These analyses showed that the travel network at the provincial scale had two characteristics. First, the large average clustering coefficient indicated that the network had a significant community characteristic, and the long average shortest path indicated that the travel was not nationwide. Second, the no scale-free characteristic indicated that the travel might have a multicenter trend, whereas the scale-free characteristic of the edge strength indicated that some strong travel flows existed among some provinces, which might be the key paths that should be paid more attention.
Some differences in node strength existed among the four networks, but an overall regional differentiation law could be found. This law indicated that high node strength appeared in the eastern and central China and low node strength appeared in the west China (
Figure 6a,d,g,j). In particular, the four networks showed that Guangdong Province was the only one with ultrahigh node strength. Other provinces with high node strength mainly consisted of the eastern coastal developed provinces, such as Beijing, Shanghai, Zhejiang, and Jiangsu, and the central and western populous provinces, such as Henan, Sichuan, and Anhui. Therefore, the economic and social development level and population size might be important factors affecting travel strength.
The edge betweenness of the four networks had a similar spatial distribution with their edge strength. The spatial distribution was characterized by the short-distance travel trend of Baidu and Tencent, the long-distance travel trend of Qihoo, and a combination migration trend of the census (
Figure 6b,e,h,k). Furthermore, the travel path chains of high betweenness, included Guangzhou-Hubei-Henan-Beijing, Guangdong-Henan, and Sichuan-Shaanxi-Gansu, geographically corresponded to important traffic backbones, such as Beijing-Guangzhou and Longhai railway lines. These travel path chains should be paid more attention and should be given a high priority.
The average clustering coefficient can be used to infer whether a network has a clustering characteristic, whereas the community identification can further identify the specific clusters. The community identification results of the four networks, whose identification process did not consider spatial factors, showed a high spatial contiguity trend (
Figure 6c,f,i,l). This finding indicated that the travel or migration network might be affected by the geographical effect. Furthermore, the geographic boundaries of communities also showed a high similarity; in particular, the boundaries of Baidu and Tencent were nearly the same. In general, the identified communities could be divided into north and south. The north could be further divided into two subcommunities, namely, northeast and northwest, and the south could be further divided into three subcommunities, namely, Yangtze River Delta, south, and southwest. These results were highly consistent with the geographic zoning of China used in general, thereby further validating the availability of big travel data. However, the northwest and eastern communities were identified as one community in the identification result of Qihoo. This observation further indicated the possible deficiency of Qihoo data.
According to the preceding network analysis, the travel and migration networks of the census, Baidu, and Tencent have high consistencies with one another, whereas the travel network of Qihoo had a deviation with the others. The travel network showed a multicenter characteristic, that is, the eastern coastal developed provinces and the central and western populous provinces behaved as network centers, and a characteristic that some travel path chains played a leading role. Furthermore, the travel network presented a significant geographic clustering characteristic, which validated the geographic effect to the network.