Spatial Characteristics of the Tourism Flows in China: A Study Based on the Baidu Index

The characteristics of information flow, as represented by the Baidu index, reflect the pattern of tourism flows between different cities. This paper is based on the Baidu index and applies the seasonal concentration index and social network analysis (SNA) methods to study the spatial structure characteristics of tourism flows in China. The results reveal that: (1) both the search volume of the Baidu index in different cities and the online attention to different scenic areas exhibit obvious spatial heterogeneity and seasonal differences; (2) regions with strong tourism flow connections mainly occur in the areas between metropolises or among the inner cities of urban agglomerations, which are largely distributed on the southeast side of the Heihe–Tengchong Line; (3) the development of the whole tourism flow network in China is low, with an unbalanced development between tourism supply and demand, indicating that tourism resources are concentrated in a few cities and that most of the information interaction among cities occurs in core areas, while a weak interaction is observed in peripheral areas; (4) cities like Beijing and Wuhan attain obvious advantages in regard to their tourism resources, whereas other cities, including Beijing, Shanghai, Shenzhen and Guangzhou, exhibit a high demand for tourism. Moreover, tourism information flow networks are concentrated in several cities with an important role in the Chinese urban system, such as Beijing, Wuhan, and Chengdu, because they contain abundant tourism resources, well-developed transportation systems and advanced economic and societal development levels. (5) Cities such as Beijing, Lhasa, Wuhan, and Zhengzhou possess numerous advantages due to structural holes, and they thus occur at an advantageous position in the tourism flow network.


Introduction
With the increasing popularity of Internet technology in China, the Internet has become an important tool for people to obtain information and assist their decision-making processes. As of December 2018, the number of Internet users in China was 829 million, and the number of online travel reservations reached 410 million [1]. The Internet has become a popular method to obtain tourism information for an increasing number of people because of its low cost, convenient access and diversified methods to search for information [2]. The development of information technology, especially the Internet, has generated an important type of data acquisition method. Every time an individual interacts with the Internet, be it through a website, search engine, social media platform or mobile phone, interaction trace data are captured, stored, and analysed [3]. Travellers rely on the Internet to search trip information, including dining options, scenic areas, accommodations, and shopping and entertainment services, which provides useful information for research. Social network methods analysis plays important role in tourism research, especially with the continuous improvement of tourism, research using social network methods is gradually increasing [4][5][6][7]. With the development of information and communication technology and advancement of social network platforms and organizational structures, the space while Zhu (2019) proposed a framework to reveal tourist groups and investigated tourist behaviour based on mobile phone call detail records [35]. Moreover, Mou (2020) considered Flickr data to research the inbound tourism flows in Shanghai [7]. The search behaviour of netizens leaves a large number of search traces, which can be quantified with the search index (also referred to as online attention) according to the location and search content. Due to its timeliness and convenience of data acquisition, Internet searching has become increasingly popular. Based on a certain topic, Google Trends and the Baidu index, the most commonly adopted approaches, facilitate continuous monitoring, which has been applied to many aspects, such as hotel room demand [36,37], tourist volume and flow [3,[38][39][40], contagious disease outbreaks [41], consumer consumption [42], unemployment rate [43], and land cover change [44]. In 2018, Baidu (www.baidu.com, accessed on 19 April 2021), the largest Chinese search engine, occupied a market share of 70.3% of the search engine market in China. With its very large number of users, the Baidu index has become an important research and analysis tool in the big data era, and it also provides data support for related research in China. More importantly, many existing studies have verified that internal relations exist between the Baidu index and tourism demand [3,39]. Therefore, it is important to conduct research considering the Baidu index.
This paper focused on the spatial structure and pattern of tourism flows according to Stewart (1996) [26], Mckercher (2008) [27], and Guo (2014) [28], a complex network approach able to study well the spatial structure and pattern of tourism flows, thus using the method of social network analysis. This study considering the advantages and disadvantages of traditional data and online big data, adopted the social network analysis (SNA) method and applied search query volume data provided by Baidu to investigate the characteristics of the network spatial structure of the urban tourism information flows in China. The main questions to address were as follows: what are the characteristics of the tourism flow network in China from the perspective of information flow? What is the position of different cities in the network, and which cities occupy an important position in the network?

Data Source and Description
Baidu is the largest Chinese search engine, and the Baidu index was introduced in 2006 based on big data. The Baidu index is a free big data analysis service based on Baidu web search and news services, indicating the query volume of users who search given associated keywords within a specific region (with the city as the minimum area). The Baidu index reflects the popularity of a particular query and user interests at a given moment in time [3]. Currently, the Baidu index provides different functions, such as trend research, demand mapping, crowd portraits and information attention, and users may select a specific time or region to query the attention degree of given keywords according to their needs. When searching the Baidu index page, keywords are first entered and the search period, range and region are selected, and a trend line of the entered keywords is then obtained for the corresponding period and region. A trend map also shows the daily search amount of a certain keyword within the region selected with the mouse [39].
Scenic areas in China can be classified into five levels. Among them, the AAAAA scenic area classification indicates the highest level, representing the top-quality tourism resources in China, which has become an important object in tourism research. According to statistics obtained from the Ministry of Culture and Tourism of the People's Republic of China, there were 259 AAAAA scenic areas at the end of April 2019. In this study, search keywords are entered based on the commonly used names of scenic areas. Since 20 scenic areas, including the Tangwang River in Yichun and Kashgar Old Town, are not included or have been included on a later date in the Baidu index, 239 scenic areas are researched. In this study, the Baidu index from 1 January to 31 December 2018, is analysed by using the Baidu index platform with the area names as search keywords and 365 cities (including county-level cities) as search regions. Basic research data are provided when relevant index data, after processing, are converted into a search volume data set concerning cities in accordance with the cities where scenic areas are located.

Annual Seasonal Concentration Index
The annual seasonal concentration index is an important indicator to study the characteristics of the tourism seasonal distribution. It has been mainly considered to carry out quantitative studies on the temporal distribution concentration of tourism flows. Its calculation equation is expressed below: where R is the annual seasonal concentration index, X i is the proportion between the tourism flows in each month and those in the whole year and 8.33 is the proportion between the monthly and annual Baidu indices, which are calculated considering the absolute average over a whole year. With increasing value of R, more differences occur between the various seasons, with a certain seasonal distribution more concentrated and the gap between the low and peak season widening. If the value approaches zero, the seasonal distribution is relatively uniform, with few differences among seasons [45].

Social Network Analysis
SNA plays a key role in economics, management, and sociology [46]. A social network is defined as a finite set or sets of actors and any corresponding relation or relations [47]. This method has become a powerful means for the allocation of resources and information among members [48,49]. The network density, centrality, centralization, core-periphery structure model and cohesive subgroups were selected to study the whole network, and the selection centrality and structural holes were determined to examine the node structure. This study uses UCINET 6 software for social network analysis, UCINET 6 for Windows is a software package for the analysis of social network data based on nodes and the ties in the network, in this study, the nodes refers to the city nodes, the ties refers to the tourism information flow represented by the Baidu index. Network density, centrality, centralization, core-periphery structure, cohesive subgroups and structural holes are automatically calculated by UCINET 6.
(1) Network density The network density reveals the tightness of the network structure, which is calculated as the ratio of the number of connections to the total possible number of connections in the tourism network. A density approaching 1 indicates that there exists strong coordination between groups. In the directed relationship graph, the equation is [46]: where m denotes the number of arrows in the network, n denotes the number of nodes in the network, and n(n−1) is the theoretical highest connection value.
(2) Centrality The centrality indicates the position or power of an individual or organization within a social network, and centrality analysis mainly involves the closeness centrality, degree centrality, and betweenness centrality [46].
a. Degree centrality The degree centrality of a node refers to the number of nodes directly related to this node. The equation is as follows: where C D (i) is the degree centrality of i (the tourism node), n is the number of tourism nodes, and r ij denotes the connections between i and j. In a directed network, the outdegree centrality and in-degree centrality apply. A higher in-degree indicates a higher influence (in terms of the choices received), and a higher out-degree is more central (in terms of the choices made). b. Closeness centrality The closeness centrality measures the centrality of a node. The closeness centrality is regarded as a measure of the time required for information to spread from node i to all other nodes sequentially, thus reflecting the accessibility of a given node to the other nodes. The closeness centrality of a node can be defined as: where C C (i) denotes the closeness centrality of node i, which is the reciprocal of the sum of the shortest distances from node i to all nodes in the network, and d ij is the shortest distance between nodes i and j. The more central a node is, the smaller its total distance to all other nodes. c. Betweenness centrality The betweenness centrality is an index measuring the influence of nodes acting as mediators in the entire network. If a node occurs along the shortest path of many other tourism node pairs, this node exhibits a high betweenness centrality and exerts a greater influence than that of the other tourism nodes. The betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. The betweenness centrality can be expressed as: where C B (i) denotes the betweenness centrality of node i, δ st is the total number of shortest paths from nodes s to t, and δ st (i) is the number of paths passing through node i.
(3) Centralization Freeman's centralization for a given centrality index is a measure of how central a vertex is regarding how central all the other vertices are with respect to the given index [50,51]. This parameter indicates the degree of centralization of the network, which is determined via quantitative analysis of the group rights, thus reflecting the overall balance and consistency of the network, including the degree centralization, betweenness centralization and closeness centralization.
(4) Core-periphery structure model In the core-periphery structure model, nodes are divided into core and peripheral areas according to the closeness among nodes in the network, with the nodes in core areas occupying more important positions [52]. Analysis of the core-periphery structure model aims to study which nodes occur at the core and which nodes occur at the periphery. The concepts of the core and peripheral areas are relative since they are determined based on the relative density of the relationship among nodes in the network.
(5) Cohesive subgroups Cohesive subgroups have always represented an important construct for sociologists who study individuals and organizations [53]. Cohesive subgroups constitute a typical analysis method of social network substructures, and this approach simplifies the complex network structure and enables researchers to identify substructures and their relationships within the network. If certain nodes in the network are close, they are combined into a subgroup, namely, a cohesive subgroup. The nodes within this group exhibit a relatively strong, direct and close relationship.
(6) Structural holes Broken connections commonly occur between nodes in a network, which are referred to as structural holes [54]. For example, node A is connected to nodes B and C in a tourism network structure, but B is not connected to node C. Thus, there exists a structural hole between nodes B and C, and node A occurs in this structural hole. Generally, indices such as effective size, efficiency and constraint are adopted to measure structural holes.

Spatial Distribution Characteristics of the Tourism Flows
Based on the information flow perspective, this study analyses the spatial distribution characteristics of the tourism flows in China from three aspects: first, the spatial distribution of the Baidu index search volume in each city is considered, which reflects the level of the tourism demand in different cities to a certain extent. Second, the Baidu index search volume of each scenic spot is determined, i.e., the spatial distribution of the online attention regarding certain scenic spots, which reasonably reflects the overall tourism supply level. Third, the strength of the tourism flow connection between cities is evaluated. The maps generated in this study are divided into five levels for analysis purposes based on the Jenks method.

Spatial Distribution of the Baidu Index Search Volume at the City Level
The daily search volume pertaining to 364 cities was summed to obtain the total Baidu index of each city in 2018 ( Figure 1). Selecting the Heihe-Tengchong Line as the boundary, a clear distribution trend is observed with a large search volume east of the line and a small search volume west of the line. Within the vast area associated with the line, only certain provincial capitals attain larger search volumes, such as Urumqi, the capital of Xinjiang, Xining, the capital of Qinghai, Lanzhou, the capital of Gansu, Yinchuan, the capital of Ningxia, and Hohhot, the capital of Inner Mongolia. To the east of the line, cities with a high level of economic development exhibit a larger search volume, mainly Beijing, Shanghai, Hangzhou, Chengdu, Shenzhen, Guangzhou, Suzhou, Zhengzhou, Nanjing, Xi'an, Wuhan, Chongqing, Tianjin and Changsha, and certain areas exhibit a larger search volume, such as the Beijing-Tianjin-Hebei, Yangtze River Delta, Pearl River Delta, Chengdu-Chongqing and Shandong Peninsula urban agglomerations. However, since the residents of Taiwan, Hong Kong and Macau do not often use the Baidu search engine, the total search number is relatively small in these areas. With the use of Equation (1), the annual seasonal concentration index of each city is calculated ( Figure 2). There are large differences among the various cities. The lowest index value is 0.51 in Ningbo, and the highest index value is 6.41 in Ledong. The overall value of the seasonal concentration index is relatively high, with the values of 109 cities ranging from 1.64 to 2.10, those of 69 cities ranging from 2.11 to 2.86 and those of 22 cities ranging from 2.87 to 6.41. Notable differences are observed on both sides of the Heihe-Tengchong Line. The value of the seasonal concentration index of most cities in the west is high, and the value in the east varies between 1.23 and 2.86.

Spatial Distribution of the Online Attention to Scenic Areas
The total volume of the online attention to scenic areas in 2018 was calculated ( Figure 3). The total volume of the online attention to the Tu Nationality Park in Huzhu Tu Autonomous County, Qinghai, attains a minimum value of 27,020, while the Palace Museum attains a maximum value of 23,681,400. The latter value is 876 times the former value, indicating obvious differences between these scenic areas. Areas with total volumes exceeding 11,889,800 are largely well-known scenic areas, such as the Palace Museum in Beijing, the Huangshan Scenic Area in Huangshan, the Gulangyu Island Scenic Area in Xiamen, the Shaolin Temple in Dengfeng, Mount Tai in Tai'an, and West Lake in Hangzhou. These scenic areas are very popular in China and worldwide. There are only 44 areas with a total volume of online attention larger than 7,499,800, mainly located to the east of the line, while 160 areas exhibit a volume smaller than 4,241,600, indicating that they generally occur at a low level.
Calculating the annual seasonal concentration index of each city with Equation (1) (Figure 4). The minimum value is 0.6 in the Emei Mountain Scenic Area in Leshan, and the maximum value is 6.02 in the Baili Rhododendron Scenic Area in Bijie. Notably, all values of these areas are relatively high, but the total volume of the areas to the north of the Yellow River is larger than that of the areas to the south of the Yangtze River.  This section research the spatial distribution characteristics and association strength of the tourism information flows, it is the basic analysis of the development of China's tourism information flow. Especially for the annual seasonal concentration index, policy makers and tourism stakeholders need to formulate policies and measures to promote the development of tourism in the off-season and peak season according to their regional location.

Research on the Network Spatial Structure of the Tourism Flows
The network spatial structure of the tourism flows in China is analysed by using the SNA method from the information flow perspective. Before index calculation, the data are processed with the binarization method, and all index calculations are then completed in UCINET 6.

Network Density
The largest number of network connections comprising 364 nodes should be 132,132, while the actual number only reaches 21,188, and the network density is 0.160, which is relatively low. This indicates that the overall network development is relatively low and that the tourism information association among cities is not notable. The reason for this result is that there exist large differences in the geographical distribution among the AAAAA scenic areas in China, which is also related to the seasonality of tourism. In addition, the distribution differences of the Baidu index between both sides of the Heihe-Tengchong Line also contribute to a lower network density in China.

Centralization
According to the calculations, the out-degree centralization is 24.528%, and the indegree centralization is 76.185%. The out-degree centralization is much lower than the in-degree centralization, indicating that the capacity of the different cities to export tourism information to other cities is comparatively balanced. However, a large difference occurs in the capacity to receive information, i.e., tourism destinations are concentrated in a few cities. The tourism flows are unevenly developed in space. Most tourism flows are concentrated in core cities, which is greatly affected by the attractiveness of destinations. The betweenness centralization is just 2.88%, revealing that there exists a large gap in the influence of city nodes, and tourism resources are focused on a few cities.

Core-Periphery Structure Model
The core-periphery structure model is used for analysis purposes, and the final fitness value is calculated at 0.766, demonstrating that the model fitness effect is good. The calculation result ( Figure 6) indicates that the network of the tourism information flows in China exhibits a core-periphery structure. There are 103 nodes in core areas with a relationship density of 0.717 and 261 nodes in peripheral areas with a relationship density of only 0.015, which illustrates that tourism information interaction among cities often occurs in core areas and that tourism interaction in peripheral areas is limited. Most of the core areas are located in major urban agglomerations, such as the Beijing-Tianjin-Hebei, Pearl River Delta, Yangtze River Delta, and Shandong Peninsula urban agglomerations. This also verifies that this feature is consistent with economic development and population mobility in China.

Cohesive Subgroups
Cohesive subgroups reveal the tourism information flow association between cities and reflect the relationship between groups and the environment, among groups and among the individual members of groups. With the factions command in UCINET 6, cohesive subgroups are divided. When the number of factions is set to 6, the final proportion correct is 0.7932, yielding a better effect. Therefore, the network is divided into six sections (Figure 7), and the density of the cohesive subgroups is listed in Table 1. The network density of the fifth subgroup is much higher than that of the other subgroups, and interconnections among the subgroups mainly occur between the fifth subgroup and the other subgroups, indicating the important position of the fifth subgroup. The fifth subgroup primarily includes cities and urban agglomerations with a high development and abundant tourism resources, such as most cities in Shandong, Zhejiang, and Hebei provinces and Beijing, Shanghai, Tianjin, Hohhot, Changchun, Chengdu, Yinchuan, Huangshan, Taiyuan, Foshan and Guangzhou. The distribution of the other subgroups exhibits obvious features: the first subgroup mainly includes provinces such as Inner Mongolia, Ningxia, and Gansu, the second subgroup largely includes provinces such as Heilongjiang, Jilin, and Liaoning, the third subgroup mostly includes provinces such as Sichuan, Yunnan, Guizhou, Guangxi and Guangdong, the fourth subgroup primarily includes Qinghai, Xinjiang, Henan and Anhui provinces, and the sixth subgroup mainly includes Xinjiang, Hunan and Jiangxi provinces.

Centrality of the Nodes
In this study, the degree centrality and betweenness centrality are considered to analyse the node structure of the tourism information flows in China in order to evaluate the role of cities within the network.
(1) Degree Centrality To accurately measure the role of each city in the Chinese tourism information flow network, it is necessary to calculate the in-degree centrality (Figure 8a) and out-degree centrality (Figure 8b) of cities with Baidu. A large difference is observed between the distribution characteristics of the out-degree centrality and in-degree centralities. The out-degree centrality exhibits obvious spatial concentration features, while those of the in-degree centrality are not obvious. Because of the long history, complex topography and unique characteristics of the cultural and natural resources in the different regions, areas with a high in-degree centrality are found in many cities, such as Beijing, Wuhan, Leshan, Aba, Tai'an, Zhengzhou, Jiaxing, Weinan, Hainan Tibetan Autonomous Prefecture, Huangshan, Lijiang, Hangzhou, Jinzhong, Lhasa, Suzhou, Xiamen, Chengdu, Xi'an, Xinzhou, Ji'an, Anshun, Yichang, Yanbian, Shiyan, Jiujiang, Nanping, Tongren, Shennongjia and Luoyang. These cities contain abundant tourism resources and tourist attractions, thus attracting more tourists. Cities with an in-degree centrality of 0 widely occur and are observed in every province, which further indicates that high-quality tourism resources in China are mainly concentrated in a few cities, and most cities in China contain relatively poor tourist attractions. As mentioned before, the out-degree centrality exhibits obvious spatial concentration features, where the out-degree centrality on the east side of the Heihe-Tengchong line is higher than that on the west side. Cities such as Beijing, Shanghai, Shenzhen, Guangzhou, Hangzhou, Suzhou, Chengdu, Nanjing, Wuhan, Xi'an, Changsha, Tianjin, Hefei, Shenyang, Jinan, Chongqing, Zhengzhou, Qingdao, Dongguan, and Nanning attain higher out-degree centrality values, which indicates that these cities comprise the core of the network and impose the greatest regional influence, thus reflecting that there exists a high tourism demand and an advanced development level. (2) Betweenness Centrality The calculation result of the betweenness centrality ( Figure 9) shows that there are only 13 cities higher than 749, namely, Beijing, Wuhan, Chengdu, Hangzhou, Zhengzhou, Suzhou, Xi'an, Sanya, Guangzhou, Xiamen, Lhasa, Jiaxing and Shanghai, which contain abundant tourism resources and occupy an important position in the urban system of China. Among these cities, Beijing, Shanghai, and Guangzhou are national central cities in China, and Wuhan, Chengdu, Zhengzhou, and Xi'an have become national central cities where the Chinese government clearly supports development, while Hangzhou and Lhasa are provincial capital cities. Moreover, regional development is prioritized in Suzhou, Sanya, Xiamen, and Jiaxing. These cities impose a strong influence on the Chinese tourism information flow network. Betweenness centrality values ranging from 289.25 to 749 are observed in 17 cities, including Weinan, Nanchang, Xining, Tai'an, Luoyang, Jinzhong, Wuxi, Chongqing, Leshan, Kunming, Yichang, Nanjing, Jiujiang, Qingdao, Jinan, Baoding, and Ji'an, which play an important role in regional development of the economy and transportation. The betweenness centrality values of 34 cities vary between 82.05 and 289.24, and the values of the remaining 300 cities are all lower than 82.04. In general, there are only a few cities with high betweenness centrality values due to their rich tourism resources, well-developed transportation system, advanced eco-social development level and important role in the Chinese urban system, thus exerting a superior tourism resource influence.

Structural Holes
This study considers effective size, efficiency and constraint to measure the structural holes of the network (Figure 10). Cities such as Beijing, Lhasa, Wuhan, Zhengzhou, Leshan, Hangzhou, Lijiang, Suzhou, Tai'an, Huangshan, Xiamen, Chengdu, Jinzhong, and Xi'an attain high effective size and efficiency values but low constraint values, which indicates that these cities are less affected by other cities. Hence, these cities occupy a favourable position in the entire network, with advantages in terms of their location and influence. Cities such as Dongfang, Shannan, Chengmai, Changjiang, Hotan, Wenchang, Wanning, Qamdo, Jinchang, Lincang and Gannan exhibit low effective size and efficiency values and high constraint values, demonstrating that these cities are more affected and thus strongly depend on other cities. Therefore, these cities must enhance their connection and cooperation levels with other cities. Cities with numerous structural holes are mainly located at core positions with abundant tourism resources and a notable superiority, whereas cities with a few structural holes largely occur in regions with relatively low economic and transportation development levels, such as Hainan, Tibet, Xinjiang, Gansu and Inner Mongolia.
Social network analysis plays a key role in this research, research can provide some guidance for China's tourism development. The tourism connection of urban agglomerations is significantly higher than that of other regions. Therefore, when formulating regional development plans and urban agglomeration development plans, policy makers should fully consider the tourism infrastructure construction and driving effect of tourism development on regional development, and consider tourism as an important means to promote the development of urban agglomeration. At the same time, tourism stakeholders should focus on building the tourism image of the core city, and form characteristic tourism resources through infrastructure construction and brand publicity. On this basis, the comprehensive development of regional tourism from point to area can be realized.

Conclusions
Based on the relevant theories and methods of traditional research [5,28], this paper makes full use of the advantages of big data in tourism research [6,7,34,35], and studies the information flow network of China's tourism through the Baidu index. Compared with existing research, this paper provides a perspective for the study of large-scale tourism flows using relatively easy data sources. At the same time, based on the perspective of geography, it can also provide some reference for the research methods of tourism geography research. Via the construction of the tourism information flow network among cities, the spatial structure characteristics of tourism information flows are analysed, and the conclusions are as follows: (1) There are apparent geographical and seasonal differences in the number of Baidu searches among cities. The number of searches to the east of the Heihe-Tengchong line is much larger than that to the west of the line. In the vast area on the west side of the Heihe-Tengchong line, only some provincial capital cities attain a larger search volume, while on the east side, most cities attain larger search volumes, which indicates a trend of high-value agglomeration. The annual seasonal concentration index of cities is relatively high in general, but notable spatial heterogeneity is observed. On the whole, the cities on the west side of the Heihe-Tengchong line attain a higher index than that of the cities on the east side. This shows that China's tourism pattern is consistent with the spatial pattern of economic and social development. Policy makers and tourism stakeholders need to formulate policies and measures to promote the development of tourism in the off-season and peak season according to their regional location. The online attention to the various scenic areas greatly differs. Scenic areas with distinctive features and attractive tourism resources, such as the Palace Museum, the Huangshan Scenic Area in Huangshan and the Xiamen Gulangyu Island Area, receive more online attention than do the other scenic areas. In summary, the cities on the east side of the Heihe-Tengchong line attract more online attention. In addition, the seasonal concentration index of cities is relatively high overall, with the index of the scenic areas to the north of the Yellow River higher than that of the scenic areas to the south of the Yangtze River, revealing large differences among the various regions. This shows that scenic spots, especially those located in the north of the Yellow River, should dig deep into the characteristics of the scenic spots, develop tourism products adapted to different seasons, and improve the level of consolidation and development of the scenic spots. (2) The difference in the association strength of the tourism information flows among cities is very obvious. Areas with a high association strength are mainly distributed in the major cities of urban agglomerations (e.g., Beijing-Qingdao, Beijing-Guangzhou, Beijing-Shanghai and Chengdu-Xi'an) and in the cities occurring within urban agglomerations (e.g., Shanghai-Hangzhou, Nanjing-Suzhou, Guangzhou-Foshan, Chengdu-Leshan and Luoyang-Zhengzhou). Almost all these cities are located on the southeast side of the Tengchong-Heihe line. (3) Through analysis of the network density, network centrality, cohesive subgroups and core-periphery structure, the whole network structure of the tourism information flows in China is studied in detail. The results demonstrate that the tourism information flow network of China exhibits the following characteristics: network development is relatively low since the network density only reaches 0.16, indicating that the overall connection of the network is poor, which is directly related to the scenic area distribution and the very large difference in economic development between both sides of the Heihe-Tengchong line; there exists an obvious imbalance in the tourism supply and demand, namely, the out-degree centrality is much lower than the in-degree centrality, verifying that the tourism demand is relatively balanced. However, since most tourism destinations are concentrated in a few cities and the betweenness centrality is low, this further demonstrates that most tourism resources are located in a few cities; the network exhibits a distinct core-periphery structure, most of the tourism information flows occur in the core areas, while the flows among the peripheral areas are weak; and the network is divided into six cohesive subgroups with the cohesive subgroup analysis method, of which the sixth subgroup occupies the most important position in the tourism information network, including the major urban agglomerations and central cities in China. Policy makers and tourism stakeholders should fully understand the characteristics of the tourism information flow network, fully consider the regional tourism supply and demand, formulate reasonable development plans, and promote high-level development of regional tourism and economic society. (4) Further research is conducted considering the centrality and structural holes, and the results indicate that the nodes of cities have the following characteristics: the in-degree centrality does not exhibit obvious characteristics in regard to spatial agglomeration, indicating that the tourism resources in China are scattered, but cities such as Beijing, Wuhan, Leshan, Aba, Tai'an and Zhengzhou attain obvious advantages in regard to their tourism resources; the out-degree centrality exhibits obvious spatial agglomeration characteristics, demonstrating that the tourism demand in China is unbalanced, for example, cities such as Beijing, Shanghai, Shenzhen, Guangzhou, Hangzhou, Suzhou and Chengdu occur at the core of the network with a highly notable influence, and they exhibit a high tourism demand because of their economic advantages; tourism information flow network resources are mainly located in a few cities, including Beijing, Wuhan, Chengdu, Hangzhou, Zhengzhou, Suzhou, Xi'an, Sanya, Guangzhou, Xiamen, Lhasa, Jiaxing, Shanghai, Weinan, Nanchang, Xining, Tai'an, Luoyang, Jinzhong, Wuxi, Chongqing, Leshan, Kunming, Yichang, Nanjing, Jiujiang, Qingdao, Jinan, Baoding and Ji'an, because these cities contain abundant tourism resources, an advanced transportation system, a high development level and an important role in the Chinese urban system; cities such as Beijing, Lhasa, Wuhan, Zhengzhou, Leshan, Hangzhou, Lijiang, Suzhou, Tai'an, Huangshan, Xiamen, Chengdu, Jinzhong and Xi'an occupy a beneficial position in the tourism network, and they exert more influence. However, cities such as Dongfang, Shannan, Chengmai, Changjiang, Hotan, Wenchang, Wanning, Qamdo, Jinchang, Lincang and Gannan are restricted by other cities because they are dependent on these cities. Therefore, it is necessary to strengthen their connection and cooperation levels with other cities.
There are a number of limitations to our study. First, since scenic areas with few tourist attractions are included in the Baidu index to a lesser extent, the study constructs the urban tourism information flow network mostly based on AAAAA areas. Actually, with increasing information incorporated into the Baidu index, it is necessary to comprehensively investigate more types of scenic area in future research. Second, the spatial pattern of the tourism information flow network is examined based on 2018 data. Hence, certain deficiencies probably occur. In the future, in-depth research should be performed on spatiotemporal development and its development mechanism.
Author Contributions: Conceptualization, methodology, software, formal analysis, investigation, resources, data curation, writing-original draft, visualization, Yongwei Liu; writing-review and editing, validation, Wang Liao. All authors have read and agreed to the published version of the manuscript.