A Methodological Workflow for Deriving the Association of Tourist Destinations Based on Online Travel Reviews: A Case Study of Yunnan Province, China

Insights into the association rules of destinations can help to understand the possibility of tourists visiting a destination after having traveled from another. These insights are crucial for tourism industries to exploit strategies and travel products and offer improved services. Recently, tourism-related, user-generated content (UGC) big data have provided a great opportunity to investigate the travel behavior of tourists on an unparalleled scale. However, existing analyses of the association of destinations or attractions mainly depend on geo-tagged UGC, and only a few have utilized unstructured textual UGC (e.g., online travel reviews) to understand tourist movement patterns. In this study, we derive the association of destinations from online textual travel reviews. A workflow, which includes collecting data from travel service websites, extracting destination sequences from travel reviews, and identifying the frequent association of destinations, is developed to achieve the goal. A case study of Yunnan Province, China is implemented to verify the proposed workflow. The results show that the popular destinations and association of destinations could be identified in Yunnan, demonstrating that unstructured textual online travel reviews can be used to investigate the frequent movement patterns of tourists. Tourism managers can use the findings to optimize travel products and promote destination management.


Introduction
Spatial movement is an essential behavior of tourism activities. Tourist movement involves time, space, place, and scale, which are the basic elements of tourism geography. Tourist travel behavior can potentially imply the popularity of tourist attractions and the correlation among destinations. Moreover, investigating tourist travel behavior can help uncover the intrinsic characteristics of how tourists design their itineraries, thereby helping tourism agencies and industries in planning destination facilities, assessing tourism products, and exploiting tourism resources. Therefore, tourist movement patterns have been an important research topic in tourism geography.
Traditional approaches in investigating tourist movement patterns and destination characteristics usually utilize questionnaires, but the collection of this dataset is costly and time consuming [1]. Moreover, this method is limited in sample size and space-time resolution, making the analysis of tourist travel behavior from a comprehensive and broad perspective difficult. Fortunately, with the rapid development of information and the internet, numerous social media websites and applications (apps) allow tourists to share their own experiences and feelings (e.g., reviews or comments on a tourist attraction or user-generated dataset could provide useful strategies for improving tourism management and digital marketing. At present, most of the studies only utilized the unstructured textual UGC data (e.g., online travel reviews) to understand the tourists' mental response to travel activities. To the best of our knowledge, its application in investigating spatial multi-destination association has not been exploited in tourism research due to its limitation in the access location information of tourist destinations. Based on this research gap, the main aim of this study is to extract the association characteristics of destinations from unstructured textual UGC data, and intends to answer the research questions: how the unstructured textual UGC data could be used to quantify the spatial association rules among tourist destinations such as geo-tagged UGC data.
Therefore, this study investigates whether unstructured online textural UGC data can be used to understand the association among destinations (frequent rules among destinations) and extend the usage of unstructured UGC data from perceiving mental travel experiences to understanding the spatial movement patterns of tourists. Moreover, this study aims to exploit a new path to excavate the frequent association of destinations from unstructured online travel reviews. First, we develop a crawling program to collect popular destination and online travel reviews from a public commercial travel service website. Then, a text-matching algorithm is used to identify the destination sequence for each travel review in accordance with the co-occurrence of the destinations. Finally, we describe the main principle of the association rule learning method to derive the frequent travel patterns from the extracted destination sequence sets. The province of Yunnan in China is used for a case study to demonstrate the feasibility of the proposed method and gather insights into the travel behavior of tourists and the association of destinations.
The main originality and contribution of this study could be drawn from twofold. First, a methodological contribution is that we develop a workflow from collecting travel reviews to identify association rules among destinations from the text. Second, an empirical case study is conducted to help understand the spatial association among the main tourist destinations of Yunnan province in China. The remainder of the paper is organized as follows: Section 2 introduces the study area of Yunnan province. Section 3 describes the methodological workflow including dataset collection, extraction of destination sequences and mining association rules. The research results are shown in Section 4. Section 5 discusses the main findings. Finally, the conclusion is presented in Section 6.

Study Area
Yunnan Province (capital: Kunming) is located at the southwest border of China ( Figure 1). It covers more than 390,000 square kilometers and includes 129 administrative counties. Recently, Yunnan has become one of the top tourist destinations in China because of the following merits: (1) The terrain of Yunnan is a mountainous plateau with an average elevation of 2000 m. Yunnan has many mountains, forests, lakes, and rivers, thus having many graceful natural resources and beautiful sceneries; (2) The climate is comfortable, and the annual temperature difference is small; hence, tourists can visit any time of the year; (3) The province has many historical and cultural resources because it has the most ethnic groups among all provinces of China. Yunnan is composed of different kinds of ethnic cultures with colorful customs, thus attracting many tourists. These abundant tourism resources attract tourists worldwide. According to statistics, more than 6.6 million tourists from overseas visit this province, generating more than 3.5 billion dollars in revenue in 2017. In addition, more than 560 million domestic tourists traveled to Yunnan in 2017, resulting in a revenue of more than 668.2 billion yuan. In recent years, tourism has become a new driving force to promote the economic development of Yunnan. Therefore, understanding the association among tourist destinations within the province is important for administrators and tourism agencies to develop strategies that can provide improved services for tourists. become a new driving force to promote the economic development of Yunnan. Therefore, understanding the association among tourist destinations within the province is important for administrators and tourism agencies to develop strategies that can provide improved services for tourists.

Methodology
In this section, we describe the methodological workflow of the study. First, we develop a crawler program to collect an online review dataset from an open-access tourism website. Second, we extract each tourist's travel destination sequences from the reviews. Finally, we present the main principle of mining association rules among popular tourist destinations. We implement the methodological workflow based on popular Python programming language.

Data Collection
The online review data used in this study is collected from Ctrip (https://www.ctrip.com/ accessed on 22 April 2021). This website is one of the largest internet platforms for Chinese tourists that provide full-scale services, including the list of attractions in a destination, ticket and hotel bookings, and popular travel route recommendations. Ctrip also allows tourists to leave their comments on the attractions or destinations and upload their travel photos, stories, or reviews, thereby providing a reference for other tourists who intend to travel to the same places. Although in text or photo format, the review usually records the travel experiences of tourists in detail, making it possible to find the destinations that were visited by the reviewer. Moreover, the sample size of online reviews is larger than the traditional questionnaire data. Therefore, reviews can be used to understand the association among destinations.
In this study, we develop a web crawling program to download the online reviews of tourists from Ctrip. We input the keyword "Yunnan" in the homepage of Ctrip to search the Yunnan-related homepage, which provides Yunnan's travel-related services (e.g., transport, attraction, accommodation, shopping, etc.). This study mainly focuses on reviews and popular destinations (Figure 2a). We first search the popular destination list in Yunnan and store the name of each destination into a destination set D. For each travel review, we capture information, such as the title, time (month of the tour), number of days in the tour, and content of the travel note ( Figure 2b). After collecting this information, we generate a review information set R.

Methodology
In this section, we describe the methodological workflow of the study. First, we develop a crawler program to collect an online review dataset from an open-access tourism website. Second, we extract each tourist's travel destination sequences from the reviews. Finally, we present the main principle of mining association rules among popular tourist destinations. We implement the methodological workflow based on popular Python programming language.

Data Collection
The online review data used in this study is collected from Ctrip (https://www.ctrip. com/ accessed on 22 April 2021). This website is one of the largest internet platforms for Chinese tourists that provide full-scale services, including the list of attractions in a destination, ticket and hotel bookings, and popular travel route recommendations. Ctrip also allows tourists to leave their comments on the attractions or destinations and upload their travel photos, stories, or reviews, thereby providing a reference for other tourists who intend to travel to the same places. Although in text or photo format, the review usually records the travel experiences of tourists in detail, making it possible to find the destinations that were visited by the reviewer. Moreover, the sample size of online reviews is larger than the traditional questionnaire data. Therefore, reviews can be used to understand the association among destinations.
In this study, we develop a web crawling program to download the online reviews of tourists from Ctrip. We input the keyword "Yunnan" in the homepage of Ctrip to search the Yunnan-related homepage, which provides Yunnan's travel-related services (e.g., transport, attraction, accommodation, shopping, etc.). This study mainly focuses on reviews and popular destinations (Figure 2a). We first search the popular destination list in Yunnan and store the name of each destination into a destination set D. For each travel review, we capture information, such as the title, time (month of the tour), number of days in the tour, and content of the travel note ( Figure 2b). After collecting this information, we generate a review information set R.

Extracting the Destination Sequences from Online Travel Reviews
This section introduces the process of extracting the destination sequences for each tourist from their online review. Destination set D = { 1 , 2 , … , }, where represents the name of the destination, and n is the number of popular destinations in Yunnan. Review set R = { 1 , 2 , … , }, = { , , , }, where l is the number of total reviews, and , , , and represent the corresponding title, month, days, and content of the review. The specific process of extracting destination sequences is illustrated in Figure 3. For example, assuming set D has five popular destinations, then and are the contents of reviews i and j (Figure 3a,b). For each destination, we first apply the text matching algorithm to identify the number of appearances of each destination in each content, where the figures in the brackets represent the total number of appearances for the corresponding destination in and ( Figure 3c). However, tourists may visit destination a and mention other destinations in their reviews. For example, someone stopped to eat special food in destination b on their way to destination a and may write this experience in their review. In this case, b can be considered an affiliated destination. In general, the number of affiliated destinations mentioned in reviews is often very small, especially if the tourist does not intend to visit these destinations. Therefore, a threshold parameter is

Extracting the Destination Sequences from Online Travel Reviews
This section introduces the process of extracting the destination sequences for each tourist from their online review. Destination set D = {d 1 , d 2 , . . . , d n }, where d i represents the name of the destination, and n is the number of popular destinations in Yunnan. Review set R = {r 1 , r 2 , . . . , r l }, r j = t j , m j , d j , c j , where l is the number of total reviews, and t j , m j , d j , and c j represent the corresponding title, month, days, and content of the review. The specific process of extracting destination sequences is illustrated in Figure 3. For example, assuming set D has five popular destinations, then c i and c j are the contents of reviews i and j (Figure 3a,b). For each destination, we first apply the text matching algorithm to identify the number of appearances of each destination in each content, where the figures in the brackets represent the total number of appearances for the corresponding destination in c i and c j (Figure 3c). However, tourists may visit destination a and mention other destinations in their reviews. For example, someone stopped to eat special food in destination b on their way to destination a and may write this experience in their review. In this case, b can be considered an affiliated destination. In general, the number of affiliated destinations mentioned in reviews is often very small, especially if the tourist does not intend to visit these destinations. Therefore, a threshold parameter is used to filter destinations with a small number of mentions and mitigate this issue. In Figure 3, destinations with a number Sustainability 2021, 13, 4720 6 of 15 of appearances less than 2 are excluded to generate the ultimate destination sequences s i and s j (Figure 3d). In this manner, we can extract the destination sequences for each travel review in set R. used to filter destinations with a small number of mentions and mitigate this issue. In Figure 3, destinations with a number of appearances less than 2 are excluded to generate the ultimate destination sequences and ( Figure 3d). In this manner, we can extract the destination sequences for each travel review in set R.

Mining Association Rules of Tourist Destinations
Association rule learning, which is widely used to identify the frequently purchased combination among commodities from transaction databases, can discover interesting relationships among variables in large databases. For example, rule {A, B} ⇒ {C} indicates that if a customer buys products A and B, they are more likely to buy product C. Currently, association rules have been applied to tourism research to uncover the travel patterns of tourists. Li et al. (2010) incorporated both positive and negative association rules into understanding the HongKong residents' outbound travel characteristics [44]. Lee et al. (2013) applied the clustering and association rules to mine the areas of attraction and their associative patterns [45]. Versichele et al. (2014) present an empirical study on the mining of association rules in tourist attraction visits by using Bluetooth tracking data [46]. Qi and Wong (2014) adopted Apriori algorithm association rules mining to segment Macau's tourists and to predict tourists' preferences for the different local heritage attractions [47]. In addition, the association rule technique could also be utilized to develop a tourism recommendation system [48]. Therefore, it is feasible to transfer the association rule data mining technique to uncover the frequent tourist patterns.
An associate rule can be represented as X ⇒ Y, where X, Y ⊂ I and X ∩ Y = ∅, where X and Y are the left-hand side (LHS) and right-hand side (RHS), respectively, and I represents the item set. The associate rule indicates that if item X appears in a transaction, then item Y may appear in the same transaction with a certain probability. In this study, destination set D is the item set, and the transaction database is the extracted destination sequence set S = { 1 , 2 . . }, where ⊆ D represents the destination sequence that tourist i has visited during the tour ( is unordered among destinations in the sequence).
Three indicators are used to compare the effectiveness of association rules, namely, support, confidence, and lift. For a destination association rule ⇒ , the three indicators can be calculated as follows:

Mining Association Rules of Tourist Destinations
Association rule learning, which is widely used to identify the frequently purchased combination among commodities from transaction databases, can discover interesting relationships among variables in large databases. For example, rule {A, B} ⇒ {C} indicates that if a customer buys products A and B, they are more likely to buy product C. Currently, association rules have been applied to tourism research to uncover the travel patterns of tourists. Li et al. (2010) incorporated both positive and negative association rules into understanding the HongKong residents' outbound travel characteristics [44]. Lee et al. (2013) applied the clustering and association rules to mine the areas of attraction and their associative patterns [45]. Versichele et al. (2014) present an empirical study on the mining of association rules in tourist attraction visits by using Bluetooth tracking data [46]. Qi and Wong (2014) adopted Apriori algorithm association rules mining to segment Macau's tourists and to predict tourists' preferences for the different local heritage attractions [47]. In addition, the association rule technique could also be utilized to develop a tourism recommendation system [48]. Therefore, it is feasible to transfer the association rule data mining technique to uncover the frequent tourist patterns.
An associate rule can be represented as X ⇒ Y , where X, Y ⊂ I and X ∩ Y = ∅, where X and Y are the left-hand side (LHS) and right-hand side (RHS), respectively, and I represents the item set. The associate rule indicates that if item X appears in a transaction, then item Y may appear in the same transaction with a certain probability. In this study, destination set D is the item set, and the transaction database is the extracted destination sequence set S = {s 1 , s 2 ..s m }, where s i ⊆ D represents the destination sequence that tourist i has visited during the tour (s i is unordered among destinations in the sequence).
Three indicators are used to compare the effectiveness of association rules, namely, support, confidence, and lift. For a destination association rule d i ⇒ d j , the three indicators can be calculated as follows: Equation (1) shows that the support of a rule reflects how frequently the destination union set of LHS and RHS appears in the total extracted destination sequences; the confidence of a rule indicates how frequently the destination union set of LHS and RHS appears in the destination sequences that contain LHS, and the lift of a rule is the ratio of the expected support if LHS and RHS are independent. Furthermore, support, confidence, Sustainability 2021, 13, 4720 7 of 15 and lift quantify the significance, accuracy, and representativeness of the rule, respectively. Figure 4 presents an example of a calculation of the support, confidence, and lift of rule a ⇒ b . A rule is considered a strong association rule if the support, confidence, and lift are greater than the user-defined minimum thresholds min support , min con f idence , and min li f t .
Equation (1) shows that the support of a rule reflects how frequently the destination union set of LHS and RHS appears in the total extracted destination sequences; the confidence of a rule indicates how frequently the destination union set of LHS and RHS appears in the destination sequences that contain LHS, and the lift of a rule is the ratio of the expected support if LHS and RHS are independent. Furthermore, support, confidence, and lift quantify the significance, accuracy, and representativeness of the rule, respectively. Figure 4 presents an example of a calculation of the support, confidence, and lift of rule

General Statistical Analysis
A total of 66 popular destinations and 12,752 travel reviews were collected from Ctrip. We excluded reviews that have no text description (i.e., photos only) because they are inappropriate for extracting destination sequences. A total of 7875 reviews were assessed and included in set R. Combining with set D, R will be used to extract the destination sequences using the process in Section 3.2 to generate the destination sequence set S.
In accordance with the generated S, statistical analysis is conducted involving D, where the popularity is defined as p = / , represents the number of sequences containing destination d in S, and n is the total number of sequences (n = 7875). We then sort the destinations according to their popularity. Figure 5 shows the spatial distribution and statistical order of destinations according to their popularity. Table 1 lists the top 15 popular destinations. The top five popular destinations are Lijiang, Dali, Kunming, Xianggelila, and Luguhu, which are the main destinations that attract tourists in Yunnan. This result is consistent with Mafengwo, which is another travel social service platform in China, which demonstrates that the online reviews can be utilized to understand the characteristics of tourist destinations. We also investigate the temporal characteristics of the destination sequences. Figure  6 displays the statistical percentage of sequences in months, indicating which month the tourists are more likely to visit Yunnan. Results show decentralization from January to

General Statistical Analysis
A total of 66 popular destinations and 12,752 travel reviews were collected from Ctrip. We excluded reviews that have no text description (i.e., photos only) because they are inappropriate for extracting destination sequences. A total of 7875 reviews were assessed and included in set R. Combining with set D, R will be used to extract the destination sequences using the process in Section 3.2 to generate the destination sequence set S.
In accordance with the generated S, statistical analysis is conducted involving D, where the popularity is defined as p = n d /n, n d represents the number of sequences containing destination d in S, and n is the total number of sequences (n = 7875). We then sort the destinations according to their popularity. Figure 5 shows the spatial distribution and statistical order of destinations according to their popularity. Table 1 lists the top 15 popular destinations. The top five popular destinations are Lijiang, Dali, Kunming, Xianggelila, and Luguhu, which are the main destinations that attract tourists in Yunnan. This result is consistent with Mafengwo, which is another travel social service platform in China, which demonstrates that the online reviews can be utilized to understand the characteristics of tourist destinations. We also investigate the temporal characteristics of the destination sequences. Figure 6 displays the statistical percentage of sequences in months, indicating which month the tourists are more likely to visit Yunnan. Results show decentralization from January to December. One possible reason is that the climate is comfortable for tourists to travel any time of the year. July has the maximum proportion of sequences, which is summer vacation in China. Students and parents with children choose to travel to Yunnan during this period. October has the second-largest percentage, which may be due to the Chinese National Days (1-7 October) when a majority of workers take a vacation and plan a trip.
December. One possible reason is that the climate is comfortable for tourists to travel any time of the year. July has the maximum proportion of sequences, which is summer vacation in China. Students and parents with children choose to travel to Yunnan during this period. October has the second-largest percentage, which may be due to the Chinese National Days (1-7 October) when a majority of workers take a vacation and plan a trip.   The results show that approximately 90% of tourists spend a maximum of ten days in Yunnan. The number of tourists who travel for only one day accounts for more than 20%. This percentage may be problematic because we find that some tourists who only stay for one day have reviewed several destinations. Moreover, based on our knowledge of Yunnan, visiting several destinations in the province in one day is impractical for tourists; therefore, this finding might contain some errors. The number of tourists who spend 5, 6, December. One possible reason is that the climate is comfortable for tourists to travel any time of the year. July has the maximum proportion of sequences, which is summer vacation in China. Students and parents with children choose to travel to Yunnan during this period. October has the second-largest percentage, which may be due to the Chinese National Days (1-7 October) when a majority of workers take a vacation and plan a trip.   The results show that approximately 90% of tourists spend a maximum of ten days in Yunnan. The number of tourists who travel for only one day accounts for more than 20%. This percentage may be problematic because we find that some tourists who only stay for one day have reviewed several destinations. Moreover, based on our knowledge of Yunnan, visiting several destinations in the province in one day is impractical for tourists; therefore, this finding might contain some errors. The number of tourists who spend 5, 6,  The results show that approximately 90% of tourists spend a maximum of ten days in Yunnan. The number of tourists who travel for only one day accounts for more than 20%. This percentage may be problematic because we find that some tourists who only stay for one day have reviewed several destinations. Moreover, based on our knowledge of Yunnan, visiting several destinations in the province in one day is impractical for tourists; therefore, this finding might contain some errors. The number of tourists who spend 5, 6, and 7 days in the province is higher than on other days. We also calculate the distribution of tourists according to the number of days spent and number of destinations (Figure 8). We exclude the data for one day to reduce the errors for the following analysis. The findings reveal that most people are more likely to spend less than 10 days visiting five or fewer destinations in Yunnan. Moreover, more than 36% of tourists are willing to spend several days staying in one destination only. This preference may be attributed to the following reasons: (1) a Sustainability 2021, 13, 4720 9 of 15 popular destination usually covers multiple attractive scenic spots, and tourists have to schedule several days to travel to these attractions; and (2) some destinations, such as Lijiang and Dali, are famous for their slow, leisurely, lazy life pace, thereby attracting people who live in large cities to relax and escape from the hustle and bustle of urban life. Therefore, among the tourists who stay in one destination when visiting Yunnan, more than 55% prefer Lijiang or Dali for their destination (Figure 9). and 7 days in the province is higher than on other days. We also calculate the distribution of tourists according to the number of days spent and number of destinations (Figure 8). We exclude the data for one day to reduce the errors for the following analysis. The findings reveal that most people are more likely to spend less than 10 days visiting five or fewer destinations in Yunnan. Moreover, more than 36% of tourists are willing to spend several days staying in one destination only. This preference may be attributed to the following reasons: (1) a popular destination usually covers multiple attractive scenic spots, and tourists have to schedule several days to travel to these attractions; and (2) some destinations, such as Lijiang and Dali, are famous for their slow, leisurely, lazy life pace, thereby attracting people who live in large cities to relax and escape from the hustle and bustle of urban life. Therefore, among the tourists who stay in one destination when visiting Yunnan, more than 55% prefer Lijiang or Dali for their destination (Figure 9).   and 7 days in the province is higher than on other days. We also calculate the distribution of tourists according to the number of days spent and number of destinations (Figure 8). We exclude the data for one day to reduce the errors for the following analysis. The findings reveal that most people are more likely to spend less than 10 days visiting five or fewer destinations in Yunnan. Moreover, more than 36% of tourists are willing to spend several days staying in one destination only. This preference may be attributed to the following reasons: (1) a popular destination usually covers multiple attractive scenic spots, and tourists have to schedule several days to travel to these attractions; and (2) some destinations, such as Lijiang and Dali, are famous for their slow, leisurely, lazy life pace, thereby attracting people who live in large cities to relax and escape from the hustle and bustle of urban life. Therefore, among the tourists who stay in one destination when visiting Yunnan, more than 55% prefer Lijiang or Dali for their destination (Figure 9).

Association Rules among Tourist Destinations
In this section, the classical Apriori algorithm is utilized to mine the strong association rules among the popular tourist destinations from set S. The 4110 sequences containing two or more destinations are used to perform association analysis because sequences

Association Rules among Tourist Destinations
In this section, the classical Apriori algorithm is utilized to mine the strong association rules among the popular tourist destinations from set S. The 4110 sequences containing two or more destinations are used to perform association analysis because sequences involving only one destination cannot be used to analyze the relationship among destinations. The minimum confidence min con f idence = 0.6, and the minimum lift min li f t > 1.0. Minimum support is difficult to determine because it depends on the characteristics of the database, and users usually set it through trial and error. Referring to the study of Vu et al. (2017), we set the minimum support min support = 0. The top 31 association rules are selected to discuss the travel behavior of tourists in Yunnan. We classify these rules into groups according to the number of destinations in LHS. Table 2 shows the association rules with one destination in LHS. Several relatively strong association rules are identified between Lijiang and other five destinations (r [1][2][3][4][5]. Regarding the support of rules, many sequences containing Dali and Lijiang (support = 50.5%), indicating that more than half of the tourists plan to visit the two destinations during a tour. Regarding confidence, if travelers intend to visit one of the five destinations (Dali, Kunming, Luguhu, Xianggelila, and Deqin), the probability (more than 80% confidence) that Lijiang will be included in their tour is high, especially in Luguhu and Xianggelila with a confidence of 98.2% and 93.3%, respectively. One possible explanation for this high confidence is the convenience of the road connecting Lijiang and Luguhu or Xianggelila and the short distance between them as compared with destinations, such as Dali or Kunming (Figure 5a). In addition, a strong association exists between Dali and Kunming, indicating that some travelers intend to visit the two destinations in their travel.  Table 3 displays the 13 association rules with two destinations in LHS. Dali, Lijiang, and Kunming are more likely to be included in a tour by travelers (support = 30.2%), that is, if travelers choose to visit any two destinations among the three, they are likely to visit the remaining one (r 1-3 ). Rule r 4-7 shows that travelers will visit Lijiang if they visited Dali and Luguhu/Xianggelila or Luguhu and Kunming/Xianggelila (confidence = 100%). If travelers plan to visit Kunming and Luguhu/Xianggelila, they are more than 70% likely to travel to Dali during their journeys (r [8][9] ). Meanwhile, travelers who visit Dali and Luguhu/Xianggelila are likely to visit Kunming (r [10][11] ). For rule r 12-13 covering Lijiang, Xianggelila, and Deqin, it is apparent that Lijiang or Xianggelila is more likely to be considered as a stop when travelers plan to visit the other two destinations.
For the association rules with three destinations in LHS (Table 4), eight strong rules are found from the sequences. Travelers have a high probability of visiting Kunming if they plan to visit Lijiang, Dali, and Luguhu/Xianggelila (r 1-2 ). Dali (r 3-4 ) is likely to be a stop during the journeys if travelers travel to one of the two destination combinations (Lijiang, Luguhu, and Kunming; Lijiang, Kunming, and Xianggelila). In addition, rules r 5-8 demonstrate that a very high chance (approximately 100% confidence) that Lijiang will be visited when travelers plan to visit one of the four destination combinations (Dali, Luguhu, and Kunming; Dali, Kunming, and Xianggelila; Luguhu, Kunming, and Xianggelila; Dali, Luguhu, and Xianggelila). Moreover, all rules contain Lijiang, indicating that Lijiang is an indispensable destination when travelers intend to visit four or more destinations. Three association rules have four destinations in LHS and one in RHS (Table 5). Only 6.4% of the sequences contain the top five popular destinations. Lijiang is the destination that travelers are sure to visit. Kunming (confidence = 75.2%) or Dali (confidence = 69.8%) is more likely to be visited if travelers plan to visit the other four destinations (r 1-2 ).

Discussion
With the rapid development of information and technology, user-generated data has experienced explosive growth in various fields. Based on these user-generated data, datadriven innovation become available and has led to the emergence and development of some new products and business models in the digital market [49]. In tourism, various user-generated data have been used to understand tourist travel patterns and the service quality of tourism, which help exploit new tourism products and improving management efficiency [25,34]. This study addresses this line and attempts to discover spatial frequent association rules among popular destinations from user-generated textual online reviews.
Based on the analysis of the results, some characteristics can be drawn as follows. First, based on the support of the association rules, it can be concluded that Lijiang, Dali, Luguhu, Xianggelila, and Kunming are the top five popular destinations in Yunnan. These destinations, especially Lijiang (which is an indispensable destination in Yunnan), are considered by most of the travelers who have not been to these places. From the perspective of spatial distribution, four out of these five popular destinations are mainly located in the northwest of Yunnan, forming the overall characteristics "dense in the west and sparse in the east, dense in the north and sparse in the south", which is consistent with previous research results [50]. In addition, the strong correlation among the five major tourist destinations further indicates a spatial monopoly in important destinations [51]. Second, the distance between destinations and traffic accessibility are key factors that affect travel plans, as illustrated by the rules in Table 2 (i.e., the closer the distance between the destinations, the larger the confidence of the rules). For example, Lijiang, Luguhu, and Xianggelila are three destinations with relatively close distances, and the rules containing these destinations usually have high confidence. Moreover, the traffic conditions from Lijiang to the other two destinations are convenient. Therefore, travelers usually include the three destinations in their journeys. In the context of all-for-one tourism in China, the first task that must be executed to improve the role of other destinations in Yunnan is to break the monopoly of popular destinations. This goal can be achieved by developing advanced transport facilities and networks to establish branch connections between the popular destinations and their nearby destinations.
Specifically, one main contribution of this study towards the existing literature is to propose a methodological workflow to extract spatial association characteristics among popular destinations from these unstructured textual travel reviews. Although this paper takes Yunnan province, China as a case study to demonstrate the feasibility of the workflow, the proposed method could be extended to other areas. Moreover, it is not limited to spatial scale, which means that it is feasible to utilize the method to analyze the association characteristics of attractions within a city, or quantify the connection of cities within a country. Currently, the travel social service websites have become an important part of tourists, from planning their journeys before the tour to updating their comments or experience after the tour. Therefore, it is very convenient to access this unstructured textual UGC to investigate the travel behavior of tourists. Most previous literature utilizes this type of data to extract meaningful knowledge based on the view of tourism marketing, e.g., analyzing destination image, evaluating tourists' satisfaction about hotels or tourism products. This study attempts to discover useful geographical knowledge from the unstructured textual reviews. Although it is effortless to extract spatial movement of tourists using geo-related tracking datasets such as mobile phone data and social media data (Flickr and Twitter), these datasets are inaccessible for most tourism researchers in China. Furthermore, some social media datasets do not include all of the places that tourists visit during their tours. For example, someone visits a destination but does not post a message on the social media application, then the destination would possibly be ignored in the analysis. Generally, the online travel reviews are available from public travel service websites, and record the destinations and attractions as well as tourists' experience in detail. Therefore, this study makes a new attempt to employ unstructured UGC data for understanding the geographical movement patterns of tourists. It demonstrates that although no location information is available in travel reviews, reviews can also be used as a resource to investigate tourist movement patterns, thereby providing a new method to study the spatial movement of tourists and their destination association. Therefore, the proposed method helps enrich data analysis technique in the field of tourism, and infer spatial associative characteristics among popular destination from textual description generated by visitors.

Conclusions
The association rules of tourist destinations quantify the possibility of tourists visiting a destination when they have traveled to one or more different destinations. Therefore, a deep knowledge of the travel behavior of tourists and the association of destinations can provide insights into how tourists schedule their journeys during the tour, thereby helping managers or industries to take effective measures for improving their services and meeting the demands of tourists. Currently, UGC big data from tourism-related social websites and apps offer great opportunities to examine the movement patterns of tourists from an unprecedented perspective. Based on previous studies that utilized geo-tagged UGC data to understand frequent movement patterns of destinations, we derive the association of destinations using unstructured online textual travel reviews. Yunnan province is used for the case study. We collect travel reviews from the website of a public travel service and propose an extraction process for destination sequences from these reviews. In addition, we identify association rules using the Apriori algorithm. Results show that some popular destinations and frequent association rules among destinations in Yunnan can be uncovered using unstructured textual travel reviews.
Multi-destination tours have become a popular travel mode. Thus, examining the association among destinations can grasp the overall characteristics of destinations in a given area. An understanding of the relationship among tourist destinations could generate some potential implications for governmental agencies and tourist industries. Based on these association rules information, tourist administrative staffs could make corresponding traffic strategies such as setting up extra trains or special trains between these destinations with a high association. Tourist industries could develop some new tourist products or tourist routes by integrating customers' time and interests. In addition, these online travel service agents could enrich their recommendation system, such as recommending the next destinations when visitors are sightseeing other destinations according to these association rules. Therefore, it is useful for extracting the association rules among popular tourist destinations, improving tourist experience, and developing a sustainable smart tourism industry.
However, one main limitation of this UGC data is the lack of attribute information, such as income, age, and preferences. These attributes can be utilized to reveal the influencing factors of travel behavior. Therefore, further studies can combine the UGC data with traditional survey and geo-tagged UGC data to understand tourist movement patterns and their potential influencing factors.