Intelligence in Tourist Destinations Management: Improved Attention-based Gated Recurrent Unit Model for Accurate Tourist Flow Forecasting

Accurate tourist flow forecasting is an important issue in tourist destinations management. Given the influence of various factors on varying degrees, tourist flow with strong nonlinear characteristics is difficult to forecast accurately. In this study, a deep learning method, namely, Gated Recurrent Unit (GRU) is used for the first time for tourist flow forecasting. GRU captures long-term dependencies efficiently. However, GRU’s ability to pay attention to the characteristics of sub-windows within different related factors is insufficient. Therefore, this study proposes an improved attention mechanism with a horizontal weighting method based on related factors importance. This improved attention mechanism is introduced to the encoding–decoding framework and combined with GRU. A competitive random search is also used to generate the optimal parameter combination at the attention layer. In addition, we validate the application of web search index and climate comfort in prediction. This study utilizes the tourist flow of the famous Huangshan Scenic Area in China as the research subject. Experimental results show that compared with other basic models, the proposed Improved Attention-based Gated Recurrent Unit (IA-GRU) model that includes web search index and climate comfort has better prediction abilities that can provide a more reliable basis for tourist destinations management.


Introduction
Since the 2000s, the tourism industry in China has significantly increased given the rapid development of the Chinese economy. According to statistics, the number of inbound and domestic tourists in China is increasing annually, and the tourism industry is developing rapidly [1]. Especially during the peak months, the surge in the number of tourists has brought a series of problems to tourist destinations management, including unreasonable allocation of resources in tourist attractions and congestion of tourists. Therefore, accurate tourist flow forecasting is essential for tourist destination management.
However, daily tourist flow presents a complicated nonlinear characteristic because of the effects of various factors in varying degrees. Its complicated nonlinearity makes it difficult for existing methods to deal with the issue in an exact manner. Although accurate tourist flow forecasting remains to pay different attention to sub-window features within different related factors. To solve such a problem, this study changes the vertical weighting method based on time importance and proposes an improved attention mechanism, which has horizontal weighting method based on factor importance. This improved attention mechanism is trained with CRS and then combined with GRU to increase the degree of attention of GRU to various related factors.

Web Search Index and Climate Comfort in Tourist Flow Forecasting
In the study of tourist flow forecasting, the prediction accuracy of the model is affected by the model itself and various factors. Moreover, existing studies have not considered other factors, such as web search index and climate comfort. Given the rapid development of the internet, people can easily search for information through search engines. At present, scholars believe that web search is an important and advanced way to obtain timely data and useful information [31]. When using the internet to search for data, the search history reflects the content of interest and the behavior that follows. In recent years, web search data has provided a new source of data and analytical basis for scientific research, which have been verified by researchers. For example, Choi and Varian have found in their research on Hong Kong's tourist flow forecasting that the autoregressive model including Google search trends can improve prediction accuracy by at least 5% [32]. Prosper and Ryan have found that Google search trends can significantly improve forecasting results in the tourist flow forecasting studies on five tourist destinations in the Caribbean [33]. Similarly, Önder and Gunter have found in the study of tourist flow forecasting in major European cities that the Google search index can effectively improve the forecasting effect [34]. In view of various meteorological factors affecting tourist flow, climate comfort has gradually become a hot spot and the focus of tourism-related research. Li et al. have found in the tourism research on Hong Kong and other 13 major cities in China that climate comfort has a significant positive impact on the tourist flow in the mainland [35]. Chen et al. have included climate comfort in the model of Huangshan's tourist flow forecasting and obtained better prediction results, indicating that climate comfort is significantly correlated with daily tourist flow [36]. Therefore, in this study, the web search index and climate comfort are used as important factors in tourist flow forecasting.
This study aims to propose a tourist flow forecasting method based on web search index, climate comfort, and Improved Attention-based GRU (IA-GRU) model. The CRS and attention mechanism are used to optimize the attention layer. Then, the selected web search index and climate comfort are added to improve the forecasting effect of IA-GRU model. The results of this study demonstrate the effectiveness of this model. The remaining parts of this paper are organized as follows. Section 2 introduces the basic principles of LSTM, GRU, attention mechanism, and IA-GRU model. Section 3 presents the data-processing methods using Huangshan Scenic Area as the research subject. Section 4 discusses the prediction performance of the proposed model and the comparison with other basic models. Section 5 provides the conclusion.

Methods
In this section, we discuss the structure of LSTM and GRU and explain why GRU is chosen as the prediction model. Then, we provide the outline for the attention mechanism and the details of the IA-GRU model. In addition, we train the IA-GRU model with a collaborative mechanism, which combines the attention mechanisms with CRS.

LSTM (GRU's Precursor)
LSTM is first proposed for language models in 1997 [15]. LSTM is an RNN with a special structure, which has the advantage of being able to deal with the long-term dependency of time series. LSTM can solve the problem of gradient disappearance caused by RNN after multi-level network propagation.
As shown in Figure 1, one tanh layer σ and three layers exist inside LSTM. The three layers correspond to three gates, namely, the forget gate, input gate, and output gate. The role of the horizontal Sustainability 2020, 12, 1390 4 of 20 line is to pass c t−1 to c t . The three gates and horizontal lines work together to complete the filtering and transmission processes of input information.
Sustainability 2020, 12, 1390 4 of 20 output gate. The role of the horizontal line is to pass t 1 c − to t c . The three gates and horizontal lines work together to complete the filtering and transmission processes of input information. We denote the input time series as t x , hidden state cells as t s , and output sequence asŷ . LSTM neural networks perform the computation as follows: where σ and tanh are activation functions applied to the internal structure of LSTM. Equations (8) and (9) are their calculation processes, respectively, wherein σ stands for the standard sigmoid function. The output value of the σ layer is between 0 and 1. The output value determines whether the input information can pass through the gate. A value of zero signifies "let nothing through," whereas a value of one means "let everything through!" f , i , o , and c respectively denote the mentioned inner-cell gates, namely, the forget gate, input gate, output gate, and cell activation vectors, c should be equal to the hidden vector s . The w terms denote weight matrices, whereas the b terms denote bias terms. The input gate can determine how incoming vectors t x alter the state We denote the input time series as x t , hidden state cells as s t , and output sequence asŷ. LSTM neural networks perform the computation as follows: where σ and tanh are activation functions applied to the internal structure of LSTM. Equations (8) and (9) are their calculation processes, respectively, wherein σ stands for the standard sigmoid function. The output value of the σ layer is between 0 and 1. The output value determines whether the input information can pass through the gate. A value of zero signifies "let nothing through," whereas a value of one means "let everything through!" f , i, o, and c respectively denote the mentioned inner-cell gates, namely, the forget gate, input gate, output gate, and cell activation vectors, c should be equal to the hidden vector s. The w terms denote weight matrices, whereas the b terms denote bias terms. The input gate can determine how incoming vectors x t alter the state of the memory cell. The output gate can allow the memory cell to have an effect on the outputs. Then, the forget gate allows the cell to remember or forget its previous state.

GRU
Given the complex structure of LSTM, the training of LSTM RNN often takes a lot of time. To improve the training speed and capture long-term dependencies efficiently, GRU is proposed as an improved model of LSTM in 2014 due to its simple structure and easy training [16]. The structure of the GRU is shown in Figure 2. Unlike LSTM, GRU has only two gates, namely, reset gate r and update gate z. the forget gate allows the cell to remember or forget its previous state.

GRU
Given the complex structure of LSTM, the training of LSTM RNN often takes a lot of time. To improve the training speed and capture long-term dependencies efficiently, GRU is proposed as an improved model of LSTM in 2014 due to its simple structure and easy training [16]. The structure of the GRU is shown in Figure 2. Unlike LSTM, GRU has only two gates, namely, reset gate r and update gate z . We denote the input time series as t x , hidden state cells as t s , and output sequence asŷ . GRU neural networks perform the computation as follows: We denote the input time series as x t , hidden state cells as s t , and output sequence asŷ. GRU neural networks perform the computation as follows: The reset gate r determines the proportion of the output state s t−1 at the previous moment in the new hidden state s t . The new hidden state s t is obtained by performing a nonlinear transformation of the tanh activation function on the output state at the previous moment s t−1 and the input at the current moment x t . The update gate z mainly affects the proportion of the current hidden state s t in the new hidden state s t . Concurrently, the update gate z controls the proportion of s t and s t−1 in the final output state s t .

Attention Mechanism
When observing a scene in the real world, humans typically focus on certain fixation point at first glance. Their attention is always concentrated on a certain part of the focus. The human visual attention Sustainability 2020, 12, 1390 6 of 20 mechanism is a model of resource allocation, providing different attention to different areas. The attention mechanism is proposed by imitating the principle of human vision's attention distribution; then, the attention mechanism is combined with the encode-decode framework [22] to complete the process of attention change. The focus of this study is to assign different attention weights to different related factors through the attention mechanism and continuously optimize the weights to improve the prediction effect of the model. The attention mechanism attached to the encode-decode framework is exhibited in Figure 3 (see Section 2.4 for details).
hidden state t s in the new hidden state t s  . Concurrently, the update gate z controls the proportion of t s  and t 1 s − in the final output state t s .

Attention Mechanism
When observing a scene in the real world, humans typically focus on certain fixation point at first glance. Their attention is always concentrated on a certain part of the focus. The human visual attention mechanism is a model of resource allocation, providing different attention to different areas. The attention mechanism is proposed by imitating the principle of human vision's attention distribution; then, the attention mechanism is combined with the encode-decode framework [22] to complete the process of attention change. The focus of this study is to assign different attention weights to different related factors through the attention mechanism and continuously optimize the weights to improve the prediction effect of the model. The attention mechanism attached to the encode-decode framework is exhibited in Figure 3 (see Section 2.4 for details).

IA-GRU Model
When the GRU model constructs tourist flow forecasting, the entire training process is continuously learning and memorizing the effects of various related factors on the target value. The proposed IA-GRU model in this study combines the attention mechanism with GRU. On the basis of the ability of GRU to handle time series prediction problems, the IA-GRU model assigns different attention weights to different related factors and continuously optimizes them to improve the learning and generalization abilities to achieve the purpose of improving the prediction effect of the model. Figure 4 consists of two parts. The left part shows the process of GRU modeling and training, whereas the right part shows the process of attention mechanism with CRS to optimize attention weights. In the left part, we obtain the input data. Then, we add an attention layer to GRU on the basis of how human vision processes input information through the attention mechanism. The introduced attention mechanism can quantitatively attach weights to related factors with different  Figure 3 illustrates how the attention mechanism is combined with the encode-decode framework. We attach attention weights W 1 ,W 2 , . . . ,W n to input variables x 1 ,x 2 , . . . ,x n , transform the attention weights into intermediate semantics C 1 ,C 2 , . . . ,C n through encoding, and then transform the intermediate semantics into new attention weights W 1 ,W 2 , . . . ,W n through decoding.

IA-GRU Model
When the GRU model constructs tourist flow forecasting, the entire training process is continuously learning and memorizing the effects of various related factors on the target value. The proposed IA-GRU model in this study combines the attention mechanism with GRU. On the basis of the ability of GRU to handle time series prediction problems, the IA-GRU model assigns different attention weights to different related factors and continuously optimizes them to improve the learning and generalization abilities to achieve the purpose of improving the prediction effect of the model. Figure 4 consists of two parts. The left part shows the process of GRU modeling and training, whereas the right part shows the process of attention mechanism with CRS to optimize attention weights. In the left part, we obtain the input data. Then, we add an attention layer to GRU on the basis of how human vision processes input information through the attention mechanism. The introduced attention mechanism can quantitatively attach weights to related factors with different importance to avoid distraction. Finally, the weighted input data is sent to the GRU layers to obtain the predicted value. The prediction error is also calculated. Moreover, the predicted errors are regarded as feedback and sent to direct the process of optimizing attention weights. In the right part, the randomly generated attention weights set is binary encoded. Then, the champion attention weights subset is selected according to the error feedback, and the new attention weights are reconstructed. Finally, the updated attention weights set is decoded, whereas the optimal attention weights are sent to the attention layer.
the predicted value. The prediction error is also calculated. Moreover, the predicted errors are regarded as feedback and sent to direct the process of optimizing attention weights. In the right part, the randomly generated attention weights set is binary encoded. Then, the champion attention weights subset is selected according to the error feedback, and the new attention weights are reconstructed. Finally, the updated attention weights set is decoded, whereas the optimal attention weights are sent to the attention layer. The concrete steps of the modeling and training of GRU and the combined attention mechanism and CRS used to optimize the attention weights are presented as follows: (1) Modeling and training of GRU Step 1: n columns of input data are obtained, corresponding to n related factors: ( , ,..., ) Step 2: attention weights are defined: ( , ,..., ) Step 3: the input data are weighed at the attention layer: ( , ,..., ) Step 4: t x  is sent to GRU neural networks to acquire the final predicted value.
(2) Combined attention mechanism and CRS optimizing attention weights According to the genetic algorithm, the purpose of CRS is to generate the optimal parameter combination in the attention layer. The process of CRS is elaborated in Figure 4. CRS comprises four parts, which are introduced as follows: In Figure 4, attention weights set W is provided in "A," while being translated into B W through binary code in "B." The subset i W denotes attention weights and is transferred into GRU neural networks in the left part, wherein it produces a corresponding loss value according to a The concrete steps of the modeling and training of GRU and the combined attention mechanism and CRS used to optimize the attention weights are presented as follows: (1) Modeling and training of GRU Step 1: n columns of input data are obtained, corresponding to n related factors: Step 2: attention weights are defined: Step 3: the input data are weighed at the attention layer: Step 4: x t is sent to GRU neural networks to acquire the final predicted value.
(2) Combined attention mechanism and CRS optimizing attention weights According to the genetic algorithm, the purpose of CRS is to generate the optimal parameter combination in the attention layer. The process of CRS is elaborated in Figure 4. CRS comprises four parts, which are introduced as follows: In Figure 4, attention weights set W is provided in "A," while being translated into W B through binary code in "B." The subset W i denotes attention weights and is transferred into GRU neural networks in the left part, wherein it produces a corresponding loss value according to a predicted error in the networks. Then, the champion attention weights subsets W B i and W B i are selected according to the loss of W B in "C," and its subset combination is traversed repeatedly. Finally, a new attention weight W B k is rebuilt in "D." The three random operators in the dotted box are introduced to illustrate how W B k is rebuilt. The concrete steps of CRS are stated as follows: Step 1: attention weights set of size M = 36 are randomly generated: W = (W 1 , W 2 , . . . , W i , . . . W M ).
Step 2: subset W i is sent to the attention layer, whereas W is binary encoded: Step 3: the prediction error is calculated based on the true value y and the predicted valueŷ from the GRU model: Loss(ŷ(W), y).
Step 4: according to the error feedback, the champion attention weights subsets W B i and W B j are selected. Each subset comprises binary strings and is evenly divided into n segments, where n is the number of related factors mentioned in the first part of this section. Correspondingly, W B i and W B j are represented by j , respectively.
Step 5: the segments of W B i and W B j are randomly selected. For instance, segment n-1 of the two is selected, that is, F n−1 i and F n−1 j . However, the number of selected subsections is not fixed. Step 7: a genetic mutation is imitated and the genotype of F n−1 k is reversed. For instance, 0 is reversed to 1. Then, F n−1 k = (0, 0, 0, 1, 1, 0) replaces the corresponding F n−1 Step 8: W B is decoded to acquire the updated attention weights set: Step 9: Steps 2-8 above should be repeated until the preset number of epochs is reached. The entire process from the selection of input data to the training and optimization of the IA-GRU model is shown in Figure 5 (see Section 3 for concrete steps for selecting input data).

Data Preparation
This study takes the Huangshan Scenic Area as an example of a famous Chinese scenic spot. Huangshan is listed as one of UNESCO world natural and cultural heritage sites in 1990. Moreover, Huangshan has been selected as one of the first global geoparks in 2004. A total of 3.38 million local and foreign tourists have visited Huangshan in 2018.

Data Preparation
This study takes the Huangshan Scenic Area as an example of a famous Chinese scenic spot. Huangshan is listed as one of UNESCO world natural and cultural heritage sites in 1990. Moreover,

Huangshan has been selected as one of the first global geoparks in 2004. A total of 3.38 million local and foreign tourists have visited Huangshan in 2018.
In this study, we use the daily historical data of the Huangshan Scenic Area from 2014 to 2018 as experimental data. The data includes (1) Basic data: the total number of tourists in the past, target total number of tourists, number of online bookings, weather, official holiday, and weekends; (2) the Baidu index of keywords: Huangshan, Huangshan weather, Huangshan tourism guide, Huangshan tourism, Huangshan scenic area, Huangshan tickets, Huangshan first-line sky, Anhui Huangshan, Huangshan weather forecast, and Huangshan guide; (3) climate comfort: composed of average temperature, average relative humidity, and average wind speed. The data (1) is obtained from a research project in cooperation with the Huangshan scenic area. The data (2) is obtained from Baidu Index Big Data Sharing Platform (http://index.baidu.com/v2/index.html?from=pinzhuan#/). The data (3) is obtained from the China Meteorological Data Network (http://data.cma.cn/).

Basic Data
To learn as many data features and tourist flow rules as possible, the data from 2014 to 2017 were selected as the training set and the data from 2018 were selected as the test set.
(1) Total number of tourists in the past The total number of tourists in the past is selected as one of the related factors of the prediction model, given that the annual and even daily tourist flow shows a certain regularity. The impact of past tourist flow on the current tourist flow may have a lag period. Thus, a correlation analysis was made between the past total number of tourists with different lag periods and the target total number of tourists. In this study, the maximum lag period was two years. Based on previous experience, the total number of tourists in the past with a correlation index greater than 0.440 was selected as the input variable. Table 1 shows the correlation analysis results with a confidence level of 0.01. Therefore, the total number of tourists in the past with a lag period of one day, two days, and 365 days were selected as the input variables, that is, the total number of tourists yesterday x 1 , the total number of tourists the day before yesterday x 2 , and the total number of tourists on the day of last year x 3 .
(2) Number of online bookings Given the internet's rapid growth, its impact on people's lifestyles is increasingly evident. Online operations are becoming persistently convenient. To a great extent, the number of online bookings can reflect the trend of the total number of tourists. Therefore, the number of online bookings was selected as the input variable x 4 . (

3) Weather
The weather is the decisive factor for outbound tourism, wherein the weather is either good or bad for tourists. Therefore, the weather is selected as the input variable x 5 in the form of a dummy variable: x 5 = { 0 1 , where 0 represents non-severe weather, such as sunny, cloudy, and drizzle; 1 represents severe weather, such as moderate rain, heavy rain, moderate snow, heavy snow, and blizzard.
(4) Official holiday Traveling on holidays is a common phenomenon. Whenever a holiday arrives, the number of tourists in famous scenic spots consistently soars. Therefore, holidays were selected as input variable x 6 in the form of a dummy variable: where 0 indicates an ordinary day; 1 indicates official holiday. (5) Weekend A week cycle comprises seven days. Monday to Friday is weekdays and Saturday and Sunday are rest days. Going out and traveling on rest days are common phenomena. Therefore, the weekend was selected as the input variable x 7 in the form of a dummy variable: x 7 = { 0 1 , where 0 represents a working day; 1 represents the rest of the day.

Baidu Index of Keywords
Baidu is the largest search engine in China and has the most users. When searching for consumer behavior in China, Baidu search has higher predictive power than Google search [37]. This study selects the keywords that tourists commonly use in the Baidu search engine to search the key analysis object. We search for the keyword "Huangshan" in Baidu's keywords-mining tool (http://stool.chinaz.com/baidu/words.aspx), and find the top-10 keywords related to Huangshan in the top 100 rankings, namely, Huangshan, Huangshan weather, Huangshan tourism guide, Huangshan tourism, Huangshan scenic area, Huangshan tickets, Huangshan first-line sky, Anhui Huangshan, Huangshan weather forecast, and Huangshan guide. Considering that the Baidu search has a lagging effect on tourist flow, a correlation analysis between the above Baidu index of keywords with different lag periods and the target total number of tourists is performed. Similarly, the maximum lag period is two years. Accordingly, the Baidu index of keywords with a correlation index greater than 0.440 is selected as input variable. The analysis results are shown in Tables 2 and 3. As shown in Tables 2 and 3, the Baidu index of keyword with a lag period of two days has the highest correlation with the actual total number of tourists. Thus, we chose the Baidu index of Huangshan, Huangshan tourism guide, Anhui Huangshan, and Huangshan guide with a lag period of two days as input variables x 8 , x 9 , x 10 , and x 11 , respectively.

Climate Comfort
Climate comfort is an important environmental factor that affects tourists' travel. Therefore, we select climate comfort as the input variable x 12 . Climate comfort is the climatic condition in which people can maintain the normal physiological process and feel comfortable without any help of heat and cold [38]. The degree of human comfort is closely related to meteorological conditions, which are the comprehensive feeling of temperature, humidity, and wind in the meteorological field. The equation for calculating climate comfort is presented as follows [39]: In Equation (15), t is the average temperature ( • C), h is the average relative humidity (%), v is the average wind speed (m/s), and Q is the climate comfort.

Building IA-GRU Model
In the IA-GRU model, except for the attention layer, parameters used in GRU neural networks can be learned by standard backpropagation through time algorithm with mean squared error as the objective function. According to previous experience and the results of repeated experiments, the IA-GRU model has six layers, namely, one attention layer, four GRU layers, and one fully connected layer. Moreover, the total number of neurons in four GRU layers was 128, 64, 64, and 32, respectively. The activation function of the fully connected layer is the Scaled Exponential Linear Units function. The number of training epochs in GRU layers is 500, and the mini-batch size of the training dataset is 64. According to the results of trial and error, the number of epochs in CRS was 15. Therefore, the IA-GRU model has been established in this study.

Results and Discussion
The proposed IA-GRU model is compared with some basic models, such as Back Propagation Neural Network (BPNN), LSTM, GRU, Attention-LSTM (A-LSTM), and Attention-GRU (A-GRU). The dataset of the IA-GRU model and basic models includes basic data x 1 -x 7 , Baidu index of keywords (1-4 keywords) x 8 -x 11 , and climate comfort x 12 . To evaluate the predictive performance of each model, we choose the average absolute percentage error (MAPE) and the correlation coefficient (R) as the evaluation indicators. MAPE represents the prediction error, and R represents the degree of correlation between the predicted value and the true value. The smaller the MAPE, the smaller the deviation between the predicted value and the true value; the closer R is to 1, the higher the degree of correlation between the predicted value and the true value. The equations are presented as follows: In the equations, y i represents the true value andŷ i represents the predicted value.
In this section, we apply the data x 1 -x 12 to IA-GRU model and basic models to testify the validity of the IA-GRU model on tourist flow forecasting. The overall experimental results are shown in Table 4, and the daily true and predicted values are shown in Figure 6. IA-GRU model with MAPE was lower than the others and R is higher than the others, which signifies that the prediction effect of the IA-GRU model is better than the abovementioned basic models. With regard to MAPE, IA-GRU was 7.77% lower than BPNN, in which MAPE was the highest. With regard to R, IA-GRU was 0.0299 higher than BPNN, in which R was the lowest. Furthermore, by comparing IA-GRU with A-GRU, we found that the former had a lower MAPE and a higher R, which indicates that the improved attention mechanism proposed in this study played a significant role. By comparing GRU with LSTM or comparing A-GRU with A-LSTM, the results show that the prediction effect of GRU was better than that of LSTM. However, R of the A-GRU model is lower than the R of the A-LSTM model. BPNN was not chosen in this area given that its MAPE was too high for forecasting at 28.58%. In summary, the IA-GRU model had the best prediction effect.   To further verify the prediction effect of the IA-GRU model, the comparative experiments in this study were divided into four categories, that is, the prediction results of different models including datasets 1, 2, 3, and 4. Dataset 1 contains basic data, dataset 2 contains basic data and Baidu index of keywords (4 keywords), dataset 3 contains basic data and climate comfort, and dataset 4 contains basic data, Baidu index of keywords, and climate comfort. Dataset 4 is the dataset mentioned in the previous paragraph.
Tables 5-9 exhibit the experimental results of different models including datasets 1-4. The increase in keywords in Tables 6-7 is in accordance with the correlation index in Section 3.3 from high to low. The results show that more keywords make the prediction more accurate. As shown in Tables 5-9, the prediction effect of IA-GRU model is better than the other basic models on each dataset, wherein the predicted value had a higher correlation with the real value. In the four datasets, the IA-GRU model had the lowest MAPE on dataset 4 and the highest R, which signifies that the Baidu index of keywords and climate comfort further improve prediction accuracy.  To further verify the prediction effect of the IA-GRU model, the comparative experiments in this study were divided into four categories, that is, the prediction results of different models including datasets 1, 2, 3, and 4. Dataset 1 contains basic data, dataset 2 contains basic data and Baidu index of keywords (4 keywords), dataset 3 contains basic data and climate comfort, and dataset 4 contains basic data, Baidu index of keywords, and climate comfort. Dataset 4 is the dataset mentioned in the previous paragraph.
Tables 5-9 exhibit the experimental results of different models including datasets 1-4. The increase in keywords in Tables 6 and 7 is in accordance with the correlation index in Section 3.3 from high to low. The results show that more keywords make the prediction more accurate. As shown in Tables 5-9, the prediction effect of IA-GRU model is better than the other basic models on each dataset, wherein the predicted value had a higher correlation with the real value. In the four datasets, the IA-GRU model had the lowest MAPE on dataset 4 and the highest R, which signifies that the Baidu index of keywords and climate comfort further improve prediction accuracy. To further analyze the prediction accuracy of the IA-GRU model, we performed a monthly analysis of the experimental results of different models using dataset 4, as shown in Tables 10 and 11. As shown in Table 10, the annual average error of the IA-GRU model was lower than that of basic models, whereas the error of all models from May to October is lower than the annual average error. In May, June, and July, the error of the IA-GRU model was lower than that of basic models. All models exhibited high errors in January, February, March, April, and December. One of the reasons may be that they were in the off-peak period. Moreover, the actual value is small, which is likely to cause high numerical deviations. Overall, the IA-GRU model is relatively stable. In February, April, May, June, July, and November, the IA-GRU model had the lowest error. Although the prediction of the IA-GRU model was not the best in other months of the year, it was not the worst. For example, the error in January was 8.48% higher than the minimum error, but 10.44% lower than the maximum error. The errors in March, September, and October were close to the minimum error, wherein the gap between the error and the minimum error in these three months is less than 2%. The performance of IA-GRU model in December was similar to that in January. IA-GRU model had the largest error in August, which was 2.22% higher than the lowest error. In August, the gap between the two errors is relatively small. The reason may be that a certain gap exists in the correlation index of the Baidu index between the training and test sets. Through correlation analysis, as shown in Table 12, we find that the correlation index of the Baidu index of the training set is low, but the correlation index of the Baidu index of the test set is high, which may cause a certain bias in the feature learning of the Baidu index. As shown in Table 11, the R of the IA-GRU model was the highest in January, February, April, May, and November, and the R from January to December was greater than 0.95. Such values are closely related to the annual average R. Thus, the IA-GRU model was relatively stable, compared to other models. In summary, the proposed IA-GRU model based on the Baidu index and climate comfort can effectively improve the accuracy of tourist flow forecasting. Moreover, the model proposed in this study is generally better than other basic models, proving the effectiveness of the model in tourist flow forecasting.

Conclusions
This study proposes IA-GRU model trained with CRS for accurate tourist flow forecasting. Tourism is an important part of the local, national, and global economies. Thus, good predictive models are becoming increasingly valuable in tourist destinations management. First, this study is the first to apply GRU in the field of tourist flow forecasting, wherein an attention layer is added into GRU neural networks. Then, an improved attention mechanism that weighs different related factors is proposed. Finally, the improved attention mechanism is combined with GRU, and CRS is used to generate the optimal parameter combination at the attention layer. As a result, the IA-GRU model captures long-term dependencies and increases the degree of attention that GRU pays to the characteristics of sub-windows in different related factors. Concurrently, this study also explores the application of the Baidu index and climate comfort in prediction models. In selecting the Baidu index of keywords, Baidu's keywords-mining tools and correlation analysis methods are used to screen out relevant keywords with a large correlation index. In synthesizing climatic comfort, the comprehensive sensation of temperature, humidity, and wind speed in the meteorological field is considered, and the corresponding climatic comfort is calculated. This study takes the famous Huangshan Scenic Area as an example to verify the effectiveness of the IA-GRU model with the Baidu index and climate comfort in tourist flow forecasting. The experimental results prove that the IA-GRU model with the Baidu index and climate comfort has higher prediction accuracy in tourist flow forecasting than basic models. Thus, the proposed model can help the administration department in managing the scenic area efficiently. Although this study has certain limitations, it remains worthy of further study in the future. For example, a more detailed method of dividing weather dummy variables, a more accurate method of keywords selection, and a more accurate method of climate comfort calculation can be explored in future studies. In general, the proposed IA-GRU model is highly suitable for tourist flow forecasting. Overall, the proposed model provides a significant reference for tourist destinations management and a new perspective for related research.