Next Article in Journal
Methodological Framework for Integrating Cultural Impact in Sustainability Assessments of Cultural Events
Next Article in Special Issue
Sustainable Recovery in Health Tourism: Managerial Insights from a Mediterranean Destination during the COVID-19 Pandemic
Previous Article in Journal
Positive Effect of Biochar Application on Soil Properties: Solubility and Speciation of Heavy Metals in Non-Acidic Contaminated Soils near a Steel Metallurgical Plant in Southeastern Europe
Previous Article in Special Issue
Sustaining the Character of Coastal “Sea Change” Destinations in a Post-Pandemic World
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Appropriate Search Engine Data for Interval Tourism Demand Forecasting Responding a Public Crisis in Macao: A Combined Bayesian Model

School of Economics and Management, China University of Mining and Technology, Xuzhou 221116, China
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(16), 6892; https://doi.org/10.3390/su16166892
Submission received: 15 July 2024 / Revised: 1 August 2024 / Accepted: 5 August 2024 / Published: 11 August 2024
(This article belongs to the Special Issue Tourism Industry Recovery after COVID-19)

Abstract

:
Public crises can bring unprecedented damage to the tourism industry and challenges to tourism demand forecasting, which is essential for crisis management and sustainable development. Existing studies mainly focused on point forecasts, but point forecasts may not be enough for the uncertain environments of public crises. This study proposes a combined Bayesian interval tourism demand forecasting model based on a forgetting curve. Moreover, considering tourists’ travel plans may be adjusted due to changing crisis situations, the choice of search engine data for forecasting tourism demand is investigated and incorporated into the proposed model to yield reliable results. Through an empirical study, this study figures out that the Baidu Index had better tourism predictive capabilities before the public crisis, whereas the Google Index effectively captured short-term fluctuations of tourism demand within the crisis period. The results also indicate that integrating both Baidu and Google Index data obtains the best prediction performance after the crisis outbreak. Our main contribution is that this study can generate flexible forecasting results in the interval form, which can effectively handle uncertainties in practice and formulate control measures for practitioners. Another novelty is successfully discovering how to select appropriate search engine data to improve the performance of tourism demand forecasts across different stages of a public crisis, thus benefiting daily operations and crisis management in the tourism sector.

1. Introduction

In the epoch of globalization, the advent of public crises invariably poses significant impediments to the socio-economic progression and sustainable development of affected locales. Various regions have grappled with the profound repercussions of such crises in recent years [1], for example, extreme climatic events, such as the forest fires in Australia during 2019–2020 which serve as a poignant illustration, causing economic damage estimated in the tens of billions of dollars; or an infectious outbreak, an epidemic situation represented by COVID-19, which has proliferated globally, exacting a heavy toll in terms of human lives, and economic expenditure in the tourism sector [2].
In the wake of public crises, it is essential to develop approaches to decrease the impacts of these incidents and to foresee upcoming public and societal demands, which are keys for the restoration of regional and national stability. To navigate the post-crisis resurgence, many scholars and policymakers have gravitated towards diverse forecasting models to furnish reference of offering strategic guidance. The mainstream observed prediction sectors include areas like the hospitality industry, travel and tourism, and the airline sector. For tourism forecasting, mainstream methods include the ARIMA model, which well models seasonal and trend variations in tourism demand [3]. Machine learning techniques, which can manifest commendable efficacy, have become popular in forecasting tourism demand [4,5]. Based on the above methods, strategies can be predominantly tailored for tourism demand prediction.
Amid public crises, it is imperative for professionals to navigate both the mitigation of these crises and the sustenance of the industry’s dynamism [6]. Decision-makers should strategically incorporate the ongoing situation and the public’s opinions into their planning. In light of these challenges, interval forecasting may offer a more tailored solution, considering its suitability for unexpected and urgent situations. Existing research has mainly focused on point forecasts as a single value is easier to understand [7]. The bootstrapping technique is commonly used to create intervals for estimating tourism demand [8]. However, the bootstrapping sampling process becomes more complex when dealing with time series data, due to the high risk of disrupting the inherent correlations. Thus, this study turns to Bayesian methods, which do not require resampling, yielding good results with small sample data sizes [9]. Existing research shows that the predictive ability of combined forecasting models is better than that of single forecasting models [10]. Some researchers pointed out that Gaussian process regression (GPR) uses a Bayesian inference framework, which is suitable for processing small sample data [11,12]. Multilayer perceptron (MLP) is believed to possess non-linear mapping ability with flexible forecasting performance [13]. Long Short-Term Memory and DenseFlipout (LSTM+Dense) is capable of managing sequential data, enabling it maintain stability when facing uncertainties [14]. Therefore, this study combines three different models, namely GPR, Bayesian Multilayer Perceptron (Bayesian-MLP), and LSTM+Dense. In order to reasonably determine the weights of these models, a new entropy weight method based on the forgetting curve is proposed.
Integrating a combined method with historical tourism data is normally used in tourism forecasting research, but it might not respond well to public crises. For example, COVID-19 had widespread and fluctuating impacts on the international tourism industry during different stages of the epidemic [15]. With the rapid development of the Internet, an increasing amount of tourism-relevant information can be obtained through search engines. Existing studies have proved that search engine data can improve the performance of forecast models and better reflect fluctuations in demand [16], making it contribute to better capturing how different stages of public crises affect tourism demand. In addition, if there are restrictions or other negative policies, it also can be reflected in the search trends. The most commonly used search engine data are Google Trends and Baidu Index [4,17]. However, it is hard to directly know which source will better fit the tourism demand forecast in the different stages of public crises.
Based on the above discussions, this study intends to verify and incorporate certain search engine data to predict tourism demand when facing the impacts of public crises. In order to make the results more valuable in crisis applications, the tourism forecasting results will be presented in the form of intervals. Due to the lack of historical data, a Bayesian method is adopted [18]. In order to further strengthen the forecast ability of the model, a novel interval forecasting model is proposed, which combines three Bayesian models based on an entropy weight method improved by the forgetting curve. Google Trends and Baidu Index are introduced to verify which one will improve performance of the developed forecasting model under the impact of different stages of crises. The proposed model is applied to forecast tourism demand in Macao. The comparative analysis with other models shows that the combined model could generate reliable forecast intervals, and demonstrates which search engine data better fit certain periods of the crisis.

2. Literature Review

2.1. Tourism Demand and Public Crisis

The tourism industry is prone to be affected by various external public crises. An economic crisis could affect the spending power of tourists. For example, the occupancy of Hong Kong’s high-spending hotels dropped significantly during the global economic crisis [19]. Moreover, terrorist attacks greatly affect consumers’ enthusiasm for destinations [20]. Pandemics, such as COVID-19, are also unpredictable factors that have a great impact on the tourism industry. Since the 21st century, the tourism industry has also been hit by epidemics. The SARS outbreak in 2002 had a significant impact on the tourism industry in the Pacific Rim, and the COVID-19 outbreak in 2019 had also affected the tourism industry all over the world. The impact of public crises affects policies and economies, and further negatively affects tourism demand [21]. Existing research studied the impact of public crises on tourism demand with the provision of valuable strategic support for future crisis management. For practitioners, understanding how tourism forecasts alter during public crises may be more valuable [22].

2.2. Tourism Demand Forecasting Techniques

Destination marketing organizations (DMOs) normally are required to utilize marketing tools to promote tourism products and services. For example, Thomas Cook and British Airways utilized virtual reality to simulate the travel experiences of potential tourists to tourism destinations, which promotes tourism demand and market return. Given the devastating impacts of fluctuating crisis situations, it is crucial for DMOs to accurately estimate tourism demand at target destinations [23]. Current tourism forecasting techniques can be divided into two directions: point forecasting and interval forecasting. Regarding the point forecasting methods used in tourism demand, these methods can be roughly divided into three categories, namely time series models, econometric models, and artificial intelligence (AI) models [24,25]. Empirical evidence shows that basic time series models and econometric models are not as effective as AI models [26]. The development of AI technology has driven its applications in the tourism industry. One advantage of AI models is that they can better capture nonlinear relationships, which are one of the characteristics of tourism data. AI models for tourism forecasting include artificial neural networks (ANNs), support vector regression (SVR), and MLP [27]. MLP, one of the most popular forecasting methods, has good nonlinear mapping ability and forecasting performance [13]. Recently, Li and Xu [28] used LSTM to predict tourism demand; however, comparison results indicate that LSTM might not be suitable for long-term forecasts with regard to emergencies. LSTM+Dense exhibits outstanding performance in handling sequential data, especially when coupled with the DenseFlipout method, allowing it to maintain stability in the face of uncertainties [14].
Compared to interval forecasting methods, point forecasts could not model the situation with regard to uncertainties. Uncertainties may be caused by input variables or crises in the tourism industry [29]. Interval forecast methods can solve the above shortcomings and can estimate uncertainties and provide practitioners with greater confidence in specifying strategies. There were relatively few studies focused on interval forecasting compared with point forecasting [7]. The bootstrapping technique is commonly used in tourism demand interval forecasting [8]. However, it is argued that the bootstrapping sampling process is complex to deal with. Especially, forecasting methods that can capture the impact of public crises, such as the bootstrap method, are likely to destroy the information hidden in data about the shift of the crisis. Thus, some researchers introduced GPR to tourism demand forecasting and found it worked better than other kernel-based models [12,30]. GPR uses a Bayesian inference framework, which is proven to be suitable for processing small data sample sizes [9,11]. As a combined forecasting model is more reliable than a single one [31], it is essential to explore interval forecasting of tourism demand based on a combined Bayesian model in order to respond to public crises.

2.3. Tourism Demand Forecasting with Search Engine Data

In recent years, many studies used Internet data to optimize the performance of tourism forecasting models, and most of them incorporated search engine data [4,32]. It is proven that search engine data can better reflect potential tourism needs and greatly improve forecasting performance [33]. Google Trends is one type of search engine data widely used in existing studies to reflect travel intentions of tourists from overseas markets [3]. The research of [34] illustrated the adoption of Google Trends in offering more accurate predictions of tourism demand in Italy. Yang et al. [35] utilized a lasso method integrated with Google search queries as predictors, highlighting the potential of Google Trends data in tourism forecasting during unprecedented events. As a prevalent search engine with the largest market share in China, Baidu Index is widely used in reflecting tourism intention in China [28]. Xue et al. [16] used Baidu Index to explore how search engine data enhance the forecasting performance of attraction tourist volume in Beijing. Li and Xu [28] compared models like ARIMA and SARIMAX in the context of tourism demand forecasting in Hong Kong using Baidu’s search engine data, and proved that search engine data play a positive role in forecasting tourism demand. Hu et al. [36] used Baidu Index to demonstrate the effectiveness of inserting search engine data into tourism demand forecasting for Sanya.
As for the pandemic situations, Hu et al. [3] demonstrated how to forecast the recovery of European markets in the tourism sector using Google Trends data for tourism recovery post the shock experienced in the pandemic. You et al. [37] investigated that the incorporation of Google Trends into a nonparametric mixed-frequency VAR model outperforms the traditional ones in tourism demand forecasting in the context of COVID-19. Some scholars attempted to integrate Google Trends with Baidu Index to study the forecast of tourism demand. For example, Li et al. [17] adopted both Google Trends and Baidu Index search engine data to enhance the performance of a deep learning approach for tourism volume forecasting. Though research exists combining the two sources of search engine data, seldom have studies focused on identifying which source of search engine data can improve tourist demand prediction performance, and how to select appropriate search engine data to better fit different crisis situations. It is necessary to study how to incorporate search engine data in a more effective way for the forecast of tourist demand in response to public crisis impact.

2.4. Research Gaps and Advantages

Based on the above literature review, this study identifies several research gaps. First, numerous existing studies utilize point methods to forecast tourism demand, which may not fully capture the inherent uncertainties in tourist arrivals. On the other hand, it is crucial to explore interval forecasts that can provide a range of potential outcomes, enabling decision-makers to formulate more adaptable strategies based on these broader data insights rather than relying solely on point forecasts.
Secondly, most existing research relies on a single data source for forecasting, yet different search engines cater to specific situations. Hence, integrating appropriate data from different search engines could potentially yield more accurate forecasts, as they would match different stages of a crisis.
Lastly, during a crisis, the situation often evolves rapidly, making recent data disproportionately influential in forecasting models. A forecasting approach that accounts for time-sensitive effects is needed to enhance the performance of predictions.
In the light of these research gaps, the proposed method has the following advantages:
Search Engine Data Integration: We investigate how different search engines perform during different stages of a crisis and whether combining their data can enhance prediction performance. Our study well understands the contributions of search engine data and potential synergies when these data are merged with tourism data.
Interval Forecasting in Tourism: To address the limitations of point forecasting, this study proposes a combined tourism interval forecasting method based on Bayesian models and an entropy weight method. This method also takes the effect of time-series into consideration to highlight the disproportionate influence of data. Moreover, it also will provide decision-makers with interval results, allowing for more flexible and suitable management in the face of uncertainties during crises.

3. Methodology

This section introduces the forecasting framework of this study with three phases, namely data collection and preprocessing, constructing the combined model, and performance evaluation.

3.1. Problem Description and the Proposed Framework

This paper takes the COVID-19 pandemic, the most widespread public crisis event in recent years, as an example to determine which search engine data can predict better based on the proposed method. Then, by incorporating different search engine data, it would be shown that different performance of tourism demand forecasting can be obtained across different stages of the epidemic. Using technology process keywords people search for on the Internet, we can find the people’s attitude towards tourism. Due to the rapid spread of the epidemic, the wide range of its influence, and the large degree of disaster, when facing a public crisis, predicting a precise numerical value may not be suitable to the crisis case. It has better practical significance to give experts an interval forecast result of the previous and next periods. Therefore, how to select a data source from the appropriate search engine to improve the performance of measuring tourist demand is key to helping companies and governments restore social vitality.
This paper proposes a combined model based on the improved entropy by incorporating forgetting curves, which contains GPR, Bayesian MLP, and LSTM+Dense for interval forecasting. As the Internet broadens the channels for people to obtain travel information, search engine data can better reflect users’ travel intentions, and better capture the impact of emergencies than solely using historical data. Therefore, this paper incorporates search engine data into the combined model, expecting to forecast tourist arrivals more accurately. The proposed framework of this study is presented in Figure 1.

3.2. Data Collection and Processing

Generally, monthly tourist arrival data for a region can be obtained from the statistics of the tourism department. In addition, in order to accurately forecast tourists’ travel intentions and the impact of COVID-19, search engine data are obtained by querying keyword popularity. Baidu is a search engine with a market share of more than 80% in mainland China, thus it might better reflect the travel intentions of mainland Chinese tourists. Google occupies more than 90% of the global market. In this study, it mainly reflects travel intentions of tourists in overseas markets. Both Baidu and Google offer functions of querying the popularity of keywords and related keywords recommendation. Therefore, based on a few initial keywords, monthly search data of multiple keywords can be obtained. Generally, the collected search engine indicators need to be further filtered, because some indicators may not be closely relevant to historical data. The Pearson correlation coefficient is used as the criterion for judging the correlation.
Because the historical data and the search engine data are different in order of magnitude and dimension, the input data are normalized:
x = x x min x max x min .
The output of the combined model, that is, tourist arrivals, is of a large order of magnitude. Since they are not suitable for the proposed model, logarithmic processing is performed on the output data:
y = ln ( y ) .
Without loss of generality, the historical data and search engine data of the t-th month are used in this paper to forecast tourist arrivals of (t + 1)-th month.

3.3. Constructing Combined Model

In the existing research literature, Bayesian methods have been extensively used in a variety of predictive models. This study specifically selects GPR, Bayesian-MLP, and LSTM+Dense as the basic models. The reasons are as follows: Firstly, GPR, due to its inherent non-parametric nature, demonstrates significant strengths when dealing with complex data structures. Bayesian-MLP combines the powerful representation ability of neural networks with the uncertainty measurement of Bayesian inference, thereby offering more robust predictions [38]. Meanwhile, LSTM+Dense exhibits outstanding performance in handling sequential data, especially when coupled with the DenseFlipout method, allowing it to maintain stability in the face of uncertainty [39]. We believe that by integrating these three methods, it is possible to harness the unique advantages of each model, and, through a weighting mechanism, produce more reliable and accurate prediction intervals. In the following sections, we will delve into a detailed discussion of these three models and their application contexts in this study.

3.3.1. Single Model

GPR

GPR is different from linear regression which learns the exact value, as it learns the probability distribution of the predicted value y. The idea of GPR is to estimate an unknown function under noisy observations at a given number of points. Given a training set D = { ( x i , y i ) | i = 1 , 2 , , N } , the input vector is represented by x and the output is represented by y. Suppose y is generated in the following way [30]:
y = f ( x ) + ε ,
where ε is Gaussian noise with a mean value of 0 and a covariance of θ 2 .
The input of the test set is represented by x*, and the output is represented by y*. Then, y and y* obey the following joint Gaussian distribution:
P ( y * | x * , x , y ) = P ( y * | x * , θ ) P ( θ | x , y ) d θ .
Therefore, when the forecasting set is inputted and the training set is given, the posterior probability distribution of the predicted value can be obtained.

Bayesian-MLP

Given the training set D = ( x i , y i ) , the neural network can be regarded as a probabilistic model P ( y | x , w ) , where w is the weight parameter. For regression, y is a continuous variable and P ( y | x , w ) is a Gaussian distribution. Correspondingly, the learning of the neural network can be regarded as a maximum likelihood estimation [40].
w M L E = arg max w   log P ( D | w ) = arg max w   i log P ( y i | x i , w ) .
Bayesian estimation introduces an a priori hypothesis to w . The goal is to find the distribution of w , so that such a probability model is obtained as follows:
P ( y * | x * , D ) = P ( y * | x * , w ) P ( w | D ) d w ,
where:
P ( w | D ) = P ( w ) P ( D | w ) P ( D ) ,
P ( y * | x * ) = E P ( w | D ) [ P ( y * | x * , w ) ] ,
where P ( w | D ) is the posterior distribution, P ( D | w ) is the likelihood function, and P ( D ) is the marginal likelihood. The above equations are equivalent to averaging the forecasts of a group of neural networks and weighting the posterior probability of its parameter w .
It is impossible to obtain the expectation because it is equivalent to using an ensemble of an infinite number of neural networks. In addition, the analytical solution obtained from the posterior distribution is also untraceable. One solution is to use variational inference to approximate the Bayesian posterior distribution. The idea of variational inference is optimization, by finding the closest one to the true posterior distribution in a family of variational distributions [41]. More specifically, the Kullback–Leibler divergence (KL divergence) is minimized.
θ * = arg min θ K L ( q ( w | θ ) | | P ( w | D ) ) = arg min θ q ( w | θ ) log q ( w | θ ) P ( w ) P ( D | w ) d w = arg min θ K L ( q ( w | θ ) | | P ( w ) ) E q ( w | θ ) log ( P ( D | w ) .
The objective function form of the above formula is:
F ( D , θ ) = K L ( q ( w | D ) | | P ( w ) E q ( w | θ ) log P ( D | w ) .
The first item on the right is our variational posterior and prior KL divergence, also called complexity cost, and the value of the second item, called likelihood cost, depends on the training data. The objective function is also called variational free energy, corresponding to negative variational free energy (known as evidence lower bound, ELBO).
E L B O ( D , θ ) = E q ( w | θ ) log P ( D | w ) K L ( q ( w | D ) | | P ( w ) .
The goal now is to maximize the function of ELBO. According to Bayes’ theorem, it has:
K L ( q ( w | θ ) | | P ( w | D ) = E q ( w | θ ) log q ( w | θ ) P ( D | w ) P ( w ) P ( D ) = K L ( q ( w | θ ) | | P ( w ) ) E q ( w | θ ) log P ( D | w ) + log P ( D ) .
Therefore, ELBO can be rewritten as:
E L B O ( D , θ ) = E q ( w | θ ) log P ( w ) + E q ( w | θ ) log P ( D | w ) E q ( w | θ ) log q ( w | θ ) .
Now, each term on the right side of the equation is about the expectation of the variational distributions. Consequently, the objective function can be approximated by sampling the variational distributions:
F ( D | θ ) 1 N i = 1 N [ log q ( w i | θ ) log p ( w i ) ] 1 N i = 1 N log P ( D | w i ) .

LSTM+Dense

LSTM is a variant of Recurrent Neural Networks (RNNs) that can learn long-term dependency problems [42]. LSTM has better performance than an ordinary RNN on a large variety of problems.
All RNNs have a chain form of repeated neural network modules. The LSTM network also has a chain structure, but the interaction between modules is achieved by updating the cell state through the gate mechanism. The gate mechanism of LSTM includes an input gate (g), an output gate (o), and a forget gate (f), which are implemented through sigmoid functions and dot multiplication operations, and will not provide additional information. The specific information flow is as per the following formulas:
f i = σ ( W f x i + W f h i 1 + b f ) c i = f i c i 1 + g i tanh ( W c x i + W c h i 1 + b c ) g i = σ ( W g x i + W g h i 1 + b g ) o i = σ ( W o x i + W o h i 1 + b o ) h i = o i tanh ( c i ) .
The LSTM network alone cannot measure uncertainty. DenseFlipout, a variational densely connected layer, is used to minimize the Kullback–Leibler divergence, which proves the effectiveness of its joint use with the LSTM network [14]. Therefore, in this study, DenseFlipout will be used to introduce uncertainty for LSTM+Dense model.

3.3.2. Model Training

The input of a single interval forecast model is preprocessed historical data and search engine data, and the output is data related to the number of arrivals in the next month. After the optimal hyperparameter combination is determined, the single interval forecast model can be trained with a training set. The obtained model then can be used for accurate forecasting. Especially, the single interval forecasting model was first trained with data that were not affected by a public crisis and then performed an ex post facto forecast. Afterwards, the single interval forecasting model was retrained with data that were affected by a public crisis.

3.3.3. Obtain Forecast Interval Result

Because the weighting of the Bayesian model is a probability distribution of random variables, the output is a single value that changes every moment. In order to generate prediction intervals, the trained model should repeat the forecast 100 times. Through calculating the mean and standard derivation, prediction intervals with a certain confidence level of each model can be generated.

3.3.4. Combined Prediction Interval Generation

In order to obtain the combined prediction interval, the weights of each model need to be decided. In this study, an entropy weight method based on the forgetting curve is adopted to determine the weight of each model.
The relative error is obtained after normalizing the error. The proportion of the relative error of the d-th method in the j-th forecast point is:
p d j = | e d j | d = 1 3 | e d j |   ( d = 1 , 2 , 3 ) .
This study focuses on the forecast of time series data. Generally speaking, p d j closer to the time period of the test set can better reflect the prediction ability of the forecasting model. Based on the forgetting curve [43], p d j of each observation point is multiplied by a time weight coefficient, and the time-corrected p d j t can be obtained as:
p d j t = t f a c j p d j   and   t f a c j = e ( t T ) ,
where T is the time interval on the training set, t j is the interval from the time point of p d j to the first time point of the test set. The entropy value corresponding to each model is:
H d = k j = 1 n p d j t ln p d j t .
Then, the weight of each model can be obtained by the following formula:
W d = 1 H d d = 1 3 ( 1 H d ) .
The upper and lower bounds can be obtained by:
y u = d = 1 3 W d y u d   and   y l = d = 1 3 W d y l d .
The algorithm used in this study is as follows (Algorithm 1):
Algorithm 1: Tourism arrivals forecast interval generation
Input: Historical data, search engine data, untrained Bayesian-MLP model, GPR model, and LSTM+Dense model.
Output: Upper and lower bounds.
Step 1: Use Pearson’s correlation analysis to filter out keywords with low correlation.
Step 2: Divide the data without impact into trainset1 and testset1 by 9:1 and data of the last eight months are selected as testset2 and testset1 and the remaining data are selected as trainset2 for the data affected by the emergency.
Step 3: Normalize input variables and logarithmize output variables.
Step 4: Use trainset1 to train three models, then make forecasts on testset1, and perform the same operation on trainset2 and testset2.
Step 5: Calculate the mean and standard deviation of point forecasting results to obtain the upper and lower bounds of each model for a certain confidence level.
Step 6: Use the Formulas (16)–(20) to determine the model weights based on the loss on the trainsets, and generate a combined upper and lower bound.
Step 7: Output upper and lower bounds.

3.4. Performance Evaluation

The purpose of this study is to generate reliable prediction intervals. Thus, it is important that the real data should fall into the prediction intervals as much as possible. The first indicator is Prediction Interval Coverage Probability (PICP), defined as follows [44]:
m i = { 1 y l i y i y u i 0 e l s e .
P I C P = i = 1 N m i N .
The larger the value of PICP means, the higher the probability that the real data will fall within the forecast interval will be. The disadvantage of PICP is that as long as the forecast interval is wide enough, the PICP value can reach 1, but the interval may not have reference value. Therefore, a second indicator is proposed, which is Mean Prediction Interval Width (MPIW) [45]:
M P I W = 1 N i N y u i y l i .
The smaller the MPIW is, the narrower the generated forecast interval will be. The forecast result with a large PICP and small MPIW is the best. However, as mentioned before, a larger PICP can lead to a wider forecast interval, so there is a conflict between PICP and MPIW. A comprehensive index Loss is used to evaluate the interval forecast model, where a control parameter λ is added to balance the importance between width and coverage. Without loss of generality, λ is set 0.5 in the following.
M P I W c a p t = 1 N i N ( y u i y l i ) m i .
L o s s = M P I W c a p t + λ N α ( 1 α ) max ( 0 , ( 1 α ) P I C P ) 2
For Bayesian methods, the confidence level usually chosen is 95% or 99% [46]. The greater the confidence level is, the wider the confidence interval will be. In this paper, width of interval is also very important, so the confidence level is set at 95%.

4. Results

According to the World Travel and Tourism Council, the tourism industry directly or indirectly contributed 10.3% of global GDP and provided one-tenth of the world’s jobs in 2019. The global outbreak of COVID-19 caused severe damage to the global economy in an unprecedented way. This study designs a comparative experiment to find the roles different search engines play in forecasting, and in order to better predict tourism demand under the impact of the public crisis, the combined model is shown to make forecasts based on the different sources of collected data. Moreover, the forecasting results also are compared with other single Bayesian models and point forecasting models. Finally, the impact of a public crisis on tourist arrivals is discussed separately.

4.1. Experimental Results

4.1.1. Data Collection and Preprocessing

Macao is taken as the research location in this study, whose map is shown in Figure 2. Data of monthly tourist arrivals in Macao from January 2011 to August 2021 are obtained from Macao Tourism Data Plus (https://dataplus.macaotourism.gov.mo/, (accessed on 1 December 2021)), as shown in Figure 3. From 2011 to 2019, monthly arrivals in Macao generally showed a cyclical growth. Affected by COVID-19, the number of monthly visitors dropped sharply and fell in the first half of 2020, followed by a small rise in the second half of 2020. According to the changes of tourist arrivals in this figure, it implies that visitors present different traveling trends in Macao. Therefore, it is essential to develop a tourism demand forecasting method to well model these trends and changes, which contributes to making advanced planning measures and tourism promotion campaigns for DMOs and governments.
According to the keyword recommendation function of the Baidu search engine, seven keywords included in Baidu Index were identified as presented in Figure 4a. Because Baidu Index only provides daily data, monthly data were calculated by addition. One feature of Google Trends is that data from different regions are available. Therefore, the trend data of Macao’s two main source regions of tourists (Taiwan and Hong Kong) and global trend data were used in this study. Similarly, based on the keyword recommendation function, 12 keywords were identified, as shown in Figure 4b.
To filter more relevant indicators, Pearson correlation coefficient analysis is performed on the collected keyword data and historical data. By calculating the correlation coefficient between historical search data and keywords we reserved, we can filter out terms that are not so closely related to the research purpose. Finally, two Baidu Index keywords and four Google Trends keywords (the bigger keywords in Figure 4a,b) were retained.
After scaling the data, we split them into two distinct datasets. The first dataset comprises data from January 2011 to January 2020, which does not exhibit a clear COVID-19 impact. These data are divided into a training set (trainset1) and a test set (testset1) at a 9:1 ratio, representing data characteristics in the absence of a pandemic shock. Conversely, testset1 combined with data from February 2020 to April 2021 serves as the second training set (trainset2), simulating the scenario where there is a sudden impact from the pandemic. The remaining data are designated as testset2.

4.1.2. Single Model Tuning

The Bayesian-MLP constructed in this experiment has two hidden layers, because it was found that more hidden layers would lead to an obvious over-fitting phenomenon in the preliminary experiment. The ‘units’ of the output layer is set to 1 according to the target output. Similarly, the LSTM+Dense model consists of an LSTM layer, a DenseFlipout layer, and an output layer. Other hyperparameters need to be determined through preliminary experiments. The learning rate, activation function, batch size, and number of units in each layer of these two models need to be determined through pre-experiments. For the GPR model, its learning rate needs to be determined through pre-experiments.
Pre-experiments are performed on the training set data. With the help of sktime’s Expanding Window Splitter, five training sequences with increasing time lengths are generated under the condition that the size of the validation set is consistent with the test set. Because with the same hyperparameters, the Bayesian model may learn different weights, each set of parameters is trained five times, and hyperparameters are selected based on the average performance results. The final selected hyperparameter combinations are {units = 20, batch = 8, learning rate = 0.005, activation function = ReLU} for Bayesian-MLP, {units = 64, 20, batch = 8, learning rate = 0.005, activation function = ReLU} for LSTM+Dense, and {learning rate = 0.01} for GPR.
The forecasting results will be shown in Section 4.2.

4.2. Comparative Analysis

4.2.1. Comparisons with Different Input Variables

To validate the role of search engine data, an experiment is performed on the proposed combined model. The input variables are tested in three cases, namely, “historical data + Baidu Index”, “historical data + Google Trends”, “historical data + both search engines”, and simply “historical data”, respectively. The results are shown in Figure 5, and the evaluation criteria of the comparative experiment are shown in Table 1.
In the first forecasting experiment, all four models produce intervals that fully contain the predicted values. From the width of the generated intervals, the forecasting effect of the Baidu Index model is the best, followed by the combined search engine model. In the second forecasting experiment, only the combined search engine model has a PICI value of 1 and followed by the Google Trends model, while the Baidu Index model and the historical data model tend to produce intervals that are lower than the observed values. Before the COVID-19 outbreak, the arrival of tourists in Macao was cyclical, with mainland China being the main source of tourists. Therefore, the Baidu Index can better reflect the changes in the number of tourists. During the period of the COVID-19 shock, the epidemic situation in the mainland was under control, and the overseas epidemic situation fluctuated greatly. Google Trends can better reflect this volatility.
Compared to the forecasting results using complete search engine data, it is clear that in the second forecasting experiment, more accurate results can be obtained using the combined search engine data. Especially in the second forecasting experiment, combined source search engine data can better handle large fluctuations in tourist arrivals. Therefore, the use of combined source search engine data can effectively improve the performance of the tourist forecast after the crisis outbreak

4.2.2. Comparisons with Other Interval Generation Models

The proposed approach is compared with the three single interval prediction models and ARIMAX. The ARIMAX model performs well in tourism demand forecasting and is often used as a benchmark model. Comparison with three single interval forecasting models can demonstrate whether the combined model improves the predictive power of the forecasting model.
After obtaining the parameter combinations, three models are trained with data in trainset1 and then perform forecasts on testset1. The obtained models are retrained with data in trainset2, and then perform forecasts on testset2. Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 present the generated forecasting intervals of these five models. Table 2 shows the comparative results of PICP, MPIW, MPIW.cap, and Loss evaluation criteria.
According to Table 2 and Figure 11, from the perspective of the comprehensive index Loss, the combined model ranks second in the first forecasting experiment, and ranks first in the second forecasting experiment. It is worth noting that only the combined model and ARIMAX output prediction intervals that fully contain the observations in both forecasts. However, in the second forecasting experiment, the prediction intervals of the combined model are narrower than that of ARIMAX. In the first forecasting experiment, all models output prediction intervals that fully encompass the observations, except Bayesian-MLP. Among them, LSTM+Dense performs the best, generating the narrowest prediction intervals while also containing observations well. But in the second forecasting experiment, the PICP value of LSTM+Dense is only 0.75, ranking second from last. The GPR model outputs forecast intervals with similar width in two forecasts, but has the lowest PICP value in the second forecasting experiment. To conclude, the proposed combined model can provide narrow prediction intervals, containing observation values as much as possible.

4.2.3. Comparisons with Point Forecasting Models

The median value of the prediction interval can be used as a point forecasting value, so the point forecasting result of the proposed model is compared with other point forecasting models and three single forecasting models. ARIMAX and SVR are selected as the benchmark models for point forecasting. The hyperparameters are {kernel = rbf, C = 100, gamma = 0.01} to build the SVR model for the purpose of comparison.
The forecasting results and mean square error (MSE) of the above six models are shown in Figure 12 and Table 3, respectively. In the first forecasting experiment, the point forecasting performance of the proposed combined model is slightly worse than that of several models. It can be found that in the second forecasting experiment, except for the GPR model, the MSE difference between the point forecasting models is not too large. So, the Diebold–Mariano (DM) test is used to compare the prediction performance between the proposed model and the other point forecasting model, and result showed in Table 3. The result shows that there is no significant difference between the proposed model and other point forecasting models. It indicates that the proposed model has better prediction ability when facing the great volatility caused by the crisis.

5. Conclusions

Facing the challenges that public crises brought to tourism demand forecasting, a novel combined interval forecasting model integrated with Bayesian models and an entropy weight method is developed. Furthermore, the selection of search engine data is investigated and inserted into the proposed model to obtain reliable results of tourism demand. By conducting an empirical study, it is proved that Baidu Index performs better tourism prediction before the public crisis, while Google Index well models short-term fluctuations of tourism demand when the crisis occurs. Another finding is that the combination of Baidu and Google Index can contribute to obtaining the best predictive capabilities after the crisis outbreak. One of the contributions of this study is deriving flexible interval forecasting results with a scientifically combined model, which can help practitioners better cope with uncertainties and fluctuations with respect to public crises. Another novelty is successfully figuring out how to choose appropriate search engine data to improve the performance of tourism demand forecasting during different stages of a public crisis, which is beneficial to crisis management in this sector.

5.1. Theoretical Implications

This study mainly makes the following theoretical contributions:
First, existing studies mainly focused on using combined data from one or two search engines together to forecast tourism demand in a stable period [47,48]. This study uses two mainstream search engines, Baidu and Google, to examine how different search engine data affect forecasting performance during different stages of a public crisis. Our findings demonstrate that search engines’ capabilities vary in capturing the trend of tourism demand: Baidu Index accurately reflects travel intentions under normal conditions, while Google Trends more effectively captures fluctuations under the impacts of public crises. This insight underscores the importance of integrating different sources of search engine data, to better reveal the actual tourism situations of crisis scenarios.
Secondly, though various forecasting models for the tourism industry have been explored, mainly divided into point and interval prediction methods [29], most tourism demand forecasting is based on point prediction methods, which normally provide specific values at particular time points, but they offer limited insights with crisp values, overlooking uncertainties in practice [49]. Interval prediction methods, though offering a range of possible outcomes, may fail to capture rapid, short-term changes [2]. This study develops a combined Bayesian model for interval tourism forecasting, designed to generate more reliable prediction intervals. The entropy weight method with the forgetting curve is inserted into the proposed model, which can reasonably capitalize on each model’s strengths, leading to more reliable forecasting results. This model, by assimilating data from two mainstream search engines, also exhibits superior robustness compared to traditional forecasting models, especially fitting and capturing the fluctuations under crisis situations.

5.2. Practical Implications

The findings of this study have significant implications for multiple agents in the tourism industry, such as consumers, DMOs, and governments, particularly in managing public crises like epidemics.
For consumers: The results derived from the proposed model offer tourists more reliable information on potential travel conditions and safety measures. For instance, during high-risk periods as indicated by upper-bound forecasts, tourists can make informed decisions to postpone or alter their travel plans. This was exemplified during the COVID-19 peak in Macao, where accurate predictions helped tourists adjust their plans in response to fluctuating travel conditions.
For DMOs and governments: On the other side, this model serves as a vital tool for DMOs and governments strategizing their operations and tourism promotion campaigns. During times of crisis, using upper-bound data can guide the deployment of preventive measures and the planning of operational adjustments in tourist attractions. Lower-bound data, conversely, can inform strategies during recovery phases, such as government subsidies or collaborative efforts with non-profits to sustain the tourism industry. For DMOs, understanding the interval mean values helps in developing contingency plans and marketing strategies tailored to evolving pandemic situations.
Additionally, the study’s insights on the usage of search engine data highlight the needs for tourist cities like Macao to closely monitor pandemic trends in key tourist-origin regions. This enables the formulation of targeted prevention measures and marketing strategies. The search engine data not only better capture the trend of tourism demand in the context of public crises, but also suggest that shifts in search trends could potentially indicate changes in the tourism industry of tourists’ destinations, such as Macao, offering a novel perspective for tracking industry dynamics. This perspective using the Baidu Index can contribute to capturing and reflecting the dynamics of the tourism industry.

5.3. Limitations and Future Research

In the future, due to the limitations of data collection, multiple sources of data need to be incorporated into the proposed model, such as meteorological and geographic data, to improve the performance of tourism demand forecasting. Moreover, the model proposed in this study is required to be compared with other types of existing hybrid models to further verify the performance of our method. To output more instructive strategies for real tourist destinations, the applications of the proposed method and tourism promotion campaigns to improve tourism demand may be an interesting research topic from both academic and practical perspectives.

Author Contributions

Conceptualization, funding acquisition, methodology, writing—original draft, writing—review and editing, R.-X.N.; investigation, data curation, writing, C.W.; software, validation, visualization, H.-M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the General Project of the Philosophy and Social Sciences Fund for Colleges and Universities in Jiangsu Province (No. 2023SJYB1050).

Data Availability Statement

Data can be found at the link https://dataplus.macaotourism.gov.mo/.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Abram, N.J.; Henley, B.J.; Sen Gupta, A.; Lippmann, T.J.; Clarke, H.; Dowdy, A.J.; Sharples, J.J.; Nolan, R.H.; Zhang, T.; Wooster, M.J. Connections of climate change and variability to large and extreme forest fires in southeast Australia. Commun. Earth Environ. 2021, 2, 1–17. [Google Scholar] [CrossRef]
  2. Gunter, U.; Smeral, E.; Zekan, B. Forecasting tourism in the EU after the COVID-19 crisis. J. Hosp. Tour. Res. 2024, 48, 909–919. [Google Scholar] [CrossRef]
  3. Hu, M.; Li, H.; Song, H.; Li, X.; Law, R. Tourism demand forecasting using tourist-generated online review data. Tour. Manag. 2022, 90, 104490. [Google Scholar] [CrossRef]
  4. Chen, J.; Ying, Z.; Zhang, C.; Balezentis, T. Forecasting tourism demand with search engine data: A hybrid CNN-BiLSTM model based on Boruta feature selection. Inf. Process. Manag. 2024, 61, 103699. [Google Scholar] [CrossRef]
  5. Zhang, D.; Niu, B. Leveraging online reviews for hotel demand forecasting: A deep learning approach. Inf. Process. Manag. 2024, 61, 103527. [Google Scholar] [CrossRef]
  6. Nie, R.; Chin, K.; Tian, Z.; Wang, J.; Zhang, H. Exploring dynamic effects on classifying service quality attributes under the impacts of COVID-19 with evidence from online reviews. Int. J. Contemp. Hosp. Manag. 2023, 35, 159–185. [Google Scholar] [CrossRef]
  7. Wang, J.; Zhang, L.; Liu, Z.; Huang, X. Tourism demand interval forecasting amid COVID-19: A hybrid model with a modified multi-objective optimization algorithm. J. Hosp. Tour. Res. 2023, 10963480221142873. [Google Scholar] [CrossRef]
  8. Song, H.; Wen, L.; Liu, C. Density tourism demand forecasting revisited. Ann. Tour. Res. 2019, 75, 379–392. [Google Scholar] [CrossRef]
  9. Gómez-Déniz, E.; Martel-Escobar, M.; Vázquez-Polo, F. A Bayesian model for online customer reviews data in tourism research: A robust analysis. Cogent Bus. Manag. 2024, 11, 2363592. [Google Scholar] [CrossRef]
  10. Wang, S.; Kang, Y.; Petropoulos, F. Combining probabilistic forecasts of intermittent demand. Eur. J. Oper. Res. 2024, 135, 1038–1048. [Google Scholar] [CrossRef]
  11. Assaf, A.G.; Tsionas, M. Non-parametric regression for hypothesis testing in hospitality and tourism research. Int. J. Hosp. Manag. 2019, 76, 43–47. [Google Scholar] [CrossRef]
  12. Wu, Q.; Law, R.; Xu, X. A sparse Gaussian process regression model for tourism demand forecasting in Hong Kong. Expert Syst. Appl. 2012, 39, 4769–4774. [Google Scholar] [CrossRef]
  13. Panahi, F.; Ehteram, M.; Ahmed, A.N.; Huang, Y.F.; Mosavi, A.; El-Shafie, A. Streamflow prediction with large climate indices using several hybrid multilayer perceptrons and copula Bayesian model averaging. Ecol. Indic. 2021, 133, 108285. [Google Scholar] [CrossRef]
  14. Mathonsi, T.; van Zyl, T.L. A statistics and deep learning hybrid method for multivariate time series forecasting and mortality modeling. Forecasting 2021, 4, 1–25. [Google Scholar] [CrossRef]
  15. Kourentzes, N.; Saayman, A.; Jean-Pierre, P.; Provenzano, D.; Sahli, M.; Seetaram, N.; Volo, S. Visitor arrivals forecasts amid COVID-19: A perspective from the Africa team. Ann. Tour. Res. 2021, 88, 103197. [Google Scholar] [CrossRef] [PubMed]
  16. Xue, G.; Liu, S.; Ren, L.; Gong, D. Forecasting hourly attraction tourist volume with search engine and social media data for decision support. Inf. Process. Manag. 2023, 60, 103399. [Google Scholar] [CrossRef]
  17. Li, M.; Zhang, C.; Sun, S.; Wang, S. A novel deep learning approach for tourism volume forecasting with tourist search data. Int. J. Tour. Res. 2023, 25, 183–197. [Google Scholar] [CrossRef]
  18. Lima, L.M.; Damien, P.; Bunn, D.W. Bayesian predictive distributions for imbalance prices with time-varying factor impacts. IEEE Trans. Power Syst. 2022, 38, 349–357. [Google Scholar] [CrossRef]
  19. Sio-Chong, U.; So, Y. The impacts of financial and non-financial crises on tourism: Evidence from Macao and Hong Kong. Tour. Manag. Perspect. 2020, 33, 100628. [Google Scholar]
  20. Akamavi, R.K.; Ibrahim, F.; Swaray, R. Tourism and troubles: Effects of security threats on the global travel and tourism industry performance. J. Travel Res. 2023, 62, 1755–1800. [Google Scholar] [CrossRef]
  21. Qiu, R.T.; Wu, D.C.; Dropsy, V.; Petit, S.; Pratt, S.; Ohe, Y. Visitor arrivals forecasts amid COVID-19: A perspective from the Asia and Pacific team. Ann. Tour. Res. 2021, 88, 103155. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, H.; Song, H.; Wen, L.; Liu, C. Forecasting tourism recovery amid COVID-19. Ann. Tour. Res. 2021, 87, 103149. [Google Scholar] [CrossRef] [PubMed]
  23. Florido-Benítez, L. Tourism promotion budgets and tourism demand: The Andalusian case. Consum. Behav. Tour. Hosp. 2024, 19, 310–322. [Google Scholar] [CrossRef]
  24. Yu, N.; Chen, J. Design of machine learning algorithm for tourism demand prediction. Comput. Math. Methods Med. 2022, 2022, 6352381. [Google Scholar] [CrossRef] [PubMed]
  25. Jiao, E.X.; Chen, J.L. Tourism forecasting: A review of methodological developments over the last decade. Tour. Econ. 2019, 25, 469–492. [Google Scholar] [CrossRef]
  26. Tellez Gaytan, J.C.; Ateeq, K.; Rafiuddin, A.; Alzoubi, H.M.; Ghazal, T.M.; Ahanger, T.A.; Chaudhary, S.; Viju, G.K. AI-based prediction of capital structure: Performance comparison of ANN SVM and LR models. Comput. Intell. Neurosci. 2022, 2022, 8334927. [Google Scholar] [CrossRef]
  27. Li, X.; Zhang, X.; Zhang, C.; Wang, S. Forecasting tourism demand with a novel robust decomposition and ensemble framework. Expert Syst. Appl. 2024, 236, 121388. [Google Scholar] [CrossRef]
  28. Li, M.; Xu, T. Short and long term tourism demand forecasting based on Baidu search engine data. J. Humanit. Arts Soc. Sci. 2023, 7, 529–539. [Google Scholar] [CrossRef]
  29. Santos, N.; Moreira, C.O. Uncertainty and expectations in Portugal’s tourism activities. Impacts of COVID-19. Res. Glob. 2021, 3, 100071. [Google Scholar] [CrossRef]
  30. Tsang, W.K.; Benoit, D.F. Gaussian processes for daily demand prediction in tourism planning. J. Forecast. 2020, 39, 551–568. [Google Scholar] [CrossRef]
  31. Li, G.; Wu, D.C.; Zhou, M.; Liu, A. The combination of interval forecasts in tourism. Ann. Tour. Res. 2019, 75, 363–378. [Google Scholar] [CrossRef]
  32. Zhang, D.; Wu, P.; Wu, C.; Ngai, E.W.T. Forecasting duty-free shopping demand with multisource data: A deep learning approach. Ann. Oper. Res. 2024, 339, 861–887. [Google Scholar] [CrossRef]
  33. Llewellyn, M.; Ross, G.; Ryan-Saha, J. COVID-era forecasting: Google trends and window and model averaging. Ann. Tour. Res. 2023, 103, 103660. [Google Scholar] [CrossRef]
  34. De Luca, G.; Rosciano, M. Google Trends data and transfer function models to predict tourism demand in Italy. J. Tour. Futures 2024. [Google Scholar] [CrossRef]
  35. Yang, Y.; Fan, Y.; Jiang, L.; Liu, X. Search query and tourism forecasting during the pandemic: When and where can digital footprints be helpful as predictors? Ann. Tour. Res. 2022, 93, 103365. [Google Scholar] [CrossRef]
  36. Hu, T.; Wang, H.; Law, R.; Geng, J. Diverse feature extraction techniques in internet search query to forecast tourism demand: An in-depth comparison. Tour. Manag. Perspect. 2023, 47, 101116. [Google Scholar] [CrossRef]
  37. You, W.; Huang, Y.; Lee, C. Forecasting tourist flows in the COVID-19 era using nonparametric mixed-frequency VARs. J. Forecast. 2024, 43, 473–489. [Google Scholar] [CrossRef]
  38. Zhang, T.; Sun, J.; Trad, D.; Innanen, K. Multilayer perceptron and Bayesian neural network-based elastic implicit full waveform inversion. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4501516. [Google Scholar] [CrossRef]
  39. Yue, X.; Ruiyu, L.; Zhenlin, L.; Li, Z. Attention-based dense LSTM for speech emotion recognition. IEICE Trans. Inf. Syst. 2019, 102, 1426–1429. [Google Scholar]
  40. Liu, Y.; Liu, B. Estimating unknown parameters in uncertain differential equation by maximum likelihood estimation. Soft Comput. 2022, 26, 2773–2780. [Google Scholar] [CrossRef]
  41. Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
  42. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  43. Averell, L.; Heathcote, A. The form of the forgetting curve and the fate of memories. J. Math. Psychol. 2011, 55, 25–35. [Google Scholar] [CrossRef]
  44. Khosravi, A.; Nahavandi, S.; Creighton, D. A prediction interval-based approach to determine optimal structures of neural network metamodels. Expert Syst. Appl. 2010, 37, 2377–2387. [Google Scholar] [CrossRef]
  45. Qiu, Y.; He, Z.; Zhang, W.; Yin, X.; Ni, C. MSGCN-ISTL: A multi-scaled self-attention-enhanced graph convolutional network with improved STL decomposition for probabilistic load forecasting. Expert Syst. Appl. 2024, 238, 121737. [Google Scholar] [CrossRef]
  46. Moore, C.E.; Pan-Ngum, W.; Wijedoru, L.P.; Sona, S.; Nga, T.V.T.; Duy, P.T.; Vinh, P.V.; Chheng, K.; Kumar, V.; Emary, K.; et al. Evaluation of the diagnostic accuracy of a typhoid IgM flow assay for the diagnosis of typhoid fever in Cambodian children using a Bayesian latent class model assuming an imperfect gold standard. Am. J. Trop. Med. Hyg. 2014, 90, 114. [Google Scholar] [CrossRef] [PubMed]
  47. Tian, F.; Yang, Y.; Mao, Z.; Tang, W. Forecasting daily attraction demand using big data from search engines and social media. Int. J. Contemp. Hosp. Manag. 2021, 33, 1950–1976. [Google Scholar] [CrossRef]
  48. Havranek, T.; Zeynalov, A. Forecasting tourist arrivals: Google Trends meets mixed-frequency data. Tour. Econ. 2021, 27, 129–148. [Google Scholar] [CrossRef]
  49. Adu, W.K.; Appiahene, P.; Afrifa, S. VAR, ARIMAX and ARIMA models for nowcasting unemployment rate in Ghana using Google trends. J. Electr. Syst. Inf. Technol. 2023, 10, 12. [Google Scholar] [CrossRef]
Figure 1. The proposed framework of this study.
Figure 1. The proposed framework of this study.
Sustainability 16 06892 g001
Figure 2. A location map of Macao.
Figure 2. A location map of Macao.
Sustainability 16 06892 g002
Figure 3. Historical data of yearly tourist arrivals in Macao.
Figure 3. Historical data of yearly tourist arrivals in Macao.
Sustainability 16 06892 g003
Figure 4. Keywords of search engine data.
Figure 4. Keywords of search engine data.
Sustainability 16 06892 g004
Figure 5. The forecasting results of Baidu Index, Google Trends, and no search engine data.
Figure 5. The forecasting results of Baidu Index, Google Trends, and no search engine data.
Sustainability 16 06892 g005
Figure 6. Forecast results of the combination model.
Figure 6. Forecast results of the combination model.
Sustainability 16 06892 g006
Figure 7. Forecast results of ARIMAX.
Figure 7. Forecast results of ARIMAX.
Sustainability 16 06892 g007
Figure 8. Forecast results of Bayesian-MLP.
Figure 8. Forecast results of Bayesian-MLP.
Sustainability 16 06892 g008
Figure 9. Forecast results of LSTM+Dense.
Figure 9. Forecast results of LSTM+Dense.
Sustainability 16 06892 g009
Figure 10. Forecast results of GPR.
Figure 10. Forecast results of GPR.
Sustainability 16 06892 g010
Figure 11. Performance comparison among different interval prediction methods during COVID-19.
Figure 11. Performance comparison among different interval prediction methods during COVID-19.
Sustainability 16 06892 g011
Figure 12. Point prediction results of different models.
Figure 12. Point prediction results of different models.
Sustainability 16 06892 g012
Table 1. The evaluation results of Baidu Index, Google Trends, and no search engine data.
Table 1. The evaluation results of Baidu Index, Google Trends, and no search engine data.
Baidu IndexGoogle TrendsBaidu + GoogleHistorical Data
PICP-11111
PICP-20.750.87510.625
MPIW-10.81740.95330.83211.0988
MPIW-21.21091.41761.37211.3217
MPIW.cap-10.81740.95330.83211.0988
MPIW.cap-21.21621.42311.37211.3434
Loss-10.81740.95330.83211.0988
Loss-24.60461.89121.372110.2421
Table 2. The evaluation of combined model and single interval forecasting model.
Table 2. The evaluation of combined model and single interval forecasting model.
Combined ModelARIMAXBayesian-MLPLSTM+DenseGPR
PICP-1110.911
PICP-21110.750.5
MPIW-10.83210.89950.74370.55621.2273
MPIW-21.37213.02851.83661.04731.2855
MPIW.cap-10.83210.89950.74370.55621.2273
MPIW.cap-21.37213.02851.83661.08431.3237
Loss-10.83210.89951.00680.55621.2273
Loss-21.37213.02851.83664.452718.3764
Table 3. MSE and DM values of different models.
Table 3. MSE and DM values of different models.
Combined ModelBayesian-MLPGPRLSTM+DenseARIMAXSVR
MSE-10.02560.06280.01290.01580.01200.0122
MSE-20.21610.23760.59700.21210.21810.2121
DM value---0.65450.76670.90610.45540.7203
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nie, R.-X.; Wu, C.; Liang, H.-M. Exploring Appropriate Search Engine Data for Interval Tourism Demand Forecasting Responding a Public Crisis in Macao: A Combined Bayesian Model. Sustainability 2024, 16, 6892. https://doi.org/10.3390/su16166892

AMA Style

Nie R-X, Wu C, Liang H-M. Exploring Appropriate Search Engine Data for Interval Tourism Demand Forecasting Responding a Public Crisis in Macao: A Combined Bayesian Model. Sustainability. 2024; 16(16):6892. https://doi.org/10.3390/su16166892

Chicago/Turabian Style

Nie, Ru-Xin, Chuan Wu, and He-Ming Liang. 2024. "Exploring Appropriate Search Engine Data for Interval Tourism Demand Forecasting Responding a Public Crisis in Macao: A Combined Bayesian Model" Sustainability 16, no. 16: 6892. https://doi.org/10.3390/su16166892

APA Style

Nie, R.-X., Wu, C., & Liang, H.-M. (2024). Exploring Appropriate Search Engine Data for Interval Tourism Demand Forecasting Responding a Public Crisis in Macao: A Combined Bayesian Model. Sustainability, 16(16), 6892. https://doi.org/10.3390/su16166892

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop