Next Article in Journal
Transforming Agricultural Productivity with AI-Driven Forecasting: Innovations in Food Security and Supply Chain Optimization
Next Article in Special Issue
Using Machine Deep Learning AI to Improve Forecasting of Tax Payments for Corporations
Previous Article in Journal
Forecasting Short- and Long-Term Wind Speed in Limpopo Province Using Machine Learning and Extreme Value Theory
Previous Article in Special Issue
Cryptocurrency Price Prediction Algorithms: A Survey and Future Directions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Does Google Analytics Improve the Prediction of Tourism Demand Recovery?

by
Ilsé Botha
1,* and
Andrea Saayman
2
1
School of Accounting, Department of Accountancy, College of Business and Economics, University of Johannesburg, Aucklandpark Campus, Johannesburg 2006, South Africa
2
School of Economic Sciences and Tourism Research in Economics, Environs and Society (TREES), North West University, Potchefstroom Campus, Potchefstroom 2531, South Africa
*
Author to whom correspondence should be addressed.
Forecasting 2024, 6(4), 908-924; https://doi.org/10.3390/forecast6040045
Submission received: 17 September 2024 / Revised: 11 October 2024 / Accepted: 16 October 2024 / Published: 18 October 2024

Abstract

:
Research shows that Google Trend indices can improve tourism-demand forecasts. Given the impact of the recent pandemic, this may prove to be an important predictor of tourism recovery in countries that are still struggling to recover, including South Africa. The purpose of this paper is firstly, to build on previous research that indicates that Google Trends improves tourism-demand forecasting by testing this within the context of tourism recovery. Secondly, this paper extends previous research by not only including Google Trends in time-series forecasting models but also typical tourism-demand covariates in an econometric specification. Finally, we test the performance of Google Trends in forecasting over a longer time period, because the destination country is a long-haul destination where more lead time may be required in decision-making. Additionally, this research contributes to the body of knowledge by including lower frequency data (quarterly) instead of the higher frequency data commonly used in current research, while also focusing on an important destination country in Africa. Due to the differing data frequencies, the MIDAS modelling approach is used. The MIDAS models are compared to typical time-series and naïve benchmarks. The findings show that monthly Google Trends improve forecasts on lower frequency data. Furthermore, forecasts that include Google Trends are more effective in forecasting one to two quarters ahead, pre-COVID. This trend changed after COVID, when Google Trends led to improved recovery forecasts even over a longer term.

1. Introduction

Recent years have seen an increase in the use of Big Data in forecasting studies, and tourism forecasting is no exception. According to [1], Big Data can be defined using the 4V concept—namely, big data is characterised by volume, variety, velocity and value. The rationale for using Big Data in tourism forecasting studies is based on the proposition that it reflects the preferences and decision-making processes of tourists [2]. According to [3], four types of Big Data can be identified in the tourism-demand literature, namely, web search data, online textual data, online photo data and social media data.
The most predominant type of Big Data found in tourism forecasting studies is web search data. Ref. [4] report that web search data from Google and Baidu search engines are most frequently employed, with [5] paving the way by using Google Trends data to forecast Hong Kong tourism. Since then, data from these search engines have been used to not only forecast tourism demand for countries and cities but also hotel demand [4]. The recent COVID-19 pandemic that disrupted tourism around the world caused a structural break in tourist arrivals to all countries. Given the impact that the recent pandemic had on the tourism industry, Big Data may prove to be an important predictor of tourism recovery in countries that are still struggling to recover. One such country is South Africa, where tourism demand is still well below its pre-pandemic levels, it having recovered just more than 80% of pre-pandemic levels by the end of 2023.
Arrivals from Africa are South Africa’s main tourism source markets, accounting for 75% of all tourist arrivals, with neighbouring countries accounting for the lion’s share. In terms of the remaining 25% of overseas arrivals (2019 figures), the country’s main tourism source markets are the UK (14.4%), the USA (14.2%), Germany (12.2%), France (6.3%), the Netherlands (5.5%), Australia (4.2%) and India (3.6%). In terms of recovery, arrivals from Africa have recovered quicker than overseas arrivals (84% versus 79%), underscoring the importance of improving recovery forecasting.
The purpose of this paper is, firstly, to build on previous research that indicates that Google Trends has the potential to improve tourism-demand forecasting by testing this within the context of tourism recovery after a shock. Secondly, this paper extends previous research by not only including Google Trends in time-series forecasting models but also including typical tourism-demand covariates in an econometric forecasting specification. Finally, we test the performance of Google Trends in forecasting over a longer time period because the destination country is a long-haul destination where more lead time may be required in decision-making.
The rationale for including Google Trends data in forecasting tourism arrivals can be found in the notion that potential tourists use the internet to plan their travel activities [5]. Therefore, the internet has become an important source of information for potential tourists and, according to [6], it signals tourists’ consumption preferences and informs their decision-making. It is therefore not surprising that [1] reports that forecasting using web search data is more prevalent in tourism than in any other subject area.
When considering only tourism demand (and not papers that focus only on hotel demand, which is a subset of tourism demand), web search data has been used to forecast demand for countries, regions/provinces, cities and attractions. In terms of country research, tourism to Spain was investigated by [7,8]. Ref. [9] forecasted tourism arrivals to South Korea, [10] China, the USA, [11] Austria, [12] Germany, and [13] to China, Turkey and the US. Table 1 provides a brief overview of the country, web search engine, data frequency and estimation methods used in these studies.
In terms of regions/provinces/states, tourism to Hong Kong leads the way and has been studied by [4,5,14,15,16,17]. Ref. [2] forecasted tourism arrivals to Hainan, China, [18] Macau, China [19], Taiwan, China, and [20] the Caribbean. Table 2 provides a summary of these studies in terms of data frequency, search data and estimation method.
The research focussing on forecasting tourism to cities is substantial, with Beijing, China, the most popular city, studied by [21,22,23,24,25]. While most of these studies used the Baidu Index, refs. [23,25] used both the Baidu and Google Indices in their studies. Ref. [26] forecasted tourism to four Taiwan cities. For European cities, [27] forecasted tourist arrivals to Prague using Google Trends data, [28] tourism to Vienna and [29] forecasted tourism to Vienna and Barcelona. These studies are summarised in Table 3.
Table 1. Forecasting tourism to countries using web search data.
Table 1. Forecasting tourism to countries using web search data.
CountriesAuthorWeb Search EngineData FrequencyEstimation Method
SpainArtola et al. (2015) [7]GoogleMonthlyARIMAX
Maximo and Jose (2018) [8]GoogleMonthlyAR-X
South KoreaPark et al. (2016) [9]GoogleMonthlyARIMAX
ChinaLv et al. (2018) [10]Google and BaiduWeekly and MonthlySEAN * regression
Wang et al. (2020) [13]GoogleMonthlyANN-based, SARIMA
USALv et al. (2018) [10]Google and BaiduWeekly and MonthlySEAN * regression
Wang et al. (2020) [13]GoogleMonthlyANN-based, SARIMA
TurkeyWang et al. (2020) [13]GoogleMonthlyANN-based, SARIMA
AustriaÖnder (2017) [29]GoogleMonthlyADLM, Naïve, AR, Holt-Winters
SpainÖnder (2017) [29]GoogleMonthlyADLM, Naïve, AR, Holt-Winters
GermanyBokelman and Lessmann (2019) [12]GoogleMonthlySARIMA, DLM *
* SEAN = stacked autoencoder with echo-state (ANN-based); DLM = distributed lag model.
Table 2. Forecasting tourism to regions/provinces using web search data.
Table 2. Forecasting tourism to regions/provinces using web search data.
Regions/ProvincesAuthorWeb Search EngineData FrequencyEstimation Method
Hong KongGawlik et al. (2011) [14]GoogleMonthlyWeighted linear regression
Choi and Varian (2012) [5]GoogleMonthlyAR-X
Wen et al. (2019) [6]BaiduMonthlyARIMA, ARIMAX, NAR, NARX, Hybrid
Bai and Hao (2021) [15] Baidu and GoogleMonthlyRandom Walk, ARIMAX, SVR, ANN, Two-step DB-ensemble DBN
Li and Law (2020) [16]GoogleMonthlyAR, Empirical Mode Decomposition ARX
Wen et al. (2021) [17]BaiduMonthlyNaïve, ETS, SARIMA, SARIMAX, MIDAS
Xie et al. (2021) [4]Baidu and GoogleMonthlyARIMA, BPNN, SVR, LSSVR, MA-LSSVR
Hainan, ChinaYang et al. (2015) [2]Baidu and GoogleMonthlyARIMAX
Macau, ChinaHu and Song (2021) [18]GoogleMonthlyANN
TaiwanHuarng and Yu (2019) [19]GoogleMonthlyAlgorithms
CaribbeanBangwayo–Skeete and Skeete (2015) [20]GoogleMonthlyAR-MIDAS
Tourism to specific attractions is also forecasted using search data with [30] forecasting tourism to the Miao Village in China, [31] forecasting demand for five London museums, [26,32] to Mount Siguniang, China, [33] to the Forbidden City, Beijing, and [21,34] to Jiuzhaigou, China.
Table 3. Forecasting tourism to cities using web search data.
Table 3. Forecasting tourism to cities using web search data.
CitiesAuthorWeb Search EngineData FrequencyEstimation Method
BeijingLi et al. (2017) [21]BaiduMonthlyARMA, Dynamic Factor Model
Li et al. (2018b) [22]BaiduMonthlyBPNN
Sun et al. (2019) [23]Baidu and GoogleMonthlyExtreme Machine Learning (EML), ARIMA, ARIMAX, ANN, SVR, LSSVR
Li et al. (2021) [24]BaiduMonthlyARIMA, ARMIAX, ML
Sun et al. (2022) [25]GoogleMonthlySN, SARIMA, SES, ARDL, SARIMAX, MLP, B-MLP, KELM, B-KELM, and SAKE
Wu et al. (2023) [35]BaiduMonthlySARIMA-MIDAS, DFM, ETS, SNaive
Taiwan citiesHu & Wu (2022) [26]GoogleMonthlyGrey models (AI) and combinations
PragueHavranek and Zeynalov (2019) [27]GoogleWeekly and MonthlyMIDAS
ViennaÖnder and Günter (2016) [28]GoogleMonthlyADLM, Naïve, AR, Holt-Winters
Önder (2017) [29]GoogleMonthlyADLM, Naïve, AR, Holt-Winters
BarcelonaÖnder (2017) [29]GoogleMonthlyADLM, Naïve, AR, Holt-Winters
From the tables above, some clear patterns can be identified. Firstly, there is a paucity of studies focussing on forecasting tourism to Africa, with Asia and Europe attracting most of the research interest. Secondly, the studies mainly use monthly data with monthly forecasts, with some utilising weekly data as well. Although not shown in the tables, forecasting attraction demand mainly uses daily and weekly data. The reason for this can be found in the assertion that web search data may be helpful in predicting the present, in other words, for nowcasting [5]. Consequently, the current body of research tends to focus more on forecasting the near future than forecasting the longer term or using quarterly data.
The tables thirdly provide a brief overview of the various forecasting methods employed. When considering country tourism forecasts, two methods dominate the research, namely, artificial neural networks or (ANN)-based methods and time-series methods, specifically ARIMA-type models and their extension to include web search data (ARX and ARIMAX). The only exceptions are [12,29], who used autoregressive distributed lag and distributed lag models, where various lags of the web search index are included. Noteworthy is the fact that none of the studies include any typical tourism demand or economic variables in the specification.
The same trend is visible in the provincial/regional and city tourism forecasts. The most predominant forecasting methods remain time-series-based methods (SARIMA, AR, ARX, ARIMA and ARIMAX) and ANN methods, with machine learning methods also becoming popular. In addition, mixed data sampling (MIDAS) was used by [17,20,27,35]. Similar to the country studies, none of the research considers other typical tourism demand or economic variables in the forecasting models, except [25], who include price and income data, and [35], who include economic index data.
Only two papers consider web search data in forecasting tourism over the COVID-19 period. One paper, by [36], uses hotel demand as the dependent variable and found that Google Trends does not improve the accuracy of demand recovery forecasts but concludes that it might be more useful for longer-haul destinations. The second paper [35] forecasted Chinese tourism recovery to Hong Kong. They found that web search data significantly enhances the recovery forecasting accuracy.
The current research addresses the gaps identified above by focussing on tourism to an African country, namely, South Africa, and specifically focussed on forecasting tourism recovery after COVID-19. In addition, we use mixed data sampling with monthly Google Trends data combined with quarterly tourist arrivals, price and income data. We therefore forecast over a longer time horizon and assess the efficacy of search data in improving forecasts over the medium-term. Additionally, this research contributes to the body of knowledge by including lower-frequency data (quarterly) instead of the higher-frequency data commonly used in current research, while also focusing on an important destination country in Africa.

2. Materials and Methods

As indicated, the objective of this research is to determine whether Google trend data improves the prediction of tourism-demand recovery in light of the COVID pandemic that started in 2020.
The data used in this research were tourist arrivals for the main source markets to South Africa and GDP and CPI data on a quarterly basis from 2004 Q1 to 2023 Q4. The data were sourced from Statistics South Africa (arrivals data, https://www.statssa.gov.za/ accessed on 30 March 2024) and the OECD (https://www.oecd.org/en/data.html accessed on 30 March 2024) and IFS databases (https://data.imf.org/?sk=4c514d48-b6ba-49ed-8ab9-52b0c1a0179b accessed 30 March 2024) for the GDP and CPI data, respectively. The focus was on arrivals from the UK, the USA, Germany, France, the Netherlands, Australia and India. The Google Trends data (web search, travel category) for these source markets are available on a monthly basis from January 2004 to May 2024. In addition, another set of Google Trends data was added to determine whether news searches (all categories) on South Africa from source markets improved the forecasts. These data are available from January 2008 to May 2024.
The search query for the Google Trends data for web searches was South Africa under the travel category for each source market. The related queries that fall under the travel category for each source market as well as for the worldwide search are shown in the word clouds below.
From Figure 1, apart from the general travel searches (flights, accommodation, travel, costs, holiday), it is evident that the South African source markets were interested in certain tourist attractions, such as South African safaris, Cape Town, Johannesburg, game reserves and the Kruger National Park. The searches for the Indian source market were mainly in terms of cricket.
The scaled and normalised Google Trends data were used. Figure 2 plots the total overseas tourist arrivals and the Google web search data (monthly) to the destination country, South Africa. These figures show that there may be an association between the monthly search data and the monthly tourist arrivals, with the Google Trends series a leading indicator of tourist arrivals.
In terms of methods, the naïve (without drift), ARIMA (1,1) optimized to convergence with the Berndt–Hall–Hall–Hausman and ARDL models, were used as the benchmark models. These models were expanded into an AR-MIDAS-web search model, an AR-MIDAS-web and news search model, an ARDL-MIDAS-web search model and an ARDL-MIDAS-web and news search model. The MIDAS approach was appropriate because it is able to incorporate mixed frequencies [20,28,37]; in this case, monthly Google Trends data, quarterly arrivals and price and income data.
The Augmented Dickey–Fuller test was used to determine the integration order of the time series. The breakpoint unit root test (ADF) was also used to account for the structural break in the data. The results rejected the null hypothesis of a unit root, and therefore, all the series were I (0).
The specification of the AR-MIDAS model for each source market is [20,28,37]:
T A t   = α + β T A t 1 + γ k = 1 m ω k , θ L H F k G T t + ε t
In the model, T A t   represents tourist arrivals, the low frequency dependent variable (quarterly basis), and T A t 1 represents the lag of tourist arrivals; these variables are transformed in logarithmic form. G T t represents the high-frequency independent variable on a monthly basis. L H F k denotes the lag operator for monthly lags k of G T t and m represents the maximum lag order. This was set to m = 4 , which means that four months of search queries have an impact on current tourist arrivals. ω k , θ represents the weighing function that can have several functional forms (e.g., Almon exponential, Almon polynomial, beta) that determine the weight of the temporal aggregation of the high-frequency observations. α , β , γ are the parameters to be estimated, and ε t is the random error term.
As indicated above, two AR-MIDAS models were estimated for each source market, one including the Google Trends index (web search) under the travel category (MIDAS_W) and another including both the web and news search Google Trends indices (MIDAS_WN). The optimal specification for each country was determined using the Schwarz Criterion.
The alternative econometric model for tourism demand is the ARDL model:
T A t = α 0 + Σ i = 1 p   α 1 T A t i + Σ j = o q α 2 Y t j + Σ j = 0 q α 3 P t j + ϵ t
where T A t is arrivals, Y t is income of the origin country, P t is relative prices and ϵ t is an i.i.d. white-noise error term. because inflation measures the cost of living in the destination country, the variable P t is determined as follows:
P t = ( C P I j t ) / ( C P I i t )
where C P I j is the consumer price index of the origin country, and C P I i is the consumer price index of the destination country.
The ARDL model is expanded to include Google Trends data, and therefore, two ARDL-MIDAS models are estimated for each country—one with the monthly web search Google Trends index (ARDL_M_W) and another with both the web and news search Google Trends indices (ARDL_M_WN) [20,28,37]:
T A t   = α + β T A t 1   Σ j = o q   α 2 Y t j + Σ j = 0 q   α 3 P t j + γ k = 1 m ω k , θ L H F k G T t + ε t
In the model, T A t   represents tourist arrivals, the low frequency dependent variable (quarterly basis), T A t 1 represents the lag of tourist arrivals, T A t is arrivals, Y t is income of the origin country and P t is relative prices. G T t represents the high-frequency independent variable/s on a monthly basis. L H F k denotes the lag operator for monthly lags k of G T t , and m represents the maximum lag order.
Given the focus on how the inclusion of Google searches improves the predictive power during a recovery period, the forecasts were evaluated pre- and post-COVID. Rolling forecasts were used due to the structural break in the data. A rolling window of 30 observations was used. The pre-COVID forecast evaluation was from 2014Q4 to 2019Q4, and the post-COVID forecast evaluation was from 2022Q1 to 2023Q4, due to most travel restrictions having been lifted in late 2021 or early 2022. The MAPE and RMSE methods were used to evaluate the forecasts on h = 1, 2, 4, 8 and 12, with four quarters being one year.

3. Results

The forecast evaluations between the benchmark models and the models including Google Trends indices (i.e., MIDAS_W, MIDAS_WN, ARDL_M_W, ARDL_M_WN) are shown in Table 4 and Table 5 below.
For the pre-COVID forecast evaluations in Table 4, the results are as follows:
The addition of Google trends as an additional explanatory variable outperformed on three forecast horizons for the US source market (h = 2, 8, 12). For forecasts over a longer forecasting horizon (h = 8, 12), the addition of the news search query as an additional explanatory variable improved the forecasts. This may be an indication that prior to COVID, news from South Africa influenced longer-term travel decisions (i.e., 8- to 12-quarters ahead) through changes in perceptions of the country.
For the UK, the MIDAS_ARDL models outperformed on h = 1, 2 and 4, and for h = 1, 2, the addition of the news search improved the forecast. Because the UK is traditionally South Africa’s most important international source market, Google searches may be more related to logistical travel concerns (see Figure 1) than perceptions about the country. A similar trend is visible for tourists from the Netherlands, which is also traditionally an important source market for South Africa and was a previous coloniser. The MIDAS_ARDL models outperformed on all five forecast horizons, with the addition of the news search variable improving h = 1, 2.
For the German source market, the MIDAS_ARDL with the additional news search variable outperformed on the shorter forecast horizons, h = 1, 2, which is more in line with other research that shows the significance of Google Trends data in shorter-term forecasts. For the Indian source market, the MIDAS with the web search variable outperformed on h = 8, which is more aligned with the results from the US source market and may be an indication that Google was used in the decision whether or not to travel to South Africa pre-COVID.
Contrary to the other source markets, the benchmark models outperformed the MIDAS models on all forecast horizons for the French and Australian source markets. This shows that prior to COVID, the underlying trends in the data were sufficient in forecasting future tourism from these two source markets.
Table 5 shows the forecast evaluations for the post-COVID period after the structural break due to the COVID pandemic. The results show the following:
For the US source market, the MIDAS web search models outperformed on all forecast horizons, and the addition of the news search improved h = 4. Post-COVID, web search data therefore became a more important source of information on which this market based their travel decisions in both the short- and long-term.
Similar to the results for the US, for the UK source market, the MIDAS models outperformed on all forecast horizons except for h = 12. The MIDAS_ARDL with the web and news search variables outperformed on h = 1, 2, and for the longer horizons (h = 4, 8), the MIDAS web search models outperformed. Compared to pre-COVID, web search data became important also in longer-term travel forecasts, which may be an indication of the increased uncertainty created by the COVID pandemic.
The MIDAS web and news model outperformed on all forecast horizons for the Netherlands source market, which is similar to the pre-COVID results for this market. Although, post-COVID, the inclusion of economic covariates did not improve the forecasting performance for tourist arrivals from the Netherlands.
For Germany, the MIDAS models outperformed on all forecast horizons; MIDAS_ARDL with web and news search outperformed on the shorter horizons (h = 1, 2, 4) and the MIDAS_WN for h = 8 and MIDAS_W for h = 12. This result may again indicate that, similar to the UK, web search data also became more important in the longer term due to the increased uncertainty caused by the pandemic. However, for India, MIDAS with web search outperformed on h = 2, 4, showing an improvement in short-term forecasts rather than the pre-COVID longer-term forecasts (h = 8).
In contrast to the pre-COVID forecast evaluation, the addition of Google Trends data did improve the forecasts for the French and Australian source markets. For the French source market, the MIDAS models outperformed on all forecast horizons except h = 2, and the addition of the news search variable improved the longer forecast horizons h = 8, 12. For Australia, the MIDAS_ARDL outperformed on h = 8, 12; the addition of the news search variable improved h = 8. This again highlights the increased importance of web search data for information about South Africa after the pandemic.
Figure 3 and Figure 4 compare the forecasting performance of the various models over the different forecasting horizons. Figure 3 shows that pre-COVID, with h = 1, 2, the MIDAS_ARDL with the Google Trends web and news search indices was the best model in three of the seven instances, with the ARIMA model and ARDL model in second and third places. Over the longer term, the naïve model was difficult to outperform, and only the MIDAS_ARDL model that included the web search Google index appeared as the best model in two instances over h = 4 and once when h = 8, 12. Figure 3 and Figure 4 compare the forecasting performance of the various models over the different forecasting horizons. Figure 3 shows that pre-COVID, with h = 1, 2, the MIDAS_ARDL with the Google Trends web and news search indices was the best model in three of the seven instances, with the ARIMA model and ARDL model in second and third places. Over the longer term, the naïve model was difficult to outperform, and only the MIDAS_ARDL model that included the web search Google index appeared as the best model in two instances over h = 4 and once when h = 8, 12.
Figure 4 shows the same figure but after COVID-19, and it is evident that the models that include Google search data outperformed the other models consistently. For example, over h = 1, 2, 8, 12, five out of the seven winning forecasts included Google Trends data. For h = 4, this improved to six out of the seven forecasts.
When one compares the models that included economic variables (ARDL-based models) with the pure time-series models pre-COVID, the ARDL models as a group fared better in forecasting tourism to South Africa in the shorter term, i.e., one- and two-quarters. Over one to three years, the time series models outperformed. Post-COVID, this trend continued, but instead of the naïve model faring well for the time-series methods, the MIDAS models that included web search data were the clear winners.

4. Discussion and Conclusions

This research set out to answer three questions, and the results are consequently considered within the context of the questions.
Firstly, does Google Trends have the potential to improve tourism-demand forecasting during recovery after a shock? Overall, the results show that for the pre-COVID period, the MIDAS models that included Google Trends data outperformed the benchmarks in 40% (14/35) of the pre-COVID forecast models, and post-COVID, this percentage increased to 77% (27/35). This shows that adding Google Trends data as an explanatory variable does improve forecast during recovery. These results are in line with the findings of previous research that show that search engine data improves tourism-demand forecasts. Specifically, it confirms research by [35] that web search data improves tourism recovery forecasts, contradicting results by [36].
Secondly, does the inclusion of typical tourism-demand covariates in an econometric forecasting specification together with Google Trends data lead to improved forecasts? To test this, MIDAS-ARDL models were estimated, and the results show that in the majority of forecasts over all time horizons pre- and post-COVID, the inclusion of Google Trends data improved the forecasting accuracy of the ARDL model.
Compared to time-series forecasts, the ARDL-based models forecast better over the shorter term, with one and two quarters ahead forecasting results, both pre- and post-COVID, and in more instances than not being the best forecasts. It is noteworthy that the ARDL-based models were never the best forecasting model for the Indian and Australian source markets pre-COVID. Post-COVID, this situation changed for these source markets, and these models delivered the most accurate forecasts, especially over the longer term. In contrast, the MIDAS-ARDL models were always the best in forecasting tourist arrivals from the Netherlands in normal (non-recovery) times.
Finally, can Google Trends improve forecasting over a longer time period? Most research on Google Trends and tourism demand is done on monthly or higher-frequency data [24]. This research considers quarterly forecasts and forecasts for up to three years (12 quarters). The rationale for this can be found in the reality that South Africa is not a well-known destination, and the country is a long-haul destination where more lead time may be required in decision-making.
This research shows that monthly Google Trends also improves forecasts on lower-frequency data (quarterly) using a mixed data-sampling framework. The results show that, pre-COVID, forecasts that included Google Trends were more effective in forecasting one to two quarters ahead than further into the future. This trend changed after COVID, where Google Trends led to improved recovery forecasts even over the longer term (4-12 quarters ahead). Additionally, this research incorporates web and/or news search indices as suggested by [29]. The addition of the news search variable improved the forecasts with the web search data by 43% (6/14) pre-COVID and by 59% (16/27) post-COVID.
In conclusion, this paper makes a significant contribution to current research by showing the significance of Google Trends data in (i) improving tourism-forecasting recovery after a shock to the tourism industry, (ii) forecasting tourism using low-frequency data and (iii) forecasting over a longer time horizon. It is also the first time that Google Trends data have been used in forecasting tourism to an African destination, which opens up future research avenues to explore in terms of tourism forecasting in African destinations.
Our research is, however, not without limitations. This research focuses only on one destination, namely, South Africa, and only models intercontinental arrivals, even though neighbouring countries are South Africa’s main tourism source markets. This is due to the limited availability of Google Trends and also, in some instances, economic data from these countries. In fact, even Google Trends search data are quite limited, limiting what could be done with them, which indicates that South Africa is a lesser-known destination that attracts fewer international internet searches. However, even with these limitations, our research shows that the inclusion of Big Data such as Google Trends can enhance tourism forecasting. Therefore, even for lesser-known destinations, it makes a difference. Future research could therefore also expand the current research by including other Big Data sources that would allow the use of, among others, sentiment analysis. Further research is needed to explain why post-COVID Google forecasts work better than pre-COVID. Additionally, it would be interesting to also verify whether forecasts that include Google indices improve long-term forecast horizons rather than short-term forecast horizons, as well as for other long-haul destinations.

Author Contributions

Conceptualization, A.S. and I.B.; methodology, I.B.; validation, A.S., formal analysis, I.B.; investigation, A.S.; writing—original draft preparation, A.S.; writing—review and editing, A.S. and I.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Secondary data were used, and the data are publicly available. The data were sourced from Statistics South Africa (arrivals data, https://www.statssa.gov.za/ accessed on 30 March 2024) and the OECD (https://www.oecd.org/en/data.html accessed on 30 March 2024) and IFS databases (https://data.imf.org/?sk=4c514d48-b6ba-49ed-8ab9-52b0c1a0179b accessed on 30 March 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, J.; Xu, L.; Tang, L.; Wang, S.; Li, L. Big data in tourism research: A literature review. Tour. Manag. 2018, 68, 301–323. [Google Scholar] [CrossRef]
  2. Yang, X.; Pan, B.; Evans, J.A.; Lv, B. Forecasting Chinese tourist volume with search engine data. Tour. Manag. 2015, 46, 386–397. [Google Scholar] [CrossRef]
  3. Wu, D.C.; Zhong, S.; Qiu, R.T.; Wu, J. Are customer reviews just reviews? Hotel forecasting using sentiment analysis. Tour. Econ. 2021, 28, 795–816. [Google Scholar] [CrossRef]
  4. Xie, G.; Li, X.; Qian, Y.; Wang, S. Forecasting tourism demand with KPCA-based web search indexes. Tour. Econ. 2021, 27, 721–743. [Google Scholar] [CrossRef]
  5. Choi, H.; Varian, H. Predicting the present with Google Trends. Econ. Rec. 2012, 88, 2–9. [Google Scholar] [CrossRef]
  6. Wen, L.; Liu, C.; Song, H. Forecasting tourism demand using search query data: A hybrid modelling approach. Tour. Econ. 2019, 25, 309–329. [Google Scholar] [CrossRef]
  7. Artola, C.; Pinto, F.; de Pedraza García, P. Can internet searches forecast tourism inflows? Int. J. Manpow. 2015, 36, 103–116. [Google Scholar] [CrossRef]
  8. Camacho, M.; Pacce, M.J. Forecasting travellers in Spain with google search volume indices. Tour. Econ. 2018, 24, 434–448. [Google Scholar] [CrossRef]
  9. Park, S.; Lee, J.; Song, W. Short-term forecasting of Japanese tourist inflow to South Korea using Google trends data. J. Travel Tour. Mark. 2016, 34, 357–368. [Google Scholar] [CrossRef]
  10. Lv, S.-X.; Peng, L.; Wang, L. Stacked autoencoder with echo-state regression for tourism demand forecasting using search query data. Appl. Soft Comput. 2018, 73, 119–133. [Google Scholar] [CrossRef]
  11. Önder, I.; Gunter, U.; Scharl, A. Forecasting Tourist Arrivals with the Help of Web Sentiment: A Mixed-frequency Modeling Approach for Big Data. Tour. Anal. 2019, 24, 437–452. [Google Scholar] [CrossRef]
  12. Bokelmann, B.; Lessmann, S. Spurious patterns in Google Trends data—An analysis of the effects on tourism demand forecasting in Germany. Tour. Manag. 2019, 75, 1–12. [Google Scholar] [CrossRef]
  13. Wang, L.; Wu, B.; Zhu, Q.; Zeng, Y.R. Forecasting Monthly Tourism Demand Using Enhanced Backpropagation Neural Network. Neural Process. Lett. 2020, 52, 2607–2636. [Google Scholar] [CrossRef]
  14. Gawlik, E.; Kabaria, H.; Kaur, S. Predicting tourism trends with Google Insights. Accessed Dec. 2011, 1, 2012. [Google Scholar]
  15. Bai, H.; Hao, H. A novel two-step procedure for tourism demand forecasting. Curr. Issues Tour. 2021, 24, 1199–1210. [Google Scholar]
  16. Li, X.; Law, R. Forecasting Tourism Demand with Decomposed Search Cycles. J. Travel Res. 2019, 59, 52–68. [Google Scholar] [CrossRef]
  17. Wen, L.; Liu, C.; Song, H.; Liu, H. Forecasting Tourism Demand with an Improved Mixed Data Sampling Model. J. Travel Res. 2021, 60, 336–353. [Google Scholar] [CrossRef]
  18. Hu, M.; Song, H. Data source combination for tourism demand forecasting. Tour. Econ. 2019, 26, 1248–1265. [Google Scholar] [CrossRef]
  19. Huarng, K.-H.; Yu, T.H.-K. Application of Google trends to forecast tourism demand. J. Internet Technol. 2019, 20, 1273–1280. [Google Scholar]
  20. Bangwayo-Skeete, P.F.; Skeete, R.W. Can Google data improve the forecasting performance of tourist arrivals? Mixed-data sampling approach. Tour. Manag. 2015, 46, 454–464. [Google Scholar] [CrossRef]
  21. Li, X.; Pan, B.; Law, R.; Huang, X. Forecasting tourism demand with composite search index. Tour. Manag. 2017, 59, 57–66. [Google Scholar] [CrossRef]
  22. Li, S.; Chen, T.; Wang, L.; Ming, C. Effective tourist volume forecasting supported by PCA and improved BPNN using Baidu index. Tour. Manag. 2018, 68, 116–126. [Google Scholar] [CrossRef]
  23. Sun, S.; Wei, Y.; Tsui, K.L.; Wang, S. Forecasting tourist arrivals with machine learning and internet search index. Tour. Econ. 2019, 70, 1–10. [Google Scholar] [CrossRef]
  24. Li, X.; Law, R.; Xie, G.; Wang, S. Review of tourism forecasting research with internet data. Tour. Manag. 2021, 83, 104245. [Google Scholar] [CrossRef]
  25. Sun, S.; Li, Y.; Guo, J.E.; Wang, S. Tourism demand forecasting: An ensemble deep learning approach. Tour. Econ. 2022, 28, 2021–2049. [Google Scholar] [CrossRef]
  26. Hu, Y.-C.; Wu, G. The impact of Google Trends index and encompassing tests on forecast combinations in tourism. Tour. Rev. 2022, 77, 1276–1298. [Google Scholar] [CrossRef]
  27. Havranek, T.; Zeynalov, A. Forecasting tourist arrivals: Google Trends meets mixed-frequency data. Tour. Econ. 2021, 27, 129–148. [Google Scholar] [CrossRef]
  28. Önder, I.; Gunter, U. Forecasting Tourism Demand with Google Trends For a Major European City Destination. Tour. Anal. 2016, 21, 203–220. [Google Scholar] [CrossRef]
  29. Önder, I. Forecasting tourism demand with Google trends: Accuracy comparison of countries versus cities. Int. J. Tour. Res. 2017, 19, 648–660. [Google Scholar] [CrossRef]
  30. Liu, Y.-Y.; Tseng, F.-M.; Tseng, Y.-H. Big Data analytics for forecasting tourism destination arrivals with the applied Vector Autoregression model. Technol. Forecast. Soc. Chang. 2018, 130, 123–134. [Google Scholar] [CrossRef]
  31. Volchek, K.; Liu, A.; Song, H.; Buhalis, D. Forecasting tourist arrivals at attractions: Search engine empowered methodologies. Tour. Econ. 2018, 25, 425–447. [Google Scholar] [CrossRef]
  32. Li, X.; Li, H.; Pan, B.; Law, R. Machine Learning in Internet Search Query Selection for Tourism Forecasting. J. Travel Res. 2020, 60, 1213–1231. [Google Scholar] [CrossRef]
  33. Huang, X.; Zhang, L.; Ding, Y. The Baidu Index: Uses in predicting tourism flows–A case study of the Forbidden City. Tour. Manag. 2017, 58, 301–306. [Google Scholar] [CrossRef]
  34. Peng, L.; Wang, L.; Ai, X.Y.; Zeng, Y.R. Forecasting Tourist Arrivals via Random Forest and Long Short-term Memory. Cogn. Comput. 2020, 13, 125–138. [Google Scholar] [CrossRef]
  35. Wu, J.; Li, M.; Zhao, E.; Sun, S.; Wang, S. Can multi-source heterogeneous data improve the forecasting performance of tourist arrivals amid COVID-19? Mixed-data sampling approach. Tour. Manag. 2023, 98, 104759. [Google Scholar] [CrossRef]
  36. Yang, Y.; Fan, Y.; Jiang, L.; Liu, X. Search query and tourism forecasting during the pandemic: When and where can digital footprints be helpful as predictors? Ann. Tour. Res. 2022, 93, 103365. [Google Scholar] [CrossRef]
  37. Li, H.; Hu, M.; Li, G. Forecasting tourism demand with multisource big data. Ann. Tour. Res. 2020, 83, 102912. [Google Scholar] [CrossRef]
Figure 1. Related search queries per source market.
Figure 1. Related search queries per source market.
Forecasting 06 00045 g001
Figure 2. Total arrivals vs. the Google Trend index.
Figure 2. Total arrivals vs. the Google Trend index.
Forecasting 06 00045 g002
Figure 3. Best forecasting models over different forecasting horizons pre-COVID.
Figure 3. Best forecasting models over different forecasting horizons pre-COVID.
Forecasting 06 00045 g003
Figure 4. Best forecasting models over different forecasting horizons, post-COVID.
Figure 4. Best forecasting models over different forecasting horizons, post-COVID.
Forecasting 06 00045 g004
Table 4. Forecast evaluation pre-COVID.
Table 4. Forecast evaluation pre-COVID.
US UK NED GER FR AUS IND
Model (h)RMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPE
ARIMA (1)9340.147.5934,115.5134.3812,110.4835.6936,647.7840.439507.9919.602741.767.595725.6720.87
MIDAS_W (1)10,766.819.1424,671.4626.0612,188.9533.1032,639.6135.3711,797.0222.523624.7310.925920.5325.04
MIDAS_WN (1)12,570.6910.9623,483.7523.0812,578.8232.5734,803.4838.1615,109.3529.653842.6011.935736.3322.98
NAÏVE (1)11,312.7411.7245,592.3144.6914,677.9542.0745,848.4259.3212,342.6928.884230.0213.777741.2531.09
ARDL (1)9676.329.1219,307.3418.928109.2219.0018,601.4121.305891.1810.133757.2511.785078.6820.10
ARDL_M_W (1)9339.287.6416,136.0116.167585.7119.3053,268.6452.038634.4716.253737.7411.087047.2027.69
ARDL_M_WN (1)10,015.287.9315,435.0916.157709.1616.7117,344.4313.5622,164.2228.093812.2011.086036.2021.98
ARIMA (2)13,666.8611.4229,986.9733.3211,076.5431.7335,086.8035.5513,481.6923.583211.159.535555.5720.04
MIDAS_W (2)12,900.8411.2323,974.9325.8811,484.4330.8130,017.2831.9513,326.6524.733982.6911.925645.9923.86
MIDAS_WN (2)14,087.5112.6822,081.2123.0112,545.6832.6332,485.7334.1317,467.3435.623981.2812.235552.1021.50
NAÏVE (2)15,332.9515.3756,297.8664.2117,667.4250.5263,392.4593.9017,701.9844.005905.7718.256767.4526.49
ARDL (2)17,798.9216.4016,086.8416.808370.1619.2017,839.6821.599279.0617.434972.2615.627165.3528.58
ARDL_M_W (2)16,626.0013.5515,385.8316.067422.5019.1653,241.5151.7313,045.6325.774692.2614.438064.6132.55
ARDL_M_WN (2)15,815.7215.1215,063.4915.827361.2218.4215,956.9314.6671,419.6077.894653.9413.868174.6930.60
ARIMA (4)15,041.0613.2231,730.9935.7911,271.3631.7436,487.3436.6213,912.4824.493187.859.135364.0819.89
MIDAS_W (4)14,072.5312.5422,350.2623.8611,498.1729.5030,526.8632.1913,545.0124.303994.0512.004948.7820.97
MIDAS_WN (4)14,656.0112.4420,857.7321.3212,374.0630.2634,779.2536.6019,197.8337.884077.7712.675106.6818.99
NAÏVE (4)8654.728.2614,617.3215.966322.8216.3514,178.8615.257528.5814.183092.599.357111.5628.08
ARDL (4)23,459.9921.9516,149.7117.438590.8619.6814,896.6716.728759.7817.504200.5213.1517,574.3364.98
ARDL_M_W (4)15,986.3314.8714,554.1415.336162.7516.3552,555.5152.2211,153.6521.934231.3813.1822,020.7364.61
ARDL_M_WN (4)16,091.5415.9414,535.3116.017709.2620.9016,781.7816.5878,005.3389.633880.1811.8616,796.8653.97
ARIMA (8)16,944.3715.5331,409.0235.9911,352.4331.7637,427.3136.3314,684.0525.653343.959.505411.0621.03
MIDAS_W (8)15,112.2513.2024,396.8826.6811,995.0230.4628,868.1731.3514,274.7724.733917.8811.524833.9419.50
MIDAS_WN (8)13,391.5010.6321,959.8621.3913,155.4331.9731,306.5631.5322,547.1043.173954.9312.005345.5122.11
NAÏVE (8)15,111.9815.0218,064.6117.326974.6416.1417,305.3518.7411,841.6121.165904.4516.9910,265.2441.23
ARDL (8)43,885.7142.1723,986.1822.038396.9919.1121,572.6024.6211,368.2823.345191.5815.65215,099.40449.20
ARDL_M_W (8)40,544.7738.2421,176.8922.665582.4114.1552,950.2351.3614,080.1826.535071.7715.89500,417.80694.51
ARDL_M_WN (8)29,829.8629.6080,196.3841.4811,433.7831.0026,149.5228.3249,880.1571.854690.1215.13140,245.80297.99
ARIMA (12)18,675.0217.8131,834.4137.4011,526.1731.9938,082.5936.0315,714.9327.333645.8510.094548.8316.55
MIDAS_W (12)16,041.4114.2425,010.1927.7712,295.1731.6330,165.8932.4015,616.3827.073964.8112.425531.3123.40
MIDAS_WN (12)13,208.6511.3523,090.4321.4713,363.1636.3433,954.8732.96349,789.80354.503909.0711.935127.9322.78
NAÏVE (12)17,332.9017.6320,790.7121.158653.0919.3217,842.0019.4510,559.4619.327311.3221.6213,147.4048.62
ARDL (12)51,931.9250.1029,942.7728.0811,499.0122.4834,454.6129.6511,704.1022.035579.4117.8111,866,709.0014,434.61
ARDL_M_W (12)73,970.8856.4926,409.8927.955518.4213.7653,219.8952.9612,182.1223.135510.7916.93122,000,000.0086,532.82
ARDL_M_WN (12)53,446.3951.021,063,032.00237.4620,931.1463.0941,463.3139.2734,217.9862.1113,027.6827.842,936,087.003535.15
The MAPE in bold indicate the best performing model in each forecast horizon.
Table 5. Forecast evaluation post-COVID.
Table 5. Forecast evaluation post-COVID.
US UK NED GER FR AUS IND
Model (h)RMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPERMSEMAPE
ARIMA (1)18,538.8220.6225,257.5430.489285.1827.9316,942.5631.814066.9514.395135.6723.674390.7217.07
MIDAS_W (1)11,535.9111.6428,700.6424.168642.9628.2730,763.6736.794831.8114.034552.7225.024633.4323.60
MIDAS_WN (1)17,347.0223.5927,083.5027.276168.0921.2428,086.5639.695778.0820.664664.2229.514772.8023.70
NAÏVE (1)14,248.3314.3532,447.4336.049714.5429.9733,201.4457.714662.2417.603431.8422.844976.7420.20
ARDL (1)17,286.3022.9737,650.2845.9811,015.3437.7020,297.5432.437086.9130.947741.1548.585755.9430.99
ARDL_M_W (1)19,602.7125.1835,133.2343.3111,373.6139.7524,889.0039.665054.9619.986544.1634.2710,351.9663.02
ARDL_M_WN (1)17,842.6724.9321,843.2022.6713,344.1544.3912,531.2525.914277.1017.326823.1835.538206.8748.55
ARIMA (2)21,276.8724.1335,781.8534.9511,132.8635.8930,100.9455.716874.0123.815457.9435.095005.0124.20
MIDAS_W (2)14,122.5016.6529,197.1428.8310,963.1037.1536,803.5250.746118.8724.146618.7140.482923.3413.94
MIDAS_WN (2)15,048.7423.2633,101.9935.557568.9126.6935,778.4354.138633.6231.226845.5448.244329.3719.71
NAÏVE (2)22,079.8022.0545,426.8354.8811,859.6138.3046,051.5884.487383.0232.266046.9540.215890.4728.37
ARDL (2)21,822.6430.5079,331.4085.3612,857.4044.5923,287.1242.1910252.6146.9014,178.1676.1931,868.72176.04
ARDL_M_W (2)27,307.0538.5731,795.8942.9711,521.7442.9824,931.0544.617875.9737.7416,884.1585.7821,875.07125.68
ARDL_M_WN (2)22,050.0834.2020,849.3024.1112,982.4742.1818,945.9634.528515.5838.9916,094.2880.4212,472.7066.53
ARIMA (4)28,522.6638.5932,692.5334.2512,577.8341.8523,028.5432.918626.1135.669649.4756.587186.1939.05
MIDAS_W (4)17,721.5225.6730,664.5630.919429.0928.5038,509.3857.667736.5932.0810,102.5463.143770.9023.31
MIDAS_WN (4)14,886.6522.8537,064.5543.607793.7223.5435,582.2156.5612,201.6955.1310,061.2669.495460.5033.29
NAÏVE (4)34,732.1245.4844,552.8750.4613,109.9846.7726,443.2447.3611,517.2748.6211,196.4964.897846.2345.23
ARDL (4)27,717.1330.90125,580.70144.4612,669.7944.8021,675.1342.5610,836.4749.4515,308.1581.3818,505.6792.58
ARDL_M_W (4)36,063.8848.9641,712.7355.3614,793.4544.6131,793.0255.4311,457.8655.4659,729.03218.1944,004.82181.03
ARDL_M_WN (4)34,395.4347.2337,826.0148.9814,522.6545.6915,076.9425.1712,280.5258.8537,890.29119.8813,672.1169.91
ARIMA (8)45,673.5558.7537,879.2444.1116,081.8155.9730,751.0667.2017,186.5779.0518,088.86124.0110,740.3865.39
MIDAS_W (8)30,622.8439.0534,423.4836.1513,233.3347.6736,236.1850.9619,876.9992.9318,391.15126.968409.1753.83
MIDAS_WN (8)38,869.9751.5836,189.5142.689624.3542.5923,400.7840.2014,268.0656.2714,889.71106.0411,123.0974.99
NAÏVE (8)62,978.5179.8871,861.9485.6124,560.5185.9551,642.6090.1622,768.9097.6017,855.14106.3514,375.4281.64
ARDL (8)56,106.1274.97976,554.90750.1614,403.4549.5230,499.0944.9882,333.98296.2123,248.57177.8510,194.1846.83
ARDL_M_W (8)64,799.9588.2879,718.4197.3714,572.8349.0330,205.9550.4122,243.9182.7014,896.69103.5615,079.5666.73
ARDL_M_WN (8)61,708.7987.7575,923.7888.0825,822.0569.0430,624.5643.4921,880.3785.0110,673.8686.0812,037.6972.40
ARIMA (12)33,607.2536.5931,500.3941.259756.0339.8430,277.0474.7114,704.8368.8413,764.5298.848175.7752.15
MIDAS_W (12)17,458.1322.6045,837.8052.369425.4537.7318,291.9523.7417,184.6879.7616,302.95123.728762.3457.27
MIDAS_WN (12)18,240.5226.9344,778.3657.187768.5533.5312,134.3223.996258.7231.6516,191.22134.7810,780.0275.08
NAÏVE (12)59,634.8272.1757,772.9365.5521,855.2265.1146,476.4177.8528,116.80122.9016,719.66107.6014,301.5674.07
ARDL (12)55,295.0567.53139,857.70169.9911,441.4040.8558,948.40101.3260,801.60203.3114,938.2499.696796.3938.64
ARDL_M_W (12)56,460.3670.45917,644.60382.1710,569.6541.5728,924.1837.6619,873.4773.929811.4976.428343.4849.32
ARDL_M_WN (12)58,416.1778.46238,076.60222.7211,482.7038.8132,620.8568.2419,714.6179.6511,137.1694.4460,629.92341.51
The MAPE in bold indicate the best performing model in each forecast horizon.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Botha, I.; Saayman, A. Does Google Analytics Improve the Prediction of Tourism Demand Recovery? Forecasting 2024, 6, 908-924. https://doi.org/10.3390/forecast6040045

AMA Style

Botha I, Saayman A. Does Google Analytics Improve the Prediction of Tourism Demand Recovery? Forecasting. 2024; 6(4):908-924. https://doi.org/10.3390/forecast6040045

Chicago/Turabian Style

Botha, Ilsé, and Andrea Saayman. 2024. "Does Google Analytics Improve the Prediction of Tourism Demand Recovery?" Forecasting 6, no. 4: 908-924. https://doi.org/10.3390/forecast6040045

APA Style

Botha, I., & Saayman, A. (2024). Does Google Analytics Improve the Prediction of Tourism Demand Recovery? Forecasting, 6(4), 908-924. https://doi.org/10.3390/forecast6040045

Article Metrics

Back to TopTop