Next Article in Journal
Designing Low-Carbon Creative Tourism Routes: The Case of Chang Moi, Chiang Mai, Thailand
Previous Article in Journal
Research on Characteristics and Influencing Factors of Rural Domestic Sewage Generation and Discharge in the Yellow River Basin at County Level
Previous Article in Special Issue
How Does Rural Tourism Enhance Rural Residents’ Well-Being? Moderating Effects of Organizational Conditions and Leadership
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning-Based Tourism Demand Prediction Using Tourism Instability Indicators

by
Ikhlas Fuad Zamzami
Department of Management Information Systems, College of Business Rabigh, King Abdulaziz University, Jeddah 21991, Saudi Arabia
Sustainability 2026, 18(11), 5503; https://doi.org/10.3390/su18115503
Submission received: 30 April 2026 / Revised: 22 May 2026 / Accepted: 25 May 2026 / Published: 1 June 2026
(This article belongs to the Special Issue Sustainable Development in Urban and Rural Tourism)

Abstract

This study presents a data-driven framework for analyzing the relationship between geopolitical risk and tourism demand using the Saudi Arabia Tourism Dataset (2015–2024). Due to the absence of explicit geopolitical indicators, the research constructs a tourism-instability proxy index derived from tourism volatility, spending fluctuations, and disruption periods to represent uncertainty conditions affecting tourism demand, including volatility in tourist arrivals, spending fluctuations, and shock indicators such as disruption periods. A two-stage machine learning pipeline is implemented to (i) predict the engineered GRI and (ii) evaluate its impact on tourism demand, measured by tourist numbers. Five algorithms—Extra Trees, Gradient Boosting, AdaBoost, K-Nearest Neighbors, and Ridge Regression—are applied and evaluated using MAE, RMSE, and R2. In Stage 1, predicting the GRI remains challenging, with the best model (Extra Trees) achieving R2 = 0.184, MAE = 0.168, and RMSE = 0.209, reflecting the complexity of modeling indirectly constructed risk. In Stage 2, performance improves significantly, with Extra Trees achieving R2 = 0.577, MAE = 1120.57, and RMSE = 2749.10, demonstrating strong predictive capability. The findings suggest that instability-sensitive tourism indicators may contain predictive information associated with tourism demand variations. Compared to existing studies that rely on external risk indices or normalized datasets, this research contributes a novel approach by engineering risk directly from tourism data and validating its predictive influence. The proposed framework provides practical insights for policymakers and tourism planners in managing uncertainty and supporting sustainable tourism development under dynamic geopolitical conditions.

1. Introduction

Under Vision 2030, Saudi Arabia’s tourism sector is rapidly transforming, moving from a primarily religious tourism model to a more diversified, data-driven industry bolstered by artificial intelligence and machine learning. The sector is now a mix of economics, culture, and technology, with tourism a major contributor to GDP growth, employment, and infrastructure development [1]. Recent studies have shown the growing importance of data-driven forecasting and predictive analytics to support strategic planning. Tourism demand is influenced by a complex set of factors such as economic indicators, major events, and global disruptions [2]. The availability of structured datasets for tourism, such as the number of tourists, overnight stays, and spending, has enabled the application of machine learning models for the prediction of demand and behavioral analysis [3]. Furthermore, the application of AI technologies, particularly sentiment analysis utilizing social media data, increases understanding of tourist satisfaction and preferences, providing immediate feedback to enhance tourism services [4]. Together, these studies provide a snapshot of the evolution of Saudi tourism into a smart, data-driven ecosystem with sophisticated analytics, policy reforms, and significant investments, all working in concert to enable sustainable growth and global competitiveness.
This current study established a research question centered on “What insights can instability-aware tourism analytics provide for tourism planning and sustainable tourism development under uncertainty?” Furthermore, the main research question is: “How effectively can machine learning models predict the constructed tourism-instability proxy index using tourism dynamics variables?”
Recent studies show that geopolitical risk has become an important factor affecting tourism demand, and it affects tourist arrival, overnight stays, and expenditure through uncertainty and risk perception. Empirical evidence shows that geopolitical instability, such as conflict, terrorism, and policy uncertainty, negatively affect the flow of tourism by reducing the confidence of travelers and changing the choice of destination [5,6]. Cross-country analysis offers additional evidence that these effects are persistent and often asymmetric, with a sharper decline in tourism during periods of increased risk [7]. Moreover, behavioral responses to uncertainty are important. Psychological and environmental factors influencing decision-making under risk reinforce the importance of perceived security [8]. The presence of machine learning has greatly improved the ability of modeling such complex relationships, providing a way to capture nonlinear interactions of the geopolitical risk and tourism indicators [9]. Moreover, recent studies integrating social media data and deep learning techniques demonstrated better forecasting accuracy by incorporating real-time behavioral signals with traditional tourism variables [10]. Hybrid machine learning frameworks enhance robustness and predictive performance in dynamic environments with external shocks [11]. The overall literature indicates a trend towards data-driven approaches in which geopolitical risk is regarded as a quantifiable and significant determinant of tourism demand.
In recent years, there has been an increasing application of data-driven and deep learning methods for modeling complex temporal and behavioral patterns in tourism demand forecasting. Advanced architectures like transformers and hybrid neural networks are increasingly supplementing or replacing traditional statistical models, which demonstrate superior performance in handling nonlinear and dynamic tourism data [12]. The use of real-time indicators of tourist interest and behavior, including external data sources such as web search trends, has improved the accuracy of forecasting [13]. Hybrid temporal neural network models have demonstrated good prediction ability based on the combination of sequential learning with feature extraction in the context of tourism demand fluctuation modeling [14]. In addition, the spatial-temporal transformer-based models offer a systematic framework for modeling geographic and temporal dependencies in tourism flows, which significantly enhances prediction accuracy in complicated environments [15]. Overall, these studies reveal a clear trend toward intelligent forecasting systems leveraging big data, deep learning, and multi-source information integration. This evolution enables a more accurate and adaptive tourism demand forecast that is essential for strategic planning, resource allocation, and sustainable tourism development in rapidly changing global contexts.
Recent literature highlights the increasing importance of incorporating advanced machine learning techniques and geopolitical risk indicators in tourism forecasting. Tourism demand is increasingly impacted by economic uncertainty and external shocks, especially geopolitical risks, which in turn have an impact on travel decisions, investor confidence, and firm performance. Studies show that geopolitical risk has a significant impact on tourism-related industries like travel and leisure companies, by changing market expectations and financial stability [16,17]. At the same time, modern forecasting approaches use mixed-frequency machine learning models to integrate heterogeneous data sources and take into account complex temporal dynamics in tourism demand [18]. The use of alternative data, such as search engine trends, further improves the forecasting accuracy since they reflect real-time tourist intentions and behavioral signals [19]. Together, these methods point to a shift toward hybrid, data-driven frameworks that combine economic indicators, behavioral data, and advanced computational models. This integration allows for a more accurate forecast of tourism demand, considering uncertainty and volatility in global environments.
This study established that literature suggests that combining machine learning techniques with geopolitical risk analysis provides a more comprehensive understanding of tourism dynamics, supporting better decision-making for policymakers and industry stakeholders. However, it is important to note that the proposed Geopolitical Risk Index (GRI) does not represent a direct geopolitical measurement comparable to established global geopolitical indices. Instead, it functions as a tourism-system instability proxy engineered from internal tourism dynamics, including volatility, disruption periods, and spending fluctuations. The objective is to model how instability signals embedded within tourism behavior influence future tourism demand in data-constrained environments where direct geopolitical indicators are unavailable.
This paper is organized as follows: Section 1 introduces the research problem, emphasizing the growing impact of geopolitical risk on tourism demand and the need for data-driven modeling. Section 2 reviews related work, identifying gaps in existing approaches. Section 3 presents the conceptual framework and model development. Section 4 presents the research methodology. Section 5 details the experimental analysis and results. Section 6 discusses the findings and implications, while Section 7 concludes the study and outlines future research directions.

2. Related Work

There are many previous research studies within the Saudi Arabia Tourism Dataset/Saudi tourism studies since the establishment of the dataset. Crucial to that is the work of Kiani [1], who investigates tourism as a driver of economic diversification in Saudi Arabia under Vision 2030. Using secondary data analysis, the study examines economic, social, and policy impacts of tourism development, including mega-projects such as NEOM and the Red Sea Project. The findings highlight significant growth in tourism revenue, employment, and GDP contribution. The study concludes that tourism plays a critical role in reducing oil dependency and fostering long-term economic sustainability. In the same approach, Alsulami et al. [2] predict tourism growth in Saudi Arabia using machine learning models aligned with Vision 2030. The study employs multiple regression and ensemble techniques, including Random Forest, Gradient Boosting, and voting ensembles, on a city-level dataset. The findings indicate that ensemble models outperform traditional approaches, achieving high predictive accuracy (R2 above 0.95). The study highlights the importance of integrating economic, seasonal, and event-driven factors for effective tourism forecasting.
Louati et al. [3] analyze and predict tourist spending patterns in Saudi Arabia using machine learning and time-series models. The study utilizes data from 2015 to 2021 and applied algorithms such as Decision Trees, Random Forest, KNN, and ARIMA. Results show strong predictive performance and reveal key insights into spending behavior, including average trip and overnight expenditures. The study demonstrates the role of ML in enhancing sustainability and decision-making in tourism. Working in the same direction, the study by Alzahrani et al. [4] focuses on developing a hybrid AI framework to enhance tourism experiences through sentiment analysis. Using data from YouTube comments, the research applies machine learning techniques such as Multinomial Naïve Bayes to classify tourist sentiment. The results demonstrate high classification performance and show that AI can effectively capture tourist satisfaction and preferences. The study emphasizes the importance of AI-driven insights for improving service quality and aligning tourism strategies with visitor expectations.
The Saudi Arabia Tourism Dataset (2015–2024) [5] provides province-level tourism statistics, including the number of tourists, overnight stays, spending, average stay, and spending indicators. Although it is a dataset rather than a journal paper, it is central to the present study because it enables empirical modeling of Saudi tourism patterns across time, province, and tourism type.
There are also many other previous research studies within the context of “Geopolitical Risk and Tourism Demand Prediction”. Among those studies is the work of Hailemariam and Ivanovski [6], who investigate the impact of geopolitical risk on tourism using empirical data analysis. The authors employ econometric modeling to examine how uncertainty and geopolitical tensions affect tourism flows. Findings reveal that geopolitical risk significantly reduces tourism demand, particularly in the short term, by increasing uncertainty and discouraging international travel. The study emphasizes that tourism is highly sensitive to external shocks, making risk modeling essential for forecasting tourism performance.
Papagianni et al. [7] analyze tourism demand under geopolitical risk using a cross-country dataset and advanced econometric techniques. The authors apply a Bayesian panel VAR model to capture dynamic relationships between risk shocks and tourism flows. Results indicate that geopolitical risk has a persistent negative impact on tourism demand across countries. The study also highlights heterogeneity in responses, suggesting that some destinations are more resilient due to structural and economic factors.
Although not directly focused on tourism demand, Huo and Li [8] explore behavioral decision-making using fuzzy-set qualitative comparative analysis (fsQCA). They examine how multiple interacting factors influence user behavior under uncertainty. The findings highlight the importance of psychological and environmental factors in shaping decision outcomes. This is relevant to tourism research, as it supports the idea that security perception and behavioral responses mediate the impact of external risks such as geopolitical instability.
Dimitriadou et al. [9] introduce a machine learning approach to modeling tourism demand under uncertainty. Multiple ML algorithms are applied to capture nonlinear relationships between uncertainty indicators and tourism flows. The findings demonstrate that machine learning models outperform traditional methods in forecasting tourism demand, particularly under volatile conditions. The study emphasizes the importance of incorporating uncertainty-related variables, including geopolitical risk, into predictive tourism models.
Qin et al. [10] focus on tourism demand forecasting using social media data and deep learning techniques. The authors employ neural network architectures to extract behavioral signals from online platforms, integrating them into predictive models. Results show improved forecasting accuracy compared to traditional approaches. The study highlights the value of incorporating real-time behavioral data to capture shifts in tourist sentiment, which is often influenced by geopolitical and environmental uncertainties.
Fu and Qin [11] propose a hybrid machine learning framework for tourism demand forecasting, combining multiple models to enhance predictive performance. The methodology integrates data preprocessing, signal decomposition, and ensemble learning techniques. Findings demonstrate that hybrid models significantly improve accuracy and robustness, particularly in dynamic environments. The study supports the use of advanced ML pipelines to model complex influences on tourism demand, including external shocks such as geopolitical risk.
There are also other previous research studies associated with the complete machine learning workflow for tourism demand prediction. Crucial to this is the work of Huang and Zhang [12], which proposes an iTransformer-based model for daily tourism demand forecasting. The objective is to improve prediction accuracy by capturing complex temporal dependencies in tourism data. The methodology employs a transformer architecture optimized for time-series forecasting, allowing the model to learn long-term patterns. The findings demonstrate that the iTransformer significantly outperforms traditional models, particularly in handling dynamic fluctuations in tourism demand, making it suitable for high-frequency tourism data analysis.
Lee [13] enhances tourism demand forecasting by integrating web search data into a SARIMAX model. The method combines traditional time-series modeling with external behavioral indicators derived from online search trends. The findings show that incorporating web search data improves forecasting accuracy by capturing real-time tourist interest. The study highlights the importance of external data sources in enriching tourism prediction models and supporting data-driven decision-making.
Zhang et al. [14] develop a hybrid temporal neural network model for tourism demand forecasting, focusing on sustainability and predictive performance. The methodology integrates sequential neural networks with temporal feature extraction to capture complex demand patterns. Results indicate that the hybrid model achieves higher accuracy compared to standalone models, particularly in modeling seasonal and nonlinear tourism trends. The study emphasizes the effectiveness of hybrid architectures in improving forecasting reliability.
Chen et al. [15] introduce a spatial-temporal transformer model to forecast tourism demand by capturing both geographic and temporal dependencies. The methodology leverages deep learning to model interactions between locations and time-series data. The findings reveal that the proposed model significantly improves prediction accuracy, especially in complex tourism systems with spatial interdependence. The study highlights the importance of integrating spatial and temporal information in modern tourism forecasting models.
Other research also focuses on ML Models for predicting engineered geopolitical risk. Hu et al. [18] examine tourism forecasting by applying mixed-frequency machine learning techniques. The methodology integrates datasets with different temporal resolutions to enhance prediction accuracy. By combining high-frequency and low-frequency variables, the model captures complex temporal relationships in tourism demand. The findings show that mixed-frequency ML models outperform traditional approaches, particularly in handling dynamic and heterogeneous data, making them suitable for forecasting tourism demand under varying economic and external conditions.
Sun et al. [16] investigate the implications of geopolitical risk on travel and leisure firms using financial data analysis. The methodology applies econometric and financial modeling techniques to assess how geopolitical uncertainty influences firm performance. Findings reveal that increased geopolitical risk negatively affects the stability and returns of travel-related firms. The study highlights the strong sensitivity of tourism-related industries to global political conditions, reinforcing the importance of risk-aware forecasting models.
Raheem and Le Roux [17] examine the relationship between geopolitical risk and tourism stocks using a causality-in-quantile approach. The methodology captures nonlinear and asymmetric effects across different market conditions. Results indicate that geopolitical risk significantly influences tourism stock returns, particularly during periods of high uncertainty. The study demonstrates that the impact of geopolitical risk is not uniform, emphasizing the need for models that account for varying market conditions and risk levels.
Georgakis et al. [19] focus on tourism demand forecasting using search engine data as a proxy for tourist interest. The methodology integrates search query data into forecasting models to capture real-time behavioral signals. The findings show that search engine data significantly improves prediction accuracy compared to traditional models. The study highlights the value of alternative data sources in understanding and forecasting tourism demand patterns.
Existing tourism studies increasingly use machine learning for demand forecasting, but most focus on tourist arrivals, search behavior, spending, or destination-level demand. Limited studies specifically examine Saudi Arabia using province-level tourism indicators from 2015 to 2024. More importantly, geopolitical risk is usually imported from external indices rather than engineered directly from tourism behavior. Few studies model geopolitical risk linked to security perception and tourism demand using tourist numbers, overnight stays, and spending. Therefore, this study fills a gap by constructing a data-driven Geopolitical Risk Index, predicting it using ML, and testing its influence on Saudi tourism demand.

3. Conceptualization and Model Development

Conceptualization and Model Development refer to the identification of theoretical linkage among variables and their representation in an analytical framework. They comprise the identification of essential constructs, the specification of their interactions, and the formulation of mathematical or computational models. This phase links theory with empirical analysis and allows building testable models that can be used for simulation, prediction, or validation with statistics or machine learning methods.

3.1. Conceptualization

This study conceptualized “Tourism in Saudi Arabia” as multi-layered in nature, driven by economic, policy, environmental, and geopolitical factors. This is drawn from the evidence that “Tourism is a strategic pillar of Vision 2030” [20], and “Tourist arrivals strongly influence GDP growth”, as well as “Rapid growth of 27.4 M international arrivals in 2023” [21]. Figure 1 presents the proposed conceptual framework layer by layer. The first layer captures external instability using measurable proxies such as shock dummy variables, arrival volatility, and revenue volatility. These indicators convert abstract geopolitical events (e.g., war, regional tensions) into quantitative data.
Geopolitical Risk Index (GRI) is modeled from shock, arrival volatility, and revenue volatility by Equation (1):
G R I t = w 1 S D t + w 2 A V t + w 3 R V t
where S D t shock Dummy (0/1), A V t = T N t T N t 1 T N t 1 , and R V t = T S t T S t 1 T S t 1 . The proposed instability proxy is theoretically grounded in tourism uncertainty and behavioral response literature, which suggests that tourism systems react sensitively to uncertainty through observable fluctuations in arrivals, spending behavior, occupancy stability, and travel activity. In tourism analytics, periods of instability are frequently reflected indirectly through volatility-based tourism dynamics, particularly when direct geopolitical indicators are unavailable. Therefore, the present study operationalizes instability through tourism-system behavioral signals rather than direct geopolitical event measurements.
A weighted index combining shock events, tourist arrival volatility, and revenue volatility to quantify geopolitical instability and its potential influence on tourism system uncertainty over time is presented by Equation (2).
G R I t = w 1 S D t + w 2 T N t T N t 1 T N t 1 + w 3 T S t T S t 1 T S t 1
This formulation defines the Geopolitical Risk Index as a weighted combination of shock events and absolute growth volatility in tourist numbers and spending, capturing instability and fluctuations in tourism demand over time.
It is important to emphasize that fluctuations in tourism arrivals and tourism spending may emerge from multiple interacting factors beyond geopolitical conditions, including seasonal variation, pandemics, economic fluctuations, transportation accessibility, tourism policy changes, pricing effects, and behavioral tourism cycles. Therefore, the proposed construct should not be interpreted as a direct measurement of geopolitical risk, but rather as a tourism-system instability proxy reflecting uncertainty-sensitive tourism dynamics. It is also acknowledged that the proposed proxy is partially endogenous because it is constructed from tourism-system variables themselves. Therefore, the framework does not estimate purely exogenous geopolitical risk, but instead models instability-sensitive tourism dynamics under uncertainty conditions.
Geopolitical risk directly influences tourism by creating uncertainty and indirectly shapes tourist behavior through perceived safety. It serves as a foundational exogenous variable that disrupts normal tourism patterns, affecting both supply and demand sides of the tourism ecosystem.
The second layer introduces temporal and environmental variations such as monthly/quarterly effects, peak versus off-peak seasons, and temperature proxies. These factors influence travel timing, destination attractiveness, and tourist comfort. The Seasonal–Weather Index models tourism variation using temporal and environmental factors, combining monthly, quarterly, peak-season, and temperature effects to capture seasonal demand fluctuations and climate influence on tourism behavior presented by Equation (3).
S W I t = α 1 M t + α 2 Q t + α 3 P S t + α 4 T P t
where M t is the month effect, Q t is the quarter effect, P S t is the peak season effect, is T P t the temperature variability. Seasonality interacts with geopolitical risk and infrastructure by amplifying or dampening tourism flows. For example, extreme weather conditions or off-peak seasons may intensify the negative effects of perceived risk, while favorable seasons can partially offset them.
The third layer is an Infrastructure layer, which includes physical and logistical components such as the number of hotels, airport capacity, and transportation networks. It determines the destination’s ability to accommodate tourists efficiently. Strong infrastructure enhances accessibility and service delivery, while weak infrastructure limits tourism potential regardless of demand. This layer also feeds into service quality, forming the operational backbone that supports tourist experiences and satisfaction.
The fourth layer deals with security perception and translates geopolitical risk into tourist decision-making. It is operationalized using rolling means of arrivals, standard deviation (stability), growth rates, and occupancy stability. These indicators reflect how tourists perceive safety and stability over time. This layer acts as a mediator, meaning geopolitical risk does not directly reduce tourism demand but influences it through perceived safety, confidence, and destination reliability.
The fifth layer dwells on “Tourism Experience”, which is the central layer, and it includes tourism satisfaction, service quality, destination attractiveness, and cultural experience. It integrates inputs from infrastructure, seasonality, and security perception. A positive experience can mitigate negative perceptions of risk, while poor service quality can amplify them. This layer explains how operational and environmental factors are translated into perceived value, ultimately shaping tourist decisions and behaviors.
The sixth layer relates to “Tourism Demand”, and it includes measurable indicators such as international arrivals, domestic trips, growth rates, and hotel occupancy. This layer reflects the direct outcomes of the system, influenced by both security perception and tourism experience. It demonstrates that demand is not solely driven by economic factors but also by behavioral and perceptual dynamics shaped by risk and experience.
Layer seven dwells on the “Economic Impact”; the layer captures the broader economic effects of tourism demand, including tourism revenue, GDP contribution, and employment. Increased demand leads to higher economic output and job creation. It highlights the strategic importance of tourism in national development, particularly in the context of Saudi Arabia’s Vision 2030, where tourism is a key diversification driver.
Layer eight is the “Behavioral Outcomes” layer, and it presents the behavioral outcomes that focus on individual tourist actions such as length of stay, spending per capita, and destination choice. These outcomes provide deeper insights into how tourists respond to their experiences and perceptions. They are influenced by satisfaction, perceived safety, and overall value. This layer is essential for understanding demand quality, not just quantity.
The final layer involves “Integrated Service Flow”. The bottom pathway, Infrastructure → Service Quality → Tourist, represents the operational transformation process. It shows how physical investments are converted into service delivery and ultimately into tourist experiences. This flow emphasizes that infrastructure alone is insufficient; its effectiveness depends on how well it enhances service quality and meets tourist expectations.
The theoretical foundation of the proposed tourism-instability proxy is grounded in tourism uncertainty theory and perceived security behavior. Prior tourism studies suggest that instability-sensitive tourism systems often exhibit measurable behavioral fluctuations through changes in arrivals, spending patterns, occupancy variability, and disruption responses. In the absence of direct geopolitical indicators, the present study operationalizes instability indirectly through tourism-system volatility measures, assuming that uncertainty conditions are partially reflected in observable tourism dynamics.
The theoretical justification, where the proposed GRI has been described, is not intended to represent a direct geopolitical risk index comparable to established external indices such as Caldara and Iacoviello’s GPR Index. Instead, the variable is now explicitly defined as the theoretical foundation of the proposed tourism-instability proxy, which is grounded in tourism uncertainty theory and perceived security behavior. Prior tourism studies suggest that instability-sensitive tourism systems often exhibit measurable behavioral fluctuations through changes in arrivals, spending patterns, occupancy variability, and disruption responses. In the absence of direct geopolitical indicators, the present study operationalizes instability indirectly through tourism-system volatility measures, assuming that uncertainty conditions are partially reflected in observable tourism dynamics.

3.2. Model Development

The models that this research adopted dwell on a data-driven framework for the analysis of Saudi Arabia’s tourist dynamics with advanced machine learning techniques. It adopts a two-stage method: first, it builds a Geopolitical Risk Index (GRI) based on known tourism data, and second, tests its predictive effect on tourism demand. The study uses various methods such as Extra Trees, AdaBoost, KNN, Gradient Boosting, and Ridge Regression with information such as tourist numbers, overnight stays, and spending. In the absence of direct geopolitical data, the model builds a proxy GRI based on volatility, demand shocks, and disruption periods, which allows for a robust examination of the impact of uncertainty on tourism patterns.

3.2.1. Extra Trees Regressor

Extra Trees Regressor (Extremely Randomized Trees) is an ensemble learning algorithm that builds multiple decision trees using random subsets of data and randomly selected split points [22,23]. Unlike Random Forest, it introduces higher randomness not by searching for the optimal split but by selecting splits randomly. Predictions are obtained by averaging outputs from all trees, reducing variance and improving generalization [24]. This algorithm is suitable for the current research because tourism data exhibits nonlinear relationships and volatility patterns. Extra Trees effectively captures complex interactions between variables such as tourist numbers, spending, and instability proxies. Its robustness to noise and ability to handle high-dimensional features make it ideal for modeling the engineered Geopolitical Risk Index and tourism demand.

3.2.2. AdaBoost Regressor

AdaBoost (Adaptive Boosting) is an ensemble method that combines multiple weak learners, typically shallow decision trees, to form a strong predictive model [25]. It works by iteratively adjusting the weights of observations, giving more importance to incorrectly predicted instances [26]. Each subsequent model focuses on reducing previous errors, leading to improved prediction accuracy [27,28]. AdaBoost is selected for its ability to emphasize difficult patterns in the dataset, such as sudden drops in tourism due to shocks or instability. In this research, geopolitical risk is indirectly modeled through volatility and disruptions, which are often subtle and nonlinear. AdaBoost enhances prediction by focusing on these complex cases, making it effective for both risk estimation and tourism demand modeling.

3.2.3. K-Nearest Neighbors Regressor

KNN Regressor is a non-parametric algorithm that predicts values based on the average of the k-nearest data points in the feature space [29]. It does not assume any predefined relationship between variables but relies on distance metrics (e.g., Euclidean distance) to identify similar observations and estimate outputs accordingly [30,31]. KNN is useful in this research because it captures local patterns in tourism data without imposing strict assumptions. Tourism demand often depends on similar historical conditions (e.g., seasons, spending patterns). KNN helps identify such similarities and predict outcomes based on past behavior [32]. It is particularly valuable as a baseline for understanding data structure and validating the effectiveness of more complex models.

3.2.4. Gradient Boosting Regressor

Gradient Boosting Regressor is an ensemble learning technique that builds models sequentially, where each new model corrects the errors of the previous one [33]. It minimizes a loss function by using gradient descent, combining weak learners (usually decision trees) to form a highly accurate predictive model [34]. This algorithm is well-suited for the research because it can model complex nonlinear relationships and interactions between variables such as geopolitical risk proxies and tourism indicators. It is particularly effective in handling structured tabular data and capturing subtle patterns like demand fluctuations and seasonal effects. Its high predictive performance makes it valuable for forecasting tourism demand influenced by instability [35].

3.2.5. Ridge Regression Baseline

Ridge Regression is a linear regression technique that incorporates L2 regularization to penalize large coefficients [36]. This reduces overfitting and improves model stability by shrinking coefficient values while maintaining all predictors in the model [37]. Ridge Regression is included as a baseline model to provide a benchmark for evaluating more complex machine learning algorithms. It is useful for identifying linear relationships between variables such as tourist numbers, spending, and engineered risk indices [38]. Although simpler, it offers interpretability and helps assess whether nonlinear models provide significant improvements over linear assumptions in this research.

4. Research Methodology

The research methodology flow of the current study is presented in Figure 2. The figure presents a structured, two-stage machine learning pipeline for analyzing the Saudi Arabia Tourism Dataset (2015–2024). It begins with dataset input, including key tourism indicators such as tourist numbers, overnight stays, and spending. The preprocessing stage ensures temporal consistency through chronological sorting, missing value handling, and time-based train-test splitting. Feature engineering transforms raw data into meaningful predictors using lag variables, growth rates, volatility measures, and shock indicators. The target construction stage defines a Geopolitical Risk Index (GRI) as a proxy for instability and sets tourism demand as the primary outcome. The modeling framework applies multiple algorithms, including Extra Trees, AdaBoost, KNN, Gradient Boosting, and Ridge Regression, across two stages: predicting GRI and then tourism demand. Finally, evaluation metrics such as MAE, RMSE, and R2 are used to assess performance, generating insights for policy and decision-making.
Finally, the present study is designed as a predictive machine learning framework rather than a causal inference study. Consequently, model performance metrics, feature importance rankings, and predictive associations should not be interpreted as evidence of causal effects between instability indicators and tourism demand.

4.1. Dataset Description

The dataset used for this research is a Saudi Arabia Tourism Dataset (2015–2024), obtained in [5], which contains structured tourism indicators at yearly and categorical levels. Key variables include YEARS, Tourists_Number (TN), Overnight_Stay (OS), Tourists_Spending (TS), Avg_Stay, Avg_Spending_Trip, Avg_Spending_Night, Province, and Tourism_Type. These variables collectively represent both demand-side (tourist arrivals, spending) and behavioral aspects (length of stay, expenditure patterns). The dataset does not explicitly include geopolitical or security variables; therefore, the study relies on internal tourism dynamics to infer instability. The dataset is treated as a time-series structure, where temporal dependencies are critical. This enables the modeling of lag effects such as T N t 1 and growth rates T N t T N t 1 T N t 1 , forming the basis for both risk construction and demand prediction.

4.2. Dataset Preprocessing

This research applies preprocessing steps to ensure temporal consistency and model validity. First, the dataset is sorted chronologically by the YEARS column to preserve time-series structure. Missing values are handled through row removal after feature engineering (via dropna()), ensuring that lagged and rolling features are properly aligned. A chronological train-test split is implemented, where training data spans 2015–2021 and testing covers 2022–2024, avoiding data leakage common in random splits. No aggressive normalization is applied because tree-based models (e.g., Extra Trees, Gradient Boosting) are scale-invariant, although distance-based models like KNN implicitly rely on feature comparability. The preprocessing ensures that all derived variables (lags, volatility) are temporally valid and suitable for predictive modeling in both stages.

4.3. Feature Engineering

Feature engineering is the core component of the notebook, transforming raw tourism data into meaningful predictors. Several operations are applied, including lagging, differencing, growth rate computation, and volatility estimation. Typically, volatility is captured using rolling statistics. Additional features include shock indicators (e.g., COVID years) and interaction effects. Lag variables such as arrivals_lag1 and spending_lag1 are introduced to capture temporal dependencies. These engineered features serve as proxies for instability and behavioral changes, enabling the transformation of static tourism data into a dynamic system suitable for machine learning.
It is acknowledged that tourism volatility may arise from multiple interacting factors beyond geopolitical conditions, including macroeconomic changes, pandemics, climate variability, exchange-rate fluctuations, transportation accessibility, and destination policies. Due to dataset limitations, the current framework focuses on tourism-derived instability signals rather than a fully controlled causal specification. Therefore, the constructed index should be interpreted as a proxy instability indicator reflecting tourism-system uncertainty.

4.4. Target Construction

Two target variables are defined corresponding to the two modeling stages. The Stage 1 target is “Geopolitical Risk Index (GRI)”. Since geopolitical risk is not directly available, it is constructed using Equation (2), and introduces growth as presented in Equation (4):
G R I t = w 1 S D t + w 2 T N _ g r o w t h t + w 3 T S _ g r o w t h t
where S D t is the shock indicator, T N _ g r o w t h t is the tourist volatility, and T S _ g r o w t h t is the spending volatility. The Stage 2 target is “Tourism Demand”. The primary dependent variable is T D t = T N t . This design allows testing whether the engineered GRI improves the prediction of tourism demand, establishing a causal pathway between instability and tourism outcomes.

4.5. Experimental Overview

The experiment follows a two-stage machine learning pipeline. Stage 1 involves training models to predict the constructed Geopolitical Risk Index using engineered features. Whereas Stage 2 involves integrating the predicted or actual GRI to be used alongside other variables to predict “Tourists_Number (TN)”. Five algorithms are implemented, as described in Section 3.2.
The performance is evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R2 (R-square), also called the coefficient of determination, a statistical measure that indicates how well a model explains the variability of the dependent variable. The chronological split ensures realistic forecasting. The experiment demonstrates whether instability-derived features (GRI) improve tourism demand prediction, validating the conceptual model.

5. Experimental Analysis and Presentation of the Results

The experimental analysis report considered the outcome of feature engineering first, where the feature engineering process successfully transformed raw tourism variables into dynamic indicators capturing instability and behavioral patterns. Key features include lag variables such as growth rates and volatility measures derived from rolling standard deviations. Additionally, shock indicators (e.g., COVID period) were introduced to represent abrupt disruptions. These engineered features enabled the construction of a proxy Geopolitical Risk Index (GRI), effectively converting static tourism data into time-sensitive predictors reflecting fluctuations in tourist numbers and spending behavior.

5.1. Exploratory Analysis

The details of yearly trends within the datasets reveal clear patterns in risk, security perception, and tourism demand. Figure 3a,b provides an essential exploratory analysis for understanding the temporal behavior of the dataset before model development. Figure 3a, which presents the average geopolitical risk and Security Perception Proxy by year, reveals clear fluctuations corresponding to disruption periods, with higher risk values aligning with lower security perception. This inverse relationship validates the constructed proxy variables and confirms that volatility-based features meaningfully capture instability patterns.
Figure 3b, illustrating total tourism demand by year, shows noticeable declines during high-risk periods and recovery trends in more stable years. This pattern supports the hypothesized relationship between instability and tourism demand. These figures are necessary because they visually validate the feature engineering process, confirm temporal dependencies, and justify the inclusion of risk proxies in the predictive models. They ensure that the data exhibits meaningful patterns suitable for machine learning analysis.

5.2. The Results of the Prediction of Geopolitical Risk

The results show that the Extra Trees Regressor performs better in predicting the Geopolitical Risk Index with the lowest MAE (0.168) and RMSE (0.209) and the highest R2 = 0.184. Other ensemble models like KNN (MAE = 0.180, RMSE = 0.211, R2 = 0.170) and AdaBoost (MAE = 0.181, RMSE = 0.211, R2 = 0.170) exhibit similar performance, but slightly lower. Gradient Boosting also does fairly well (MAE = 0.177, RMSE = 0.214, R2 = 0.147), but with less explanatory power (see Table 1). However, the relatively low R2 values for all models suggest that predicting the geopolitical risk is still an open challenge due to its indirect and complex nature. The Ridge Regression baseline performs poorly with a much higher MAE (0.270) and RMSE (0.315) and a negative R2 = −0.85, confirming that linear models are not suitable to capture instability-driven dynamics.
Figure 4a presents the scatter plot of actual versus predicted Geopolitical Risk Index values using the Extra Trees model. A moderate positive relationship is observed, indicating that the model captures general risk trends. However, the wide dispersion of points, particularly in the lower and mid-range values, suggests prediction uncertainty and aligns with the relatively low R2 (~0.18). This indicates that while the model detects overall patterns, it struggles with precise estimation due to the indirect construction of the risk index. Figure 4b illustrates the feature importance results from the Extra Trees model. The most influential predictors include lagged variables such as Tourists_Spending_lag1, overnight_Stay_lag1, and Tourists_Number_lag1, indicating that past tourism behavior strongly drives the constructed risk index. Temporal features (e.g., YEARS) and spending-related variables also contribute significantly, while regional variables have lower importance. This confirms that volatility and lag effects are key determinants in modeling geopolitical risk proxies.
The residual vs. predicted plot for the Stage 1 Tourism Instability Proxy prediction with the Extra Trees model is shown in Figure 5a. The residuals are spread around the zero-reference line, which indicates that the model is able to capture the general tendency of the instability proxy without severe systematic bias. However, there is evident dispersion in the mid and high prediction regions, which implies that prediction uncertainty increases for some tourism-instability conditions. This is in line with the relatively modest R2.
The pattern of residuals also indicates that the model does not simply memorize the data, as residuals are scattered and not perfectly clustered around zero. A few larger negative residuals are observed at higher predicted values, indicating occasional over-estimation during times sensitive to instability. However, the lack of strong linear residual structures indicates that the ensemble model is able to capture the embedded nonlinear relationships among the tourism-system variables. In conclusion, the residual analysis indicates moderate predictive power, but also points out the inherent uncertainty in modeling tourism-instability dynamics.
Figure 5b shows the histogram of the residuals for the prediction of the Stage 1 Tourism Instability Proxy. The residuals are centered around zero, indicating the model predictions are well balanced with no large systematic under-prediction or over-prediction. Most of the residuals are moderate, suggesting that the prediction errors are still reasonably well-behaved despite the complexity of the construction of the instability proxy. The histogram also shows that the Extra Trees model gives relatively stable prediction behavior for the evaluated tourism periods.
However, the distribution shows some slight asymmetries and wider tails in some regions, which is indicative of occasional larger deviations of the predictions during periods of high market volatility. This observation is not surprising as the instability proxy is indirectly constructed from the tourism-system dynamics rather than directly capturing geopolitical events. Hence, the residual spread reflects the noise and partial endogeneity of the constructed proxy. Notwithstanding these limitations, the residual distribution confirms that the model is still able to keep an acceptable predictive consistency and offers interesting exploratory insights into the tourism-instability behavior.

5.3. The Results of the Prediction of Tourism Demand

In tourism demand prediction, the Extra Trees Regressor again achieves the best performance, with an R2 = 0.577, MAE = 1120.573, and RMSE = 2749.096, indicating strong explanatory power. KNN also performs well with an R2 = 0.485, MAE = 1287.985, and RMSE = 3035.790, followed by AdaBoost (R2 = 0.462, MAE = 1262.692, RMSE = 3102.071) (see Table 2). Gradient Boosting shows slightly lower performance (R2 = 0.429, MAE = 1387.188, RMSE = 3195.806), while the Ridge baseline records the weakest results with R2 = 0.424, MAE = 1713.186, and RMSE = 3208.847. These results confirm that tourism demand is highly nonlinear and influenced by multiple interacting factors, including the engineered GRI. The higher predictive performance observed in Stage 2 suggests that the engineered instability proxy is associated with tourism demand patterns within the predictive modeling framework.
Figure 6a illustrates the scatter plot of actual versus predicted tourism demand using the Extra Trees model. A strong positive relationship is observed, indicating that the model captures the general trend of tourism demand effectively. Compared to Stage 1, the predictions are more closely aligned with actual values, particularly in the lower and mid-range demand levels. However, some dispersion remains at higher values, suggesting slight underestimation or overestimation during peak demand periods. This aligns with the relatively high R = 0.58, confirming improved predictive performance. Figure 5b presents the feature importance results, highlighting that Province_Makkah is the most influential predictor, followed by Geopolitical Risk Index and Security Perception Proxy. Lag variables such as security_t_x_arrivals_lag and Overnight_Stay_lag1 also contribute significantly. This confirms that both location-specific factors and the engineered risk-related variables play a critical role in explaining tourism demand variations.
Figure 7a shows the residual versus predicted plot for the Stage 2 tourism demand prediction using the Extra Trees model. The residuals are generally distributed around the zero-reference line, i.e., the model captures the overall trend of tourism demand with relatively stable predictive behavior. Compared to Stage 1, the residuals are more concentrated around zero at low and moderate prediction levels, which is a reflection of the better predictive performance obtained in tourism demand forecasting. This is in agreement with the higher R2. value found for Stage 2, indicating a better explanatory power of the ensemble-based model. However, at higher predicted demand values, several large residual deviations are visible, suggesting that the model sometimes underestimates or overestimates tourism demand during peak tourism periods or under disruption-sensitive conditions. The increasing residual dispersion for larger prediction ranges could be an indication of the existence of non-linear volatility and time fluctuations of the tourism system. However, such variations do not indicate a severe systematic residual structure, which indicates that the model captures important tourism-demand dynamics with acceptable generalization performance.
Figure 7b shows the residual distribution histogram of the prediction of the tourism demand in Stage 2. The residuals are very concentrated around zero, i.e., most of the predictions of tourism demand have relatively small forecasting errors. This suggests that the Extra Trees model performs stable and accurate predictions on a large portion of the dataset. The concentration of residuals near the center confirms the robustness of the ensemble learning framework to model the tourism demand behavior in different tourism conditions. However, the distribution of residuals is also positively skewed, with some large positive residuals stretching into the higher error ranges. This suggests that some periods of tourism demand—especially in years of high demand or high volatility—are still difficult to predict accurately. Such behavior is expected in tourism forecasting because tourism demand is affected by several economic, behavioral, seasonal, and uncertainty-sensitive factors that interact. Overall, the residual distribution confirms the proposed framework has a strong predictive capability while still reflecting the inherent complexity of tourism-demand dynamics.
The model comparison results demonstrate distinct performance differences between Stage 1 (risk prediction) and Stage 2 (tourism demand prediction). In Stage 1, predicting the Geopolitical Risk Index remains challenging, with relatively low explanatory power across all models. The Extra Trees model performs best with R2 = 0.184, MAE = 0.168, and RMSE = 0.209, followed closely by KNN (R2 = 0.170, MAE = 0.180, RMSE = 0.211) and AdaBoost (R2 = 0.170, MAE = 0.181, RMSE = 0.211) (see Table 3). Gradient Boosting shows slightly lower performance (R2 = 0.147), while the Ridge baseline performs poorly with a negative R2 = −0.846, confirming that linear models cannot capture the nonlinear nature of the engineered risk index.
In Stage 2, predicting tourism demand yields significantly improved results. The Extra Trees model again achieves the best performance, with R2 = 0.577, MAE = 1120.57, and RMSE = 2749.10, indicating strong predictive capability. Other models, such as Gradient Boosting (R2 = 0.485), KNN (R2 = 0.462), and AdaBoost (R2 = 0.429), perform moderately well, while Ridge regression remains the weakest (R2 = 0.424). Overall, these results confirm that while geopolitical risk is inherently difficult to predict, it plays a significant role in explaining tourism demand, and ensemble-based models are most effective in capturing these complex relationships.

6. Discussion

In the face of the increasing complexity of global tourism dynamics, particularly in the context of geopolitical uncertainty, data-driven approaches are needed to support efficient analysis and forecasting. Traditional tourism models are often based upon external indices or linear assumptions, which ignore non-linear volatility and behavioral responses. Understanding demand fluctuations is critical in the context of Saudi Arabia’s Vision 2030, where tourism is a key pillar of the economy. However, the challenge remains the lack of explicit geopolitical indicators in tourism datasets. This research justifies the need to build proxy-based indicators such as the Geopolitical Risk Index (GRI) and integrate them into machine learning frameworks to increase predictive accuracy and support evidence-based decision-making.

6.1. Principal Findings

The results demonstrate that Extra Trees Regressor achieves the best performance in both stages. In risk prediction, it records R2 = 0.184, MAE = 0.168, and RMSE = 0.209, outperforming KNN (R2 = 0.170) and AdaBoost (R2 = 0.170). However, the low R2 values indicate that geopolitical risk is difficult to predict due to its indirect construction. In tourism demand prediction, performance improves significantly, with Extra Trees achieving R2 = 0.577, MAE = 1120.57, and RMSE = 2749.10. These results are important as they confirm that while risk is complex to model, it substantially contributes to explaining tourism demand variations.
This research introduces a novel approach to constructing a Geopolitical Risk Index using internal tourism variables such as volatility, growth rates, and shock indicators. Unlike existing studies that rely on external geopolitical indices [1,3], this approach transforms observable tourism behavior into a measurable proxy for instability. This contribution aligns with the need for localized and dataset-specific risk modeling, particularly in contexts where external risk data is unavailable. By leveraging tourism indicators, the study provides a scalable and adaptable framework for modeling geopolitical uncertainty within tourism datasets.
The study demonstrates the effectiveness of machine learning algorithms, particularly ensemble models, in capturing nonlinear relationships between risk and tourism demand. Consistent with prior work highlighting the superiority of ML approaches in tourism forecasting [4,5], this research confirms that models such as Extra Trees and Gradient Boosting outperform traditional linear models. The integration of engineered features, including lag variables and volatility measures, enhances predictive performance and provides a comprehensive framework for tourism analytics. This contribution strengthens the shift toward data-driven and AI-based tourism modeling.
The research empirically validates the conceptual relationship between geopolitical risk and tourism demand. While Stage 1 results show that risk is difficult to predict, Stage 2 results confirm that it significantly improves demand prediction. This finding supports existing literature emphasizing the impact of geopolitical risk on tourism behavior [1,3]. By demonstrating this relationship using real data and machine learning models, the study provides practical evidence that risk-related variables are essential for understanding tourism fluctuations, particularly in regions exposed to geopolitical uncertainty.

6.2. Comparative Findings with Previous Research

The comparative results highlight clear differences in modeling objectives, data scale, and predictive performance across studies. Prior works such as Alsulami et al. [2] report very high accuracy (MAE = 0.0133, MSE = 0.0007, R2 = 0.9601, R2 = 0.9601, R2 = 0.9601) and RF/GB variants (R2 = 0.8736, R2 = 0.8736, R2 = 0.8736), largely due to normalized datasets and controlled settings (see Table 4). Similarly, Dimitriadou et al. [9] achieved R2 = 0.9, R2 = 0.9, R2 = 0.9, while hybrid deep learning models [14] reported RMSE ≈ 22 and R2 = 0.85, R2 = 0.85, R2 = 0.85. However, these studies do not incorporate geopolitical risk modeling.
In contrast, this study introduces a more complex and realistic framework. While Stage 1 shows modest performance (R2 = 0.184, R2 = 0.184, R2 = 0.184, MAE = 0.168), Stage 2 achieves strong results (R2 = 0.577, R2 = 0.577, R2 = 0.577, RMSE = 2749.1, MAE = 1120.57) on real-scale tourism data. This study outperforms previous work conceptually by integrating an engineered Geopolitical Risk Index and demonstrating its predictive contribution to tourism demand, which is absent in prior studies.

6.3. Limitations and Recommendations for Future Work

Despite its contributions, the study has several limitations. First, the Geopolitical Risk Index is indirectly constructed from tourism variables, which may not fully capture real-world geopolitical dynamics such as political conflicts or international relations. Second, the dataset is limited to structured tourism indicators and does not include external variables such as economic conditions, weather data, or global risk indices. Third, the relatively low R2 in risk prediction indicates that the proxy may not fully represent the underlying complexity of geopolitical instability. Finally, the analysis is based on a single-country dataset, limiting generalizability.
Another limitation is that the proposed GRI is partially endogenous to tourism-system behavior because it is constructed from tourism-related volatility indicators. Consequently, the framework may reflect autoregressive tourism instability patterns rather than purely exogenous geopolitical shocks. Future studies should integrate external geopolitical indices, economic uncertainty measures, and security datasets to improve the independence and interpretability of the risk construct.
The present study is designed as an exploratory predictive analytics framework rather than a causal inference study. Consequently, machine learning performance metrics, feature importance measures, and predictive relationships should not be interpreted as evidence of causal effects. Future studies employing exogenous shocks, event-study analysis, VAR frameworks, Granger causality testing, or quasi-experimental designs may provide stronger causal interpretation.
Future research should focus on integrating external datasets, such as global geopolitical risk indices, weather conditions, and economic indicators, to enhance model accuracy. Incorporating real-time data sources, including social media sentiment and search trends, can further improve the representation of security perception and tourist behavior. Advanced deep learning models, such as LSTM and transformer-based architectures, can be explored to better capture temporal dependencies. Additionally, extending the framework to multi-country datasets would improve generalizability and enable comparative analysis. Finally, combining machine learning with structural equation modeling (SEM) could provide deeper insights into causal relationships.

7. Conclusions

This study presents a comprehensive machine learning framework for analyzing the relationship between geopolitical risk and tourism demand using the Saudi Arabia Tourism Dataset (2015–2024). By addressing the absence of direct geopolitical variables, the research introduces a novel Geopolitical Risk Index constructed from tourism-based indicators such as volatility, growth rates, and shock events. This approach enables the transformation of abstract geopolitical uncertainty into measurable and model-ready features. The experimental results demonstrate that ensemble-based machine learning models, particularly Extra Trees Regressor, consistently outperform other algorithms in both risk prediction and tourism demand forecasting. While the prediction of geopolitical risk remains challenging, as reflected by relatively low R2 values, the inclusion of the engineered risk index significantly improves tourism demand prediction. The findings suggest that instability-sensitive tourism indicators may provide useful predictive information for tourism demand modeling. Furthermore, the study highlights the nonlinear and complex nature of tourism systems, where demand is influenced by multiple interacting factors, including behavioral responses and temporal dependencies. The findings emphasize that machine learning models are well-suited for capturing such complexity, offering superior performance compared to traditional linear approaches. Finally, this research contributes to the growing body of knowledge on data-driven tourism analytics by integrating feature engineering, machine learning, and conceptual modeling. It provides practical insights for policymakers and tourism planners, particularly in managing uncertainty and supporting sustainable tourism development under evolving geopolitical conditions.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset for this study is available online, and it is also cited in the paper; the link to the dataset is provided in the reference list.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Kiani, D. Tourism as a Catalyst for Economic Diversification in Saudi Arabia: Vision 2030 and Beyond. J. Econ. Financ. Account. Stud. 2026, 8, 13–25. [Google Scholar] [CrossRef]
  2. Alsulami, A.G.; Alharbi, A.; Khan, U.A.; Al-Hejri, A.M.; Alshamrani, S.S.; Almotiri, J. Predicting tourism growth in Saudi Arabia with machine learning models for vision 2030 perspective. Sci. Rep. 2026, 16, 2556. [Google Scholar] [CrossRef] [PubMed]
  3. Louati, A.; Louati, H.; Alharbi, M.; Kariri, E.; Khawaji, T.; Almubaddil, Y.; Aldwsary, S. Machine learning and artificial intelligence for a sustainable tourism: A case study on Saudi Arabia. Information 2024, 15, 516. [Google Scholar] [CrossRef]
  4. Alzahrani, A.; Alshehri, A.; Alamri, M.; Alqithami, S. AI-driven innovations in tourism: Developing a hybrid framework for the Saudi tourism sector. AI 2025, 6, 7. [Google Scholar] [CrossRef]
  5. Tooba, I.K. Saudi Arabia Tourism Dataset (2015–2024). Kaggle 2024. Available online: https://www.kaggle.com/datasets/toobaik/saudi-arabia-tourism-dataset-20152024 (accessed on 25 April 2026).
  6. Hailemariam, A.; Ivanovski, K. The impact of geopolitical risk on tourism. Curr. Issues Tour. 2021, 24, 3134–3140. [Google Scholar] [CrossRef]
  7. Papagianni, E.; Evgenidis, A.; Tsagkanos, A.; Megalooikonomou, V. Tourism demand in the face of geopolitical risk: Insights from a cross-country analysis. J. Travel Res. 2024, 63, 2094–2119. [Google Scholar] [CrossRef]
  8. Huo, H.; Li, Q. Influencing factors of the continuous use of a knowledge payment platform—Fuzzy-set qualitative comparative analysis based on triadic reciprocal determinism. Sustainability 2022, 14, 3696. [Google Scholar] [CrossRef]
  9. Dimitriadou, A.; Gogas, P.; Papadimitriou, T. Tourism and uncertainty: A machine learning approach. Curr. Issues Tour. 2025, 28, 2278–2298. [Google Scholar] [CrossRef]
  10. Qin, F.; Bi, J.W.; Li, H.; Xu, H. Tourism demand forecasting using social media data: A deep learning–based ensemble model with social media communication conversion rates. Ann. Tour. Res. 2025, 115, 104058. [Google Scholar] [CrossRef]
  11. Fu, X.; Qin, Y. A Hybrid Machine Learning Framework for Tourism Demand Forecasting. Discov. Artif. Intell. 2026, 6, 63. [Google Scholar] [CrossRef]
  12. Huang, J.; Zhang, C. Daily tourism demand forecasting with the iTransformer model. Sustainability 2024, 16, 10678. [Google Scholar] [CrossRef]
  13. Lee, G.C. A Data-Driven Approach to Tourism Demand Forecasting: Integrating Web Search Data into a SARIMAX Model. Data 2025, 10, 73. [Google Scholar] [CrossRef]
  14. Zhang, Y.; Tan, W.H.; Zeng, Z. Tourism demand forecasting based on a hybrid Temporal neural network model for sustainable tourism. Sustainability 2025, 17, 2210. [Google Scholar] [CrossRef]
  15. Chen, J.; Li, C.; Huang, L.; Zheng, W. Tourism demand forecasting: A deep learning model based on spatial-temporal transformer. Tour. Rev. 2025, 80, 648–663. [Google Scholar] [CrossRef]
  16. Sun, K.; Chi, J.; Tao, M.; Saadaoui, J. What are the Implications of Geopolitical Risks on Travel and Leisure Firms? Financ. Res. Lett. 2025, 91, 109419. [Google Scholar] [CrossRef]
  17. Raheem, I.D.; le Roux, S. Geopolitical risks and tourism stocks: New evidence from causality-in-quantile approach. Q. Rev. Econ. Financ. 2023, 88, 1–7. [Google Scholar] [CrossRef]
  18. Hu, M.; Li, M.; Chen, Y.; Liu, H. Tourism forecasting by mixed-frequency machine learning. Tour. Manag. 2025, 106, 105004. [Google Scholar] [CrossRef]
  19. Georgakis, A.; Profilidis, V.; Botzoris, G.N. Forecasting tourism demand using search engine data. In Proceedings of the 10th International Congress on Transportation Research (ICTR2021), Rhodes, Greece, 1–3 September 2021. [Google Scholar]
  20. Alshammari, B.; South, R.B.; Raleigh, K. Saudi Arabia outbound tourism: An analysis of trends and destinations. J. Policy Res. Tour. Leis. Events 2025, 17, 824–846. [Google Scholar] [CrossRef]
  21. OECD. OECD Tourism Trends and Policies 2024. Available online: https://www.oecd.org/en/publications/oecd-tourism-trends-and-policies-2024_80885d8b-en/full-report/component-57.html (accessed on 26 April 2026).
  22. Mastelini, S.M.; Nakano, F.K.; Vens, C.; de Leon Ferreira, A.C.P. Online extra trees regressor. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 6755–6767. [Google Scholar] [CrossRef] [PubMed]
  23. Sudhamathi, T.; Perumal, K. Ensemble regression based Extra Tree Regressor for hybrid crop yield prediction system. Meas. Sens. 2024, 35, 101277. [Google Scholar] [CrossRef]
  24. Antolini, F.; Cesarini, S. Predicting Domestic Tourists’ Length of Stay in Italy leveraging Regression Decision Tree Algorithms. Electron. J. Appl. Stat. Anal. 2024, 17, 621. [Google Scholar] [CrossRef]
  25. Ozaslan, I.N.; Degirmenci, A.; Karal, O. Tourism demand forecasting for Turkey by using adaboost algorithm. In 2022 Innovations in Intelligent Systems and Applications Conference (ASYU); IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
  26. He, M.; Qian, X. Forecasting tourist arrivals using STL-XGBoost method. Tour. Econ. 2026, 32, 408–436. [Google Scholar] [CrossRef]
  27. Tsai, J.K.; Hung, C.H. Improving AdaBoost classifier to predict enterprise performance after COVID-19. Mathematics 2021, 9, 2215. [Google Scholar] [CrossRef]
  28. Suzanti, I.O.; Kamil, F.I.; Rochman, E.M.S.; Azis, H.; Suni, A.F.; Rachman, F.H.; Solihin, F. Imbalanced Text Classification on Tourism Reviews using Ada-boost Naïve Bayes. J. ELTIKOM J. Tek. Elektro Teknol. Inf. Dan Komput. 2025, 9, 91–97. [Google Scholar] [CrossRef]
  29. Novita, D. Comparison of K-Nearest Neighbor and Neural Network for Prediction International Visitor in East Java. BAREKENG J. Ilmu Mat. Dan Terap. 2024, 18, 2070. [Google Scholar] [CrossRef]
  30. Webb, T.; Lee, M.; Schwartz, Z.; Vouk, I. Beyond accuracy: The advantages of the k-nearest neighbor algorithm for hotel revenue management forecasting. Tour. Econ. 2024, 30, 1216–1236. [Google Scholar] [CrossRef]
  31. Nagy, B.; Oltean, F.D.; Gabor, M.R. Entrepreneurship or Resources for a Better Tourism? AK Nearest Neighbors and Decision Tree Dynamic Analysis for Romania. 2023. Available online: http://hdl.handle.net/20.500.14044/34995 (accessed on 29 April 2026).
  32. Tapak, L.; Abbasi, H.; Mirhashemi, H. Assessment of factors affecting tourism satisfaction using K-nearest neighborhood and random forest models. BMC Res. Notes 2019, 12, 749. [Google Scholar] [CrossRef] [PubMed]
  33. Sattayanuchit, W.; Wongkhunnen, W.; Chaisaengpratheep, N. Application of Gradient Boosting Regression to Evaluate the Impact of Family Tourism on Children’s Experiential Learning in Nakhon Ratchasima Province. J. Multidiscip. Acad. Res. Dev. (JMARD) 2025, 7, 35–53. [Google Scholar]
  34. Anshori, M.Y.; Katias, P.; Herlambang, T.; Meutia, N.S.; Othman, Z.B.; Azmi, M.S. Predicting hotel revenue using gradient boosting regression and support vector regression: A comparative analysis. J. Revenue Pricing Manag. 2025, 1–10. [Google Scholar] [CrossRef]
  35. Wang, Y. Evaluation of Tourist Satisfaction Based Light Gradient Boosting Machine Technique. In 2024 International Conference on Intelligent Algorithms for Computational Intelligence Systems (IACIS); IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar] [CrossRef]
  36. Liu, C. Scenic area data analysis based on NLP and ridge regression. In 2021 IEEE International Conference on Electronic Technology, Communication and Information (ICETCI); IEEE: Piscataway, NJ, USA, 2021; pp. 270–277. [Google Scholar] [CrossRef]
  37. Hu, X.; Li, W. Group penalized Poisson regressions forecast daily tourism demand. Int. J. Mach. Learn. Cybern. 2026, 17, 87. [Google Scholar] [CrossRef]
  38. Vasenska, I. Comparative analysis of machine learning and deep learning models for tourism demand forecasting with economic indicators. FinTech 2025, 4, 46. [Google Scholar] [CrossRef]
Figure 1. The proposed conceptual model.
Figure 1. The proposed conceptual model.
Sustainability 18 05503 g001
Figure 2. The research methodology flow.
Figure 2. The research methodology flow.
Sustainability 18 05503 g002
Figure 3. Visualize yearly risk and security perception (a) and tourism demand (b).
Figure 3. Visualize yearly risk and security perception (a) and tourism demand (b).
Sustainability 18 05503 g003
Figure 4. (a) The scatter plot of the geopolitical risk index, (b) the feature importance.
Figure 4. (a) The scatter plot of the geopolitical risk index, (b) the feature importance.
Sustainability 18 05503 g004
Figure 5. Residual distribution of Stage 1 Tourism Instability Proxy prediction.
Figure 5. Residual distribution of Stage 1 Tourism Instability Proxy prediction.
Sustainability 18 05503 g005
Figure 6. (a) The scatter plot of tourism demand, (b) the feature importance.
Figure 6. (a) The scatter plot of tourism demand, (b) the feature importance.
Sustainability 18 05503 g006
Figure 7. The residual versus predicted (a) the tourism demand distribution (b).
Figure 7. The residual versus predicted (a) the tourism demand distribution (b).
Sustainability 18 05503 g007
Table 1. The training result of Geopolitical Risk.
Table 1. The training result of Geopolitical Risk.
ModelMAERMSER2
Extra Trees0.1684680.2094920.184437
KNN0.1803020.2112890.170379
AdaBoost0.1805970.2113340.170026
Gradient Boosting0.1768620.2141850.147484
Ridge Baseline0.2704140.315196−0.84623
Table 2. The training result of Tourism Demand.
Table 2. The training result of Tourism Demand.
ModelMAERMSER2
Extra Trees1120.5732749.0960.577487
KNN1287.9853035.790.484767
AdaBoost1262.6923102.0710.462023
Gradient Boosting1387.1883195.8060.42902
Ridge Baseline1713.1863208.8470.424351
Table 3. Model comparison.
Table 3. Model comparison.
TaskModelMAERMSER2
risk_predictionExtra Trees0.1684680.2094920.184437
risk_predictionKNN0.1803020.2112890.170379
risk_predictionAdaBoost0.1805970.2113340.170026
risk_predictionGradient Boosting0.1768620.2141850.147484
risk_predictionRidge Baseline0.2704140.315196−0.846233
tourism_demand_predictionExtra Trees1120.572749.10.577487
tourism_demand_predictionGradient Boosting1287.993035.790.484767
tourism_demand_predictionKNN1262.693102.070.462023
tourism_demand_predictionAdaBoost1387.193195.810.42902
tourism_demand_predictionRidge Baseline1713.193208.850.424351
Table 4. The performance comparison with previous studies.
Table 4. The performance comparison with previous studies.
Ref.StudyModelMAERMSE/MSER2
[2]Saudi Tourism MLEnsemble Voting0.0133MSE = 0.00070.9601
RF/GB variants0.03390.8736
[3]Saudi Tourism SpendingRF/KNN0.05879-
[9]Tourism and UncertaintyGBT-15,463.200.9
[12]iTransformerTransformer14.115219.2557-
[14]Hybrid NNhybrid BiLSTM18–19220.85
This Study (Stage 1)GRI PredictionExtra Trees0.1680.2090.184
This Study (Stage 2)Tourism DemandExtra Trees1120.572749.10.577
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zamzami, I.F. Machine Learning-Based Tourism Demand Prediction Using Tourism Instability Indicators. Sustainability 2026, 18, 5503. https://doi.org/10.3390/su18115503

AMA Style

Zamzami IF. Machine Learning-Based Tourism Demand Prediction Using Tourism Instability Indicators. Sustainability. 2026; 18(11):5503. https://doi.org/10.3390/su18115503

Chicago/Turabian Style

Zamzami, Ikhlas Fuad. 2026. "Machine Learning-Based Tourism Demand Prediction Using Tourism Instability Indicators" Sustainability 18, no. 11: 5503. https://doi.org/10.3390/su18115503

APA Style

Zamzami, I. F. (2026). Machine Learning-Based Tourism Demand Prediction Using Tourism Instability Indicators. Sustainability, 18(11), 5503. https://doi.org/10.3390/su18115503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop