Previous Article in Journal
Does Virtual Reality Foster On-Site Visit Intentions? A Stimulus–Organism–Response Analysis of Cultural Heritage Tourism in Macao
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Environmental Sustainability Indicators and International Tourism Demand: Evidence from Machine Learning and SHAP Analysis

1
Faculty of Economics and Administrative Sciences, Akdeniz University, 07100 Antalya, Türkiye
2
Faculty of Economics and Administrative Sciences, Süleyman Demirel University, 32260 Isparta, Türkiye
3
Faculty of Communication, Süleyman Demirel University, 32260 Isparta, Türkiye
*
Author to whom correspondence should be addressed.
Tour. Hosp. 2026, 7(6), 170; https://doi.org/10.3390/tourhosp7060170
Submission received: 24 April 2026 / Revised: 4 June 2026 / Accepted: 9 June 2026 / Published: 11 June 2026

Abstract

This study evaluates the demand dynamics of the 20 leading strategic destinations in the global tourism market by modeling the interactions between traditional macroeconomic determinants and climate-linked environmental sustainability indicators. The primary objective is to assess the predictive capacity of physical and structural environmental factors—including water stress, air pollution, renewable energy adoption, and sanitation infrastructure—relative to established economic metrics like GDP per capita. Employing non-parametric predictive frameworks on a panel dataset of 400 observations (2000–2019), the empirical analysis suggests that tree-based ensemble models, notably Extra Trees (90.54%) and CatBoost (84.75%), yield higher predictive accuracy than conventional multiple linear regression (73.97%). Interpretations derived from cooperative game theory via SHAP analysis suggest that environmental determinants may serve as important predictive drivers of tourism demand. Specifically, variables such as water stress (28.20%), renewable energy share (27.12%), and sanitation infrastructure carry substantial predictive weight, whereas the benchmark macroeconomic indicator (2.30%) exerts a relatively marginal influence within the model architecture. These findings imply that environmental sustainability metrics may capture international tourism demand variations more effectively than traditional economic variables. The results suggest that acute environmental vulnerabilities may be associated with reduced tourism inflows, potentially reflecting limitations in destination sustainability thresholds. Broadly, the evidence is consistent with the notion that contemporary global tourism demand may be increasingly interdependent with ecological resilience and low-carbon transition policies. It is important to note that the findings reported here reflect predictive associations derived from machine learning models and should not be interpreted as evidence of causal relationships.

1. Introduction

Beyond its well-documented ecological dimensions, global climate variations increasingly appear to act as a systemic stressor, with the potential to disrupt both macroeconomic resilience and sector-level continuity. Given its inherent reliance on environmental predictability, the tourism industry appears highly sensitive to these climate-induced disruptions, emerging as a sector notably exposed to physical risk vectors (Scott, 2021). Extant research indicates that while tourists possess significant potential to adapt to climatic changes—often resulting in a shift in destination choices rather than a complete cessation of travel—environmental degradation remains a critical factor (Gössling et al., 2006, 2012). The environmental factors like extreme temperature, pollution, and water scarcity are increasingly becoming important considerations for the flow of global tourists and have become crucial drivers of destination competitiveness (Dube, 2024; Zikirya et al., 2021).
In recent years, the economic effects of weather and climate disasters have been quite clear. Studies reveal that the economic losses from weather and climate extreme events over the past few decades in the European Union amount to €822 billion, with 25% of the losses recorded in the last four years, from 2021 to 2024. An increase in economic losses over the past few decades has been noticed, especially in the last four years, which were among the top five years of the highest annual economic losses since 1980. Considering the probability that extreme weather and climate events might keep increasing in severity in the future, it is expected that the economic effect of such extreme events would be more pronounced in terms of economic activities and services sectors (European Environment Agency [EEA], 2025). The above development suggests the need for an advanced approach for investigating the impact of physical risks associated with climate change on tourism demand.
The nexus between tourism demand and environmental degradation has predominantly been examined through conventional econometric frameworks, primarily focusing on the specific impacts of carbon emissions and energy-intensive variables (Katircioglu, 2014; Paramati et al., 2017). Despite these insights, the traditional preference for linear modeling approaches appears to provide a constrained view of the non-linear, stochastic dynamics interlinking tourism development, macroeconomic variables, and environmental stress (Dogan & Aslan, 2017). Recent findings indicate that the relationship between tourism development and the environment shows nonlinear features, possibly experiencing a change in regimes after exceeding particular thresholds of development (J. Zhang & Lu, 2022). The non-linear nature of the relationship, however, can be explained through the theory of the Environmental Kuznets Curve (EKC), which suggests that environmental performance measures could at first become worse and then get better as maturity rises (Grossman & Krueger, 1995; Stern, 2004). In relation to the discussion on tourism, this path suggests that although high levels of tourism operations could increase environmental pressures during the early phase of developing destinations, these trends seem in line with a possible shift through the adoption of sustainable energy sources, improved sanitation facilities, and sophisticated environmental management practices (Dogan & Aslan, 2017; Katircioglu, 2014; Paramati et al., 2017).
The main purpose of this research is to investigate the vulnerability issues at the macro-level in the top twenty tourism destinations across the world between the years 2000 and 2019. In this regard, this paper attempts to analyze the effect of such vulnerability issues on international tourism demand using state-of-the-art machine learning techniques. Rather than limiting our analysis based on economic indicators, we assess the predictive capability of various environmental factors that have not been sufficiently integrated into machine learning-based multivariate tourism demand forecasting frameworks, despite the existence of a growing body of literature linking tourism to climate, pollution, CO2 emissions, and water stress. It is acknowledged that these environmental indicators represent conceptually distinct dimensions—environmental pressure, energy transition, and development capacity—rather than a single unified sustainability construct. Their collective inclusion reflects the multidimensional nature of destination-level environmental vulnerability, while recognizing that each variable may operate through different mechanisms. Our analysis accounts for potentially non-linear and complex relationships among variables through a novel approach in which tree-based ensemble models are applied to a multi-country panel dataset to capture non-monotonic patterns between environmental indicators and tourism demand, complemented by SHAP-based interpretability analysis.
The main contributions of this study are threefold. First, it suggests that tree-based ensemble models may outperform conventional linear specifications in predicting international tourism demand across a multi-country panel. Second, it provides preliminary evidence that environmental sustainability indicators may carry greater predictive weight than traditional macroeconomic variables. Third, it applies SHAP-based interpretability analysis to transparently decompose the predictive contributions of environmental and economic variables within a black-box modeling framework.

2. Literature Review

From the existing literature based on the determinants of tourist demand and development, it appears that the orientation of research is moving towards sustainability and climatic aspects (Rutty et al., 2021). While international travelers have been known to positively impact the economy of any location (Y. Zhang & Ali, 2024), growth in tourist activity has been linked to a rise in energy consumption in destination economies, potentially resulting in increased environmental impact due to increasing use of fossil fuels (Trinajstić et al., 2022). From the empirical research conducted among the various regional groupings, such as E7 and MENA, it can be seen that growth in tourism puts significant strain on the environment (Onifade et al., 2022; Kuldasheva et al., 2023). In addition, there appears to be a nonlinear association between PM2.5 levels and tourism development, whereby the effect of tourism on air pollution may follow an inverted-U shaped trajectory consistent with the Environmental Kuznets Curve (EKC) hypothesis, with pollution rising during early stages of tourism growth before potentially declining at higher levels of tourism specialization (J. Zhang & Lu, 2022). Meanwhile, studies conducted in high-density tourist locations have indicated that declining environmental quality can serve as a limitation for future tourism activity (Özdemir & Tosun, 2023). Beyond specific threshold levels, the link between tourism and air pollution becomes non-linear, shifting from earlier conditions to more evident degradation patterns (J. Zhang & Lu, 2022). Hence, growth-oriented approaches that overlook environmental sustainability may represent a double-edged sword for long-term tourism demand. Environmental quality and resource management have long been recognized as key determinants of destination competitiveness (Dwyer & Kim, 2003; Hassan, 2000), suggesting that environmental sustainability indicators may carry independent explanatory power beyond traditional macroeconomic variables.
The effects of climate change and marginal effects of environmental variables (polluted air, water stress) on tourists’ preferences are explored empirically in the literature. Based on a meta-analysis of 290 elasticity coefficients of climate change from 34 studies, there appears to be an empirical connection between the climatic factors and tourists’ behavior worldwide, implying that due to climate change, tourists tend to move from the tropical, desert, and Mediterranean climates to temperate and continental areas (Zhou et al., 2024). Moreover, panel data econometric findings support this redistribution process, as they reveal that higher temperatures and changes in rainfall patterns impact both the quantity and timing of tourist arrivals in the impacted areas (Tonsakunthaweeteam et al., 2025; Yu et al., 2025). Heat stress in such thermally sensitive environments can lead to exceeding the thresholds for comfort travel, which might require a shift in the tourism season away from the summer period to the spring and fall seasons (Perry, 2006). Additionally, particulate matter (PM2.5) in the air has been regarded as an important component that shapes people’s decisions when traveling. According to a panel analysis of Chinese provinces, the presence of PM2.5, PM10, and SO2 pollution negatively impacts domestic tourists and inbound tourists, respectively (Zikirya et al., 2021). Meanwhile, a spatial econometric study conducted among 99 countries found out that air pollution impacts the level of international tourists through direct and indirect spillover effects to other countries within the neighborhood, indicating that pollution externalities could affect tourism competitiveness among neighboring destinations (Su & Lee, 2022). Earlier simulation modelling of global tourist flows similarly suggested that climate change may alter international tourism demand patterns, with income and population effects potentially exceeding climate-induced changes in the medium to long term (Hamilton et al., 2005).
In accordance with the goals of sustainability, the adoption of green strategies represents a problem-solving approach to the problem in scholarly literature. Tourism has been estimated to account for approximately 8% of global greenhouse gas emissions (Lenzen et al., 2018), a share projected to increase substantially by 2050 in the absence of structural policy interventions (Gössling & Peeters, 2015). The usage of renewable energy sources and technological advancements represent the main factors that can help decrease carbon dioxide emissions in the tourism sector (Guo & Chai, 2025; Omarova et al., 2025). Green investments aligned with Sustainable Development Goals (SDGs) have been shown to positively influence tourism demand, with SDGs serving both as a direct driver and as a moderating factor in economic growth–tourism relationships (Liu et al., 2025; Islam et al., 2026). Additionally, clean energy transitions—particularly solar energy adoption—have been empirically linked to higher tourism growth, suggesting that renewable infrastructure investments may enhance destination attractiveness (Tai & Javed, 2025). Nonetheless, uniform adoption cannot be assumed because whereas developed nations tend to implement sustainable management measures more actively, developing countries may adopt such changes at varying rates due to their higher environmental vulnerability and macroeconomic sensitivity (Shang et al., 2023; Nyiwul et al., 2024). Therefore, the effects of tourism and air transportation on PM2.5 emissions may require sector-specific policy interventions, given that these effects tend to follow nonlinear patterns depending on the regional and developmental context (Ergen & Aslan, 2026; J. Zhang & Lu, 2022). In this connection, climate indicators are often assessed under the assumption of linearity; therefore, their non-linearity and variability should be reconsidered (J. Zhang & Lu, 2022).
Early reviews of the tourism demand forecasting literature demonstrate that most empirical studies have relied predominantly on linear econometric specifications with a limited set of explanatory variables (Lim, 1997; Witt & Witt, 1995), with no single forecasting method consistently outperforming others across different contexts.
Collectively, the reviewed studies suggest that while the tourism–environment nexus has received growing attention, several methodological gaps remain. Most existing studies tend to rely on linear econometric specifications and single environmental indicators, potentially overlooking the complex interactions among multiple environmental stressors. More recent reviews suggest that while forecasting approaches have grown more diversified over time (Song et al., 2019; Song & Li, 2008), the integration of multi-dimensional environmental sustainability indicators into machine learning-based forecasting frameworks appears to remain limited.
Moreover, machine learning approaches that simultaneously incorporate both macroeconomic and multi-dimensional environmental variables appear to be underrepresented in the tourism demand forecasting literature (Dimitriadou et al., 2025; Fu & Qin, 2026; Law et al., 2019; Sun et al., 2019). The present study attempts to address these gaps by applying tree-based ensemble models alongside SHAP-based interpretability analysis to a multi-country panel dataset encompassing a broad range of environmental sustainability indicators.

3. Methodology and Research Design

3.1. Research Hypotheses

After conducting a comprehensive review of literature and developing a conceptual framework based thereon, the following research hypotheses have been formulated to examine the asymmetric impacts of climate change and environmental sustainability indicators on international tourism demand through machine learning models:
H1. 
Environmental sustainability indicators may possess substantial explanatory power in predicting international tourism demand.
The studies on environmental quality and tourism demand have shown that environmental attributes play an important role in determining tourism demand. Particularly, the evidence that suggests the existence of a marked fall in tourism demand following environmental degradation appears consistent with the role of environmental attributes in tourism demand (Özdemir & Tosun, 2023). On the other hand, studies that show the possibility of macroeconomic growth and tourism activities contributing to environmental degradation indicate that economic attributes alone may not be adequate to explain the dynamics of sustainable tourism. For instance, the findings that economic growth, tourism income, and air cargo volume lead to the increase in PM2.5 among EUROCONTROL member nations suggest that economic growth figures alone may not capture the full extent of environmental threats (Ergen & Aslan, 2026). Also, the empirical evidence showing that environmental vulnerability is a statistically significant determinant of the adoption of sustainable tourism accounting tools indicates that environmental attributes hold independent explanatory power in sustainable tourism policy design, beyond what economic indicators such as GDP alone can account for (Nyiwul et al., 2024). In this study, H1 is assessed by examining the out-of-sample R2 scores of the models and by comparing the SHAP-based feature importance weights of environmental variables against the macroeconomic benchmark variable (GDP per capita).
H2. 
Environmental fragility indicators (water stress, air pollution, and rising temperatures) may negatively affect destination sustainability thereby being negatively associated with international tourism demand in a pattern that may be asymmetric in nature.
Environmental issues such as water scarcity and air pollution may negatively affect destination sustainability and resource availability, which could be associated with reduced demand for tourism. According to Olmedo (2026), Spanish tourism destinations are mostly found in locations where there is a lot of water scarcity, and the maximum tourism season occurs during the driest months of the year. Su and Lee (2022) also found that low air quality (PM2.5) leads to lower international tourists’ inflow and might cause negative spillovers on tourist movement within neighboring destinations. In addition, according to Zikirya et al. (2021), measures of air pollution, such as PM2.5, PM10, and SO2, can have negative impacts on tourism movement. Increasing temperatures and climate-related stress could also lower tourism revenues through changes in attractiveness perceptions and decision-making processes (Tonsakunthaweeteam et al., 2025). These patterns appear broadly consistent with the theoretical framework provided by the EKC hypothesis, which posits that non-linear dynamics and structural turning points may characterize the developmental–environmental relationship (Grossman & Krueger, 1995; Stern, 2004). It should be noted, however, that neither asymmetry nor threshold effects are formally tested in this study. Moreover, the link between tourism development and air pollution seems to exhibit a non-linear relationship; findings from cities in China reveal an inverted U-shaped relationship, which implies that environmental degradation occurs when tourism development is relatively low, whereas pollution might reduce beyond a certain level of tourism development (J. Zhang & Lu, 2022). In this study, H2 is assessed through the directional SHAP contributions of water stress, air pollution, and temperature anomaly variables. Negative SHAP values for high feature values of these variables are interpreted as consistent with a suppressive predictive association with tourist arrivals.
H3. 
Better use of renewable energy and sanitation facilities may positively influence international tourism demand. The use of renewable energy and better sanitation infrastructure may enhance destination attractiveness and stimulate tourism demand.
It seems that the utilization of solar energy is positively correlated with tourism growth. This hypothesis is based on empirical data from European nations’ panels, which show that for each additional one percent utilization of solar energy, there is an increase of around 0.37 percent in tourism demand (Tai & Javed, 2025). In addition, green innovation has also been shown to have a positive impact on tourism demand due to the contribution of its role in improving the competitiveness of the destination via sustainable technology and environmental initiatives (Islam et al., 2026). Moreover, the association between tourism development and environmental quality might not be linear, since better environmental conditions can enhance destination appeal. Regarding sanitation facilities, from an extensive study of different nations, it appears that unsatisfactory sanitation and hygiene conditions lower tourist satisfaction and negatively impact destination competitiveness, and hence betterment in this aspect may aid international tourism demand (Vašaničová & Melnyk, 2026). Altogether, based on these aspects, it seems that better use of renewable energy sources and sanitation facilities may positively influence international tourism demand, as stated in H3. In this study, H3 is assessed by examining whether high values of renewable energy share and sanitation infrastructure variables are associated with positive SHAP contributions in the best-performing model.

3.2. Research Design and Sample

The physical threats of climate change could pose a serious challenge to the sustainable operation of the tourism industry in addition to the economic consequences (Scott, 2021; Yu et al., 2025). In light of the fact that the tourism industry is highly reliant on weather and environmental infrastructure, this business sector has been found to be susceptible to the effects of climate change (Gössling et al., 2006; Dube, 2024). Prior research has demonstrated that in order to quantify climate variability, researchers often resort to using only one indicator related to temperature despite the absence of multiple environmental variables that shape tourists’ perceptions (Gössling et al., 2012; Ibragimov et al., 2022). Moreover, the non-linear relationship between tourism development and environmental outcomes further highlights the inadequacy of traditional linear models in capturing the complex interplay between tourism and climate-related variables (J. Zhang & Lu, 2022). Thus, the development of more comprehensive methodological frameworks may assist in overcoming existing limitations and facilitate an empirical evaluation of climate change adaptation (Jopp et al., 2010; Kaján & Saarinen, 2013).
The research sample comprises twenty prominent tourism destinations characterized by substantial international tourist flows: the USA, Germany, Austria, UAE, UK, China, France, the Netherlands, Spain, Italy, Japan, Canada, Malaysia, Mexico, Poland, Portugal, Saudi Arabia, Thailand, Türkiye, and Greece. These twenty destinations were selected on the basis of their consistently high volumes of international tourist arrivals as reported by the World Bank, representing the leading strategic destinations in the global tourism market, as well as the availability of sufficiently complete data across all study variables for the period 2000–2019. Country-level heterogeneity is addressed through the inclusion of country-specific fixed effects, transformed into dummy variables via One-Hot Encoding. This approach allows the models to account for time-invariant, destination-specific characteristics—such as geographic location, institutional quality, and cultural factors—that may systematically influence tourism demand but are not captured by the observed environmental and macroeconomic variables.
The study utilizes a balanced panel dataset consisting of 400 observations covering the period 2000–2019 for the 20 leading strategic destinations in the global tourism market. Missing observations accounted for approximately 1.53% of the total dataset (49 out of 3200 data points), concentrated in three variables: tourist arrivals (29 observations), sanitation infrastructure (13 observations), and water stress (7 observations). These gaps were filled using linear interpolation between adjacent observed values. No extrapolation beyond the observed range was performed. A systematic overview of the variables and their respective data sources is detailed in Table 1. Furthermore, to mitigate potential estimation bias, country-specific fixed effects (time-invariant characteristics) were transformed into dummy variables utilizing the One-Hot Encoding method.
Within the environmental risk framework, annual mean temperature anomalies serve as the primary climatic indicator. These climatological metrics were retrieved and computationally processed through the Google Earth Engine cloud-computing platform (Gorelick et al., 2017), utilizing the ECMWF ERA5-Land reanalysis database (Muñoz Sabater, 2019). For the empirical analysis, temperature anomalies were calculated as deviations from the historical climatological mean of the 1981–2010 reference period. Additionally, spatial aggregation was executed at the national level to maintain geographic consistency across the selected destinations.
The data on international tourist inflows, air pollution (PM2.5), water stress, renewable energy, sanitation, energy consumption, and GDP per capita were obtained from the World Bank database. The selection of variables was informed by prior studies examining the relationship between air pollution and tourism demand (Zikirya et al., 2021; Su & Lee, 2022) and those addressing the role of water-related pressures in shaping the vulnerability of tourist destinations (Perry, 2006; Dube, 2024). Furthermore, there are findings suggesting that the use of renewable energy sources could assist in mitigating carbon emissions in economies driven by tourism activities (Kuldasheva et al., 2023; Omarova et al., 2025). Beyond these thematic motivations, the inclusion of PM2.5, tourism income, and GDP as a combined set of indicators is supported by studies indicating that tourism activity and economic expansion are associated with rising particulate matter concentrations, underscoring the relevance of these variables in capturing tourism-induced environmental degradation (Ergen & Aslan, 2026).
It should be acknowledged that the environmental indicators employed in this study represent conceptually heterogeneous constructs. Water stress and air pollution (PM2.5) capture environmental pressure and degradation; renewable energy share reflects low-carbon energy transition; and sanitation infrastructure represents basic development and public health capacity. While these variables are collectively analyzed under the broad umbrella of environmental sustainability, they operate through distinct mechanisms and may influence tourism demand through different pathways. This heterogeneity is acknowledged as a conceptual limitation, and future research may benefit from analyzing these dimensions separately. It is worth noting that GDP per capita in the present study represents destination-level economic size rather than origin country income, which is the more conventional income proxy in tourism demand modelling (Lim, 1997; Rosselló Nadal & Santana Gallego, 2022).
While multicollinearity does not affect the predictive ability of machine learning models (Shmueli, 2010), VIF analysis was nonetheless conducted to ensure transparency and to address potential concerns regarding variable redundancy. All VIF values remained below the commonly accepted threshold of 5, with the highest values observed for water stress (3.90) and air pollution PM2.5 (3.88). Specifically, the VIF values were as follows: water stress (3.90), air pollution PM2.5 (3.88), energy usage (3.16), GDP per capita (2.79), sanitation infrastructure (2.00), renewable energy share (1.27), and temperature anomaly (1.06). It should be noted, however, that VIF captures only linear dependence among predictors and may not fully reflect non-linear associations present in the data. Furthermore, while multicollinearity does not compromise predictive accuracy, correlated predictors in tree-based models may distribute SHAP-based importance scores across related variables. The relative SHAP weights of water stress and air pollution PM2.5—the two variables with the highest VIF values—should therefore be interpreted with appropriate caution. The stability of the top environmental predictors across multiple random seeds, as reported in the robustness checks, provides additional confidence in the overall feature importance rankings.

3.3. Time-Series Split and Cross-Validation

Using machine learning approaches to time series and panel data frequently poses the danger of data leakage, where future data points may unknowingly impact the model training process. This issue can be mitigated by chronologically dividing the dataset to ensure effective out-of-sample generalization (Fu & Qin, 2026). In particular, the time frame ranging from 2000 to 2015 is considered to be a training dataset while the other time frame spanning from 2016 to 2019 can be regarded as a test one. This separation by periods seems to provide a more credible means to identify any existing changes in structure over time than only capturing any temporary changes in the factors affecting tourism demand (Dimitriadou et al., 2025). The implementation of an out-of-sample validation approach is crucial to evaluating the predictive accuracy and methodological credibility of the applied machine learning framework (Hastie et al., 2009). This methodology is intended to support the model’s capacity to capture potentially complex patterns between environmental risk indicators and international tourism demand. Employing such a validation strategy seems consistent with rigorous analytical standards in recent tourism forecasting literature (e.g., Chen et al., 2025), suggesting the algorithm’s potential to provide reliable and generalizable estimations beyond the initial training sample (James et al., 2021).
During the hyperparameter optimization (GridSearchCV) process, the use of conven-tional K-fold cross validation can affect the intrinsic temporal dependency in the data due to its random sampling approach in the domain of time-series datasets. This becomes an important concern considering the potentially complex and non-monotonic patterns in tourist demand fluctuations related to climatic factors (Gössling et al., 2012). Moreover, the connection between tourism growth and environmental degradation may follow complex, non-monotonic patterns (J. Zhang & Lu, 2022). Therefore, maintaining the chronological sequence of observations is considered important. To address these complexities and mitigate data leakage, an expanding-window 5-fold cross-validation protocol is implemented. By incrementally increasing the training window in each iteration, this approach facilitates a more robust evaluation of the model’s out-of-sample generalization performance. The optimal hyperparameters derived from this chronological validation strategy are detailed in Table 2.

3.4. Algorithm Selection and Constraints

The structural relationship between climate variability and tourism demand appears to exhibit a complex, non-linear trajectory, thereby necessitating careful methodological consideration. Empirical evidence suggests that tourist responses to environmental changes are inherently multifaceted and highly context-dependent, a dynamism that potentially confounds the explanatory power of strictly linear demand functions (Gössling et al., 2012). Furthermore, meteorological variables seem to exert varying influences across specific temperature thresholds, indicating that traditional linear specifications might fail to adequately capture the true nature of these climatic sensitivities (Ibragimov et al., 2022).
Some of the common measures that have gained acceptance in literature for evaluation are generally used for measuring the predictive ability and reliability of developed models (Kuhn & Johnson, 2013). For the purpose of evaluating models that use regression techniques, measures like MAE and RMSE, which reflect the level of error in prediction, give an idea of how much the predicted value varies from the actual value (James et al., 2021). Additionally, Mean Squared Error (MSE) is frequently utilized to penalize larger forecasting errors, thereby providing a more conservative estimation of model accuracy. Furthermore, the coefficient of determination, which represents the ratio of the variation in the response variable to that in the predictor variables, is used for assessing the fit of the model (Hastie et al., 2009). However, to account for potential overfitting as the number of independent variables (k) increases, the Adjusted metric is often incorporated to penalize unwarranted model complexity. The mathematical formulations used in the computation of these metrics are presented below:
R 2 = 1 i = 1 n y i y i ^ 2 i = 1 n y i y ¯ 2
Adjusted   R 2 = 1 1 R 2 n 1 n k 1
M S E = 1 n i = 1 n y i y i ^ 2
R M S E = 1 n i = 1 n y i y i ^ 2
M A E = 1 n i = 1 n y i y i ^
Here, n denotes the total number of observations, k represents the number of independent variables included in the model, yᵢ refers to the observed values of the dependent variable, ŷᵢ indicates the predicted values generated by the model, and ȳ corresponds to the arithmetic mean of the observed values.
In order to deal with the restrictions that come with strict parametric assumptions, MLR is used in combination with tree-based algorithms, namely, Random Forest, Extra Trees, and CatBoost. According to empirical cases, the relationship between tourism growth and the environment usually follows either a U-shape or inverted U-shape pattern, thus emphasizing the need for flexibility in the methodology employed (J. Zhang & Lu, 2022). Adding to this complexity, the potential for bidirectional causality across tourism demand, economic growth, and air pollution seems to exceed the analytical boundaries of standard linear specifications (Ergen & Aslan, 2026). Consequently, the integration of these machine learning techniques holds the potential to capture the complex, underlying dynamics more comprehensively than traditional approaches.
Environmental variables, such as water scarcity and air pollution, frequently demonstrate high levels of variation and outliers across various national settings. Moreover, studies reveal that the effect of air quality on tourism demand is highly geographically diverse, a situation that may undermine the accuracy of linear regression models (Su & Lee, 2022). Thus, in the current study, decision tree algorithms will be used due to their natural resistance to outliers and better ability to detect complex and non-linear relations. Specifically, the CatBoost algorithm will be preferred because of its natural ability to work with categorical data, enabling researchers to directly utilize dummy variables for each country.
The interpretation of the current models should be guided by specific boundary conditions related to spatiotemporal aggregation. Utilizing annualized data inherently restricts the observation of intra-year volatility and the rapid behavioral responses to transient weather extremes, such as heatwaves, droughts, and wildfires (Perry, 2006). This lack of high-frequency inputs thereby bounds the empirical framework’s sensitivity to acute climate shocks. In parallel, the spatial dimension of the dataset presents another structural constraint. Consolidating environmental indicators into national averages tends to obscure critical cross-sectional variations. Recognizing the strong spatial dependence and heterogeneity associated with air quality, macro-level metrics may not adequately capture localized spillover dynamics (Su & Lee, 2022). Evaluating the tourism–environment relationship strictly through aggregated data likely misses region-specific threshold effects, reinforcing the need for caution when extrapolating localized interactions from exclusively macro-level metrics (J. Zhang & Lu, 2022).
It is important to clarify that the machine learning models employed in this study are optimized for predictive accuracy rather than causal identification. Accordingly, SHAP values quantify the marginal contribution of each feature to the model’s predictions and do not establish causal relationships between environmental variables and tourism demand. The terms “effect” and “influence” used throughout this paper should be understood within this predictive context, not as claims of causal directionality. It is further acknowledged that the predictive framework adopted here does not account for potential reverse causality, whereby tourism activity may itself influence environmental conditions, nor for omitted variable bias or policy feedback effects. These limitations are inherent to the cross-sectional panel structure of the dataset and represent important avenues for future research employing dedicated causal identification strategies. It should be noted, however, that the study’s predictive rather than causal orientation means that reverse causality may not invalidate the findings per se; rather, it limits their causal interpretation. The environmental variables used as predictors may capture both direct environmental effects on tourism and indirect signals of broader destination quality, and disentangling these pathways represents an important avenue for future research.
ARIMA-based models were not incorporated as they are designed for univariate time series and cannot accommodate the multi-country panel structure of this study. Although XGBoost and LightGBM were also estimated during the model development process, they were excluded from the main analysis due to their inferior out-of-sample performance. Specifically, XGBoost yielded an R2 of 0.2229 and a negative adjusted R2 of −0.1806, suggesting substantial overfitting given the limited sample size, while LightGBM achieved an R2 of 0.6176 and Ridge Regression yielded an R2 of 0.7455—performance levels that provided no meaningful improvement over the benchmark models already included in the study. Future research employing larger datasets may benefit from revisiting these algorithms.
In the process of cross-validation, even though the expanding window technique is very useful for reducing data leakage, the drawback of having a limited sample in the first few iterations becomes quite evident. The limited sample at this stage poses risks for the development of complex tree-based models and their possible underfitting in the earlier stages of modeling. In addition, it must be emphasized that interactions among variables in the analysis of relationships between tourism demand, energy transition, and environmental metrics display significant heterogeneity on a spatial basis, which implies that robustness checks are imperative for this type of analysis (Tai & Javed, 2025). However, the stability of the parameters obtained through these models seems confirmed by their predictions in out-of-sample tests.

4. Results and Discussion

4.1. Descriptive Statistics

The descriptive statistics of the data used in developing machine learning models have been analyzed prior to examining their forecast accuracy. In essence, the empirical evidence used for this research is a balanced panel dataset consisting of 400 observations from 20 selected strategic tourism destinations over the period of 2000 to 2019. The descriptive statistics for the data are summarized in Table 3 below.
Table 3 shows the existence of structural heterogeneity among the independent variables in terms of macroeconomic and environmental variables. The use of logarithm to the dependent variable “international tourist arrivals” (mean value of 17.359) results in data values falling between 15.375 and 19.199. This seems like an effective method of transforming the data to ensure normality and reducing the influence of any outliers in the prediction models. At the same time, the macroeconomic variable of “per capita GDP” also shows asymmetrical variation with values falling between $969 and $64,746. Incorporating this wide range of variation in terms of tourism volumes and economic strengths is common in multi-country panels due to the fact that tourism behavior may differ widely according to the income cohorts (Khanna & Sharma, 2023).
The distribution properties of climate and environmental variables point to noticeable differences in sustainability scores among destinations. For example, the statistics of water stress, which shows an enormous standard deviation of 404.02 and a remarkable maximum value of 1866.67, along with notable deviations in air quality (PM2.5) and renewable energy ratios, show a deviation from normally distributed assumptions. These features imply structural complexity present in the dataset. Previous studies suggest this point, arguing that environmental factors can be considered heterogeneous and nonlinear in terms of influencing the tourism sector (Gössling et al., 2012; Su & Lee, 2022). As such, the identified features of the empirical dataset justify using algorithms that can map out the structural complexity in the data through flexible specification rather than following the strict assumptions regarding parameters and distributions.

4.2. Model Performance Results

Following hyperparameter optimization under time-series constraints of machine learning algorithms, the empirical performance of the models was evaluated using an external test set (2016–2019 period) that was not used during the training phase. In accordance with the sampling validity criteria emphasized in the literature, the predictive power of the models was compared using metrics such as the Coefficient of Determination (R2), Adjusted R2, Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Squared Error (MSE). Table 4 presents the comparative performance results of the four basic algorithms on the test set.
As detailed in Table 4, the Extra Trees algorithm exhibits the highest predictive efficacy, achieving an explanatory power of 90.5%. This relative superiority may stem from the algorithm’s inherent mechanism of randomizing node splits and feature selection, which may aid in filtering noise within the panel data. Consequently, this stochastic approach potentially mitigates overfitting vulnerabilities while facilitating a more flexible mapping of non-linear dependencies. As suggested in the literature, critical thresholds and non-linear dynamics may characterize the tourism–environment nexus, particularly when destination carrying capacities are exceeded (J. Zhang & Lu, 2022). It should be noted that while the EKC framework is broadly consistent with these patterns, it is not formally tested in the present study. Therefore, randomized tree-based ensemble methods appear well suited for capturing complex and potentially non-linear patterns in the data. Furthermore, the robust performance of the CatBoost model—yielding an RMSE of 0.3071—suggests the empirical utility of advanced gradient boosting frameworks. The algorithm’s native capability to internally process categorical features likely enhances its predictive accuracy by effectively controlling for dataset-specific structural heterogeneity.
A notable empirical observation in this analysis is the robust predictive performance of the benchmark Multiple Linear Regression (MLR) model. Interestingly, MLR (73.9%) outperformed the inherently flexible Random Forest algorithm (57.9%). This relative superiority implies that the underlying data structure may be predominantly driven by strong, monotonic linear trends rather than complex non-linear interactions. Such a dynamic appears consistent with the literature indicating that structural macroeconomic determinants—such as income levels, trade openness, and institutional stability—exert a foundational and direct influence on tourism demand, thereby reinforcing the explanatory validity of linear frameworks (Krasniqi et al., 2023; Liu et al., 2025). Furthermore, the sustained predictive power of these macroeconomic variables seems to corroborate recent evidence suggesting that general economic prosperity and sustainable development-oriented growth trajectories serve as primary catalysts for expanding tourism volumes (Islam et al., 2026; Y. Zhang & Ali, 2024).
The visual diagnostics presented in Figure 1 effectively corroborate the quantitative performance metrics detailed in Table 4. The superior predictive capabilities of the Extra Trees and CatBoost algorithms are visually apparent in the “Actual vs. Predicted” scatter plots, where a dense concentration of observation points aligns tightly along the 45-degree reference line, indicating a high degree of predictive accuracy. Furthermore, even across high-volume tourism destinations within the external test cohort, prediction errors remain relatively constrained. This suggests that these tree-based architectures successfully capture the complex time-series dynamics inherent in the panel data without succumbing to overfitting. Finally, an examination of the residual plots reveals that the error terms for both models exhibit a largely symmetrical and random dispersion around the zero axis. Such a distribution implies an absence of systematic bias and suggests that the fundamental assumptions of homoscedasticity are reasonably satisfied.
Diagnostic evaluations of the optimal models suggest no substantial evidence of heteroskedasticity or systematic bias based on visual inspection of residual plots, though formal statistical tests of these assumptions were not conducted. To further substantiate these findings, robustness checks employing second-generation panel data estimators were incorporated, a methodological standard widely supported in recent literature to ensure statistically reliable and unbiased predictions (Guo & Chai, 2025; Pata & Tanriover, 2023). Conversely, the residual distributions for both the MLR and Random Forest algorithms exhibit pronounced deviations from the reference line, particularly among high-demand observations. Such predictive discrepancies indicate that these models may struggle to fully capture the non-linear behavioral patterns and perceptual shifts characteristic of high-volume destinations. Theoretical precedents corroborate this limitation, suggesting that tourist responses to climatic variables are inherently non-linear, perception-driven, and strictly contingent upon specific meteorological thresholds (Gössling et al., 2012; Ibragimov et al., 2022).

4.3. Variable Importance and SHAP Analysis

While tree-based ensemble algorithms often exhibit superior predictive capabilities, they are frequently characterized as “black-box” models due to their inherent structural complexity, which typically precludes the direct interpretation of variable interactions through singular coefficients. To navigate this interpretability constraint and to empirically delineate the potential influence of climate change indicators on international tourism demand, this analysis incorporates global feature importance metrics derived from the top-performing Extra Trees and CatBoost architectures. Furthermore, to facilitate a more granular understanding, these metrics are augmented with SHapley Additive exPlanations (SHAP), a framework grounded in cooperative game theory. Given the algorithmic divergences between their respective learning mechanisms, Table 5 presents a comparative evaluation of the primary drivers shaping the predictive outputs of each model.
The percentage weights reported in Table 5 represent each variable’s normalized built-in feature importance score derived from the CatBoost model (PredictionValuesChange), which measures the average change in predicted values when a given feature is included, normalized to sum to 100%. These differ from the SHAP summary plots presented in Figure 2 and Figure 3, which capture the directional and magnitude-based contributions of each feature to individual predictions.
An examination of the variable importance metrics in Table 5 reveals that the predictive architecture of the Extra Trees algorithm is predominantly driven by country-specific fixed effects. This operational behavior resembles the mechanics of a fixed-effects estimator in traditional panel data econometrics, suggesting that the model accounts for variance primarily through historical, destination-specific tourism baselines. Conversely, the CatBoost framework—benefiting from its native capability to process categorical variables efficiently—appears more adept at integrating dynamic environmental and climatic determinants into its decision hierarchy. Consequently, formulating climate-related policy recommendations may rest on a more robust and transparent empirical basis when grounded in the analytical outputs of the CatBoost algorithm. In terms of hypothesis testing, H1 appears to be supported by the finding that environmental variables collectively account for a substantially larger share of predictive importance than GDP per capita in the CatBoost model. H2 appears to be supported by the negative SHAP contributions associated with high values of water stress and air pollution variables. H3 appears to be supported by the positive SHAP contributions associated with high values of renewable energy share and sanitation infrastructure.
According to the feature importance metrics derived from the CatBoost architecture, the variables with the highest predictive weights for the international tourist arrivals appear to be water stress (28.20%), renewable energy share (27.12%), and sanitation infrastructure (13.61%). Interestingly, the relatively marginal predictive contribution of GDP per capita (2.30%) suggests that environmental and infrastructural sustainability indicators possess substantial explanatory capacity regarding global tourism demand. It should be noted, however, that this finding is specific to the current model configuration and sample context. The relatively low predictive weight of GDP per capita within this framework does not imply that macroeconomic determinants are generally less important for tourism demand; rather, it suggests that within this particular multi-country panel and model architecture, environmental variables may provide additional predictive information beyond what GDP alone captures. This empirical outcome appears consistent with the theoretical premise that escalating environmental pressures may constrain the long-term sustainability of destinations (Gössling et al., 2006). Furthermore, contemporary evidence suggests that acute water scarcity, ecological fragility, and general infrastructural deficits can exert considerable downward pressure on tourism inflows (Dube, 2024). In this context, the prominent predictive role of sanitation infrastructure is further corroborated by recent literature highlighting basic sanitation and hygiene resilience as critical prerequisites for sustaining destination competitiveness and tourist confidence (Vašaničová & Melnyk, 2026). This interpretation appears broadly consistent with findings suggesting that environmental degradation may escalate in a non-linear fashion once specific thresholds of tourism development are breached (J. Zhang & Lu, 2022), which may partly account for the relatively high predictive weight of water stress within the current analytical framework.
Empirical outputs suggest that the algorithmic decision-making process is significantly driven by variations in air quality (PM2.5) and energy consumption patterns. Specifically, escalating PM2.5 concentrations appear to exert a deterrent effect on international tourist arrivals, thereby appearing consistent with Hypothesis 2 (H2). This empirical finding aligns with established literature demonstrating that air quality deterioration not only directly suppresses tourism flows but is inherently linked to broader energy-driven emissions (Su & Lee, 2022; Zikirya et al., 2021). Furthermore, the non-linear and threshold-dependent nature of these results is theoretically consistent with the Environmental Kuznets Curve (EKC) framework, which posits that environmental quality and sectoral expansion interact through complex, structural turning points (Grossman & Krueger, 1995; Stern, 2004). The observation that tourism-induced economic expansion can precipitate higher energy demand—paradoxically exacerbating local PM2.5 pollution if not managed sustainably—suggests that destinations may be approaching critical ecological thresholds (Ergen & Aslan, 2026; J. Zhang & Lu, 2022). Conversely, the substantial predictive weight attributed to the renewable energy share (27.12%) implies that transitioning toward low-carbon infrastructure potentially enhances destination attractiveness. This observation appears consistent with Hypothesis 3 (H3), suggesting that broader renewable energy integration may support both international tourism demand and environmental quality (Guo & Chai, 2025; Omarova et al., 2025).
An examination of the SHAP summary graph (Figure 2) illustrates the varying marginal contributions of environmental determinants to predicted international tourist arrivals, revealing patterns that may be consistent with non-linear and asymmetric dynamics. It is important to emphasize that SHAP values reflect the contribution of each feature to the model’s predictions and should not be interpreted as evidence of tourist behavioral responses or decision-making mechanisms. The directional patterns observed in the SHAP plots indicate predictive associations within the model architecture, not causal pathways through which environmental conditions influence tourist choices. Specifically, elevated feature values—represented by the red data points—for water stress and air pollution consistently shift the model output in a negative direction. This pattern is broadly consistent with the theoretical premises of the Environmental Kuznets Curve (EKC) framework, suggesting that destinations may encounter critical ecological thresholds where environmental degradation begins to suppress demand (Grossman & Krueger, 1995; Stern, 2004). It should be emphasized, however, that the EKC hypothesis is not formally tested in this study and is invoked here solely as a theoretical lens. This visual evidence implies that heightened environmental vulnerability may weaken destination competitiveness, thereby suppressing tourism inflows; a dynamic that appears consistent with Hypothesis 2 (H2) (Özdemir & Tosun, 2023; Su & Lee, 2022). Conversely, high values for the renewable energy and sanitation infrastructure variables pull the SHAP distributions toward the positive spectrum. This trajectory suggests that investments in sustainable infrastructure and low-carbon energy transitions have the potential to enhance destination attractiveness and stimulate tourism demand. Such an empirical outcome aligns closely with contemporary literature emphasizing the strategic benefits of clean energy and sustainable practices, thereby substantiating Hypothesis 3 (H3) (Islam et al., 2026; Tai & Javed, 2025).
An analysis of the SHAP summary graph (Figure 3) corresponding to the Extra Trees (ET) algorithm appears to reveal a structural divergence in its underlying learning architecture. Specifically, the model’s predictive mechanism seems to be predominantly driven by variables representing high-market-share destinations within the global tourism landscape, such as France, the United States, Spain, and Italy. Elevated feature values for these specific destination identifiers (visually denoted by the red data points) consistently shift the SHAP distributions in a positive trajectory. This dynamic implies that the ET architecture potentially captures destination-specific fixed effects, historical brand equity, and persistent demand patterns more aggressively than it weights environmental or climatic indicators. Such algorithmic behavior is conceptually analogous to traditional dynamic panel methodologies, wherein unobserved qualitative heterogeneity, tourist habit persistence, and expectation formations are frequently captured through specific baseline proxies (Krasniqi et al., 2023). Consequently, despite the ET model yielding a superior aggregate predictive performance (R2), its heavy reliance on inherent destination characteristics provides a robust methodological justification for favoring the CatBoost framework. This dominance of country fixed effects in the Extra Trees model raises an important methodological concern: the algorithm may be learning primarily from persistent, country-specific tourism baselines rather than from the dynamic environmental variables of interest. In such a case, the model would effectively approximate a country-level mean prediction, limiting its capacity to inform environmental policy. This further reinforces the rationale for prioritizing the CatBoost model for substantive interpretation. CatBoost’s demonstrated sensitivity to dynamic environmental determinants makes it significantly more appropriate for formulating sustainability-focused policy recommendations.
It should also be noted that SHAP-based feature importance rankings may vary across different model runs or alternative specifications. To assess the stability of the reported rankings, the CatBoost model was re-estimated across multiple random seeds, and the top environmental features—water stress, renewable energy share, and sanitation infrastructure—consistently appeared among the most important predictors, suggesting that the reported rankings are reasonably robust to minor specification changes. Additionally, the predictive importance of environmental variables in the CatBoost model may partly reflect cross-sectional variation across countries rather than within-country temporal dynamics. Future research employing within-country panel designs would help disentangle these two sources of variation. A further ablation check excluding country fixed effects confirmed that environmental variables (R2 = 0.68) substantially outperform GDP per capita alone (R2 = −0.03), as reported in Section 5.

5. Conclusions

This study contributes to the extant literature by evaluating the impact of environmental sustainability and fragility indicators on international tourism demand through the deployment of advanced machine learning algorithms across global destinations. The contributions of this study are explicitly predictive in nature: the primary empirical contribution lies in demonstrating the superior out-of-sample predictive performance of tree-based ensemble models, while the secondary contribution involves the use of SHAP-based interpretability analysis to transparently identify the relative predictive importance of environmental versus macroeconomic variables. These findings reflect predictive associations and do not constitute causal claims. More specifically, the findings suggest that within this particular model configuration and sample context, environmental sustainability indicators may carry greater predictive weight than GDP per capita—a result that highlights the potential role of ecological conditions as predictive drivers of destination attractiveness. This finding should not be generalized to imply that macroeconomic determinants are universally less important for tourism demand, as the relative predictive importance of variables may vary across different model specifications, samples, and contexts. The empirical outcomes suggest that traditional linear specifications may be methodologically inadequate for capturing the intricate, multidimensional relationships between climatic/environmental factors and tourist behavior. Conversely, flexible tree-based architectures demonstrate superior explanatory efficacy. The findings suggest that tourism demand may respond to environmental disturbances in a complex and potentially non-linear manner, a pattern that appears consistent with non-linear dynamics observed in related literature. Consequently, these results appear consistent with Hypothesis 1 (H1), suggesting that environmental sustainability metrics may possess substantial predictive value in modeling global tourism demand.
An important finding from the study in this regard is the significant deterrent effect of environmental vulnerability on destination competitiveness. In particular, rising water scarcity, declining air quality, and inadequate sanitation facilities are found to be negatively associated with predicted international tourist arrivals, suggesting that these environmental pressures may limit the sustainability threshold of destinations. As environmental degradation intensifies, its suppressive impact on tourism demand exhibits a non-linear acceleration. The identification of this non-linear trajectory is broadly consistent with the theoretical premises of the Environmental Kuznets Curve (EKC) framework, suggesting that tourism–environment interactions may involve structural turning points where ecological depletion begins to erode destination attractiveness (Grossman & Krueger, 1995; Stern, 2004). It should be noted, however, that the EKC hypothesis is not formally tested in this study; the framework is invoked as a theoretical lens consistent with the observed predictive patterns. These dynamics suggest that destination managers and policymakers may benefit from considering a gradual shift away from volume-centric growth strategies toward resilience-oriented planning frameworks anchored in robust resource management and environmental sustainability principles.
Conversely, the analytical framework indicates that the integration of renewable energy and the enhancement of sanitation infrastructure are positively associated with international tourism inflows. Destinations actively transitioning toward low-carbon energy architectures may more effectively mitigate environmental pressures, potentially enhancing their attractiveness to international visitors. These findings suggest that strategic investments in sustainable infrastructure and clean energy transitions may not only improve aggregate environmental performance but may also represent potential competitive advantages that support tourism demand. Such outcomes appear consistent with Hypothesis 3 (H3), suggesting a theoretical link between destination sustainability and competitive resilience. These findings should, however, be interpreted as predictive associations derived from machine learning models rather than confirmed causal mechanisms. Establishing causal directionality would require dedicated identification strategies such as instrumental variable approaches or natural experiments, which represent important avenues for future research.
Despite its empirical contributions, this study acknowledges certain methodological constraints. The reliance on annual-frequency macro-data inherently limits the capacity to capture the immediate repercussions of seasonal anomalies and short-term extreme weather shocks. Furthermore, the aggregation of environmental indicators at the national level represents a more significant limitation than it may initially appear. Tourism activity is spatially concentrated within specific resorts, coastal zones, and urban centers, whereas PM2.5, water stress, and temperature anomaly data are averaged at the national level. This mismatch between the spatial scale of tourism activity and that of the environmental indicators may substantially attenuate the true relationships being estimated, and future research would benefit considerably from sub-national or destination-level environmental data. Moreover, the predictive importance of environmental variables observed in this study may partly reflect cross-sectional variation across countries rather than within-country temporal dynamics, since the panel structure combines both between-country and within-country variation. Disentangling these two sources of variation through within-country fixed-effects designs or first-differenced specifications represents an important direction for future research. Additionally, the high out-of-sample predictive performance of the models may partly reflect the learning of country-specific tourism baselines through dummy variables, rather than transferable tourism–environment relationships. The generalizability of these findings to out-of-sample destinations should therefore be interpreted with caution. Consequently, future research endeavors could significantly benefit from the integration of high-frequency temporal datasets and granular spatial analyses, potentially augmented by geospatial satellite imagery, to derive a more nuanced understanding of the climate–tourism nexus. Additionally, future research could strengthen the robustness of findings by incorporating lagged environmental variables to account for delayed effects on tourism demand, conducting subgroup analyses separating developed and developing economies, and performing sensitivity analyses in which linearly interpolated observations are excluded from the sample to verify that main findings are not driven by imputed data. The robustness of the findings was assessed through two complementary approaches. First, a sensitivity analysis excluding all linearly interpolated observations yielded R2 values of 0.9285 (Extra Trees) and 0.8486 (CatBoost), remaining consistent with the main analysis. Second, SHAP-based feature importance rankings were assessed across multiple random seeds, with the top environmental predictors remaining stable across specifications. These checks collectively suggest that the reported findings are not materially sensitive to data imputation or random initialization. The research design incorporates country fixed effects as dummy variables to control for time-invariant destination characteristics, consistent with established practice in destination-level tourism forecasting (Nag & Sarkar, 2024). As an additional robustness check, when country fixed effects are excluded, environmental variables collectively achieve an R2 of 0.68, whereas GDP per capita alone yields an R2 of −0.03, further supporting the argument that environmental variables carry meaningful predictive information beyond country-specific baseline effects. This result is consistent with established practice in the tourism demand literature: income as a demand driver typically refers to origin country income as a proxy for tourists’ purchasing power, rather than destination country GDP (Lim, 1997; Witt & Witt, 1995; Song et al., 2012). Moreover, when destination-level fixed effects are included in panel specifications, destination GDP is typically absorbed by these controls (Rosselló Nadal & Santana Gallego, 2022).
In conclusion, this study provides preliminary predictive evidence—within a specific sample of 20 leading global destinations over the period 2000–2019—that environmental sustainability indicators may carry substantial predictive weight in forecasting international tourism demand. The results suggest that destinations facing acute environmental pressures may be associated with lower predicted tourist arrivals, while those investing in renewable energy and sanitation infrastructure tend to show higher predicted demand. These findings should be interpreted as predictive associations within this particular sample context and should not be generalized as evidence of structural transformation in global tourism systems. Causal confirmation and broader generalization await further research employing larger samples, alternative model specifications, and dedicated causal identification strategies.

Author Contributions

Conceptualization, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; methodology, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; validation, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; formal analysis, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; investigation, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; resources, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; data curation, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; writing—original draft preparation, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö.; Writing—review and editing, M.E., O.Ö., E.O.E., E.D.Ö. and Ş.Ö. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data derived from public domain resources. These data were derived from the following resources available in the public domain: World Bank Open Data for macroeconomic, environmental quality, and tourism demand indicators (https://data.worldbank.org/ (accessed on 10 February 2026)), and the Google Earth Engine platform for the retrieval and processing of ECMWF ERA5-Land reanalysis climatological datasets (https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_LAND_MONTHLY_AGGR (accessed on accessed on 13 February 2026)).

Acknowledgments

The authors would like to acknowledge Google Earth Engine for providing the cloud-based computational platform used for the acquisition and processing of large-scale geospatial data. This research was conducted under the Earth Engine non-commercial research/education license. Furthermore, the authors express their gratitude to the Copernicus Climate Change Service (C3S) for making the ECMWF ERA5-Land reanalysis datasets openly available.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
EKCEnvironmental Kuznets Curve
SDGSustainable Development Goals
GDPGross domestic product
RMSERoot Mean Squared Error
MAEMean Absolute Error
MSEMean Squared Error
MLRMultiple Linear Regression
ETExtra Tress
SHAPSHapley Additive exPlanations

References

  1. Chen, J., Li, C., Huang, L., & Zheng, W. (2025). Tourism demand forecasting: A deep learning model based on spatial-temporal transformer. Tourism Review, 80(3), 648–663. [Google Scholar] [CrossRef]
  2. Dimitriadou, A., Gogas, P., & Papadimitriou, T. (2025). Tourism and uncertainty: A machine learning approach. Current Issues in Tourism, 28(14), 2278–2298. [Google Scholar] [CrossRef]
  3. Dogan, E., & Aslan, A. (2017). Exploring the relationship among CO2 emissions, real GDP, energy consumption and tourism in the EU and candidate countries: Evidence from panel models robust to heterogeneity and cross-sectional dependence. Renewable and Sustainable Energy Reviews, 77, 239–245. [Google Scholar] [CrossRef]
  4. Dube, K. (2024). Evolving narratives in tourism and climate change research: Trends, gaps, and future directions. Atmosphere, 15(4), 455. [Google Scholar] [CrossRef]
  5. Dwyer, L., & Kim, C. (2003). Destination competitiveness: Determinants and indicators. Current Issues in Tourism, 6(5), 369–414. [Google Scholar] [CrossRef]
  6. Ergen, H., & Aslan, A. (2026). Examining the impact of air transportation, tourism, and economic growth on PM2.5 pollution: A comparative analysis of EUROCONTROL member nations. Letters in Spatial and Resource Sciences, 19(8), 8. [Google Scholar] [CrossRef]
  7. European Environment Agency (EEA). (2025). Economic losses from weather- and climate-related extremes in Europe. Available online: https://www.eea.europa.eu/en/analysis/indicators/economic-losses-from-climate-related?activeAccordion=ecdb3bcf-bbe9-4978-b5cf-0b136399d9f8 (accessed on 31 March 2026).
  8. Fu, J., & Qin, J. (2026). Tourism demand forecasting with multi-source data: A hybrid framework integrating denoising, signal decomposition, and machine learning. Management System Engineering, 5, 6. [Google Scholar] [CrossRef]
  9. Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. [Google Scholar] [CrossRef]
  10. Gössling, S., Bredberg, M., Randow, A., Sandström, E., & Svensson, P. (2006). Tourist perceptions of climate change: A study of international tourists in Zanzibar. Current Issues in Tourism, 9(4–5), 419–435. [Google Scholar] [CrossRef]
  11. Gössling, S., & Peeters, P. (2015). Assessing tourism’s global environmental impact 1900–2050. Journal of Sustainable Tourism, 23(5), 639–659. [Google Scholar] [CrossRef]
  12. Gössling, S., Scott, D., Hall, C. M., Ceron, J. P., & Dubois, G. (2012). Consumer behaviour and demand response of tourists to climate change. Annals of Tourism Research, 39(1), 36–58. [Google Scholar] [CrossRef]
  13. Grossman, G. M., & Krueger, A. B. (1995). Economic growth and the environment. The Quarterly Journal of Economics, 110(2), 353–377. [Google Scholar] [CrossRef]
  14. Guo, Y., & Chai, Y. (2025). Toward green tourism: The role of renewable energy for sustainable development in developing nations. Frontiers in Sustainable Tourism, 4, 1512922. [Google Scholar] [CrossRef]
  15. Hamilton, J. M., Maddison, D. J., & Tol, R. S. J. (2005). Climate change and international tourism: A simulation study. Global Environmental Change, 15(3), 253–266. [Google Scholar] [CrossRef]
  16. Hassan, S. S. (2000). Determinants of market competitiveness in an environmentally sustainable tourism industry. Journal of Travel Research, 38(3), 239–245. [Google Scholar] [CrossRef]
  17. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer. [Google Scholar] [CrossRef]
  18. Ibragimov, K., Perles-Ribes, J. F., & Ramón-Rodríguez, A. B. (2022). The impact of climate change on tourism demand: Evidence from Kazakhstan. Anatolia, 33(2), 293–297. [Google Scholar] [CrossRef]
  19. Islam, H., Unurlu, Ç., Işık, C., Saha, S., Khoshnood, M., & Sarker, N. K. (2026). Green innovation and tourism demand: The moderating role of sustainable development goals (SDGs). Sustainable Development, 1–17. [Google Scholar] [CrossRef]
  20. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). Linear regression. In An introduction to statistical learning. Springer. [Google Scholar] [CrossRef]
  21. Jopp, R., DeLacy, T., & Mair, J. (2010). Developing a framework for regional destination adaptation to climate change. Current Issues in Tourism, 13(6), 591–605. [Google Scholar] [CrossRef]
  22. Kaján, E., & Saarinen, J. (2013). Tourism, climate change and adaptation: A review. Current Issues in Tourism, 16(2), 167–195. [Google Scholar] [CrossRef]
  23. Katircioglu, S. T. (2014). International tourism, energy consumption, and environmental pollution: The case of Turkey. Renewable and Sustainable Energy Reviews, 36, 180–187. [Google Scholar] [CrossRef]
  24. Khanna, R., & Sharma, C. (2023). Does financial development raise tourism demand? A cross-country panel evidence. Journal of Hospitality & Tourism Research, 47(6), 1040–1070. [Google Scholar] [CrossRef]
  25. Krasniqi, S., Dreshaj, K., & Shala Dreshaj, F. (2023). Determinants of tourism demand in selected countries of meta: Empirical panel analysis. DETUROPE—The Central European Journal of Regional Development and Tourism, 15(1), 23–46. [Google Scholar] [CrossRef]
  26. Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer. [Google Scholar] [CrossRef]
  27. Kuldasheva, Z., Ahmad, M., Salahodjaev, R., & Fahlevi, M. (2023). Do tourism and renewable energy influence CO2 emissions in tourism-dependent countries? International Journal of Energy Economics and Policy, 13(6), 146–152. [Google Scholar] [CrossRef]
  28. Law, R., Li, G., Fong, D. K. C., & Han, X. (2019). Tourism demand forecasting: A deep learning approach. Annals of Tourism Research, 75, 410–423. [Google Scholar] [CrossRef]
  29. Lenzen, M., Sun, Y. Y., Faturay, F., Ting, Y. P., Geschke, A., & Malik, A. (2018). The carbon footprint of global tourism. Nature Climate Change, 8, 522–528. [Google Scholar] [CrossRef]
  30. Lim, C. (1997). Review of international tourism demand models. Annals of Tourism Research, 24(4), 835–849. [Google Scholar] [CrossRef]
  31. Liu, S., Islam, H., Ghosh, T., Sibt e Ali, M., & Afrin, K. H. (2025). Exploring the nexus between economic growth and tourism demand: The role of sustainable development goals. Humanities and Social Sciences Communications, 12, 441. [Google Scholar] [CrossRef]
  32. Muñoz Sabater, J. (2019). ERA5-Land monthly averaged data from 1950 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). [Google Scholar] [CrossRef]
  33. Nag, A., & Sarkar, S. (2024). Integrating choice freedom, economic health, and transportation infrastructure to forecast tourism demand: A case study of Bishnupur and its alignment with sustainable development goals. Transport Policy, 147, 198–214. [Google Scholar] [CrossRef]
  34. Nyiwul, L., Shakya, M., Lamichhane, A., & Hu, Z. (2024). Adoption of tools for sustainable tourism development: Role of environmental vulnerability. Journal of Policy Research in Tourism, Leisure and Events, 16(3), 349–371. [Google Scholar] [CrossRef]
  35. Olmedo, F. G. (2026). Tourism and water stress: A worrying convergence in time and place. In J. Martínez-Valderrama, & J. Olcina Cantos (Eds.), The labyrinth of desertification. Springer. [Google Scholar] [CrossRef]
  36. Omarova, A., Saubetova, B., Abubakirova, A., Saimagambetova, G., & Pernebekkyzy, N. (2025). Renewable energy consumption, tourism and climate change relationship in developed countries. International Journal of Energy Economics and Policy, 16(1), 218–224. [Google Scholar] [CrossRef]
  37. Onifade, S. T., Gyamfi, B. A., Bekun, F. V., & Altuntaş, M. (2022). Significance of air transport to tourism-induced growth hypothesis in E7 economies: Exploring the implications for environmental quality. Tourism: An International Interdisciplinary Journal, 70(3), 339–353. [Google Scholar] [CrossRef]
  38. Özdemir, D., & Tosun, B. (2023). Determinants of tourism demand in context of environmental quality. Advances in Hospitality and Tourism Research (AHTR), 11(2), 294–316. [Google Scholar] [CrossRef]
  39. Paramati, S. R., Alam, M. d. S., & Chen, C. F. (2017). The effects of tourism on economic growth and CO2 emissions: A comparison between developed and developing economies. Journal of Travel Research, 56(6), 712–724. [Google Scholar] [CrossRef]
  40. Pata, U. K., & Tanriover, B. (2023). Is the load capacity curve hypothesis valid for the top ten tourism destinations? Sustainability, 15(2), 960. [Google Scholar] [CrossRef]
  41. Perry, A. (2006). Will predicted climate change compromise the sustainability of mediterranean tourism? Journal of Sustainable Tourism, 14(4), 367–375. [Google Scholar] [CrossRef]
  42. Rosselló Nadal, J., & Santana Gallego, M. (2022). Gravity models for tourism demand modeling: Empirical review and outlook. Journal of Economic Surveys, 36(5), 1358–1409. [Google Scholar] [CrossRef]
  43. Rutty, M., Steiger, R., Demiroglu, O. C., & Perkins, D. R. (2021). Tourism climatology: Past, present, and future. International Journal of Biometeorology, 65, 639–643. [Google Scholar] [CrossRef]
  44. Scott, D. (2021). Sustainable tourism and the grand challenge of climate change. Sustainability, 13(4), 1966. [Google Scholar] [CrossRef]
  45. Shang, Y., Bi, C., Wei, X., Jiang, D., Taghizadeh-Hesary, F., & Rasoulinezhad, E. (2023). Eco-tourism, climate change, and environmental policies: Empirical evidence from developing economies. Humanities and Social Sciences Communications, 10, 275. [Google Scholar] [CrossRef]
  46. Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. [Google Scholar] [CrossRef]
  47. Song, H., Dwyer, L., Li, G., & Cao, Z. (2012). Tourism economics research: A review and assessment. Annals of Tourism Research, 39(3), 1653–1682. [Google Scholar] [CrossRef]
  48. Song, H., & Li, G. (2008). Tourism demand modelling and forecasting—A review of recent research. Tourism Management, 29(2), 203–220. [Google Scholar] [CrossRef]
  49. Song, H., Qiu, R. T. R., & Park, J. (2019). A review of research on tourism demand forecasting: Launching the Annals of Tourism Research Curated Collection on tourism demand forecasting. Annals of Tourism Research, 75, 338–362. [Google Scholar] [CrossRef]
  50. Stern, D. I. (2004). The rise and fall of the environmental kuznets curve. World Development, 32(8), 1419–1439. [Google Scholar] [CrossRef]
  51. Su, Y., & Lee, C. C. (2022). The impact of air quality on international tourism arrivals: A global panel data analysis. Environmental Science and Pollution Research, 29, 62432–62446. [Google Scholar] [CrossRef]
  52. Sun, S., Wei, Y., Tsui, K. L., & Wang, S. (2019). Forecasting tourist arrivals with machine learning and internet search index. Tourism Management, 70, 1–10. [Google Scholar] [CrossRef]
  53. Tai, Y., & Javed, M. A. (2025). Energy integration as a catalyst for tourism growth and economic stability: Evidence from european nations. International Journal of Energy Research, 2025, 8843761. [Google Scholar] [CrossRef]
  54. Tonsakunthaweeteam, S., Pongsakornrungsilp, S., Pongsakornrungsilp, P., Ketkaew, K., Rattanapan, P., & Grusevaja, M. (2025). Climate and visa exemption policies on international tourism revenue: A cross-country panel study. Journal of Human, Earth, and Future, 6(4), 900–915. [Google Scholar] [CrossRef]
  55. Trinajstić, M., Cerović, L., & Krstinić Nižić, M. (2022). Tourism demand and energy consumption in the service sector: Panel analysis of selected EU countrıes. Ekonomski Pregled, 73(3), 371–389. [Google Scholar] [CrossRef]
  56. Vašaničová, P., & Melnyk, K. (2026). Regional and income-based disparities in health and hygiene: Evidence from the travel & tourism development index. Hygiene, 6(1), 11. [Google Scholar] [CrossRef]
  57. Witt, S. F., & Witt, C. A. (1995). Forecasting tourism demand: A review of empirical research. International Journal of Forecasting, 11(3), 447–475. [Google Scholar] [CrossRef]
  58. Yu, Y., Gong, Y., Tao, Y., Won, D., & Zhang, G. (2025). An Examination of the impact of climate change on international inbound tourism: Insights from China. International Journal of Tourism Research, 27, E70057. [Google Scholar] [CrossRef]
  59. Zhang, J., & Lu, Y. (2022). Exploring the effects of tourism development on air pollution: Evidence from the panel smooth transition regression model. International Journal of Environmental Research and Public Health, 19(14), 8442. [Google Scholar] [CrossRef]
  60. Zhang, Y., & Ali, Q. (2024). Socio-economic determinants of sustainable tourism and their nexus with energy, environment, and economy (3ES): A panel data analysis. Energy Strategy Reviews, 56, 101577. [Google Scholar] [CrossRef]
  61. Zhou, W., Faturay, F., Driml, S., & Sun, Y. Y. (2024). Meta-analysis of the climate change-tourism demand relationship. Journal of Sustainable Tourism, 32(9), 1762–1783. [Google Scholar] [CrossRef]
  62. Zikirya, B., Wang, J., & Zhou, C. (2021). The relationship between CO2 emissions, air pollution, and tourism flows in China: A panel data analysis of Chinese provinces. Sustainability, 13(20), 11408. [Google Scholar] [CrossRef]
Figure 1. Model Performance Plots.
Figure 1. Model Performance Plots.
Tourismhosp 07 00170 g001
Figure 2. SHAP Summary Plot for the CatBoost Model.
Figure 2. SHAP Summary Plot for the CatBoost Model.
Tourismhosp 07 00170 g002
Figure 3. SHAP Summary Plot for the Extra Trees Model.
Figure 3. SHAP Summary Plot for the Extra Trees Model.
Tourismhosp 07 00170 g003
Table 1. Variable Definitions and Data Sources.
Table 1. Variable Definitions and Data Sources.
Variable (Feature)Description/Measurement UnitData Source
Tourist_Arrivals_Log
(Dependent Variable)
Natural logarithm of international tourist arrivals.World Bank
Temperature_AnomalyAnnual mean surface temperature anomaly (°C).ECMWF ERA5-Land (Google Earth Engine)
Air_Pollution_PM2.5Population exposure to average PM2.5 air pollution (µg/m3).World Bank
Water_Stress_PercentageRatio of freshwater withdrawals to available freshwater resources (water stress, %).World Bank
Renewable_Energy_PercentageShare of renewable energy in total final energy consumption (%).World Bank
Sanitation_PercentagePercentage of population with access to basic sanitation services (%).World Bank
Energy_UsageEnergy use per capita (kg of oil equivalent).World Bank
GDP_Per_Capita_USDGDP per capita (USD) representing destination economic size.World Bank
Table 2. Hyperparameter Selection Parameters.
Table 2. Hyperparameter Selection Parameters.
ModelOptimized Parameters (Best Parameters)
Multiple Linear Regressionfit_intercept: True
Random Forestmax_depth: None, min_samples_leaf: 1, n_estimators: 250
Extra Treesmax_depth: None, min_samples_leaf: 1, n_estimators: 100
CatBoostdepth: 4, iterations: 500, learning_rate: 0.05
Table 3. Descriptive Statistics of the Research Dataset.
Table 3. Descriptive Statistics of the Research Dataset.
VariablesMeanStd. Dev.MinMax
Tourist_Arrivals_Log (Dependent Variable)17.3590.94115.37519.199
Temperature_Anomaly (°C)0.4540.534−1.1602.368
Air_Pollution_PM25 (µg/m3)22.24913.9556.22665.813
Water_Stress_Percentage (%)154.165404.0282.8921866.670
Renewable_Energy_Percentage (%)11.5358.4800.00035.700
Sanitation_Percentage (%)96.1746.73856.820100.000
Energy_Usage (kg of oil equivalent)3861.5902296.110898.07511,703.275
GDP_Per_Capita_USD 27,30416,322.8969.20064,746.500
Table 4. Out-of-Sample Prediction Performance Comparison of Machine Learning Models.
Table 4. Out-of-Sample Prediction Performance Comparison of Machine Learning Models.
ModelR2 ScoreAdjusted R2RMSEMAEMSE
Extra Trees0.90540.85630.24190.18690.0585
CatBoost0.84750.76830.30710.23700.0943
Multiple Linear Regression0.73970.60450.40120.28510.1610
Random Forest0.57940.36100.51000.28630.2601
Table 5. Comparison of Variable Importance for the Extra Trees and CatBoost Models.
Table 5. Comparison of Variable Importance for the Extra Trees and CatBoost Models.
RankCatBoost VariablesWeight (%)Extra Trees VariablesWeight (%)
1Water Stress28.20Country Fixed Effect:
France
18.13
2Renewable Energy27.12Country Fixed Effect:
USA
11.51
3Sanitation13.61Country Fixed Effect:
Spain
8.74
4Country Fixed Effect:
France
5.84Sanitation7.93
5Energy Usage5.68Country Fixed Effect:
Italy
7.29
6Country Fixed Effect:
Japan
4.07Country Fixed Effect:
Poland
6.57
7Air Pollution3.99Country Fixed Effect:
Mexico
6.27
8GDP per Capita2.30Country Fixed Effect:
China
5.67
9Country Fixed Effect:
Portugal
2.05Country Fixed Effect:
Japan
4.40
10Country Fixed Effect:
Türkiye
1.65Renewable Energy4.25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Oruç Erdoğan, E.; Özdemir, O.; Erdoğan, M.; Durmuş Özdemir, E.; Özdemir, Ş. Environmental Sustainability Indicators and International Tourism Demand: Evidence from Machine Learning and SHAP Analysis. Tour. Hosp. 2026, 7, 170. https://doi.org/10.3390/tourhosp7060170

AMA Style

Oruç Erdoğan E, Özdemir O, Erdoğan M, Durmuş Özdemir E, Özdemir Ş. Environmental Sustainability Indicators and International Tourism Demand: Evidence from Machine Learning and SHAP Analysis. Tourism and Hospitality. 2026; 7(6):170. https://doi.org/10.3390/tourhosp7060170

Chicago/Turabian Style

Oruç Erdoğan, Eda, Ozan Özdemir, Murat Erdoğan, Eren Durmuş Özdemir, and Şefika Özdemir. 2026. "Environmental Sustainability Indicators and International Tourism Demand: Evidence from Machine Learning and SHAP Analysis" Tourism and Hospitality 7, no. 6: 170. https://doi.org/10.3390/tourhosp7060170

APA Style

Oruç Erdoğan, E., Özdemir, O., Erdoğan, M., Durmuş Özdemir, E., & Özdemir, Ş. (2026). Environmental Sustainability Indicators and International Tourism Demand: Evidence from Machine Learning and SHAP Analysis. Tourism and Hospitality, 7(6), 170. https://doi.org/10.3390/tourhosp7060170

Article Metrics

Back to TopTop