Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study

Xie, Shengkun; Onungwe, Chinwendu

doi:10.3390/math14050758

Open AccessFeature PaperArticle

Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study

by

Shengkun Xie

^*

and

Chinwendu Onungwe

Global Management Studies, Ted Rogers School of Management, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(5), 758; https://doi.org/10.3390/math14050758

Submission received: 2 January 2026 / Revised: 8 February 2026 / Accepted: 21 February 2026 / Published: 25 February 2026

(This article belongs to the Special Issue Statistical Data Modeling and Machine Learning with Applications, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Forecasting of travel demand has become increasingly important in the context of evolving mobility patterns and structural disruptions, including economic fluctuations and public health crises. Classical time series models, although well established in travel-demand analysis, are often limited in their ability to capture non-linear dependencies or adapt to abrupt regime shifts. This study develops and evaluates forecasting techniques drawn from both traditional statistical modeling and machine learning approaches. Their predictive performance and adaptability are benchmarked for U.S. outbound air travel demand across eight global destination regions, Europe, the Caribbean, Asia, South America, Central America, Oceania, the Middle East, and Africa, respectively. Using historical outbound passenger data, six forecasting models are constructed and assessed through multiple forecasting accuracy measures, including the Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). Empirical results demonstrate that machine-learning-based models, particularly those incorporating adaptive learning components, consistently outperform conventional approaches in modeling structural changes in travel demand data. The study further contributes a generalizable methodological framework that enhances robustness under uncertainty and offers broad applicability to forecasting problems in transportation, tourism, and related domains.

Keywords:

travel demand forecasting; machine learning; time series models; data science

MSC:

62P25; 62M10

1. Introduction

The volume of U.S. citizens traveling abroad carries profound economic, political, and cultural implications, not only for the United States but also for destination countries worldwide. International travel demand directly influences the tourism and hospitality sector, shaping both inbound and outbound transportation flows [1]. For stakeholders such as government policy-makers, transportation authorities, service providers, and international marketers, the ability to understand volumes, shifts and uncertainties in travel demand is essential for informed strategic decision-making. In an era defined by increasing global connectivity [2] and the rising contribution of international tourism to national and global GDP [3], accurate forecasting of outbound travel has become critical for maintaining competitiveness in the market, and ensuring its sustainable growth. Capturing the underlying structure of historical travel demand data often requires advanced statistical and computational tools that are capable of identifying latent patterns. These patterns may be deterministic, stochastic, or a combination of both. Effectively modeling the inherent complexity of the time series is therefore essential for producing reliable forecasts of future travel demand. This highlights the need for modern modeling techniques that can be used to uncover complex data patterns and effectively inform future travel demand forecasting.

Despite its importance, forecasting international travel demand remains challenging. Demand patterns are highly sensitive to structural shifts, including economic downturns and geopolitical events. Global disruptions such as the COVID-19 pandemic have also severely altered mobility and reshaped tourism dynamics [4,5,6,7,8]. Traditional time series models, although effective for modeling linear dependencies and seasonal cycles, often fall short in accommodating the non-linear, non-stationary, and multidimensional dynamics that characterize real-world travel behavior [9,10]. Their reliance on stationarity assumptions, limited handling of exogenous shocks, and inability to efficiently scale with high-frequency or high-volume data pose substantial risks of bias, particularly during periods of rapid change and uncertainty. This is why forecasting based on the non-stationary time series data is challenging and requires much of the attention on better understanding the major patterns obtained from the historical data and their impact to the unobserved future pattern. It is fundamentally important to employ forecasting methods that can both capture historical data patterns and adequately reflect future uncertainty. Compared with modern machine learning (ML) approaches, traditional time series forecasting methods exhibit several limitations. Their inherently linear structure restricts their ability to model non-linear and complex temporal dependencies. Moreover, these methods often rely heavily on stationarity assumptions, which typically require substantial data pre-processing. For instance, Auto Regressive Integrated Moving Average (ARIMA) model incorporates external variables only in a limited manner, making it less suitable for handling high-dimensional data.

Recent advances in data science provide a promising pathway to overcome these limitations. ML and deep learning (DL) methods offer the ability to capture non-linear relationships among variables [11,12], uncover latent patterns through automated feature extraction [13,14], and adapt dynamically by modeling long-term temporal dependencies [15]. A growing body of research highlights the strong performance of algorithms such as Random Forest (RF) and Gradient Boosting, as well as recurrent neural networks (NNs) including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs), in improving forecasting accuracy in tourism domains [16,17,18,19]. This suggests the need to further investigate the applicability of these advanced ML methods to forecasting problems in travel research, in particular, systematically comparing the performance of different forecasting models when applied to travel demand data. This study is motivated by the urgent need for resilient and flexible forecasting models capable of addressing the dynamic and interconnected nature of global travel demand data [20,21]. By conducting a systematic comparison of traditional time series methods (e.g., ARIMA), classical ML models (e.g., Random Forest, Gradient Boosting), and advanced DL architectures (e.g., LSTM, GRU), this research contributes to identifying the most effective and adaptive forecasting strategies for travel demand. Unlike much of the existing literature, which tends to focus on inbound flows or domestic mobility, this study emphasizes outbound U.S. travel across global regions, a dimension that remains under-explored yet critical for understanding international tourism flows, making it a significant and original contribution to tourism research.

From a practical perspective, the findings of this study have important implications for transportation planners, tourism authorities, and policy-makers concerned with managing international mobility under uncertainty. Accurate forecasts of outbound travel demand can support infrastructure investment, airline capacity planning, marketing allocation, and policy formulation related to visa processing and international coordination. By providing region-specific forecasting insights, this study offers decision-makers actionable evidence to support data-informed planning in an increasingly volatile global travel environment. From a methodological standpoint, this study contributes a structured comparative framework for evaluating forecasting performance across heterogeneous regions using multiple modeling models. Rather than advocating a single universal approach, the analysis emphasizes how model suitability depends on regional data characteristics, including scale, volatility, and temporal structure. By systematically benchmarking statistical, ML, and DL models within a unified experimental design, this work advances empirical understanding of model selection in outbound travel demand forecasting and establishes a foundation for future extensions by incorporating additional data sources and hybrid modeling strategies.

2. Literature Review

The economic shocks and subsequent structural shifts in global mobility patterns have underscored the need for robust, adaptive forecasting approaches capable of addressing both short-term shocks and long-term transformations. In response, recent studies have increasingly turned to advanced time series models ML methods to capture the complex, non-linear, and non-stationary dynamics inherent in time series data. For example, Bontempi et al. [22] document how tree-based ensembles and neural networks adapt to evolving data-generating processes without requiring strict parametric assumptions. Similarly, Lai et al. [23] demonstrate that DL architectures, including recurrent and convolutional neural networks, substantially outperform traditional models when time series exhibit non-linearity and non-stationarity.

A prominent research direction centres on the integration of DL architectures to improve forecasting accuracy. For example, Chen et al. [24] proposed a spatial–temporal transformer network that simultaneously models temporal evolution and spatial interactions, while Zhang et al. [25] introduced a hybrid BiLSTM transformer framework to account for both short-term volatility and long-term structural patterns. Similarly, ensemble DL strategies, such as the bagging-based multivariate approach by Sun et al. [26], demonstrate how combining multiple learners can significantly enhance robustness and reduce predictive variance. In addition, Lim and Zohren [27] show that attention-based DL models can capture complex temporal dependencies and regime shifts that are difficult to model using classical approaches. These studies underscore a broader shift in the time series literature toward flexible, data-driven modeling frameworks that explicitly accommodate non-linearity, temporal dependence across multiple scales, and evolving data-generating mechanisms. This growing body of work highlights the potential of modern ML approaches not only to improve predictive accuracy, but also to enhance model robustness, thereby motivating their adoption in increasingly complex real-world time series forecasting.

Another stream of research emphasizes the fusion of multi-source data to enrich forecasting models. For instance, Lee [20] incorporated web search trends as exogenous variables into an SARIMAX framework, highlighting the utility of behavioral information as leading indicators of demand shifts. Complementarily, Colladon et al. [28] applied semantic and social network analysis to online travel forums, showing that unstructured digital footprints can provide early-warning signals for changes in travel demand behavior. These studies collectively highlight the inadequacy of traditional forecasting methods, which often fail to capture the richness and heterogeneity of contemporary travel demand dynamics. Addressing the challenge of structural breaks and crisis-induced volatility has also become a central focus. For example, Liu et al. [29] introduced the BayesBag method, integrating bootstrap aggregation with Bayesian inference to improve model stability under uncertainty. Hybrid CNN–LSTM approaches, such as those used by [30] in post-pandemic recovery forecasting for Vietnam, further demonstrate the potential of adaptive architectures to outperform conventional models in turbulent contexts. These contributions underscore the importance of resilient methods capable of adjusting to abrupt disruptions.

More recently, advances in data augmentation and transformer-based architectures have further expanded methodological horizons. Diao et al. [31] leveraged spatio-temporal GANs to generate virtual samples for transformer models, effectively mitigating data sparsity issues. Similarly, Li et al. [32] proposed a hybrid framework integrating time series decomposition with a temporal fusion transformer optimized via Bayesian search, achieving superior accuracy and interpretability. Expanding on this line, Yi et al. [33] integrated calendar-based encodings into a transformer encoder–decoder architecture, enhancing both interpretability and predictive performance. Together, these innovations reflect a growing trend toward combining generative, predictive, and interpretable frameworks to address both data limitations and practical decision-making needs.

As evidenced by [24,25,26], recent literature shows rapid application of advanced ML and DL methods to time series forecasting, particularly in the context of tourism demand prediction. Yet, two critical gaps remain: (1) most studies concentrate on inbound tourism or regional-level demand, leaving outbound travel demand comparatively underexplored; and (2) although ensemble and hybrid models have shown strong potential, their application to outbound U.S. travel demand forecasting is still limited. By addressing these gaps, the present study advances the literature through systematically evaluating a range of forecasting approaches for U.S. outbound travel. It provides both methodological contributions and actionable insights for policy-makers, transportation authorities, and the tourism industry.

3. Materials and Methods

The methodological framework is designed to capture the complex and dynamic patterns inherent in U.S. outbound travel demand data, including linear trends, seasonal effects, and non-linear interactions driven by exogenous factors. The study design and its key components are summarized in the flowchart shown in Figure 1. To facilitate a meaningful comparative analysis, the following sections describe the data and model formulations, providing a clear overview of the proposed methodological framework.

3.1. Data

The dataset used in this study was obtained from the U.S. Department of Homeland Security’s Customs and Border Protection (CBP) through the Advance Passenger Information System (APIS), an open-source resource that systematically tracks international air passenger flows. A key data summary is given in Table 1. The dataset records monthly counts of U.S. outbound air travelers, disaggregated by geographic destination region, thus offering a granular view of travel flows over time. The key variables include:

Region—categorical labels representing the major global regions where U.S. travelers are destined.
Date—temporal identifiers specifying the month and year of each observation.
Passengers—the total number of U.S. tourists traveling to each destination region in a given month.

These attributes collectively enable the construction of robust time series suitable for forecasting tasks. To ensure analytical rigor, the research framework incorporates several steps: data preprocessing (to handle missing values and ensure consistency), exploratory data analysis (EDA) (to identify underlying patterns), and model development (to evaluate predictive performance across multiple approaches).

3.2. Statistical and Machine Learning Methods

Five forecasting models, ARIMA, RF, XGBoost, LSTM, and GRU, are employed in this study. These models were selected to represent three complementary classes of forecasting approaches, providing comprehensive methodological coverage across a broad spectrum of available techniques. The rationale for selecting these models is discussed in the following section.

3.2.1. Model Selection Rationale

ARIMA assumes linear autoregressive structure and is well-suited for stationary or trend-stationary series. Tree-based ensembles (RF, XGBoost) impose no linearity constraints and excel at capturing complex feature interactions without requiring explicit specification. Recurrent architectures (LSTM, GRU) are designed to learn arbitrary temporal dependencies through gating mechanisms. This makes them particularly suitable for time series exhibiting long-range dependences or regime changes. Each model class has demonstrated strong performance in tourism and travel demand forecasting. Foe instance, ARIMA remains a widely used benchmark in tourism forecasting literature [5,6]. Tree-based methods have shown competitive accuracy in recent comparative studies [8,10]. DL approaches, particularly LSTM networks, have emerged as state-of-the-art techniques for capturing complex seasonal patterns in tourism demand data [7,8,12].

On the other hand, ARIMA provides a lightweight, interpretable baseline, requiring little computational effort. Random Forest and XGBoost offer a middle ground with moderate training times and built-in regularization to prevent overfitting. LSTM and GRU, while computationally intensive, provide a capacity to model temporal dynamics of time series. By evaluating models from each class, this study enables a rigorous comparison across different model types rather than within a single model family. This design choice allows us to identify whether the additional complexity of DL architectures yields meaningful improvements over simpler methods for U.S. outbound travel demand forecasting, and whether performance varies systematically across regions with different data characteristics (e.g., high-volume destinations such as Europe versus lower-volume destinations such as Africa).

3.2.2. Auto Regressive Integrated Moving Average (ARIMA)

The ARIMA model [34] is a classical and widely used time series model that combines three components: autoregression (AR), differencing (I), and moving average (MA). The general ARIMA

(p, d, q)

model can be formally expressed as follows:

Φ_{p} (B) {(1 - B)}^{d} D_{t} = Θ_{q} (B) ε_{t},

(1)

where

D_{t}

denotes the observed time series at time t, B is the backshift operator such that

B D_{t} = D_{t - 1}

,

ε_{t}

represents a white noise error term with zero mean and constant variance, and the polynomials are defined as:

\begin{matrix} Φ_{p} (B) & = 1 - ϕ_{1} B - ϕ_{2} B^{2} - \dots - ϕ_{p} B^{p}, \end{matrix}

(2)

\begin{matrix} Θ_{q} (B) & = 1 + θ_{1} B + θ_{2} B^{2} + \dots + θ_{q} B^{q}, \end{matrix}

(3)

where p is the order of the autoregressive part, d is the degree of differencing used to ensure stationarity, and q is the order of the moving average part. ARIMA explicitly models temporal dependence through a global, linear autoregressive and moving-average structure, relying on assumptions of linearity and stationarity (after differencing) and capturing dynamics with fixed parameters across time. The ARIMA framework served a primary role within this research to provide a robust baseline against which more advanced forecasting approaches could be evaluated.

Note that, the organization of input data differs fundamentally between ARIMA model and ML approaches. The ARIMA model is formulated for a single univariate time series and is therefore estimated separately for each region using only the historical passenger counts of that region. In contrast, ML models require the time series to be transformed into a supervised learning format, where lagged passenger volumes are used as input features, along with additional categorical and calendar-based variables. This allows ML models to be trained globally across regions, capturing both temporal dynamics and cross-sectional heterogeneity within a unified framework.

3.2.3. Input Feature Vector for ML/DL

For each observation at time t, we construct an input feature vector

x \in R^{13}

comprising temporal and categorical features:

\begin{matrix} x = {[d_{t}^{(y)}, d_{t}^{(w)}, q_{t}, m_{t}, y_{t}, r_{1}, r_{2}, \dots, r_{8}]}^{⊤} \end{matrix}

(4)

where

d_{t}^{(y)} \in 1, \dots, 366

denotes day of year;

d_{t}^{(w)} \in 0, \dots, 6

denotes day of week;

q_{t} \in 1, 2, 3, 4

denotes quarter;

m_{t} \in 1, \dots, 12

denotes month;

y_{t} \in Z^{+}

denotes year;

r_{j} \in 0, 1

for

j = 1, \dots, 8

are one-hot encoded region indicators. The target variable is passenger volume at time t, that is

D_{t} \in R^{+}

. Note that, temporal features have been extracted from timestamps and encoded as integer values. While cyclical encoding via sine–cosine transformation (e.g.,

m_{sin} = sin (2 π m / 12)

,

m_{cos} = cos (2 π m / 12)

) can better represent periodic features by preserving adjacency relationships (e.g., December and January), we employ linear integer encoding given the monthly aggregation level of our data. Tree-based models (XGBoost, Random Forest) can implicitly learn seasonal splits through recursive partitioning, and neural networks can capture periodicity through sufficient training examples across multiple seasonal cycles.

3.2.4. Random Forest

RF is an ensemble machine learning algorithm that builds a collection of decision trees and aggregates their outputs to enhance predictive accuracy and stability [35]. For regression tasks, the prediction of the RF model for an input vector

x

is given by:

\hat{D} (x) = \frac{1}{T} \sum_{t = 1}^{T} h_{t} (x),

(5)

where T is the total number of trees and

h_{t} (x)

is the prediction of the t-th tree, which recast time series forecasting as a supervised learning problem by using lagged observations as predictors and learning a nonlinear, data-driven mapping from past values to future outcomes. Each tree is trained on a bootstrap sample of the dataset, and at each split, only a random subset of predictors of size m (with

m < p

, where p is the total number of features) is considered. The best split is chosen by minimizing the mean squared error (MSE):

MSE (S) = \frac{1}{| S |} \sum_{i \in S} {(D_{i} - {\bar{D}}_{S})}^{2},

(6)

where S is the set of training observations in a node,

| S |

is the size of set S and

{\bar{y}}_{S}

is the average of their response values. In this study, RF was applied to effectively model the potentially complex and non-linear relationships between the number of passengers and the time. By aggregating across multiple regression trees, RF reduces variance and produces more stable predictions than a single regression tree. Compared to individual regression trees, RF is less prone to overfitting due to its ensemble structure.

3.2.5. Extreme Gradient Boosting (XGBoost)

XGBoost is an advanced implementation of the Gradient Boosting framework, designed for speed, accuracy, and scalability [36]. It constructs an ensemble of regression trees in a sequential manner, where each new tree is trained to minimize the residual errors of the existing ensemble. Given a training dataset

{(x_{i}, D_{i})}_{i = 1}^{n}

with each

x_{i} \in R^{p}

having the same data structure as Equation (4) and response

D_{i} \in R

for each time i, XGBoost models the prediction at iteration t as:

{\hat{D}}_{i}^{(t)} = {\hat{D}}_{i}^{(t - 1)} + f_{t} (x_{i}),

(7)

where

f_{t} \in F

is a regression tree from the function space

F

of the trees, and

{\hat{y}}_{i}^{(0)}

is typically initialized as the mean of the response.

x_{i}

is the input vector that consists of the lagged predictors generated from time series

D_{t}

. The algorithm minimizes a regularized objective function:

L^{(t)} = \sum_{i = 1}^{n} l (D_{i}, {\hat{D}}_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t}),

(8)

where

l (\cdot)

is a differentiable convex loss function (e.g., squared error loss for regression), and

Ω (f_{t})

is a regularization term that penalizes model complexity.

This iterative refinement enables XGBoost to capture complex non-linear relationships in the time series data while maintaining robustness through regularization. XGBoost is widely recognized for its effectiveness across a variety of machine learning tasks, particularly when dealing with high-dimensional datasets and intricate feature interactions. Its built-in mechanisms for regularization (

L_{1}

and

L_{2}

) help prevent overfitting and contribute to predictive stability.

3.2.6. Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) networks are a specialized type of recurrent neural network (RNN) designed to capture sequential dependencies in time series data, including both short- and long-term patterns [37]. LSTMs address the vanishing gradient problem common in traditional RNNs by incorporating a memory cell

c_{t}

and hidden state

h_{t}

, which allow the network to retain relevant information across long sequences.

In a simplified mathematical view, the LSTM updates its hidden state at each time step t as a function of the current p dimensional input vector

x_{t}

consists of lagged predictors from time series

D_{t}

, and the previous hidden state

h_{t - 1}

:

h_{t} = F (x_{t}, h_{t - 1}; Θ),

(9)

where

F (\cdot)

represents the non-linear transformation performed by the LSTM cell and

Θ

denotes the set of learnable parameters with the network. This abstraction highlights that LSTMs iteratively refine predictions by learning complex, non-linear temporal relationships without specifying detailed gate operations. Its ability to automatically learn temporal structure makes it well-suited for forecasting scenarios where historical context strongly influences future values.

To complement the LSTM, we also implemented the Gated Recurrent Unit (GRU) [38], which can be viewed mathematically in a similar form as LSTM:

h_{t} = \tilde{F} (x_{t}, h_{t - 1}; \tilde{Θ}),

(10)

where

\tilde{F} (\cdot)

is a streamlined recurrent function with fewer parameters than a standard LSTM, which means that

\tilde{Θ}

is a subspace of

Θ

. GRU retains most of the temporal modeling capability while being computationally more efficient, making it suitable for resource-constrained settings. By including both LSTM and GRU, this study evaluates the trade-off between predictive performance and computational cost, providing a comprehensive assessment of deep learning approaches for forecasting U.S. outbound passenger volumes.

The above-mentioned methods are applied to evaluate forecasting performance for US outbound travel demand data, with the objective of identifying a robust and suitable approach for similar prediction tasks in economic and business contexts. To achieve this, a series of experimental studies is conducted to demonstrate how these methods are implemented and to assess their performance under the data characteristics and scenarios considered in this study.

3.2.7. Hyperparameter Tuning for Machine Learning

The RF model was configured with

n_e s t i m a t o r s = 100

trees, using default splitting criteria (

\max_depth = None

and

\min_samples_split = 2

), and a fixed

random_state

= 42 to ensure reproducibility. The selected hyperparameter values for XGBoost, along with their corresponding justifications, are summarized in Table 2. For the LSTM and GRU models, the hyperparameter values, reported in Table 3, were chosen based on commonly adopted defaults in the machine learning literature and preliminary experimentation, rather than through systematic grid search or Bayesian optimization. This strategy prioritizes reproducibility and computational efficiency; however, we acknowledge that region-specific hyperparameter tuning may further enhance individual model performance and identify this as a direction for future research. The total number of tunable parameters for each selected model is summarized in Table 4.

3.2.8. Model Validation Strategy

For the ARIMA model, we implement walk-forward (expanding window) validation: at each time step t in the test period, the model is refit using all available observations up to

t - 1

, a one-step-ahead forecast is generated. The actual observation is then appended to the training history before proceeding to

t + 1

. This approach mimics realistic forecasting conditions where the model is updated as new data becomes available. For ML and DL models (XGBoost, Random Forest, LSTM, GRU), a single temporal split is employed, where models are trained once on the full training set and evaluated on the held-out test period. While walk-forward validation would provide more robust performance estimates, the computational cost of iteratively retraining neural networks (each requiring 200 epochs) and ensemble models (1000+ trees) for each forecast horizon renders this approach impractical within the scope of the current study. While the ARIMA model employs walk-forward validation with expanding windows, the ML and DL models are evaluated using a single train–test split. This asymmetry arises from computational constraints: iteratively retraining neural networks and large ensemble models for each forecast step would substantially increase computational requirements. Consequently, the reported performance metrics for ML/DL models reflect a single temporal split rather than averaged performance across multiple forecast origins.

3.2.9. Methodological Limitations

Several methodological constraints should be acknowledged. First, the DL models (LSTM, GRU) were trained with fixed hyperparameters across all regions without region-specific tuning. Given the heterogeneity in passenger volumes and seasonal patterns across destination regions, individualized hyperparameter optimization could potentially enhance model performance, particularly for regions with lower sample variability (e.g., Africa, Oceania). Second, no explicit regularization mechanisms (dropout layers, L2 weight decay) were incorporated into the neural network architectures. While the shallow network depth partially mitigates overfitting, future work could investigate the impact of regularization on generalization performance. Third, the current implementation uses a fixed training epoch count (i.e., 200) without early stopping for neural networks. Implementing patience-based early stopping could improve computational efficiency and potentially reduce overfitting, particularly for regions exhibiting faster convergence. These choices reflect a balance between methodological rigor and practical constraints, with the primary objective of establishing baseline comparative performance across model families rather than achieving maximally optimized predictions for each individual model.

4. Experimental Study

4.1. Data Preparation

The raw dataset underwent a systematic preprocessing pipeline to ensure data quality and analytical consistency. The following steps were applied sequentially:

4.1.1. Missing Value Handling

An initial data audit identified missing passenger counts for December 2023 across all eight regions. To preserve temporal continuity while avoiding information leakage from future observations, missing values were imputed using linear interpolation with a time-based method. Specifically, for each region r, missing values were estimated as:

\begin{matrix} {\hat{D}}_{t, r} = D_{t - 1, r} + \frac{D_{t + 1, r} - D_{t - 1, r}}{2}, \end{matrix}

(11)

where

D_{t, r}

denotes passenger volume for region r at time t. This approach maintains the temporal structure of the series while providing reasonable estimates based on adjacent observations. The interpolation was applied independently within each regional subset to preserve region-specific trends.

4.1.2. Date Parsing and Temporal Feature Extraction

The original date field, recorded in YYYY-MM format, was parsed into a date time index to enable time series operations. From this temporal index, the following features were extracted: Year, Month, Quarter, Day of Week, and Day of Year. These temporal features were encoded as integer values rather than cyclical (sine–cosine) transformations. This design choice reflects the monthly aggregation level of the data, where tree-based models can implicitly learn seasonal splits through recursive partitioning, and neural networks can capture periodicity through exposure to multiple seasonal cycles during training.

4.1.3. Regional Subset Creation

The dataset was stratified by destination region to enable region-specific model training and evaluation. Eight regional subsets were created based on the categorical region labels provided in the source data. Each regional subset contains 132 monthly observations. Region identifiers were subsequently transformed into one-hot encoded dummy variables (8 binary features) to serve as categorical predictors for machine learning models.

4.1.4. Train–Test Split

Data partitioning followed a temporal split strategy to prevent data leakage. For each region, observations from 2013–2019 (approximately 80%) were allocated to the training set, with subsequent observations reserved for testing. This chronological split ensures that model evaluation reflects genuine out-of-sample forecasting performance.

4.2. Computational Environment

All experiments were conducted using Python 3.11.5 on MacOS. The hardware configuration consisted of Apple M2 with 16 GB of RAM. The key software libraries and their versions are summarized in Table 5.

4.3. Exploratory Data Analysis

To explore the distributional characteristics of outbound travel volumes, a distribution was obtained using the full dataset, which is displayed in Figure 2a. The resulting distribution reveals a clear right-skewed pattern, with a high frequency of observations concentrated at lower passenger counts. This suggests that a majority of routes involve relatively modest passenger volumes, potentially reflecting either lower travel demand or less frequent flight availability. Conversely, the long tail of the distribution highlights the presence of extreme values, routes or regions with exceptionally high travel volumes. These outliers stretch the passenger count up to approximately 2.7 million and are likely attributable to major international hubs or popular tourism destinations. The distributional analysis thus underscores the unequal distribution of air traffic among different destinations and the importance of identifying these high-impact routes in travel demand forecasting. This result underscores the importance of conducting region- or destination-specific analyses to effectively account for heterogeneity in the data.

To examine regional disparities, we conducted a comparative analysis of total passenger volumes. The results, presented in Figure 2b, reveal distinct differences in central tendency across regions. The results indicate that Europe and the Caribbean consistently account for the highest outbound passenger traffic, both before and after the pandemic. This suggests sustained demand for these regions, driven by a combination of tourism, historical ties, and well-developed airline connectivity. Regions such as Asia and North America also record substantial passenger volumes, though they rank slightly below the leading regions. These findings point to the important but secondary role these areas play in U.S. outbound travel networks.

4.4. Designed Experiments

A series of experiments were conducted to assess the performance of various modeling approaches. The objective was to determine the most accurate and computationally efficient models discussed above for forecasting outbound passenger numbers, thereby supporting informed decision-making in sectors such as tourism, transportation management, and logistics.

Experiment 1: ARIMA Modeling. The first experiment implemented the ARIMA model as a benchmark. The model was trained on the training data and evaluated using Root Mean Squared Error (RMSE), to assess goodness-of-fit and forecast accuracy.
Experiment 2: Machine Learning Models (RF, Gradient Boosting, LSTM and GRU). In the second experiment, RF and XGBoost (Gradient Boosting) models were trained using the Year and Month features as inputs. These models were tuned via hyperparameter optimization to enhance performance.
Experiment 3: Comparative Analysis. The third experiment focused on a comparative evaluation of the ARIMA model and the machine learning methods. Performance was assessed on the same test dataset using the same performance metrics to determine which approach provided superior predictive accuracy and reliability for passenger number forecasting.

4.5. Performance Evaluation

Model performance was evaluated using out-of-sample predictive accuracy metrics. Three complementary measures were employed:

Root Mean Squared Error (RMSE): Measures the average magnitude of prediction errors, with lower values indicating greater accuracy. RMSE is particularly sensitive to large errors due to the squaring operation.
Normalized RMSE (NRMSE): Computed as RMSE divided by the mean of observed values, enabling cross-regional comparison by accounting for differences in passenger volume scales.
Mean Absolute Error (MAE): Measures the average absolute deviation between predicted and observed values, providing a more robust measure less sensitive to outliers than RMSE.

In addition, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are used exclusively to evaluate the ARIMA model. These likelihood-based criteria are theoretically grounded in maximum likelihood estimation with well-defined parameter dimensionality, which does not naturally extend to ML ensemble methods or neural network architectures where the concept of “effective parameters” differs fundamentally. Consequently, model comparison for ML/DL approaches relies exclusively on out-of-sample predictive accuracy metrics (i.e., RMSE, NRMSE, MAE).

5. Results

In this section, we analyze the obtained results. Table 6 summarizes the model performance across all evaluation metrics. Overall, the LSTM model achieves the highest predictive accuracy, as evidenced by its lowest RMSE, MAE, and normalized RMSE values. This suggests a strong capability in capturing complex, non-linear temporal dependencies. However, these gains in accuracy come with increased model complexity (excessive model parameter to be trained), highlighting a trade-off between fit and parsimony, particularly relevant in contexts where interpretability and computational efficiency are critical.

Notably, simpler models such as ARIMA and Random Forest demonstrate competitive accuracy while maintaining lower complexity. We also note that modeling the aggregated data without separating regions would introduce substantial heterogeneity, as evidenced by the markedly different mean levels of outbound passenger volumes across regions. Because of this reason, we further analyze model performance across eight regions: Europe, the Caribbean, Asia, South America, Central America, Oceania, the Middle East, and Africa. Given the heterogeneity in travel demand patterns driven by seasonal, economic, cultural, and geopolitical factors, this regional breakdown offers insight into the adaptability of each model.

Model performance varied considerably across regions, reflecting differences in data scale, variability, and underlying travel dynamics. Overall, the comparative analysis highlights the importance of aligning model complexity with data characteristics and operational constraints.

For Africa, where the results are reported in Table 7, the ARIMA(1, 1, 1) model achieved the best performance across all key error metrics, demonstrating a strong capacity to capture the underlying linear structure of passenger flows. Both XGBoost and Random Forest showed moderate results, with the latter slightly outperforming the former. Deep learning models, particularly LSTM, performed poorly, as evidenced by high RMSE, and MAE. This suggests overfitting due to limited data volume and variability, emphasizing the limitations of data-intensive models in low-sample contexts. While ARIMA offered simplicity and accuracy, it lacked flexibility to account for non-linear effects, whereas tree-based models provided adaptability at the cost of computational efficiency.

In contrast, the results for Asia, shown in Table 8, exhibited distinct patterns favoring deep learning approaches. Both LSTM and GRU achieved the best predictive accuracy, benefiting from their ability to capture non-linear temporal dependencies and long-term relationships inherent in the data. ARIMA remained a robust yet simpler alternative, performing adequately for relatively stable time series but failing to respond effectively to non-linear shifts. XGBoost and Random Forest produced intermediate results, suggesting that their performance could improve with feature augmentation or hyperparameter tuning. These results illustrate how model suitability depends on the complexity and temporal structure of regional data.

For the Caribbean, shown in Table 9, Random Forest delivered the lowest RMSE and normalized RMSE, confirming its superior ability to model non-linear interactions without substantial overfitting. Deep learning models such as LSTM and GRU provided moderate performance. ARIMA underperformed, reflecting its inability to represent pronounced seasonality and volatility in regional travel patterns. These findings highlight the efficacy of ensemble tree-based methods in small- to medium-scale datasets where non-linearity is present but data are insufficient for deep architectures.

In Europe, where the obtained results are given in Table 10, LSTM achieved the lowest RMSE (219,030), while GRU obtained the lowest normalized RMSE (0.13), indicating that both models effectively captured complex seasonal and trend components in the data. However, their computational demands and tuning requirements limit their scalability for operational use. ARIMA, by comparison, underperformed due to its rigid linear assumptions, underscoring its limited adaptability to intricate time series dynamics. The results suggest that while deep learning models excel in high-variability environments, practical applications must balance predictive power against resource and interpretability considerations.

For the Middle East, XGBoost emerged as the top-performing model, yielding the lowest RMSE, normalized RMSE, and MAE values, which are shown in Table 11. Its gradient boosting framework effectively modeled non-linear dependencies and complex regional variations. Random Forest performed moderately well, outperforming ARIMA but trailing XGBoost. LSTM, on the other hand, recorded the weakest performance, likely due to overfitting in a relatively small dataset. While XGBoost achieved the highest accuracy, its computational intensity and limited interpretability suggest that marginal gains over simpler alternatives should be carefully evaluated in applied settings.

In Oceania, from the results shown in Table 12, we can see that ARIMA achieved the lowest RMSE and MAE, demonstrating strong predictive accuracy and reliable tracking of observed data trends. Random Forest also performed competitively, while GRU and XGBoost showed higher errors and lower stability. Although ARIMA’s linear structure effectively captured the dominant temporal trends, its inability to model complex dependencies may restrict its usefulness in more data-rich or volatile contexts. Nonetheless, for small or stable datasets, ARIMA remains a parsimonious and effective forecasting tool.

Finally, the comparative results corresponding to South America are displayed in Table 13. In this table, we see that XGBoost outperformed all other models, achieving the lowest RMSE (23,702.39), normalized RMSE (0.17), and MAE (18,465.63). ARIMA provided a consistent baseline with clear interpretability but lagged in predictive precision. Both LSTM and GRU underperformed, likely due to limited temporal depth and data complexity. These outcomes reaffirm the balance required between accuracy, model complexity, and interpretability when selecting predictive approaches for regional forecasting. Taken together, the results reveal that model performance is context-dependent. ARIMA remains effective for stable or low-variability series, tree-based ensemble methods (XGBoost, Random Forest) excel in moderately complex and nonlinear contexts, and deep learning models (LSTM, GRU) are most suitable when sufficient data are available to support their representational capacity. The comparative findings underscore the necessity of tailoring model selection to regional data characteristics, ensuring that predictive accuracy is achieved without sacrificing scalability or interpretability.

Figure 3 and Figure 4 illustrate substantial regional heterogeneity in U.S. outbound travel demand and corresponding differences in model performance. Across all regions, outbound passenger volumes exhibit distinct magnitudes, volatility levels, and temporal patterns, underscoring the importance of region-specific analysis. No single forecasting model consistently outperforms others across all regions, highlighting the limitations of a uniform modeling strategy. In regions characterized by higher variability and more complex demand dynamics, such as Europe and Asia, machine learning and deep learning models more effectively track observed fluctuations, while in regions with lower passenger volumes or more stable patterns, simpler models such as ARIMA remain competitive and, in some cases, superior.

The comparative results further demonstrate that model effectiveness depends critically on data richness and structural complexity. Deep learning architectures, particularly LSTM and GRU, tend to provide improved predictive accuracy in regions with sufficient data and pronounced non-linear temporal dependencies, whereas ensemble tree-based methods offer strong performance in moderately complex settings. Conversely, in data-sparse or relatively stable regions, traditional time series models yield robust and interpretable forecasts with lower computational cost. Together, these findings emphasize that outbound travel demand forecasting benefits from a flexible, region-adaptive modeling framework that balances predictive accuracy, computational efficiency, and interpretability rather than relying on a single universal approach.

6. Discussions

The empirical results reveal substantial heterogeneity in model performance across regions, underscoring that forecasting accuracy is strongly conditioned on data scale, variability, and temporal structure. Ensemble tree-based models, such as XGBoost and Random Forest, consistently performed well in regions characterized by moderate data volume and nonlinear seasonal patterns, while deep learning models exhibited advantages primarily in high-traffic regions with richer temporal dynamics. These findings indicate that no single modeling approach dominates across all contexts, highlighting the importance of aligning model complexity with regional data characteristics.

Deep learning architectures, including LSTM and GRU, demonstrated improved predictive accuracy in regions with sufficient historical depth, where long-term temporal dependencies and evolving demand patterns are more pronounced. Conversely, their weaker performance in data-sparse regions suggests sensitivity to sample size and increased risk of overfitting. This shows the practical limits of highly parameterized models in constrained settings. Tree-based ensemble methods offered a more stable balance between flexibility and robustness in several regions, making them good alternatives when deep architectures are not ideal for the data.

Feature design also played a critical role in shaping model performance. The inclusion of temporal indicators and lagged passenger volumes enhanced forecast accuracy by enabling models to better capture seasonal persistence and trend evolution. These results highlight that careful data organization and feature engineering remain essential, even when advanced learning algorithms are employed. From an applied perspective, the region-specific insights derived from this comparative analysis can support more targeted decision-making in capacity planning, marketing allocation, and policy design. By demonstrating how forecasting performance varies across regional contexts, this study provides practical guidance for selecting appropriate modeling strategies in real-world travel demand applications.

7. Conclusions and Future Work

This study demonstrates that the suitability of forecasting models for outbound travel varies significantly across regions, depending largely on the volume of data and the complexity of underlying travel patterns. For high-traffic regions such as Europe and the Caribbean, advanced machine learning models, particularly XGBoost and Random Forest, consistently outperformed traditional approaches. These models effectively captured seasonal variations and abrupt changes, delivering high predictive accuracy. In contrast, for regions with lower passenger volumes, simpler models like ARIMA proved more reliable, offering stable forecasts while mitigating the risk of overfitting in data-sparse environments.

The results emphasize the importance of region-specific model selection, where the balance between model complexity and data availability must be carefully managed. While sophisticated models are advantageous in data-rich contexts, simpler, more interpretable models offer better generalizability in regions with limited data. This trade-off between bias and variance is critical for ensuring robust and practical forecasting outcomes.

The findings of this study show direct implications for industry practitioners and policymakers. For airline revenue management, the demonstrated superiority of adaptive machine learning models enables more responsive capacity planning, allowing carriers to optimize seat inventory and route frequency in anticipation of demand fluctuations across destination regions. Airport authorities and ground handlers can leverage regional forecasts to allocate staffing and terminal resources more efficiently, particularly during seasonal peaks.

From a policy perspective, destination countries can utilize these forecasting frameworks to anticipate U.S. tourist inflows, informing visa processing capacity, border control staffing, and tourism infrastructure investments. Also, national tourism organizations can apply region-specific forecasts to allocate marketing budgets strategically, targeting promotional efforts toward periods and regions where demand elasticity is highest. At the macroeconomic level, improved forecasting accuracy supports more reliable projections of tourism-related foreign exchange earnings, informing fiscal planning in tourism-dependent economies.

Looking ahead, future research can build on these findings in several ways. First, incorporating additional features, such as economic indicators, political events, and environmental factors like weather patterns, could significantly enhance predictive performance, especially in regions characterized by high variability. Second, exploring ensemble learning techniques, such as stacking or blending, offers the potential to combine the strengths of various models. Such hybrid approaches could adaptively apply simpler models in data-scarce settings and more complex algorithms in data-rich contexts to yield consistently strong results.

Author Contributions

Conceptualization, S.X. and C.O.; methodology, S.X. and C.O.; software, C.O.; validation, S.X. and C.O.; formal analysis, S.X. and C.O.; investigation, S.X. and C.O.; writing—original draft preparation, S.X. and C.O.; writing—review and editing, S.X.; visualization, C.O.; supervision, S.X.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article material. For further inquiries, please contact the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Agbola, F.W.; Dogru, T.; Gunter, U. Tourism demand: Emerging theoretical and empirical issues. Tour. Econ. 2020, 26, 1307–1310. [Google Scholar] [CrossRef]
Tuite, A.R.; Bhatia, D.; Moineddin, R.; Bogoch, I.I.; Watts, A.G.; Khan, K. Global trends in air travel: Implications for connectivity and resilience to infectious disease threats. J. Travel Med. 2020, 27, taaa070. [Google Scholar] [CrossRef]
Richter, L.K. International tourism and its global public health consequences. J. Travel Res. 2003, 41, 340–347. [Google Scholar] [CrossRef]
Ceccato, R.; Rossi, R.; Gastaldi, M. Travel demand prediction during COVID-19 pandemic: Educational and working trips at the University of Padova. Sustainability 2021, 13, 6596. [Google Scholar] [CrossRef]
Baghestani, A.; Tayarani, M.; Mamdoohi, A.R.; Habibian, M.; Gao, O. Travel demand management implications during the COVID-19 pandemic: The case study of Tehran. Sustainability 2023, 15, 1209. [Google Scholar] [CrossRef]
Shemer, L.; Shayanfar, E.; Avner, J.; Miquel, R.; Mishra, S.; Radovic, M. COVID-19 impacts on mobility and travel demand. Case Stud. Transp. Policy 2022, 10, 2519–2529. [Google Scholar] [CrossRef]
Miller, E.J. Travel demand models, the next generation: Boldly going where no-one has gone before. In Mapping the Travel Behavior Genome; Elsevier: Amsterdam, The Netherlands, 2020; pp. 29–46. [Google Scholar]
Zhang, T.; Zhang, Z.; Xue, G. Mitigating the disturbances of events on tourism demand forecasting. Ann. Oper. Res. 2024, 342, 1019–1040. [Google Scholar] [CrossRef]
Lim, C.; McAleer, M. Time series forecasts of international travel demand for Australia. Tour. Manag. 2002, 23, 389–396. [Google Scholar] [CrossRef]
Jha, K.; Sinha, N.; Arkatkar, S.S.; Sarkar, A.K. A comparative study on application of time series analysis for traffic forecasting in India: Prospects and limitations. Curr. Sci. 2016, 110, 373–385. [Google Scholar] [CrossRef]
Peng, H.; Yan, J.; Yu, Y.; Luo, Y. Time series estimation based on deep learning for structural dynamic nonlinear prediction. Structures 2021, 29, 1016–1031. [Google Scholar] [CrossRef]
Chen, R.; Jin, X.; Laima, S.; Huang, Y.; Li, H. Intelligent modeling of nonlinear dynamical systems by machine learning. Int. J. -Non-Linear Mech. 2022, 142, 103984. [Google Scholar] [CrossRef]
Li, L.; Wu, Y.; Ou, Y.; Li, Q.; Zhou, Y.; Chen, D. Research on machine learning algorithms and feature extraction for time series. In Proceedings of the 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC); IEEE: Piscataway, NJ, USA, 2017; pp. 1–5. [Google Scholar]
Herff, C.; Krusienski, D.J. Extracting features from time series. In Fundamentals of Clinical Data Science; Springer: Cham, Switzerland, 2019; pp. 85–100. [Google Scholar]
Ubal, C.; Di-Giorgi, G.; Contreras-Reyes, J.E.; Salas, R. Predicting the long-term dependencies in time series using recurrent artificial neural networks. Mach. Learn. Knowl. Extr. 2023, 5, 1340–1358. [Google Scholar] [CrossRef]
Law, R.; Li, G.; Fong, D.K.C.; Han, X. Tourism demand forecasting: A deep learning approach. Ann. Tour. Res. 2019, 75, 410–423. [Google Scholar] [CrossRef]
Yu, N.; Chen, J. Design of machine learning algorithm for tourism demand prediction. Comput. Math. Methods Med. 2022, 2022, 6352381. [Google Scholar] [CrossRef]
Li, Y.; Lin, Z.; Xiao, S. Using social media big data for tourist demand forecasting: A new machine learning analytical approach. J. Digit. Econ. 2022, 1, 32–43. [Google Scholar] [CrossRef]
Zhang, Y.; Li, G.; Muskat, B.; Law, R. Tourism demand forecasting: A decomposed deep learning approach. J. Travel Res. 2021, 60, 981–997. [Google Scholar] [CrossRef]
Lee, G.C. A Data-Driven Approach to Tourism Demand Forecasting: Integrating Web Search Data into a SARIMAX Model. Data 2025, 10, 73. [Google Scholar] [CrossRef]
Giannoukou, I.; Kougia, A. Innovative Digital Strategies in Crisis Management for Tourism Enterprises: Enhancing Resilience and Sustainable Growth in the Digital Era. In International Conference of the International Association of Cultural and Digital Tourism; Springer: Berlin/Heidelberg, Germany, 2025; pp. 47–87. [Google Scholar]
Bontempi, G.; Ben Taieb, S.; Le Borgne, Y.A. Machine learning strategies for time series forecasting. In European Big Data Management and Analytics Summer School; Springer: Berlin/Heidelberg, Germany, 2012; pp. 62–77. [Google Scholar]
Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long-and short-term temporal patterns with deep neural networks. In Proceedings of the 41st international ACM SIGIR Conference on Research & Development in Information Retrieval; Association for Computing Machinery: New York, NY, USA, 2018; pp. 95–104. [Google Scholar]
Chen, J.; Li, C.; Huang, L.; Zheng, W. Tourism demand forecasting: A deep learning model based on spatial-temporal transformer. Tour. Rev. 2025, 80, 648–663. [Google Scholar] [CrossRef]
Zhang, Y.; Tan, W.H.; Zeng, Z. Tourism Demand Forecasting Based on a Hybrid Temporal Neural Network Model for Sustainable Tourism. Sustainability 2025, 17, 2210. [Google Scholar] [CrossRef]
Sun, S.; Li, Y.; Guo, J.; Wang, S. Tourism Demand Forecasting: An Ensemble Deep Learning Approach. arXiv 2021, arXiv:2002.07964. [Google Scholar] [CrossRef]
Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2021, 379, 20200209. [Google Scholar] [CrossRef] [PubMed]
Colladon, A.F.; Guardabascio, B.; Innarella, R. Using social network and semantic analysis to analyze online travel forums and forecast tourism demand. Decis. Support Syst. 2019, 123, 113075. [Google Scholar] [CrossRef]
Liu, X.; Liu, A.; Chen, J.L.; Li, G.; Song, H. Tourism Demand Forecasting in Normal and Crisis Times: Combining Bootstrap-Aggregating and Bayesian Approaches. J. Hosp. Tour. Res. 2025, 10963480251313492. [Google Scholar] [CrossRef]
Nguyen-Da, T.; Li, Y.M.; Peng, C.L.; Cho, M.Y.; Nguyen-Thanh, P. Tourism demand prediction after COVID-19 with deep learning hybrid CNN–LSTM—Case study of Vietnam and provinces. Sustainability 2023, 15, 7179. [Google Scholar] [CrossRef]
Diao, T.; Wu, X.; Yang, L.; Xiao, L.; Dong, Y. A novel forecasting framework combining virtual samples and enhanced Transformer models for tourism demand forecasting. arXiv 2025, arXiv:2503.19423. [Google Scholar] [CrossRef]
Li, X.; Xu, Y.; Law, R.; Wang, S. Enhancing tourism demand forecasting with a transformer-based framework. Ann. Tour. Res. 2024, 107, 103791. [Google Scholar] [CrossRef]
Yi, S.; Chen, X.; Tang, C. Time series transformer for tourism demand forecasting. Sci. Rep. 2025, 15, 29565. [Google Scholar] [CrossRef] [PubMed]
Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In KDD’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]

Figure 1. The flowchart of time series processing, modelling, and model evaluation via comparative analysis.

Figure 2. Data distribution function of outbound travel passengers and passenger numbers by regions (based on 2013–2018 data).

Figure 3. U.S. outbound passengers vs. model predictions for the regions of Africa, Asia, Caribbean and Europe.

Figure 4. U.S. outbound passengers vs. model predictions for the regions of Middle East, Oceania and South America.

Table 1. Summary of dataset characteristics.

Metric	Value
Total observations	1056 (excluding header)
Regions	8
Time points per region	132 months (January 2013–December 2023)
Variables	Date, Region, Passengers
Train/Test split	Varies by region (walk-forward validation)

Table 2. Hyperparameter values configured for XGBoost.

Parameter	Value	Rationale
n_estimators	1000	Sufficient ensemble size with early stopping
max_depth	3	Prevents overfitting; promotes generalization
learning_rate	0.1	Standard value balancing convergence speed and accuracy
early_stopping_rounds	50	Halts training when validation performance plateaus

Table 3. Neural network hyperparameter values for LSTM and GRU.

Parameter	Value
Hidden units per layer	50
Number of layers	2 (stacked)
Optimizer	Adam (learning rate = 0.001)
Loss function	Mean Squared Error
Epochs	200
Batch Size	32
Input normalization	Min-Max scaling [0, 1]
Sequence shuffling	Disabled (shuffle = False)

Table 4. Model Parameters Summary.

Model	Number of Trainable Parameters
ARIMA	3 (1 AR + 1 MA + 1 constant/variance)
XGBoost	∼1000 trees × splits per tree
Random Forest	100 trees × features
LSTM	33,051
GRU	24,801

Table 5. Software environment specifications.

Library	Version	Purpose
TensorFlow	2.19.0	Deep learning framework
Keras	3.10.0	Neural network API
XGBoost	3.1.3	Gradient boosting implementation
scikit-learn	1.6.1	Random Forest and preprocessing
statsmodels	0.14.6	ARIMA implementation
pandas	2.2.2	Data manipulation
numpy	2.0.2	Numerical computation

Table 6. Comparison of performance metrics for the models we consider across all regions.

Model	Mean	RMSE	NRMSE	MAE
ARIMA(1, 1, 1)	375,939	509,181	1.454	328,948
XGBoost	375,939	117,384	0.312	63,896
Random Forest	375,939	117,762	0.313	63,932
LSTM	375,939	101,999	0.271	61,392
GRU	375,939	107,763	0.286	68,763

Table 7. Comparison of performance metrics for the models we consider for the Africa region.

Model	RMSE	NRMSE	MAE	Mean
ARIMA(1, 1, 1)	9126.21	0.23	7322.24	40,999.96
XGBoost	13,473.13	0.34	11,302.26	40,999.96
RandomForest	10,607.34	0.27	8440.06	40,999.96
LSTM	23,511.63	0.59	19,676.35	40,999.96

Table 8. Comparison of performance metrics for the models we consider for Asia.

Model	RMSE	NRMSE	MAE	Mean
ARIMA(1, 1, 1)	65,061.99	0.34	51,120.12	533,275.12
XGBoost	81,353.35	0.42	71,266.72	533,275.12
RandomForest	74,971.84	0.39	63,719.83	533,275.12
LSTM	56,747.39	0.29	48,897.79	533,275.12
GRU	64,776.38	0.34	48,019.96	533,275.12

Table 9. Comparison of performance metrics for the models we consider for the Caribbean region.

Model	RMSE	NRMSE	MAE	Mean
ARIMA(1, 1, 1)	155,508.54	0.26	131,744.02	752,918.58
XGBoost	122,773.46	0.21	101,616.95	752,918.58
RandomForest	95,221.44	0.16	80,707.53	752,918.58
LSTM	139,538.48	0.23	106,493.79	752,918.58
GRU	136,793.17	0.23	107,670.03	752,918.58

Table 10. Comparison of performance metrics for the models we consider for Europe.

Model	RMSE	NRMSE	MAE	Mean
ARIMA(1, 1, 1)	343,121.6	0.18	284,167.11	1,532,984.42
XGBoost	290,914.61	0.15	238,561.1	1,532,984.42
RandomForest	304,987.31	0.16	268,880.74	1,532,984.42
LSTM	227,320.09	0.12	186,343.97	1,532,984.42
GRU	238,337.36	0.13	200,376.47	1,532,984.42

Table 11. Comparison of performance metrics for the models we consider for Middle East.

Model	RMSE	NRMSE	MAE	Mean
ARIMA(1, 1, 1)	27,616.97	0.3	23,450.8	207,847.71
XGBoost	16,556.52	0.18	14,404.71	207,847.71
RandomForest	25,293.01	0.27	21,994.26	207,847.71
LSTM	43,638.85	0.47	36,282.11	207,847.71
GRU	41,797.86	0.45	35,623.91	207,847.71

Table 12. Comparison of performance metrics for the models we consider for Oceania.

Model	RMSE	NRMSE	MAE	Mean
ARIMA(1, 1, 1)	15,775.95	0.28	13,029.46	75,791
XGBoost	19,673.66	0.35	15,029.42	75,791
RandomForest	16,308.87	0.29	13,535.49	75,791
LSTM	21,295.64	0.38	17,198.77	75,791
GRU	28,450.97	0.51	23,537.89	75,791

Table 13. Comparison of performance metrics for the models we consider for South America.

Model	RMSE	NRMSE	MAE	Mean
ARIMA(1, 1, 1)	37,131.99	0.26	27,842.25	185,103.38
XGBoost	23,702.39	0.17	18,465.63	185,103.38
RandomForest	29,000.86	0.21	23,552.66	185,103.38
LSTM	36,307.37	0.26	30,483.38	185,103.38
GRU	36,881.12	0.26	32,838.02	185,103.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, S.; Onungwe, C. Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study. Mathematics 2026, 14, 758. https://doi.org/10.3390/math14050758

AMA Style

Xie S, Onungwe C. Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study. Mathematics. 2026; 14(5):758. https://doi.org/10.3390/math14050758

Chicago/Turabian Style

Xie, Shengkun, and Chinwendu Onungwe. 2026. "Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study" Mathematics 14, no. 5: 758. https://doi.org/10.3390/math14050758

APA Style

Xie, S., & Onungwe, C. (2026). Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study. Mathematics, 14(5), 758. https://doi.org/10.3390/math14050758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling and Forecasting U.S. Outbound Travel Demand Across Regions Using Time Series Model and Machine Learning: A Comparative Study

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data

3.2. Statistical and Machine Learning Methods

3.2.1. Model Selection Rationale

3.2.2. Auto Regressive Integrated Moving Average (ARIMA)

3.2.3. Input Feature Vector for ML/DL

3.2.4. Random Forest

3.2.5. Extreme Gradient Boosting (XGBoost)

3.2.6. Long Short-Term Memory (LSTM)

3.2.7. Hyperparameter Tuning for Machine Learning

3.2.8. Model Validation Strategy

3.2.9. Methodological Limitations

4. Experimental Study

4.1. Data Preparation

4.1.1. Missing Value Handling

4.1.2. Date Parsing and Temporal Feature Extraction

4.1.3. Regional Subset Creation

4.1.4. Train–Test Split

4.2. Computational Environment

4.3. Exploratory Data Analysis

4.4. Designed Experiments

4.5. Performance Evaluation

5. Results

6. Discussions

7. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI