Next Article in Journal
Pipeline Systems in Floating Offshore Production Systems: Hydrodynamics, Corrosion, Design and Maintenance
Previous Article in Journal
Seasonal Variation in Pacific Sleeper Shark (Somniosus pacificus) Habitat Use in Prince William Sound, Alaska
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High-Accuracy ETA Prediction for Long-Distance Tramp Shipping: A Stacked Ensemble Approach

1
College of Navigation, Jimei University, Xiamen 361021, China
2
Law School, Institute of Maritime Law, University of Southampton, Southampton SO17 1BJ, UK
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2026, 14(2), 177; https://doi.org/10.3390/jmse14020177
Submission received: 17 December 2025 / Revised: 6 January 2026 / Accepted: 13 January 2026 / Published: 14 January 2026
(This article belongs to the Section Ocean Engineering)

Abstract

The Estimated Time of Arrival (ETA) of vessels is a vital operational indicator for voyage planning, fleet deployment, and resource allocation. However, most existing studies focus on short-distance liner services with fixed routes, while ETA prediction for long-distance tramp bulk carriers remains insufficiently accurate, often resulting in operational inefficiencies and charter party disputes. To fill this gap, this study proposes a data-driven stacking ensemble learning framework that integrates Light Gradient-Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) as base learners, combined with a Linear Regression meta-learner. This framework is specifically tailored to the unique complexities of tramp shipping, advancing beyond traditional single-model approaches by incorporating systematic feature engineering and model fusion. The study also introduces the construction of a comprehensive multi-dimensional AIS feature system, incorporating baseline, temporal, speed-related, course-related, static, and historical behavioral features, thereby enabling more nuanced and accurate ETA prediction. Using AIS trajectory data from bulk carrier voyages between Weipa (Australia) and Qingdao (China) in 2023, the framework leverages multi-feature fusion to enhance predictive performance. The results demonstrate that the stacking model achieves the highest accuracy, reducing the Mean Absolute Error (MAE) to 3.30 h—a 74.7% improvement over the historical averaging benchmark and an 11.3% reduction compared with the best individual model, XGBoost. Extensive performance evaluation and interpretability analysis confirm that the stacking ensemble provides stability and robustness. Feature importance analysis reveals that vessel speed, course stability, and remaining distance are the primary drivers of ETA prediction. Additionally, meta-learner weighting analysis shows that LightGBM offers a stable baseline, while systematic deviations in XGBoost predictions act as effective error-correction signals, highlighting the complementary strengths captured by the ensemble. The findings provide operational insights for maritime logistics and port management, offering significant benefits for port scheduling and maritime logistics management.

1. Introduction

As the backbone of global trade, maritime transport moves nearly 80% of the world’s cargo due to its extensive network coverage and cost efficiency [1]. Within this system, tramp shipping (also known as irregular shipping), as opposed to liner shipping with fixed routes and schedules, operates on a non-scheduled, demand-driven chartering basis [2]. It plays an irreplaceable role by operating on a flexible demand-driven chartering model tailored to the transport of bulk commodities such as iron ore, coal, and grains. This operational flexibility makes tramp vessels the primary carriers of global raw materials and resources. However, the absence of fixed sailing schedules also makes predicting vessel Estimated Time of Arrival (ETA) significantly more challenging than in the highly structured liner shipping sector.
It is particularly important in tramp shipping to accurately predict ETA because of its direct implications for cargo readiness, port resource planning, and charter party performance. Even minor inaccuracies in arrival time can delay cargo operations, disrupt berthing assignments, increase storage and demurrage costs, and weaken supply chain coordination. Contractual risks further heighten the importance of ETA accuracy. Under the Laydays and Canceling Date (LAYCAN) clause defined in Article 97 of China’s Maritime Code, a vessel must arrive at the designated port before the agreed canceling date; otherwise, the charterer may cancel the contract. Therefore, ETA deviations in tramp shipping are not only operationally disruptive but may also directly jeopardize charter party fulfillment, highlighting the centrality of arrival-time controllability in maritime commercial practice.
Furthermore, uncertainty in ETA can also trigger cascading disruptions throughout port logistics systems, leading to substantial economic losses. Studies indicate that even in the well-structured container liner sector, discrepancies between Actual Time of Arrival (ATA) and scheduled ETA can range from 30 to 40 h [3]. In the more unpredictable tramp bulk segment, deviations are often larger and produce more severe consequences. Port operations—such as berth allocation, equipment deployment, manpower scheduling, and yard planning—are typically the first to be impacted [4,5]. Large ETA deviations result in idle pre-allocated resources, unexpected vessel clustering, reduced berth turnover, and ultimately port congestion.
At the strategic level, the Development Research Center of the State Council (DRC) observes that global supply chains are undergoing a shift towards a more regionalized and multipolar structure, posing new challenges for maritime logistics [6]. Liner shipping, with its fixed routes and schedules, offers economies of scale but lacks the flexibility to adjust promptly to evolving regional trade patterns. In contrast, tramp shipping provides strong adaptability due to its flexible voyage planning, but its inherent schedule irregularity leads to weak ETA predictability and reduced port coordination efficiency. Consequently, reliable ETA prediction remains a persistent challenge for the maritime sector, despite International Maritime Organization (IMO) requirements that vessels report ETA in advance [7].
In this context, digital transformation offers a promising path forward. Leveraging the Internet of Things and artificial intelligence technologies enables the development of dynamic, data-driven ETA prediction models that enhance the controllability of tramp vessel operations. This study focuses on ETA prediction for bulk carriers operating on a long-distance route from Port Weipa, Australia, to Qingdao, China. To address the challenges posed by long voyage durations and high operational uncertainty, a stacking ensemble machine-learning framework is proposed. Using historical Automatic Identification System (AIS) trajectory data combined with multiple voyage-related features, the model conducts multi-feature fusion to generate high-precision ETA predictions. The main innovative contributions of this study are as follows:
  • Development of a data-driven stacking ensemble framework.
The study introduces a robust ETA prediction model that integrates Light Gradient-Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) through a Linear Regression meta-learner. This framework is specifically tailored to the operational characteristics of long-haul bulk carriers, overcoming the limitations of single-model approaches commonly used in prior research.
2.
Construction of a comprehensive multi-dimensional AIS feature system.
A systematic feature engineering pipeline is developed, incorporating baseline, temporal, speed-related, course-related, static, and historical behavior features. This six-category, 72-feature set captures both instantaneous navigation states and longer-term behavioral patterns, enabling richer ETA inference than existing studies.
3.
Extensive performance evaluation and interpretability analysis.
Through multi-metric benchmarking, spatial prediction assessment, residual analysis, and feature importance interpretation, the study demonstrates that the stacking model achieves superior accuracy (Mean Absolute Error (MAE) 3.30 h), stability, and robustness. The meta-learner’s weighting strategy is analyzed, revealing how complementary error structures among base models enhance ensemble performance.
4.
Operational insights for maritime logistics and port management.
By reducing ETA uncertainty by approximately 75% compared with conventional benchmarks, the proposed model provides substantial practical value for berth scheduling, resource allocation, and supply chain coordination in long-distance bulk shipping.
The remainder of this paper is structured as follows. Section 2 reviews relevant literature. Section 3 describes the methodology, including data preprocessing, feature engineering, model construction, and the evaluation criteria. Section 4 presents the experimental results. Section 5 discusses the limitations of the study and proposed directions for future research. Finally, Section 6 concludes by summarizing the key findings and implications.

2. Literature Review

The core focus of this paper concerns the application of stacking models, ETA prediction methods, the practical challenges of ETA in port planning, and the factors influencing ETA accuracy. Accordingly, the literature is reviewed from these four perspectives to identify existing research gaps.

2.1. Application of Stacking Models

Stacking represents an advanced ensemble learning technique in which predictions from multiple heterogeneous base learners are combined through a meta-learner. This method has demonstrated superior performance in areas involving high-dimensional, nonlinear, and multi-patterned data.
In environmental science, stacking is used to synthesize multi-source geospatial information for risk assessments. Shojaeian et al. developed a hybrid stacking–Principal Component Analysis (PCA) system incorporating six models to achieve highly accurate flood susceptibility mapping [8]. In civil engineering, stacking has shown strong predictive capability in structural analysis and signal processing tasks [9]. Shu et al. introduced a stacking framework optimized via Bayesian Optimization (BO) that significantly improved the prediction accuracy of reinforced concrete shear capacity [10]. In healthcare, Nguyen and Byeon demonstrated the utility of stacking combined with Local Interpretable Model-agnostic Explanations (LIME) for diagnosing depressive symptoms in Parkinson’s patients, addressing challenges arising from overlapping clinical features [11]. In the energy sector, Cao et al. formulated an LSTM–Informer hybrid backed by stacking algorithms for multi-timescale photovoltaic power forecasting, resulting in substantial improvements in prediction reliability [12].
Collectively, these studies highlight stacking’s ability to integrate complementary strengths across models, reduce overfitting, and enhance robustness—properties particularly valuable for maritime ETA prediction, where data are noisy, highly nonlinear, and influenced by numerous external factors.

2.2. ETA Prediction Methods in the Shipping Industry

Research on vessels’ ETA prediction generally falls into two methodological categories: trajectory-based models and feature-based regression models.
Trajectory-based approaches estimate ETA by first predicting a vessel’s future sailing path and then converting that trajectory into travel time [13]. For example, Alessandrini et al. applied Dijkstra’s algorithm to derive optimal navigation routes [14], while Wu et al. developed a multi-scale visibility graph method suitable for autonomous long-distance navigation [15]. Subsequent studies incorporate more complex navigational and environmental factors. For instance, Park et al. embedded AIS data into a Reinforcement Learning (RL) framework and used Bayesian sampling to estimate speed over ground (SOG), a key input for converting geometric routes into time estimates [16]. Ogura et al. proposed a two-stage scheme explicitly accounting for weather impacts on routing and speed [17], and Li et al. introduced deep RL with artificial potential fields to enhance dynamic routing performance [18]. While effective for structured navigation scenarios with predictable sailing corridors, these methods depend heavily on accurately forecasting the entire future trajectory. This requirement compounds prediction errors and renders such approaches unsuitable for tramp shipping, where routing flexibility and operational variability undermine trajectory predictability.
Feature-based regression methods, by contrast, treat ETA prediction as a supervised learning task that maps extracted features to arrival times [19]. Early efforts using logistic regression, Classification and Regression Trees (CART), and RF relied primarily on static attributes or coarse environmental variables [20,21]. However, these static models perform poorly in dynamic maritime environments, motivating the shift toward machine learning algorithms capable of exploiting the continuously updated position, speed, and heading information provided by AIS. Research has since evolved toward dynamic AIS-driven modeling, employing algorithms ranging from Support Vector Machines (SVM) and Neural Networks (NN) to learning architectures [22]. Noman et al. showed that Gated Recurrent Units (GRU)-based recurrent models effectively capture temporal dependencies and outperform Gradient-Boosting Decision Trees (GBDT) and Multi-layer Perceptron (MLP) for inland ETA prediction [23]. Bourzak et al. further compared MLP, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), and Transformer architectures, identifying Bidirectional Long Short-Term Memory (BiLSTM) as the most effective for sequence-based ETA estimation [24].
Despite these advances, existing methods remain limited in their applicability to long-distance tramp bulk carriers, whose voyages are far longer, less structured, and influenced by more volatile operational conditions than those of liner vessels. This highlights a need for modeling frameworks that are robust to nonlinearity, high variability, and sparse external information.

2.3. ETA Situation in Port Operations

Within the operational framework of ports, the Estimated Time of Arrival (ETA) of vessels serves as a fundamental input for critical decision-making processes, including berth allocation, resource scheduling, and labor planning. The prevailing practice is that ports depend almost exclusively on ETA information provided by the incoming vessels themselves. This information is typically communicated through established channels such as ship agents, direct emails, or standardized messaging within AIS transmissions [20,25]. The content of these ETAs is often derived from manual estimates made by the ship’s master, based on experience and prevailing conditions at the time of reporting [13].
This operational model, however, is characterized by inherent uncertainty. The dynamic and complex nature of maritime voyages means that the initially reported ETA is frequently subject to revision. Factors such as adverse weather, delays at preceding ports, or changes in sailing speed necessitate continuous updates, making the ETA a variable rather than a fixed parameter during a vessel’s journey [13,20]. Consequently, this variability directly impacts port planning efficiency.
The reliance on these volatile, externally provided ETAs presents a well-acknowledged operational scenario for port authorities. The discrepancy between planned and actual arrival times can disrupt meticulously crafted schedules. This often manifests in two contrasting inefficiencies: under-utilization of resources (e.g., quay cranes, pilots, and stevedores standing idle due to delays) or over-congestion at the terminal when multiple vessels arrive earlier than anticipated [26,27]. Such inaccuracies can extend beyond terminal operations, ultimately hindering port competitiveness and the efficiency of interconnected multimodal transport chains [28]. Thus, the management of ETA uncertainty is a recognized and persistent aspect of daily port logistics.
This prevailing situation underscores a clear gap between existing operational practices and the level of information reliability required for optimized, resilient port planning. It establishes a direct practical imperative for the development of more accurate, stable, and data-driven ETA prediction methodologies [13,27].

2.4. Factors Affecting Vessel ETA

There is a broad consensus in the literature that ETA is shaped by a complex interplay of vessel characteristics, operational decisions, environmental conditions, and route-specific factors [29]. Based on an extensive synthesis of prior research, these determinants can be grouped into the major categories summarized in Table 1. Within the Macro route structure category, a precise operational definition of the voyage endpoint (e.g., port boundaries or anchorage areas) is crucial. Ambiguity in this geographical delimiter directly affects the calculation of the final leg’s duration, impacting ETA accuracy. Therefore, standardizing these definitions is an important prerequisite for reliable prediction [30].
For ocean-going tramp bulk carriers, these factors are amplified by several intrinsic operational characteristics—such as the absence of fixed routes or schedules or predetermined routes, frequent market-driven changes in loading/discharging ports, diverse navigational environments across long-distance routes, and limited transparency regarding commercial decisions. Such variability results in highly nonlinear and unpredictable ETA behavior, posing significant challenges for data-driven prediction models that rely on stable patterns or consistent route characteristics [27,44].

2.5. Research Gap

Although substantial progress has been made in ETA prediction research, existing studies largely rely on data from liner shipping, where fixed routes, stable schedules, and recurring voyage patterns provide a supportive environment for trajectory-based and feature-based modeling. These assumptions break down in tramp shipping, where routing is market-driven rather than predetermined, schedules are irregular or nonexistent, and operational decisions change dynamically during the voyage. As a result, existing models tend to exhibit poor generalization performance when applied to long-distance, variable tramp operations.
Additionally, many recent studies enhance ETA models by incorporating external datasets, such as high-resolution weather fields, port congestion indicators, or commercial operation logs. While such information improves accuracy, it is often unavailable or unreliable for tramp vessels, whose commercial activities are not systematically recorded or disclosed. In contrast, AIS trajectories inherently encode the vessel’s navigational behavior—an integrated reflection of engine settings, environmental forces, operational decisions, and traffic interactions. However, how to systematically mine and fuse these latent signals remains underexplored.
Lastly, although stacking models have demonstrated exceptional performance in diverse engineering fields, their application to long-distance ETA prediction for tramp bulk carriers has not been fully investigated. The potential of stacking to exploit complementary residual structures, enhance robustness, and reduce prediction uncertainty represents an important opportunity.
To address these gaps, this study proposes a tailored stacking ensemble framework designed for the complex, nonlinear, and high-uncertainty operational environment of tramp bulk carriers, aiming to significantly improve the accuracy and reliability of long-distance ETA prediction.

3. Methodology

To address the operational complexity and variability inherent in long-distance tramp bulk carrier voyages, this study develops a stacked ensemble learning framework for ETA prediction using AIS data. Through systematic preprocessing of historical AIS records and targeted feature engineering, the proposed model learns the nonlinear relationships between a vessel’s navigational state and its remaining sailing time, enabling accurate end-to-end ETA forecasting.
As illustrated in Figure 1, the methodological workflow comprises four core phases:
The data preprocessing phase begins with cleaning raw AIS messages to remove errors, outliers, and duplicate records. Additional filtering ensures temporal continuity and operational relevance, after which vessel trajectories are segmented into individual voyages. This process establishes a high-quality dataset suitable for subsequent modeling.
Subsequently, feature engineering is constructed to construct a comprehensive set of features that capture voyage efficiency, vessel motion dynamics, and spatiotemporal context. These include speed- and course-related descriptors, distance-based indicators, and rolling-window statistics designed to characterize short- and mid-term behavioral patterns. Collectively, these engineered attributes provide the model with rich information regarding sailing behavior and its link to ETA.
Following this, the model construction and training phase establishes a stacked ensemble framework, comprising LightGBM, XGBoost, and RF as base learners. Their outputs are integrated by a Linear Regression meta-learner, which synthesizes complementary predictive patterns and enhances model robustness and generalization. Hyperparameter tuning of all learners is performed to achieve optimal predictive performance.
Finally, performance evaluation is carried out using four complementary metrics: MAE, Root Mean Square Error (RMSE), Symmetric Mean Absolute Percentage Error (sMAPE), and the coefficient of determination ( R 2 ). Together, these indicators evaluate both absolute error magnitude and the model’s explanatory power across diverse operational conditions.

3.1. Data Preprocessing

AIS data often contains noise, discontinuities, missing values, and redundant records resulting from equipment malfunction, signal interference, and transmission delays. To ensure data quality and improve the reliability of subsequent ETA prediction, this study implements a systematic preprocessing pipeline to transform raw AIS messages into a high-quality vessel trajectory dataset. As depicted in Figure 2, the preprocessing consists of three main components: data cleaning, trajectory completion, and trajectory compression.

3.1.1. Data Cleaning

A four-step cleaning procedure is applied to remove erroneous and implausible records while preserving genuine vessel movements.
The procedure commences with time standardization, wherein all timestamps are converted to the standard format “YYYY-MM-DD hh:mm:ss”. Records with missing or invalid timestamps are removed to ensure temporal consistency. After this step, the number of retained AIS records ( N r e c ) in Figure 2 is 835,203. Subsequently, duplicate removal is executed by applying two rules: (a) For identical Maritime Mobile Service Identity (MMSI) timestamp pairs, only the earliest message is retained; (b) Records where SOG exceeds 1 knot while reported coordinates remain unchanged from the previous message are identified as anomalous duplicates and removed. After removing duplicates, N r e c = 835,101 .
The cleaning process continues with physical threshold filtering, discarding data points that fall outside the valid physical ranges defined by maritime operation standards (Table 2), including invalid positions, unrealistic speeds, and out-of-range vessel dimensions. After filtering, N r e c = 831,676 .
The final step addresses the removal of trajectory jump points caused by AIS noise. A composite detection rule is applied: (a) instantaneous speeds exceeding 16 knots are removed; (b) heading changes between consecutive points greater than 180° are filtered out; (c) large spatial deviations inconsistent with vessel dynamics are eliminated. After removing jump points, the final retained dataset consists of N r e c decreases to 822,650 high-quality records.
Overall, 98.5% of the original dataset is preserved, indicating that the cleaning rules effectively removed invalid data while maintaining the integrity of vessel trajectories.
Figure 3 compares the AIS trajectories before and after cleaning. A zoomed-in view of a representative region (125–135° E, 5° S–10° N, marked by blue rectangles) is included to demonstrate specific examples of cleaning, where overlapping and erroneous points are most prevalent. This comparison clearly shows that the cleaning procedure effectively preserves genuine vessel movements while eliminating erroneous data points, thereby ensuring a reliable foundation for subsequent analysis.

3.1.2. Data Completion

AIS signals may contain irregular reporting intervals or missing values, especially in open-ocean segments. To restore temporal continuity and preserve navigational semantics, this study adopts a stratified completion strategy, combining forward–backward filling for static fields with adaptive interpolation for kinematic variables. Static or categorical fields (e.g., MMSI, IMO, vessel dimensions, navigation status) are filled using forward–backward propagation, ensuring consistency without altering vessel identity or voyage descriptors. Dynamic numerical fields (e.g., latitude, longitude, SOG, heading) are reconstructed using adaptive linear interpolation when the interval between adjacent AIS messages exceeds a predefined threshold. For two valid AIS points t 1 ,   v 1 and t 2 ,   v 2 , the interpolated value v t at any intermediate time t t 1 < t < t 2 is calculated by Equation (1):
v t   =   v 1 + v 2 v 1 t 2 t 1 · t t 1
Interpolation significantly improves temporal completeness. For example, in the representative trajectory (MMSI: 538005339 [49]), the number of records increases from 7399 to 9357 after interpolation (Figure 4), reconstructing otherwise missing navigational segments. This yields uniformly spaced, continuous trajectories suitable for extracting rolling-window features (e.g., average SOG or acceleration), critical components for accurate ETA prediction.

3.1.3. Trajectory Compression

To mitigate data redundancy and minimize the risk of model overfitting, this study adopts a dual-criterion trajectory compression method that integrates geometric feature preservation with temporal coverage guarantee [51]. In the spatial dimension, the Douglas–Peucker (DP) algorithm is applied to retain key geometric points whose perpendicular deviation from the connecting chord exceeds a predefined tolerance. This ensures that essential navigational behaviors, such as course alterations and maneuvering segments, are preserved. The perpendicular distance d is computed using Equation (2):
d =   b x a x a y p y a x p x b y a y b x a x 2 + b y     a y 2
where p x ,   p y is the evaluated point and a x ,   a y and b x ,   b y denote the endpoints of the chord.
In the temporal dimension, a fixed-interval sampling scheme is introduced to maintain temporal consistency by uniformly selecting points throughout the voyage. The final compressed trajectory is obtained by merging the spatially and temporally selected points, thereby preserving both geometric structure and time-series integrity.
Following parameter calibration, the DP tolerance is set to ϵ   =   0.01 ° (approximately 1.1 km), and the temporal sampling interval to T   =   30 min. A non-recursive implementation of the DP algorithm improves computational efficiency for large-scale datasets. All static vessel attributes and dynamic motion parameters are maintained to avoid information loss.
Across all voyages, only 8.23% of the original AIS points are retained, resulting in a compression rate of 91.77%. This substantial compression rate primarily reflects the removal of redundant data commonly found in long-distance voyages, rather than the exclusion of features vital for ETA prediction. The dual-criterion method is specifically designed to preserve the geometric and temporal anchors that define a voyage profile. As shown in Figure 5, while redundant points in straight-line segments are removed, all turning and maneuvering points—critical for understanding vessel behavior—are retained. This approach ensures that the compression process improves data quality for model training by increasing information density, while maintaining the integrity of the key features necessary for ETA prediction. The resulting compact representation provides a high-quality, informative input for downstream ETA modeling.

3.2. Feature Engineering Extraction

Feature engineering plays a pivotal role in transforming raw AIS data into informative variables that capture underlying navigation patterns and support accurate ETA prediction [52,53]. In this study, a comprehensive feature set is systematically designed to reflect the operational characteristics of long-distance tramp bulk carriers. The constructed features are organized into six categories, as summarized in Appendix A, Table A1.
Specifically, Baseline features offer the fundamental physical information directly related to transit time estimation. Temporal features encode periodic patterns, such as seasonal patterns, weekday–weekend differences, and holiday effects, that may influence vessel behavior and port operational efficiency. Course-related features quantify directional alignment and course stability, enabling the model to detect route deviations arising from traffic avoidance, meteorological disturbances, or other navigational decisions.
Speed-related features constitute the largest group and characterize both short-term and long-term vessel motion dynamics. These include instantaneous acceleration and rolling statistics of SOG computed over 6, 12, 24, and 48 h windows, allowing the model to distinguish stable open-sea cruising from low-speed sailing in congested or restricted waters. Static features incorporate vessel geometry, capturing inherent differences in maneuverability and ensuring that predictions remain consistent with physical constraints. Finally, historical behavioral features summarize long-term navigational tendencies unique to each MMSI, enabling the model to exploit vessel-specific patterns learned from past voyages.
Through this structured feature engineering process, a total of 72 features are generated across six dimensions. Collectively, these features capture instantaneous navigation states, temporal context, route-following behavior, dynamic motion patterns, vessel-specific attributes, and historical sailing habits. This multi-layered representation provides a robust and information-rich input foundation for the stacked ensemble ETA prediction model.

3.3. Model Construction

3.3.1. Stacking Model

To address the performance limitations and instability often observed in single-model ETA prediction approaches, this study constructs a two-stage stacking ensemble framework tailored to the characteristics of long-distance tramp bulk carriers. As illustrated in Figure 6, the first layer integrates three heterogeneous base learners (LightGBM, XGBoost, and RF) to capture diverse learning patterns and error structures. The second layer employs a Linear Regression meta-learner, which fuses the base-model outputs to generate the final ETA prediction.
The proposed stacking ensemble framework is implemented using Python 3.9. The implementation employs widely used machine learning libraries: scikit-learn (version 1.2.2) for constructing the modeling pipeline and for the Linear Regression meta-learner; XGBoost (version 1.7.6) and LightGBM (version 4.1.0) as the gradient-boosting base learners; and the Random Forest Regressor from scikit-learn. The development environment utilized is PyCharm 2025.1, and all experiments are conducted on a workstation equipped with a 12th Gen Intel(R) Core (TM) i5-12500H (3.10 GHz) processor and 16 GB of RAM.
The workflow is organized into two phases: a training phase and a prediction phase. During training, BO, combined with 3-fold cross-validation and MAE as the evaluation criterion, is applied to tune the hyperparameters of the base learners (Figure 7). To prevent data leakage, a 5-fold cross-validation scheme is then used to generate out-of-fold predictions, which constitute the meta-train matrix Z train . Pair with the true labels y train , the matrix Z train is used to train the Linear Regression meta-learner.
In the prediction phase, each trained base model produces a prediction for the test set. The final base-model outputs are obtained by averaging predictions across folds, forming the meta-feature matrix Z text , which is subsequently passed to the trained meta-learner to yield the ensemble’s ETA estimates.

3.3.2. Base Model and Meta-Model Selection

After extensive experimentation and comparative evaluation, this study selects LightGBM, XGBoost, and RF as the base learners due to their complementary strengths and proven robustness in large-scale, nonlinear, and high-dimensional learning tasks—characteristics aligned with AIS-based ETA prediction.
LightGBM achieves high efficiency and strong baseline accuracy through histogram-based computation and leaf-wise tree growth [54]. XGBoost augments this with second-order gradient optimization and explicit regularization, enabling more precise loss minimization [55]. RF, based on bootstrap aggregation, introduces model diversity and enhances robustness by reducing variance, effectively counterbalancing the overfitting tendencies of boosting algorithms [56,57].
Collectively, these models represent complementary learning biases (efficiency, precision, and robustness), forming a diversified base layer capable of capturing complex data patterns.
A Linear Regression model is selected as the meta-learner to integrate the base learners’ predictions [36]. Its simplicity reduces the risk of overfitting given the low dimensionality of meta-features, while its convex optimization yields interpretable coefficients that clarify the relative contribution of each base model. Section 4 further analyzes these coefficients and their implications.

3.3.3. Model Training and Optimization Strategy

To ensure robust predictive performance, this study incorporates a combined optimization strategy involving BO and cross-validation. BO serves as the primary hyperparameter tuning method, efficiently navigating the parameter space by updating search direction from prior evaluations. A limited evaluation budget of 50–100 iterations is configured for BO. This range is determined based on the dimensionality of our hyperparameter search space (comprising 7 key hyperparameters), ensuring a thorough exploration while maintaining computational tractability. This budget strikes a balance between efficiency and performance, addressing computational scalability for future expansion. The primary computational cost scales with the number of evaluations (each requiring model training), not with the complexity of the BO meta-algorithm itself. While per-evaluation training time will increase with larger multi-route datasets, the BO overhead remains negligible. Moreover, evaluations are parallelizable, and with a fixed iteration count, the overhead remains manageable in large-scale applications. With 3-fold cross-validation and MAE minimization as the objective, the algorithm optimizes key hyperparameters such as learning rate, tree depth, and the number of estimators for LightGBM and XGBoost, as well as tree quantity and depth for RF.
A rigorous 5-fold cross-validation scheme is strictly employed during meta-feature generation to avoid data leakage. All meta-features provided to the meta-learner are out-of-sample predictions generated from an unseen fold, ensuring that the meta-learner is trained on unbiased and representative information.
Together, the BO-driven hyperparameter tuning and the multi-level cross-validation framework provide a robust foundation for constructing a high-performing and reliable stacking ensemble model for long-distance ETA prediction in tramp shipping.

3.4. Evaluation Criteria

To comprehensively evaluate the performance of the ETA prediction model, four complementary evaluation metrics are employed: MAE, RMSE, sMAPE, and R 2 . Their mathematical formulations and interpretive meanings are summarized in Table 3.
Collectively, these metrics provide a multi-dimensional assessment of model quality, capturing absolute and squared-error behavior, scale-independent proportional errors, and overall explanatory strength. All experimental results reported in Section 4 are evaluated within this unified framework, which was predefined here, to ensure rigorous and comparable performance benchmarking.

4. Experiments and Results

To evaluate the capability of AIS-based models for long-distance ETA prediction, this study selects the shipping corridor between Port Weipa (Australia) to Qingdao Port (China). This 3500-nautical-mile route traverses the Coral Sea, the Indonesian Archipelago, and the South China Sea areas characterized by monsoon variability, complex navigational conditions, and high vessel density. These characteristics make the route an ideal testbed for assessing the performance of a data-driven stacking ensemble framework. More importantly, it serves as a representative case for long-distance tramp bulk shipping, encapsulating its core attributes: market-driven irregular schedules, absence of fixed service frequency, and operations across diverse oceanic regions. Consequently, a model developed and validated on this route is inherently designed to address the fundamental challenges of the domain.

4.1. Dataset

The dataset is sourced from the AIS database of the Shipping Big Data Platform at Jimei University and includes bulk carrier voyages between Weipa and Qingdao from 17 December 2022 to 31 December 2023. A total of 86 complete voyages are retained after preprocessing. The vessels operating on this route are mainly Capesize and Panamax bulk carriers, whose principal dimensions—matched MMSI codes and the Clarksons World Fleet Register—range from 180 to 292 m in length and 28 to 45 m in breadth. The spatial distribution of these trajectories is illustrated in Figure 8.
Following the preprocessing and feature engineering procedures outlined in Section 3, model inputs are represented as a set of D -dimensional feature vectors X   =   x 1 ,   x 2 , . . . , x n , x i   R D , where each vector encodes the vessel’s navigational state at a specific timestamp based on the six feature groups described in Section 3.2. The corresponding prediction targets are defined as Y   =   y 1 ,   y 2 , . . . ,   y n , where each y i represents the remaining sailing time, defined as the duration required for the vessel to reach Qingdao from its current location.
To support rigorous performance evaluation, the dataset is randomly partitioned into a training set ( S train ) and a test set ( S test ) using a 4:1 split ratio. The training set is used to fit and optimize the base learners and the meta-learner, while the test set is reserved exclusively for out-of-sample validation. Importantly, all AIS records belonging to the same voyage are assigned entirely to either the training or test set to eliminate data leakage and ensure that evaluation results reflect realistic prediction conditions for unseen voyages.

4.2. Comparative Analysis

4.2.1. Performance Comparison Across Evaluation Criteria

The predictive performance of the stacking ensemble is benchmarked against a baseline model and its three constituent base learners. Historical Averaging (Baseline Model), a non-learning approach, establishes a simple benchmark for ETA prediction [58]. For each vessel (identified by MMSI), the model computes the mean of the maximum remaining sailing times across all its voyages in the training set. This vessel-specific historical mean is then used as the ETA prediction for all the vessel’s voyages in the test set, without accounting for real-time navigational dynamics. LightGBM, XGBoost, and RF, which function as the base learners within the stacking ensemble, are also evaluated as standalone machine learning models. This comparison quantifies the performance improvement achieved through the integration of the meta-learner, in contrast to the use of any individual base learner.
Figure 9 presents the performance of all models across four complementary evaluation metrics: MAE (with standard deviation), RMSE, sMAPE, and R 2 . To highlight the superiority of the proposed stacking model, its performance on each metric is indicated by a red dashed line in the respective subplot, against which all other models are compared. The error bars on the MAE represent the standard deviation, emphasizing the stability of predictions, which is a key focus of our evaluation. These visual comparisons are further supported by the numerical results reported in Table 4.
Figure 9 and Table 4 collectively demonstrate that the stacking ensemble model consistently outperforms all benchmark and individual models across all dimensions of evaluation, including accuracy, stability, goodness-of-fit, and large-error control.
In terms of accuracy, the stacking model achieves an MAE of 3.30 h, representing a 74.7% improvement relative to the historical averaging benchmark (13.2 h) and outperforming all individual models (XGBoost: 3.72 h; LightGBM: 4.12 h; RF: 4.42 h). Although the historical averaging method yields a slightly lower sMAPE (4.44%) than the stacking model (4.52%), this advantage is misleading, as its constant-value predictions artificially suppress relative error. Among learning-based models, the stacking ensemble achieves the best sMAPE, while XGBoost exhibits the highest relative error (6.71%).
Regarding stability, the stacking model records the lowest MAE_std (4.80 h), indicating the most consistent predictive behavior. RF, by contrast, displays substantial volatility (MAE_std = 8.60 h), even exceeding its own MAE, reflecting limited robustness.
For goodness-of-fit, the stacking model achieves an R 2 of 0.9902, explaining 99.02% of the variance and approaching the theoretical upper limit. Both LightGBM and XGBoost also perform strongly (>0.98). The historical averaging method, however, yields a negative R 2 (−0.0593), indicating performance worse than a simple mean-based predictor.
In terms of large-error control, the stacking model also obtains the lowest RMSE (4.54 h), while the historical averaging method produces the highest RMSE (17.26 h), reflecting frequent and substantial errors. Among the individual learners, XGBoost performs best, achieving a favorable RMSE-to-MAE ratio, an advantage further strengthened within the stacking ensemble.
Across all evaluation dimensions, the stacking ensemble delivers optimal or near-optimal performance, validating the effectiveness of heterogeneous learners through a meta-learner and demonstrating its reliability for long-distance vessel ETA prediction.

4.2.2. Comparative Analysis of Prediction Accuracy in Spatial Distribution

Figure 10 presents scatter plots comparing predicted and true ETA values for each model, offering a spatial perspective on prediction accuracy and error dispersion. The historical averaging method is excluded due to its degenerate constant-value output. The reference line y ^ =   y indicates ideal predictions, while the ±20% and ±50% error bands provide intuitive accuracy thresholds.
Key observations include: (i) LightGBM and XGBoost show similar performance, with most predictions concentrated within the ±20% band. Slight dispersion appears when the remaining time exceeds 200 h (Figure 10a,b); (ii) RF exhibits markedly higher vertical dispersion, particularly between 100 and 200 h, with several predictions falling outside the ±50% band (Figure 10c), reflecting weaker stability; (iii) The stacking ensemble demonstrates the tightest clustering around the ideal line, with uniformly distributed residuals, fewer outliers, and strong alignment across the full prediction spectrum (Figure 10d).
These spatial distribution results confirm that the stacking model provides the most stable and reliable ETA predictions, thereby supporting its practical deployment for voyage monitoring, route planning, and navigational decision-making.

4.2.3. Comparative Analysis of Residual Error Structures

Residual plots in Figure 11, mapping predicted values (x-axis) against residuals (true minus predicted values, y-axis), further characterize model error behavior by comparing predicted values against residuals. These diagnostics help identify systematic bias, heteroscedasticity, and variability patterns.
The following observations emerge: (i) All models display residuals centered around zero, with no obvious structural patterns, satisfying basic regression assumptions. (ii) LightGBM displays generally uniform residuals, but with slight overestimation bias (mean = 1.12 h) and moderate variability (std = 5.51 h). Occasional large errors (>±20 h) occur for long-range predictions (Figure 11a). (iii) XGBoost exhibits slightly lower variability (std = 5.00 h) and more concentrated residuals, although errors increase at higher predicted remaining times (Figure 11b). (iv) RF displays the highest variability (std = 6.84 h) and occasional extreme negative errors (below −40 h), indicating instability and heteroscedasticity (Figure 11c). (v) The stacking ensemble provides the most consistent residual distribution, with minimal bias (mean = 0.92 h) and the lowest variability (std = 4.44 h). Residuals cluster tightly within ±5 h across all prediction horizons (Figure 11d).
In summary, these findings confirm that the stacking ensemble significantly enhances error stability and reduces residual variance, reinforcing its suitability for long-distance ETA prediction under complex navigational conditions.

4.2.4. Feature Importance Analysis and Model Interpretability

To gain insight into model decision mechanisms, feature importance scores derived from the trained LightGBM model are analyzed. The aggregated feature-category importance values, presented in Figure 12, exhibit a coherent and physically interpretable structure.
The three most influential categories are velocity-related features (22.20%), course-related features (21.27%), and baseline features (20.05%). Jointly, they constitute the primary predictive pillars of the model. These findings reaffirm established maritime reasoning, whereby ETA is governed mainly by current vessel speed, course stability, and remaining distance. Historical behavioral features (19.87%) also play a substantial role by capturing short-term fluctuations in speed and heading, providing contextual cues about navigational rhythm and informing near-term ETA adjustments. This parallels real-world maritime operations, where captains consider both instantaneous motion and recent vessel behavior when assessing ETA.
Historical behavioral features (19.87%) contribute meaningfully by capturing short-term fluctuations in speed and heading, which offer contextual insights into the navigational rhythm. However, in long-distance tramp shipping, their predictive utility is lower than that of real-time SOG and distance. This is due to the significant variability of tramp shipping routes and port calls, which diminishes the consistency of vessel-specific patterns across voyages. Over long distances, macro-scale progress (captured by real-time features) becomes the dominant factor in ETA prediction. Nonetheless, historical features still provide valuable information for near-term ETA adjustments, consistent with real-world maritime operations where factors such as current motion and recent vessel behavior are continuously assessed by ship captains.
Operationally, the reduction in ETA error from 13.02 h (baseline) to 3.30 h (stacking) significantly narrows uncertainty for voyages exceeding 3500 nautical miles. Such improvements directly enhance berth allocation, equipment planning, manpower scheduling, and inland logistics coordination—ultimately strengthening the resilience and efficiency of maritime supply chains.

4.3. Interpretation of the Meta-Learner Weighting Mechanism

To further elucidate the internal decision-making process of the stacking ensemble, this section analyzes the optimal weighting coefficients estimated by the meta-learner. Table 5 represents the contribution of each base model to the final ensemble prediction. Several key insights emerge from this analysis. First, the stacking ensemble does not perform a simple average of the base learners. Instead, it constructs an optimized linear combination that capitalizes on the complementary predictive behaviors of the constituent models. Notably, as shown in Table 5, all estimated weighting coefficients are statistically significant at a highly significant level (p < 0.001), which confirms that the allocation of the meta-learner—assigning a dominant positive weight to LightGBM, a corrective negative weight to XGBoost, and a supplementary positive weight to RF—is robust and not due to random chance. Second, although XGBoost yields the best individual performance (MAE = 3.72 h, R 2 = 0.9807), the meta-learner assigns the highest positive weight to LightGBM ( ω 1 = 0.984) and a negative weight to XGBoost ( ω 2 = –0.187). This allocation indicates that the meta-learner does not merely select the strongest individual model; rather, it prioritizes the configuration that yields the best error synergy and generalization performance.
A plausible explanation is that LightGBM, while slightly less accurate than XGBoost in isolation, produces more stable and consistent residual patterns, making it a robust predictive baseline. In contrast, systematic deviations present in XGBoost’s predictions exhibit structured relationships with LightGBM’s residuals. The meta-learner exploits this relationship by treating XGBoost as an error-correction signal. Assigning a negative coefficient enables the ensemble to reverse the direction of these deviations and fine-tune LightGBM’s predictions. Meanwhile, RF ( ω 3 = 0.202) contributes positively by supplying additional complementary information that correlates with the target variable.
In conclusion, the meta-learner’s weighting pattern demonstrates a key advantage of stacking ensembles: their ability to integrate models with diverse error structures to produce superior predictions. The coordinated interplay among the base models—led by LightGBM’s stability, enhanced by XGBoost’s corrective deviations, and supported by RF’s complementary signals—explains the ensemble’s heightened generalization capability. This asymmetric and at times counterintuitive weighting scheme highlights the sophistication of the stacking strategy and its effectiveness in achieving high-precision ETA prediction for long-distance tramp vessels.

5. Discussion

This study proposes a data-driven ETA prediction framework for tramp shipping by integrating historical AIS trajectory data with a tree-based stacking ensemble model. The empirical validation on the Weipa-Qingdao iron-ore route demonstrates significant improvements in ETA prediction, reducing the Mean Absolute Error (MAE) by 74.7% compared to the historical averaging benchmark. While the framework achieves high predictive accuracy and stability across various evaluation metrics (e.g., RMSE, sMAPE, R2), the following limitations must be addressed in future research.
Firstly, although the model performs well overall, it struggles with accurately predicting ETA during the final port-approach phase. Specifically, the model’s relative error during this phase (sMAPE = 29.08%) indicates difficulty in capturing critical port-access processes, such as pilotage delays, queuing, and weather-induced traffic restrictions, using AIS data alone. This limitation highlights the need to integrate additional data sources—such as meteorological conditions, real-time fairway status, and port congestion indicators—to improve near-port prediction accuracy.
Secondly, this study focuses on a single long-distance route (Weipa-Qingdao), which serves as a representative case for long-distance tramp bulk shipping. However, the generalizability of the model to other routes, vessel types, and geographical regions remains untested. Future work should evaluate the model’s performance across a broader range of routes with varying operational characteristics. This would involve adapting the feature engineering schema to capture route-specific geography while maintaining the flexibility of the framework to handle diverse operational conditions.
Furthermore, the model’s ability to generalize across multi-port voyage structures and different vessel types needs to be explored. For example, ships with different operational profiles (e.g., container ships versus bulk carriers) may exhibit distinct navigational behaviors that could influence ETA prediction. Future studies could assess how well the model adapts to these variations and fine-tune the feature extraction process accordingly.
Another key limitation concerns the computational efficiency of the model in real-time applications. Although the study demonstrates the framework’s promising performance, operational deployment would require addressing engineering challenges such as model lightweighting and improving computational efficiency. This includes optimizing the model for integration with Vessel Traffic Services (VTS), port community systems, and commercial fleet-management platforms, ensuring scalability and responsiveness in dynamic operational environments.
Finally, there is a need for further refinement of the interpretability of the model. While the meta-learner’s weighting strategy has been analyzed, deeper exploration of how specific feature combinations influence ETA predictions could provide valuable insights into the operational decision-making process. Future research could explore methods for improving the transparency of the model and how these insights can be translated into actionable strategies for port and fleet operators.

6. Conclusions

By integrating historical AIS trajectory data with a tree-based stacking ensemble model, this study proposes a data-driven ETA prediction framework for tramp shipping. Using the Weipa–Qingdao iron-ore route as a representative case, the model reduces the MAE from 13.02 h (historical averaging benchmark) to 3.30 h—an accuracy improvement of approximately 74.7%. Across all evaluation metrics, including RMSE, sMAPE, and R 2 , the stacking ensemble consistently achieves optimal or near-optimal performance, demonstrating high predictive accuracy, strong stability, and robustness generalization under realistic tramp-shipping conditions.
Beyond performance improvement, the study provides interpretable insights into the determinants of ETA. Feature importance analysis reveals that velocity-related, course-related, and baseline features collectively form the core predictive foundation, accounting for more than 63% of total importance. This ranking is fully aligned with maritime navigation principles, wherein ETA is primarily governed by a vessel’s speed, course stability, and remaining distance. Historical behavioral features also show substantial influence, underscoring the value of recent navigational trends as contextual indicators of short-term motion dynamics. These findings confirm that AIS trajectories alone contain rich, physically meaningful information sufficient to support high-precision ETA estimation in tramp operations.
The interpretability analysis of the meta-learner further illustrates the sophistication of the stacking mechanism. Although XGBoost exhibits the best standalone accuracy, the meta-learner assigns LightGBM the dominant positive weight (0.984) and XGBoost a negative weight (−0.187). This asymmetric allocation indicates that the ensemble optimizes complementary error structures rather than relying on the numerically strongest single model. LightGBM provides a stable predictive baseline, while systematic deviations in XGBoost outputs act as effective error-correction signals when negatively weighted. RF contributes additional positively correlated patterns. Together, these interactions explain the ensemble’s enhanced predictive power and demonstrate the methodological value of exploiting heterogeneous residual structures.
From an operational perspective, reducing ETA uncertainty from roughly half a day to approximately three hours offers substantial practical benefits. More accurate arrival forecasts enable ports to optimize berth planning, equipment utilization, and labor scheduling, while inland logistics networks gain more reliable lead time for coordinating downstream transport and inventory operations. The proposed method, therefore, enhances predictability and operational resilience across the maritime supply chain, particularly for bulk trades characterized by long distances, variable routing conditions, and limited schedule regularity.
Overall, this study demonstrates that high-quality ETA prediction for tramp shipping can be achieved through deep extraction of AIS-derived navigational signals combined with a carefully designed stacking ensemble. The findings contribute both methodological advances and practical insights that support more intelligent, data-driven maritime logistics management.

Author Contributions

P.H.: Writing—Review and Editing, Supervision, Project Administration, Funding Acquisition, Conceptualization. J.C.: Writing—Original Draft, Writing—Review and Editing, Methodology, Formal Analysis, Investigation, Data Curation, Visualization. J.W.: Writing—Review and Editing, Writing—Original Draft, Methodology, Validation, Formal Analysis, Project Administration. H.C.: Writing—Review and Editing, Supervision, Methodology, Validation, Data Curation. P.Z.: Writing—Review and Editing, Supervision, Methodology, Funding Acquisition, Conceptualization, Project Administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Social Science Fund of China (Grant No. 23&ZD138) and the Start-up Grant for Natural Sciences of Jimei University (Grant No. ZQ2025035).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the financial support from the National Social Science Fund of China (Grant No. 23&ZD138) and the Start-up Grant for Natural Sciences of Jimei University (Grant No. ZQ2025035). We also extend our sincere thanks to the anonymous reviewers for their valuable comments and suggestions, which greatly improved the quality of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ETAEstimated time of arrival
LAYCANLaydays and Canceling Date
ATAActual time of arrival
DRCDevelopment Research Center of the State Council
IMOInternational Maritime Organization
AISAutomatic identification system
LightGBMLight Gradient-Boosting Machine
XGBoostExtreme Gradient Boosting
RFRandom Forest
MAEMean Absolute Error
PCAPrincipal Component Analysis
BOBayesian Optimization
LIMELocal Interpretable Model-agnostic Explanations
RLReinforcement Learning
SOGSpeed over ground
CARTClassification and Regression Trees
SVMSupport vector machines
NNNeural network
GRUGate Recurrent Unit
GBDTGradient-Boosting Decision Trees
MLPMulti-layer Perceptron
CNNConvolutional Neural Networks
RNNsRecurrent Neural Networks
BiLSTMBidirectional Long Short-Term Memory
COGCourse Over Ground
RMSERoot Mean Square Error
sMAPESymmetric Mean Absolute Percentage Error
R 2 Coefficient of Determination
MMSIMaritime Mobile Service Identity
DPDouglas–Peucker (Algorithm)

Appendix A

Table A1. Feature categories and definition.
Table A1. Feature categories and definition.
CategoryFeatureDefinition
Baseline features
(9)
distance_to_goThe remaining great-circle distance (in nautical miles) from the vessel’s current position to the destination port.
SOGInstantaneous SOG (knots), directly obtained from the AIS message.
COGInstantaneous COG (degrees), representing the vessel’s actual direction of movement relative to true north.
headingInstantaneous True Heading (degrees), representing the direction the vessel’s bow is pointed.
bearing_to_destThe initial compass bearing (degrees) from the current position to the destination waypoint.
cog_deviation; heading_deviationThe absolute angular difference (degrees) between bearing_to_dest (ideal) and COG/heading (actual).
cog_rolling_std_windowThe standard deviation of the COG values within a short, immediately preceding adaptive time window.
accelerationThe instantaneous rate of change of SOG, calculated as the derivative between consecutive AIS messages (knots per hour).
Temporal features
(5)
hourThe hour of the day extracted from the AIS timestamp (integer, 0 to 23).
day_of_weakThe day of the week extracted from the AIS timestamp (integer, e.g., Monday = 0, Sunday = 6).
is_weekend/is_holidayA binary indicator (0 or 1) where 1 denotes Saturday or Sunday/a public holiday in the voyage region.
monthThe month of the year extracted from the AIS timestamp (integer, 1 to 12).
Course-related Features
(32)
cog_rolling_mean/std/min/max_6H/12H/24H/48HThe mean/Standard deviation/Minimum/Maximum of COG over the preceding 6/12/24/48 h window.
heading_rolling_mean/std/min/max_6H/12H/24H/48HThe mean/Standard deviation/Minimum/Maximum of heading over the preceding 6/12/24/48 h window.
Speed-related features
(16)
sog_rolling_mean/std/min/max_6H/12H/24H/48HThe mean/Standard deviation/Minimum/Maximum of SOG over the preceding 6/12/24/48 h window.
Static features
(2)
Length; breadthThe Length Overall (LOA) or breadth (beam)of the vessel, obtained from the vessel’s static database (meters).
Historical features
(8)
historical_sog_mean/std/min/maxMean/Standard deviation/Minimum/Maximum of SOG across all historical voyages of the vessel (identified by MMSI).
historical_cog_mean/stdCircular mean/Standard deviation of COG across all historical AIS positions of the vessel.
historical_heading_mean/stdCircular mean of heading/Standard deviation across all historical AIS positions of the vessel.

References

  1. Zhang, Y.; Zhai, Y.; Chen, J.; Xu, Q.; Fu, S.; Wang, H. Factors Contributing to Fatality and Injury Outcomes of Maritime Accidents: A Comparative Study of Two Accident-Prone Areas. J. Mar. Sci. Eng. 2022, 10, 1945. [Google Scholar] [CrossRef]
  2. Kendall, L.C. Liner Service and Tramp Shipping; Springer: Dordrecht, The Netherlands, 1986; pp. 5–11. [Google Scholar] [CrossRef]
  3. Lloret-Batlle, R.; Lin, S.; Guo, J. Container Ship Operating Trajectory and Arrival Time Prediction Based on Machine Learning. Logist. Technol. 2023, 42, 99–106. [Google Scholar] [CrossRef]
  4. Di Francesco, M.; Fancello, G.; Serra, P.; Zuddas, P. Optimal Management of Human Resources in Transhipment Container Ports. Marit. Policy Manag. 2015, 42, 127–144. [Google Scholar] [CrossRef]
  5. El Mekkaoui, S.; Benabbou, L.; Berrado, A. Machine Learning Models for Efficient Port Terminal Operations: Case of Vessels’ Arrival Times Prediction. IFAC-PapersOnLine 2022, 55, 3172–3177. [Google Scholar] [CrossRef]
  6. Wei, J.; Wang, C. Building World-Class Supply Chain Master Enterprises; China Economic Times: Beijing, China, 2025; p. A02. [Google Scholar]
  7. Veenstra, A.; Harmelink, R. On the Quality of Ship Arrival Predictions. Marit. Econ. Logist. 2021, 23, 655–673. [Google Scholar] [CrossRef]
  8. Shojaeian, A.; Shafizadeh-Moghadam, H.; Sharafati, A.; Shahabi, H. Extreme Flash Flood Susceptibility Mapping Using a Novel PCA-Based Model Stacking Approach. Adv. Space Res. 2024, 74, 5371–5382. [Google Scholar] [CrossRef]
  9. Wang, J.; Wang, Y.; Liu, G.; Chen, G. A Model Stacking Algorithm for Indoor Positioning System Using WiFi Fingerprinting. KSII Trans. Internet Inf. Syst. 2023, 17, 1200–1215. [Google Scholar] [CrossRef]
  10. Shu, J.; Yu, H.; Liu, G.; Yang, H.; Chen, Y.; Duan, Y. BO-Stacking: A Novel Shear Strength Prediction Model of RC Beams with Stirrups Based on Bayesian Optimization and Model Stacking. Structures 2023, 58, 105593. [Google Scholar] [CrossRef]
  11. Nguyen, H.V.; Byeon, H. Prediction of Parkinson’s Disease Depression Using LIME-Based Stacking Ensemble Model. Mathematics 2023, 11, 708. [Google Scholar] [CrossRef]
  12. Cao, Y.; Liu, G.; Luo, D.; Bavirisetti, D.P.; Xiao, G. Multi-Timescale Photovoltaic Power Forecasting Using an Improved Stacking Ensemble Algorithm Based LSTM-Informer Model. Energy 2023, 283, 128669. [Google Scholar] [CrossRef]
  13. Yoon, J.-H.; Kim, D.-H.; Yun, S.-W.; Kim, H.-J.; Kim, S. Enhancing Container Vessel Arrival Time Prediction through Past Voyage Route Modeling: A Case Study of Busan New Port. J. Mar. Sci. Eng. 2023, 11, 1234. [Google Scholar] [CrossRef]
  14. Alessandrini, A.; Mazzarella, F.; Vespe, M. Estimated Time of Arrival Using Historical Vessel Tracking Data. IEEE Trans. Intell. Transp. Syst. 2019, 20, 7–15. [Google Scholar] [CrossRef]
  15. Wu, G.; Atilla, I.; Tahsin, T.; Terziev, M.; Wang, L. Long-Voyage Route Planning Method Based on Multi-Scale Visibility Graph for Autonomous Ships. Ocean Eng. 2021, 219, 108242. [Google Scholar] [CrossRef]
  16. Park, K.; Sim, S.; Bae, H. Vessel Estimated Time of Arrival Prediction System Based on a Path-Finding Algorithm. Marit. Transp. Res. 2021, 2, 100012. [Google Scholar] [CrossRef]
  17. Ogura, T.; Inoue, T.; Uchihira, N. Prediction of Arrival Time of Vessels Considering Future Weather Conditions. Appl. Sci. 2021, 11, 4410. [Google Scholar] [CrossRef]
  18. Li, L.; Wu, D.; Huang, Y.; Yuan, Z.-M. A Path Planning Strategy Unified with a COLREGS Collision Avoidance Function Based on Deep Reinforcement Learning and Artificial Potential Field. Appl. Ocean Res. 2021, 113, 102759. [Google Scholar] [CrossRef]
  19. Dhivyabharathi, B.; Kumar, B.A.; Vanajakshi, L. Real Time Bus Arrival Time Prediction System under Indian Traffic Condition. In Proceedings of the 2016 IEEE International Conference on Intelligent Transportation Engineering (ICITE), Singapore, 20–22 August 2016; IEEE: New York, NY, USA, 2016; pp. 18–22. [Google Scholar]
  20. Pani, C.; Vanelslander, T.; Fancello, G.; Cannas, M. Prediction of Late/Early Arrivals in Container Terminals—A Qualitative Approach. Eur. J. Transp. Infrastruct. Res. 2015, 15, 536–550. [Google Scholar]
  21. Mohd Salleh, N.H.; Riahi, R.; Yang, Z.; Wang, J. Predicting a Containership’s Arrival Punctuality in Liner Operations by Using a Fuzzy Rule-Based Bayesian Network (FRBBN). Asian J. Shipp. Logist. 2017, 33, 95–104. [Google Scholar] [CrossRef]
  22. Parolas, I. ETA Prediction for Containerships at the Port of Rotterdam Using Machine Learning Techniques; Port Rotterdam Authority: Rotterdam, The Netherlands, 2016. [Google Scholar]
  23. Noman, A.A.; Heuermann, A.; Wiesner, S.A.; Thoben, K.-D. Towards Data-Driven GRU Based ETA Prediction Approach for Vessels on Both Inland Natural and Artificial Waterways. In Proceedings of the 2021 IEEE Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: New York, NY, USA, 2021; pp. 2286–2291. [Google Scholar]
  24. Bourzak, I.; El Mekkaoui, S.; Berrado, A.; Caron, S.; Benabbou, L. Deep Learning Approaches for Vessel Estimated Time of Arrival Prediction: A Case Study on the Saint Lawrence River. In Proceedings of the 2023 14th International Conference on Intelligent Systems: Theories and Applications (SITA), Mohammedia, Morocco, 19–20 October 2023; IEEE: New York, NY, USA, 2023; pp. 1–7. [Google Scholar]
  25. Pani, C.; Fadda, P.; Fancello, G.; Frigau, L.; Mola, F. A Data Mining Approach to Forecast Late Arrivals in a Transhipment Container Terminal. Transport 2014, 29, 175–184. [Google Scholar] [CrossRef]
  26. Van der Steeg, J.-J.; Oudshoorn, M.; Yorke-Smith, N. Berth Planning and Real-Time Disruption Recovery: A Simulation Study for a Tidal Port. Flex. Serv. Manuf. J. 2023, 35, 70–110. [Google Scholar] [CrossRef]
  27. El Mekkaoui, S.; Benabbou, L.; Berrado, A. Deep Learning Models for Vessel’s ETA Prediction: Bulk Ports Perspective. Flex. Serv. Manuf. J. 2023, 35, 5–28. [Google Scholar] [CrossRef]
  28. Lloret-Batlle, R.; Lin, S.; Guo, J. Cross-Pacific Vessel Estimated Time of Arrival and Next Destination Prediction with Automatic Identification System Data. Transp. Res. Rec. 2025, 2679, 67–80. [Google Scholar] [CrossRef]
  29. Jiang, S.; Liu, L.; Peng, P.; Xu, M.; Yan, R. Prediction of Vessel Arrival Time to Port: A Review of Current Studies. In Maritime Policy & Management; Taylor & Francis Online: Abingdon, UK, 2025; pp. 1–26. [Google Scholar] [CrossRef]
  30. Issa-Zadeh, S.B.; Garay-Rondero, C.L. Decarbonizing Seaport Maritime Traffic: Finding Hope. World 2025, 6, 47. [Google Scholar] [CrossRef]
  31. Barreiro, J.; Zaragoza, S.; Diaz-Casas, V. Review of Ship Energy Efficiency. Ocean Eng. 2022, 257, 111594. [Google Scholar] [CrossRef]
  32. Pietrzykowski, Z.; Wielgosz, M. Effective Ship Domain—Impact of Ship Size and Speed. Ocean Eng. 2021, 219, 108423. [Google Scholar] [CrossRef]
  33. Duan, K.; Huang, F.; Zhang, S.; Shu, Y.; Dong, S.; Liu, M. Prediction of Ship Following Behavior in Ice-Covered Waters in the Northern Sea Route Based on Hybrid Theory and Data-Driven Approach. Ocean Eng. 2024, 296, 116939. [Google Scholar] [CrossRef]
  34. Chen, P.; Yang, F.; Mou, J.; Chen, L.; Li, M. Regional Ship Behavior and Trajectory Prediction for Maritime Traffic Management: A Social Generative Adversarial Network Approach. Ocean Eng. 2024, 299, 117186. [Google Scholar] [CrossRef]
  35. Lee, H.-T.; Kim, M.-K. Optimal Path Planning for a Ship in Coastal Waters with Deep Q Network. Ocean Eng. 2024, 307, 118193. [Google Scholar] [CrossRef]
  36. Lei, J.; Chu, Z.; Wu, Y.; Liu, X.; Luo, M.; He, W.; Liu, C. Predicting Vessel Arrival Times on Inland Waterways: A Tree-Based Stacking Approach. Ocean Eng. 2024, 294, 116838. [Google Scholar] [CrossRef]
  37. Xin, R.; Pan, J.; Yang, F.; Yan, X.; Ai, B.; Zhang, Q. Graph Deep Learning Recognition of Port Ship Behavior Patterns from a Network Approach. Ocean Eng. 2024, 305, 117921. [Google Scholar] [CrossRef]
  38. Verschuur, J.; Koks, E.E.; Hall, J.W. Port Disruptions Due to Natural Disasters: Insights into Port and Logistics Resilience. Transp. Res. Part D Transp. Environ. 2020, 85, 102393. [Google Scholar] [CrossRef]
  39. Hasheminia, H.; Jiang, C. Strategic Trade-off between Vessel Delay and Schedule Recovery: An Empirical Analysis of Container Liner Shipping. Marit. Policy Manag. 2017, 44, 458–473. [Google Scholar] [CrossRef]
  40. Stevens, S.C.; Parsons, M.G. Effects of Motion at Sea on Crew Performance: A Survey. Mar. Technol. SNAME News 2002, 39, 29–47. [Google Scholar] [CrossRef]
  41. Chen, Y.; Lou, N.; Liu, G.; Luan, Y.; Jiang, H. Risk Analysis of Ship Detention Defects Based on Association Rules. Mar. Policy 2022, 142, 105123. [Google Scholar] [CrossRef]
  42. Gong, X.; Jiang, H.; Yang, D. Maritime Piracy Risk Assessment and Policy Implications: A Two-Step Approach. Mar. Policy 2023, 150, 105547. [Google Scholar] [CrossRef]
  43. Stanivuk, T.; Lalic, B.; Amizic, Z. General Assessment of the Impact of the War in Ukraine on the Shipping Industry Using Parametric Methods. Trans. Marit. Sci. 2023, 12, 1–6. [Google Scholar] [CrossRef]
  44. Jiang, J.; Yang, C. Research on Numerical Simulation of Large Bulk Carrier Speed Prediction. Ship Eng. 2024, 46, 29–34. [Google Scholar] [CrossRef]
  45. National Geospatial-Intelligence Agency. Department of Defense World Geodetic System 1984: Its Definition and Relationships with Local Geodetic Systems; National Geospatial-Intelligence Agency: Springfield, VA, USA, 2014; pp. 1–207. [Google Scholar]
  46. IMO Resolution A.917(22); Guidelines for the Onboard Operational Use of Shipborne Automatic Identification Systems (AIS). International Maritime Organization: London, UK, 2002.
  47. Fancello, G.; Pani, C.; Pisano, M.; Serra, P.; Zuddas, P.; Fadda, P. Prediction of Arrival Times and Human Resources Allocation for Container Terminal. Marit. Econ. Logist. 2011, 13, 142–173. [Google Scholar] [CrossRef]
  48. Pallotta, G.; Vespe, M.; Bryan, K. Vessel Pattern Knowledge Discovery from AIS Data: A Framework for Anomaly Detection and Route Prediction. Entropy 2013, 15, 2218–2245. [Google Scholar] [CrossRef]
  49. International Maritime Organization. International Convention for the Safety of Life at Sea (SOLAS), 1974, as Amended: Consolidated Edition; IMO Publication Sales Number: IE110E 2020; International Maritime Organization: London, UK, 2020. [Google Scholar]
  50. UNCTAD. Review of Maritime Transport 2023; UN: New York, NY, USA, 2023. [Google Scholar]
  51. Lu, D.; Jiang, W.; Chen, Y.; Zhang, L.; Jia, W.; Wang, H.; Chen, M. DP Compress: A Model Compression Scheme for Generating Efficient Deep Potential Models. J. Chem. Theory Comput. 2022, 18, 5559–5567. [Google Scholar] [CrossRef]
  52. Chen, Y.; Duan, W.; He, Y.; Wang, S.; Fernandez, C. A Hybrid Data Driven Framework Considering Feature Extraction for Battery State of Health Estimation and Remaining Useful Life Prediction. Green Energy Intell. Transp. 2024, 3, 100160. [Google Scholar] [CrossRef]
  53. Shi, R.; Zhang, L.; Lin, F.; Ning, J.; Jia, L.; Lee, K.Y. Annotated Survey and Perspectives on Rail Transport Energy System RAMS Evaluation Technology. Green Energy Intell. Transp. 2024, 3, 100164. [Google Scholar] [CrossRef]
  54. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
  55. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  56. Zhou, Z.-H. Machine Learning; Springer Nature: Cham, Switzerland, 2021. [Google Scholar]
  57. Zhu, G.; Chen, J.; Liu, X.; Sun, T.; Lai, X.; Zheng, Y.; Guo, Y.; Bhagat, R. Intelligent Lithium Plating Detection and Prediction Method for Li-Ion Batteries Based on Random Forest Model. Green Energy Intell. Transp. 2025, 4, 100167. [Google Scholar] [CrossRef]
  58. Chu, K.-F.; Lam, A.Y.S.; Tsoi, K.H.; Huang, Z.; Loo, B.P.Y. Deep Encoder Cross Network for Estimated Time of Arrival. IEEE Access 2023, 11, 76095–76107. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the prediction process.
Figure 1. Flowchart of the prediction process.
Jmse 14 00177 g001
Figure 2. Data preprocessing framework.
Figure 2. Data preprocessing framework.
Jmse 14 00177 g002
Figure 3. AIS trajectories before (a) and after (b) cleaning. The blue rectangles highlight a selected region (125–135° E, 5° S–10° N) for the zoomed-in view, illustrating the specific cleaning effects.
Figure 3. AIS trajectories before (a) and after (b) cleaning. The blue rectangles highlight a selected region (125–135° E, 5° S–10° N) for the zoomed-in view, illustrating the specific cleaning effects.
Jmse 14 00177 g003
Figure 4. Trajectory comparison for MMSI 538,005,339 before (a) and after (b) interpolation.
Figure 4. Trajectory comparison for MMSI 538,005,339 before (a) and after (b) interpolation.
Jmse 14 00177 g004
Figure 5. Zoomed-in comparison of AIS trajectory before and after compression for vessel MMSI 538,005,339 (125° E–135° E and 10° S–0°).
Figure 5. Zoomed-in comparison of AIS trajectory before and after compression for vessel MMSI 538,005,339 (125° E–135° E and 10° S–0°).
Jmse 14 00177 g005
Figure 6. Stacking model diagram.
Figure 6. Stacking model diagram.
Jmse 14 00177 g006
Figure 7. Hyperparameter optimization framework.
Figure 7. Hyperparameter optimization framework.
Jmse 14 00177 g007
Figure 8. Trajectory of bulk carriers from Port Weipa (Australia) to Qingdao Port (China).
Figure 8. Trajectory of bulk carriers from Port Weipa (Australia) to Qingdao Port (China).
Jmse 14 00177 g008
Figure 9. Multi-dimensional performance comparison of all models. The red dashed line indicates the performance of the proposed stacking ensemble model, and error bars on the MAE show the standard deviation from cross-validation.
Figure 9. Multi-dimensional performance comparison of all models. The red dashed line indicates the performance of the proposed stacking ensemble model, and error bars on the MAE show the standard deviation from cross-validation.
Jmse 14 00177 g009
Figure 10. Scatter plot comparison of predicted values versus actual values for each model.
Figure 10. Scatter plot comparison of predicted values versus actual values for each model.
Jmse 14 00177 g010
Figure 11. Residual plots for each model.
Figure 11. Residual plots for each model.
Jmse 14 00177 g011
Figure 12. Feature category importance analysis for LightGBM models.
Figure 12. Feature category importance analysis for LightGBM models.
Jmse 14 00177 g012
Table 1. Factors influencing vessel ETA and supporting references.
Table 1. Factors influencing vessel ETA and supporting references.
CategoryFactorsRefs.
Static attributesVessel type, dimensions, deadweight, design speed, engine power.Barreiro et al. [31],
Pietrzykowski and Wielgosz [32].
Dynamic statesSOG, Course Over Ground (COG), heading, draught, distance to destination.Duan et al. [33].
Macro route structureVoyage distance, great-circle path, choke points (e.g., canals).Chen et al. [34],
Lee and Kim [35].
Micro-traffic environmentPort congestion, anchorage queue, and fairway traffic.Lei et al. [36], Xin et al. [37].
Natural factorsWind, waves, currents, visibility, extreme events.Bourzak et al. [24],
Verschuur et al. [38].
Commercial decisionsEconomic speed, destination change.Hasheminia and Jiang [39].
Navigation decisionsSpeed reduction or rerouting for avoidance.Stevens and Parsons [40].
EmergenciesRegional conflict, piracy, and port strikes.Chen et al. [41],
Gong et al. [42],
Stanivuk et al. [43].
Table 2. Data cleaning thresholds.
Table 2. Data cleaning thresholds.
ParameterValid RangeUnitRefs.
Longitude[−180, 180]DegreeNational Geospatial-Intelligence Agency [45].
Latitude[−90, 90]Degree
COG[0, 360)DegreeIMO (2002) [46].
Heading[0, 360)Degree
SOG[0, 16]KnotsFancello et al. [47], Pallotta et al. [48].
Length[78, 342]mIMO (2020) [49], UNCTAD [50], Jiang and Yang [44]
Breadth[14.3, 63.5]m
Table 3. Performance evaluation metrics.
Table 3. Performance evaluation metrics.
MetricFormulaInterpretation
MAE MAE = 1 n i = 1 n y i   -   y ^ i Average absolute prediction error; smaller values indicate higher accuracy.
RMSE RMSE = 1 n i = 1 n y i   -   y ^ i 2 Penalizes large errors more heavily than MAE; smaller values indicate better predictive performance.
sMAPE sMAPE = 100 % n i = 1 n y i   -   y ^ i y i + y ^ i / 2 Scale-independent percentage error; smaller values indicate higher predictive accuracy.
R 2 R 2 = 1   -   i = 1 n y i   -   y ^ i 2 i = 1 n y i   -   y ¯ 2 Quantifies   goodness   of   fit ;   values   close   to   1   indicate   strong   explanatory   power ,   while   R 2 0 implies worse than a simple mean predictor.
Table 4. Multi-dimensional performance metrics for all models.
Table 4. Multi-dimensional performance metrics for all models.
AlgorithmMAE (h)MAE_std (h)sMAPE (%) R 2 RMSE
Historical Averaging Benchmark13.0211.334.44−0.059317.26
LightGBM4.125.805.300.98045.56
XGBoost3.725.716.710.98075.01
RF4.428.605.980.97466.89
Stacking3.304.804.520.99024.54
Table 5. Meta-model weighting coefficients.
Table 5. Meta-model weighting coefficients.
Weighting SymbolCorresponding NameWeighting Coefficientp-Value
ω 0 Intercept term0.156 < 0.001
ω 1 LightGBM0.984 < 0.001
ω 2 XGBoost−0.187 < 0.001
ω 3 RF0.202 < 0.001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, P.; Cai, J.; Wang, J.; Chen, H.; Zhang, P. High-Accuracy ETA Prediction for Long-Distance Tramp Shipping: A Stacked Ensemble Approach. J. Mar. Sci. Eng. 2026, 14, 177. https://doi.org/10.3390/jmse14020177

AMA Style

Huang P, Cai J, Wang J, Chen H, Zhang P. High-Accuracy ETA Prediction for Long-Distance Tramp Shipping: A Stacked Ensemble Approach. Journal of Marine Science and Engineering. 2026; 14(2):177. https://doi.org/10.3390/jmse14020177

Chicago/Turabian Style

Huang, Pengfei, Jinfen Cai, Jinggai Wang, Hongbin Chen, and Pengfei Zhang. 2026. "High-Accuracy ETA Prediction for Long-Distance Tramp Shipping: A Stacked Ensemble Approach" Journal of Marine Science and Engineering 14, no. 2: 177. https://doi.org/10.3390/jmse14020177

APA Style

Huang, P., Cai, J., Wang, J., Chen, H., & Zhang, P. (2026). High-Accuracy ETA Prediction for Long-Distance Tramp Shipping: A Stacked Ensemble Approach. Journal of Marine Science and Engineering, 14(2), 177. https://doi.org/10.3390/jmse14020177

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop