As it is summarized in
Figure 1, the methodology of this study consists of two parts: preprocessing phase and experimental phase. As an outcome of the preprocessing phase, a total of 773 ship trajectories were obtained from raw data, which were collected as separate CSV files for each day between 1 January and 31 March. Each trajectory is composed of different numbers of timestamped points with different state vector values. Ship trajectories are characterized by various metrics; these are duration, distance, number of timestamped points, average duration between timestamped points, average SOG, and average COG. A total of 18,649 “trajectory segments” were sampled to compose the segment database (SD), where 3730 were reserved for testing procedure. In this study, the same training–test split is used in each experiment to ensure result comparability when interpreting the findings. Finally, the experimental phase results were grouped in terms of the respective research objectives below. Although the dataset contains multiple vessels navigating within the same waterway, in this study, each trajectory is modeled independently based solely on its own historical AIS data. Vessel-to-vessel interactions and potential collision scenarios are not explicitly modeled, as the primary objective is to forecast single-vessel trajectories.
4.1. Evaluation of Hypotheses H1 to H4
Table 4 presents the descriptive statistics and results of normality assessments for the performance’s metric distributions (MDs). Shapiro–Wilk tests were conducted for each group, revealing that all twelve distributions significantly deviated from normality (
p < 0.0001), which was further supported by skewness and kurtosis values. Despite the large sample size (n = 3730), the assumption of normality was not met. Therefore, non-parametric statistical methods were adopted to ensure the robustness of the analysis. Specifically, the Friedman test—the non-parametric alternative to repeated measures ANOVA—was employed to evaluate differences across the three groups within each hypothesis. The results revealed statistically significant differences for Hypothesis-1 (
), Hypothesis-2 (
), Hypothesis-3 (
), and Hypothesis-4 (
), providing strong evidence of differences among the groups within each hypothesis.
All twelve distributions are leptokurtic (kurtosis > 3), indicating heavier tails compared to a normal distribution and a higher likelihood of extreme values. Among them, exhibits the highest kurtosis value, suggesting the most pronounced tail heaviness and the greatest presence of outliers. Additionally, the distributions with positive skewness, where the median is lower than the mean of the distribution, indicate that while most values are relatively low, a few large values are pulling the mean upward.
In evaluating the distributions for each hypothesis by considering multiple metrics—such as mean, standard deviation, median, skewness, and kurtosis—it is evident that the distributions with the smallest mean values— for H1, for H2, for H3, and for H4—also generally exhibit favorable characteristics across other metrics. These distributions not only demonstrate the lowest means, indicating better overall performance, but also show lower variability and consistent central tendencies, making them the most stable and reliable choices within each hypothesis.
The evaluation of the performance metrics across different forecast horizons and historical time points reveals nuanced insights into the optimal input length for accurate vessel trajectory prediction. For Forecast 1, 2, and 4, the distributions with two historical time points (, , and ) consistently demonstrate the lowest mean error values along with favorable stability indicators such as lower standard deviations and relatively balanced skewness. This suggests that for these horizons, a shorter historical window is sufficient to capture the vessel’s movement dynamics effectively, leading to more reliable and computationally efficient predictions. Conversely, Forecast 3 shows a distinct pattern where the distribution with four historical points () outperforms the others, indicating that the model benefits from a longer temporal context at this forecast horizon to better accommodate more complex or less predictable vessel behavior.
Although distributions with the smallest mean errors generally indicate better average performance, a closer look at kurtosis and skewness reveals the presence of heavier tails and outliers, particularly in some cases such as , which exhibits high kurtosis, suggesting more extreme values despite the low mean. This highlights the importance of considering not only the central tendency but also the variability and tail behavior of the distributions when selecting the optimal historical length. Hence, while shows promising average accuracy for Forecast 4, practitioners should be cautious of potential outliers that could impact operational reliability. Balancing mean error and distribution shape characteristics leads to a more robust model choice, underscoring the need for adaptive historical input lengths tailored to specific forecast horizons to optimize both accuracy and stability in vessel trajectory prediction.
One of the trajectories, for which their evaluation metrics are detailed in
Table 4, was selected to illustrate the distance errors at the first, second, third, and fourth forecast horizons, as shown in
Figure 4, respectively. The sequence of blue carets represents the current time point at the top, with historical time points arranged consecutively below. Black-filled circles denote the ground truth positions used for comparison, while three sets of predicted positions are overlaid on the topographical map using red, green, and blue filled circles. Red circles correspond to predictions based on two historical data points, green circles represent predictions using three historical points, and blue circles indicate predictions relying on a single historical data point.
As seen in
Table 4, predictions incorporating two historical time points (red) generally achieve lower distance errors compared to those based on a single point (blue), demonstrating improved accuracy. Additionally, predictions utilizing three historical points (green) occasionally show further refinement but with diminishing returns, consistent with the observed mean and variability metrics. This suggests that while increasing the number of historical points can enhance prediction accuracy, the marginal benefit decreases beyond two points for most forecast horizons. These visual and statistical insights support the selection of optimal historical context lengths depending on the forecast horizon and model complexity.
4.3. Evaluation of Hypotheses H9 to H12
The evaluation results presented in
Table 6 demonstrate the effectiveness of the proposed model that integrates SOG and COG. Compared to the reference model, the proposed model yields significantly lower error values across all four hypotheses (H9–H12), with all differences statistically significant at
.
For example, in H9 and H10, the mean distance errors were reduced from to and from to , respectively—indicating a substantial improvement in short-term forecasts. Although the improvements in H11 and H12 are smaller in magnitude, the consistent advantage across forecast horizons demonstrates the robustness of the proposed approach.
These findings validate the motivation behind this study: While SOG and COG contain valuable navigational signals, their benefits can only be fully realized when they are properly encoded into the model. The proposed model offers a more structured and context-aware incorporation of these features, leading to measurable performance gains. This highlights the importance of architectural design in leveraging auxiliary motion attributes such as speed and course for trajectory prediction.
Vessel trajectory prediction fundamentally involves forecasting a vessel’s future positions based on sequential, timestamped AIS data collected over time. Over recent decades, a wide spectrum of methodologies has emerged to extract meaningful patterns and trends from vessel trajectory data. The Automatic Identification System (AIS) serves as the predominant data source for these investigations owing to its high reliability, extensive global coverage, and rich informational content. Originally developed for vessel traffic management, AIS transmits real-time data at fixed intervals, including critical positional parameters (latitude and longitude) and dynamic movement metrics such as speed over ground (SOG) and course over ground (COG).
Seminal studies, leverage AIS data to elucidate regional collision patterns, thereby delineating high-risk zones and furnishing crucial insights to bolster maritime safety [
25,
26]. The majority of existing research concentrates on the four principal AIS attributes—location, speed over ground (SOG), course over ground (COG), and timestamps [
10,
11,
12,
13,
14,
15,
16,
17,
18]—while a subset restricts analysis exclusively to positional coordinates [
6,
7,
8,
9]. For example, Bi et al. [
27] augmented AIS-based models with environmental variables, including wind and wave data, to enhance prediction precision.
A considerable segment of the literature targets coastal zones, where factors such as rising sea levels and coastal erosion introduce complexities to long-term trajectory forecasting. For example, Slaughter et al. [
11] conducted long-term trajectory predictions by employing all four core AIS attributes. Coastal environments present particular challenges for short-term prediction tasks due to their dynamic nature, elevated traffic densities, and swiftly fluctuating environmental conditions. Models designed for these areas must adapt quickly to real-time updates. Studies such as [
6,
15,
17,
21] focused on short-term trajectory predictions in coastal regions. Among them, Sekhon et al. [
21] relied only on location data, while Alam et al. [
10] and Cheng et al. [
13] achieved better results by incorporating longer historical sequences, even with the trade-off between accuracy and real-time performance.
Research efforts have also expanded into archipelagic environments, characterized by intricate traffic flows and challenging natural topographies. Li et al. [
12], for instance, analyzed medium-to-short-term trajectory predictions across both coastal and archipelagic zones. Inland waterways, exemplified by rivers, introduce further complexities arising from narrow, sinuous channels, fluctuating water levels, and robust currents [
7,
9,
16].
Recently, deep learning techniques have surged in prominence within maritime trajectory prediction, with attention mechanisms playing a pivotal role in augmenting model efficacy. Attention mechanisms enable models to selectively concentrate on the most salient portions of sequential inputs, a critical capability for time-series forecasting tasks. For instance, Zhou et al. [
19] applied attention to pedestrian trajectory prediction, while Messaoud et al. [
20] used it in vehicle forecasting. In the maritime domain, attention-based LSTM architectures have proven effective in capturing long-term dependencies [
7,
11,
15,
21]. Transformer-based models [
12], CNN-GRU hybrids [
27], and spatiotemporal graph networks [
14] further show how attention helps improve predictive accuracy.
Despite the growing popularity of attention mechanisms, some researchers prefer simpler, more computationally efficient models that omit them [
8,
10,
13,
18]. These models offer advantages in terms of speed but often struggle to identify the most informative features in high-dimensional settings.
The experimental results demonstrate that the proposed model outperforms recent studies in terms of distance error measured in nautical miles. Alam et al. [
10] reported distance errors of approximately 370 m, 742 m, and 1.2 km for forecast horizons of 10, 20, and 30 min, respectively—corresponding to roughly 0.2, 0.4, and 0.65 nautical miles. Slaughter et al. [
11] achieved a mean error of 0.88 km (0.475 nautical miles). Li et al. [
12] reported a best position error of 0.0006 degrees, corresponding to approximately 0.036 nautical miles (latitude) or 0.028 nautical miles (longitude) around Zhoushan. Zhang et al. [
14] reported an average of 0.8 nautical miles over ten sample routes, while Wang et al. [
15] achieved 508 m (0.274 nautical miles) for a 1 min horizon. You et al. [
16] reported an RMSE of 0.00386 degrees in an inland scenario.
In comparison, our proposed model achieved significantly lower mean distance errors—ranging between 0.017 and 0.042 nautical miles across all tested forecast horizons and input feature combinations (see
Table 6). These results highlight the effectiveness of our attention-based architecture, particularly in capturing the latent dynamics conveyed by SOG and COG, which have often been underutilized in the literature despite their recognized potential.
While acknowledging the inherent variability across studies—stemming from differences in forecast horizons, regional focus, input features, and error metrics—we took careful measures to ensure a fair and context-aware comparison. To minimize inconsistencies, all reported errors from the literature were converted to nautical miles, providing a consistent basis for evaluation. Moreover, rather than merely aggregating the overall results, our approach includes hypothesis-driven evaluations that examine the influence of different input configurations, including SOG and COG, under controlled experimental settings. This design choice strengthens the validity of comparisons and offers a deeper understanding of the proposed model’s performance in relation to prior works.