MART: Ship Trajectory Prediction Model Based on Multi-Dimensional Attribute Association of Trajectory Points

Zhao, Senyang; Guo, Wei; Liu, Yi

doi:10.3390/ijgi14090345

Open AccessArticle

MART: Ship Trajectory Prediction Model Based on Multi-Dimensional Attribute Association of Trajectory Points

by

Senyang Zhao

¹,

Wei Guo

^1,* and

Yi Liu

²

¹

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

²

School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(9), 345; https://doi.org/10.3390/ijgi14090345

Submission received: 31 July 2025 / Revised: 30 August 2025 / Accepted: 5 September 2025 / Published: 7 September 2025

(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

Download

Browse Figures

Versions Notes

Abstract

Ship trajectory prediction plays an important role in numerous maritime applications and services. With the development of deep learning technology, the deep learning prediction method based on Automatic Identification System (AIS) data has become one of the hot topics in current maritime traffic research. However, as current models always concatenate dynamic information with distinct meanings (such as position, ship speed, and heading) into a single integrated input when processing trajectory point information as input, it becomes difficult for the models to grasp the correlations between different types of dynamic information of trajectory points and the specific information contained in each type of dynamic information itself. Aiming at the problem of insufficient modeling of the relationships among dynamic information in ship trajectory prediction, we propose the Multi-dimensional Attribute Relationship Transformer (MART) model. This model introduces a simulated trajectory training strategy to obtain the Association Loss (AssLoss) for learning the associations among different types of dynamic information; and it uses the Distance Loss (DisLoss) to integrate the relative distance information of the attribute embedding encoding to assist the model in understanding the relationships among different values in the dynamic information. We test the model on two AIS datasets, and the experiments show this model outperforms existing models. In the 15 h long-term prediction task, compared with other models, the MART model improves the prediction accuracy by 9.5% on the Danish Waters Dataset and by 15.4% on the Northern European Dataset. This study reveals the importance of the relationship between attributes and the relative distance of attribute values in spatiotemporal sequence modeling.

Keywords:

AIS data; deep learning; ship trajectory prediction; attribute association; embedding encoding

1. Introduction

In the past few decades, maritime situation awareness and maritime surveillance have gradually become research hotspots, with the core objective of using multi-source data to achieve the maximum perception of maritime activities. Maritime safety is a very important field among them. Currently, there are various types of data related to maritime safety, and the most important of them is the AIS data provided by the Automatic Identification System (AIS) [1]. With the promotion and popularization of AIS equipment by the International Maritime Organization (IMO), the number of ships equipped with AIS equipment has been continuously increasing, accumulating a large amount of AIS data. These data contain key information such as longitude (lon), latitude (lat), speed over ground (sog), course over ground (cog), Maritime Mobile Service Identity (MMSI), ship type, and motion state. The mining and utilization of this deep-level information provide data support for fields such as ship collision avoidance and anomaly detection. Among them, ship trajectory prediction is an important method for utilizing AIS data.

Ship trajectory prediction refers to using machine learning, deep learning, or other related technologies to predict the future trajectory of ships based on historical ship trajectory data and environmental information [2]. Accurate ship trajectory prediction can be used for maritime traffic management [3], including tasks such as route planning [4,5,6], destination and arrival time prediction [7], maritime search and rescue operations [8,9], maritime abnormal traffic detection [10,11,12], etc., providing technical support for the intelligence of maritime supervision systems [13]. The mainstream ship trajectory prediction methods are mainly divided into two types. The first type relies on the ship motion model as prior knowledge and combines filtering methods or other clustering methods and then sampling to obtain the final posterior prediction results. The simplest and most common motion model is the Nearly Constant Velocity (NCV) model [14], but it lacks consideration of complex situations. Then, scholars proposed the Kernel Density Estimation (KDE) [15] method for trajectory prediction. By marking the source trajectory and time in the historical data, it is possible to predict multiple possible paths of the trajectory. Based on the knowledge of historical trajectories, Mazzarella et al. [16] proposed the Particle Filter method. They match the trajectory to be predicted with historical trajectories. If the matching fails, the NCV model is used, and if the matching is successful, the PF method is used for subsequent prediction. Rong et al. [17] proposed a Gaussian process model, which decomposes the ship motion into horizontal and vertical directions and calculates the position probabilities in these two directions, and updates the parameters of the Gaussian model through historical ship trajectories. However, the limitations of the above methods are quite obvious. Their predictions are based on the trajectory motion model, which cannot explain the existence of turning points, and the prediction time will become longer as the amount of data increases. At the same time, the filtering method is prone to the problem of error propagation and is not suitable for long-term prediction. The second type is the neural network-based method that has emerged in recent years. With the extensive application of neural network models in various fields and the achievement state-of-the-art results, ship trajectory prediction has shifted from prediction using kinematic models to prediction using neural network and deep learning models. Given the sequential characteristics of ship trajectories, the structure of the Recurrent Neural Network (RNN) is very suitable for learning these features. To solve the long-term dependence problem in RNNs, its variant, the Long Short-Term Memory (LSTM) model, has emerged. Ma et al. [18] used RNN and LSTM models for multi-trajectory prediction, and the experimental results showed that the LSTM model performed better. Park et al. [19] understood the intentions of ships through trajectory prediction to prevent ship collisions. They used the Bi-LSTM model to learn the characteristics of ship trajectories from noisy trajectory points and achieved more accurate prediction results compared with the LSTM and GRU models in the final trajectory prediction. Due to the success of the seq2seq method in machine translation, some scholars have also applied it to trajectory prediction. Forti et al. [20] used LSTM as the encoder and decoder for long-term trajectory prediction and achieved better results compared with the Ornstein–Uhlenbeck stochastic process method. Capobianco et al. [21] selected Bi-LSTM as the encoder compared with the former and added the attention mechanism to solve the problem of long-distance dependence, improving the prediction accuracy of the model. The proposal of the Transformer model is another outstanding contribution in the field of deep learning. Its attention mechanism obtains the information of each token in the input sequence data simultaneously and makes prediction outputs in parallel, which enables the Transformer to learn the long-term dependence relationships existing in the data while speeding up the training speed of the model and successfully achieving state-of-the-art results in multiple fields. For example, Nguyen et al. [22] improved the Transformer to obtain the TrAISformer. They gridded the areas and connected various attribute information of the input after embedding as the input, and obtained the predicted trajectory through the method of probability sampling, which solved the multi-modal and heterogeneous problems in ship trajectories to a certain extent. The multi-modal problem of long-term prediction is shown in Figure 1. This figure illustrates the multi-modal problem existing in trajectory prediction, that is, starting from a known trajectory, there are multiple possible future headings for the ship. In the figure, the blue trajectory represents the real trajectory, while the yellow dashed line and red dashed line stand for other possible navigation routes. In this paper, we attempt to address the multi-modal problem through a probabilistic sampling method.

In ship trajectory prediction, due to difficulties in data collection, most research in this field mainly focuses on the dynamic information of ships, such as position, speed, and course. Of course, there are also a small number of studies that combine ship dynamic information with other environmental factors like nearby ships for prediction. However, this paper mainly focuses on the application of ship dynamic information, so no comparison will be made here. Most ship trajectory prediction methods also use these four types of dynamic information as the input of the model in experiments [23,24]. In order to solve the existing multi-modal problems, many studies [18,19,20] attempt to merge trajectories with similar features into one category through methods such as clustering, so as to help the model achieve better prediction results in the form of labels and the like. However, this method is quite difficult for vast areas with complex trajectories, while Nguyen et al. [22] can solve the multi-modal problem relatively well by using embedded vectors in the way of probability sampling. However, since only the dynamic information is separately embedded and then concatenated as the input, the model will actually only combine the information of the speed, course, and position for understanding. It is unable to understand the relationships between attributes. As a result, when encountering unfamiliar trajectories, the model cannot start from the information of the trajectory itself, but matches it with the training data, leading to serious prediction errors. In addition, as a Transformer model originally used for NLP tasks, when texts and trajectories are transformed into another dimension through embedding, theoretically, the embedding encoding corresponding to the vocabulary of the text does not contain the relationships between words. However, when the position, sog, and cog attributes of the trajectory are transformed into embedding encodings, the relative distances in the embedding encoding actually represent the relative distances between the values of the corresponding attributes, which is something worthy of attention when applying embedding to trajectories.

In order to address the above challenges, we propose the Multi-dimensional Attribute Relationship Transformer (MART) model. Our main contribution lies in the improvement of the loss function. We propose two loss calculation methods, namely Association Loss (AssLoss) and Distance Loss (DisLoss), which, respectively, enhance the model’s understanding of the relationships between different attributes and the relationships among different values within attributes. Existing methods like TrAISformer treat trajectory attributes as separately embedded inputs that are then concatenated, which means the model fails to explicitly learn the inherent physical relationships between them (e.g., how speed and course dictate future position). This limitation leads to significant prediction errors, particularly for simple, straight-line trajectories in sparsely represented areas, where the model cannot rely on memorized patterns from the training data. Our proposed AssLoss directly addresses this by training the model on simulated trajectories, forcing it to learn this physical correlation to minimize loss. Furthermore, standard embedding approaches do not enforce the fact that adjacent physical locations have correspondingly close vector representations, especially in areas with sparse data, leading to poor generalization. The proposed DisLoss overcomes this by modifying the loss function to reward predictions that are close to the true value in the embedding space, thereby creating a more semantically meaningful and continuous vector space that better reflects real-world distances.

2. Methodology

2.1. Problem Statement

In current ship trajectory prediction models, the mainstream approach is to improve the model to better memorize the trajectory shape. However, this is often limited by the trajectory distribution in the training set. We have selected the test trajectories shown in Figure 2 for demonstration. The test trajectories are drawn with blue lines in the figure and their names are indicated with red letters. Both trajectories are located in the marginal areas of the training range and have simple trajectory shapes.

We use different points of the test trajectory as starting points and let the model predict the subsequent trajectories. The trajectories, before the starting points, are input into the model. The prediction results are shown in Figure 3. The green line in the figure represents the predicted trajectory of the TrAISformer. The solid red line represents the input trajectory of the model, and the dashed blue line represents the real trajectory to be predicted.

As can be seen from the figure, the sog and cog of this trajectory are highly consistent with the real situation of the trajectory. Therefore, the positions at future moments calculated according to the trajectory sog and cog always coincide with the real positions of the trajectory. However, when the model makes predictions, the predicted points at the next moment of the input points always deviate greatly from the real positions.

Moreover, in the embedding layer of the model, due to the insufficient distribution of training data in some areas, the embedding vectors corresponding to these areas are not well trained. This is clearly shown in Figure 4. The y-values in the figure are the cosine similarities calculated from the latitude vectors that are adjacent in the real world within the latitude vector space. For the previous area where there is a sufficient amount of data in the training scope, the vectors corresponding to the adjacent latitudes have relatively high cosine similarities. However, for the latter area, because there is a lesser amount of trajectory data at some positions, the cosine similarities of some adjacent latitude embedding vectors are close to 0. This is also the reason why the model performs poorly in predicting areas with sparse trajectories, as the model does not understand the actual meanings of the position vectors corresponding to these areas.

2.2. Model Structure

In order to achieve better prediction of ship trajectories, since the Transformer can fully learn the long-term dependencies among sequential data and enable efficient parallel computing, we use it as the basic architecture of the model to construct a model based on the association of multi-dimensional attributes of trajectories. The structural diagram of this model is shown in Figure 5.

The embedding layer maps the attribute information to the vector space, and the four attributes of the trajectory point as input are mapped to the corresponding vector space through their respective embedding layers and are connected to obtain the vector representation of the trajectory point.

After the real trajectory and simulated trajectory as inputs are mapped to the vector space through embedding, they will be added to the positional encoding. The obtained vector containing positional information will be input into the Blocks layer, which includes masked multi-head self-attention and Feed Forward Network (FFN) layers. For the input sequence vector

x

, the first step is to transform it into

Q, K, V

through three transformation matrices:

\{\begin{cases} Q = x \cdot W_{Q} \\ K = x \cdot W_{K} \\ V = x \cdot W_{V} \end{cases}

(1)

Then, the obtained

Q, K, V

matrices will enter the attention layer, and the attention calculation formula is as follows:

Z = s o f t \max (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V

(2)

In the formula,

d_{k}

is the vector dimension, and

Z

is the attention score. After calculating the matrix multiplication operation of

Q

and

K^{⊤}

, the

s o f t m a x

is used to convert the result of the

Q

and

K

matrices into weights that sum to 1. Finally, after passing through the Blocks composed of multiple attention layers and fully connected layers, the probability of the predicted trajectory points is obtained through the final linear layer and

s o f t m a x

.

For the predicted probabilities of the output, we re-cut them into the probabilities of the predicted points of the simulated trajectories and the probabilities of the predicted points of the real trajectories. For the part of the simulated trajectories, we only calculate the loss values of the latitude and longitude to obtain the AssLoss. For the part of the real trajectories, we calculate the sum of the loss values of the four attributes to obtain the DisLoss. By summing these two types of losses, we obtain the final loss of the model, and the calculation formula is as follows:

L o s s = D i s L o s s + A s s L o s s

(3)

2.3. Distance Loss

This trajectory prediction model and the mainstream text generation models both use the Transformer structure, but the two models input completely different types of data. When the vocabulary obtained after the word segmentation of the text and the longitude and latitude input of the trajectory are projected through embedding, there is no relationship between the words in the former and the embedding encodings, while there is a relatively obvious correlation in the latter. The specific situation is shown in Figure 6.

Therefore, the characteristics of the embedding encodings of the attributes of the trajectory points can be applied to the loss function. The most commonly used loss function in the mainstream text generation models is the cross-entropy function, which is Equation (4), as follows:

C r o s s E n t r o p y = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{k = 1}^{K} p_{i, k} \log {\hat{p}}_{i, k}

(4)

In this formula,

N

denotes the number of samples,

K

denotes the number of categories,

p_{i, k}

represents the true probability distribution, and

{\hat{p}}_{i, k}

represents the predicted probability distribution. In text generation, one-hot vectors are used as the category distribution mainly because of the mutual exclusivity between words in the vocabulary. However, the probability distribution of the output vocabulary learned by the model does not always result in a unimodal probability distribution like a one-hot vector; instead, it may produce multiple possible alternative words. However, such a weight distribution cannot be predefined by us and must be learned by the model itself. In trajectory prediction, Figure 7 illustrates possible prediction scenarios. The black line represents the input trajectory, the green lines represent possible predicted results, and the yellow points represent the true trajectory points. Among the multiple predicted results in the figure, the best prediction is naturally the true result, followed by the closer purple position, and the worst is the farthest red position. Therefore, the closer the predicted result is to the true result, the smaller its loss should be. Thus, in the cross-entropy function, predicted classes that are closer to the true class should have a higher output probability. We can achieve this goal by modifying the true class distribution.

For the new true category probability distribution, we adopt a normal distribution to obtain the weights corresponding to each category. Figure 8 shows this probability distribution. In this figure, the weight corresponding to the true embedding encodings is 0.8, the weights corresponding to two adjacent embedding encodings are 0.1 each, and the weights of the remaining parts are still close to 0. The reason for this distribution is that the current grid cells are approximately 1 km in both length and width. Including additional adjacent areas would result in excessive prediction errors. By modifying the true category distribution in this way, the model can consider nearby points of the true position as potential alternatives during the prediction process, thereby enhancing the model’s generalization ability.

We refer to the improved loss function as DisLoss, and its calculation formula is as follows:

{\begin{cases} G D (i_{target}, σ = 0.5) & = \frac{1}{\sqrt{2 π} σ} \exp (- \frac{{(x - i_{target})}^{2}}{2 σ^{2}}) \\ h_{lat}^{True}, h_{lon}^{True}, h_{sog}^{True}, h_{cog}^{True} & = split (h^{True}) \\ l_{lat}^{True}, l_{lon}^{True}, l_{sog}^{True}, l_{cog}^{True} & = split (l^{True}) \\ DisLoss & = C E (l_{lat}^{True}, G D (h_{lat}^{True})) + C E (l_{lon}^{True}, G D (h_{lon}^{True})) \\ + C E (l_{sog}^{True}, G D (h_{sog}^{True} & )) + C E (l_{cog}^{True}, G D (h_{cog}^{True})) \end{cases}

(5)

In the formula, GD is the formula used to calculate the Gaussian distribution.

σ

and

i_{t a r g e t}

are the standard deviation and the mean value of the Gaussian distribution, respectively. The former is a given value (in the paper, it is 0.5), and the latter is the embedding encoding corresponding to the true value.

h

and

l

are the true probability distribution and the predicted probability distribution, respectively. “

T r u e

” represents the true trajectory, and CE represents the cross-entropy loss function.

2.4. Association Loss

In the model prediction, the four attributes of lat, lon, sog, and cog are selected as inputs mainly because the two position attributes represent the position information of the ship that we need to predict, and sog and cog represent the motion state of the ship. For the simplest NCV model, the position at the next moment can be predicted based on the current position, speed, and time. By interpolating the original trajectory at equal time intervals, the time information has been added to the trajectory data as a kind of hidden information. Therefore, in fact, what we need is to enable the model to learn the relationships among these four attributes, so as to predict the position at the next moment and predict the corresponding motion state according to the possible target of the ship. However, when we apply the trained ship trajectory prediction model to the test, we find that the model does not always understand the relationship between the sog, cog, and position. The model only understands these four attributes as a complex feature and ignores the internal connections among the attributes.

Therefore, we add simulated random trajectory data to the training data to help the model understand the impact of heading and speed information on the ship’s navigation, as shown in Figure 5. Both the simulated trajectory and the real trajectory go through the same model structure. However, unlike for the real trajectory, for the simulated trajectory, there is no need to predict the sog and cog. Only the prediction loss of longitude and latitude needs to be calculated to enable the model to understand the relationships among the four attributes. The loss used to assist the model in understanding the correlations of the attributes is called AssLoss, and its calculation formula is as follows:

\{\begin{cases} h_{l a t}^{S i m}, h_{l o n}^{S i m} = s p l i t (h^{S i m}) \\ l_{l a t}^{S i m}, l_{l o n}^{S i m} = s p l i t (l^{S i m}) \\ AssLoss = CE (l_{l a t}^{S i m}, h_{l a t}^{S i m}) + CE (l_{l o n}^{S i m}, h_{l o n}^{S i m}) \end{cases}

(6)

In the formula,

h

and

l

have the same meanings as those in the previous text, and ‘

S i m

’ represents the simulated trajectory. And the method for generating the simulated trajectory is as follows, along with Algorithm 1 (pseudocode) for generating simulated trajectories. In it, we set the range for generating simulated trajectory points, the maximum speed, the maximum heading, and the length of a single trajectory. The steps are as follows:

Randomly initialize the position, speed, and heading of the starting trajectory point;
Calculate the position of the next trajectory point at the next moment based on the speed and heading, while randomly generating its new speed and heading;
Repeat step 2 until the generated trajectory reaches the set length.

Algorithm 1: Generate_Simulated_Traj (

R O I, s o g_{m a x}, c o g_{m a x}, T_{l e n}

)

Description: Generate simulated trajectories

T r a j_{s i m}

.
Input: the boundary of area

R O I

, the maximum of sog

s o g_{m a x}

= 30,
the maximum of cog

c o g_{m a x}

= 360, the length of trajectory

T_{l e n}

.
Output:

T r a j_{s i m}

.
// Generate the origin position

l a t_{0}, l o n_{0}

= random_point()
// Generate the others point
for i in 0:

T_{l e n}

−1 do
// Randomly generate the speed and direction

s o g_{i}, c o g_{i}

= random_motion()

l a t_{i + 1}, l o n_{i + 1}

= cal_point(

l a t_{i}, l o n_{i}, s o g_{i}, c o g_{i}

)

P_{i} = (l a t_{i}, l o n_{i}, s o g_{i}, c o g_{i}

)
end
Return

T r a j_{s i m} = P_{0 : T_{l e n} - 1}

The parameters for the simulated trajectory generation were chosen to reflect realistic maritime conditions. A maximum speed over ground (sogmax) of 30 knots is a reasonable upper limit for the cargo ships and tankers present in our datasets. It is also the maximum value of sog in the preprocessed dataset.

There are several differences between DisLoss and AssLoss, as outlined below:

In terms of application: DisLoss is used for real trajectory prediction, while AssLoss is used for simulated trajectory prediction.
In terms of weight assignment: The weight distribution of DisLoss adopts a normal distribution, whereas for AssLoss, the weight corresponding to the true value is 1 and the weights for all other values are 0.
In terms of purpose: DisLoss aims to teach the model about the relative distances between attribute values, while AssLoss aims to force the model to learn the physical association between motion attributes (SOG, COG) and the resulting position change.

3. Experiment

In this section, we evaluated the MART model on two real AIS datasets and also added comparisons of experimental results with other models. In addition, we also conducted ablation experiments to evaluate the impact of each method on the experimental results. In order to facilitate the reproduction of the model proposed in the paper, we have made the code and dataset open-source https://github.com/sgxzz1/MART (accessed on 1 September 2025).

3.1. Datasets

The dataset information is shown in Table 1. The first dataset is the data of the Danish area open-sourced by https://github.com/CIA-Oceanix/TrAISformer (accessed on 1 September 2025). The other dataset is obtained by downloading from the official website of the Danish Maritime Authority and preprocessing. Both datasets contain AIS data of cargo ships and tankers. For the former dataset, the time range is from 1 January 2019 to 31 March 2019, and the regional range is from (

55.5 ° N

,

10.3 ° E

) to (

58 ° N

,

13 ° E

). The data from 1 January to 10 March are used as the training set, the data from 11 March to 20 March are used as the validation set, and the remaining data are used as the test set. In total, the data contain 13,679 ship trajectory data points. For the latter dataset, the time range is from 1 September 2023 to 29 February 2024, and the regional range is a rectangular area from (

51 ° N

,

1 ° W

) to (

60 ° N

,

21.2 ° E

). The data are randomly divided into a training set, a validation set, and a test set at a ratio of 8:1:1. In total, the data contain 78,647 ship trajectory data points. For the convenience of description, the former dataset is called Area 1 and the latter is called Area 2. The data preprocessing method refers to [22]. The data of the two areas are shown in Figure 9. The preprocessed trajectory points are shown in Figure 10, where we have selected three trajectories and distinguished them with different colors.

3.2. Model Parameters

In the result display, all models with the Transformer architecture use eight-layer Transformer blocks, and each layer uses eight-head attention. The unit grid size for longitude and latitude is

0.01 °

, the unit of sog is 1 knot, and the unit of cog is

5 °

. The embedding lengths

e_{t}^{L a t}, e_{t}^{L o n}, e_{t}^{S O G}, e_{t}^{C O G}

corresponding to each unit are 256, 256, 128, and 128, respectively. The discretization of continuous values will inevitably lead to a loss of precision. Within the training area, after dividing the latitude by 0.01°, the length of the corresponding grid is approximately 1.11 km; after dividing the longitude by 0.01°, the length of the corresponding grid ranges from approximately 0.56 km to 0.7 km. Therefore, the maximum theoretical error caused by precision loss is about 0.65 km. Although information loss exists, this granularity is acceptable for our long-term prediction task. The length of the input historical trajectory is 3 h, and the future trajectory for 15 h is predicted. We train our model on a single NVIDIA A10 GPU. For the training of the model in Area 1, it takes about 1 h, while for the model in Area 2, the training takes about 9 h. The differences in training times between the two regions are mainly due to the following reasons: First, the number of trajectories in Area 2 is nearly six times that of Area 1. With the increase in data volume, the model also requires more training epochs to converge. Second, since the scope of Area 2 is larger, its latitude and longitude embedding vectors will also increase, which in turn leads to an increase in the number of model parameters.

3.3. Evaluation Criteria

3.3.1. Haversine Distance

For the prediction error of each step t, due to the high calculation efficiency of the Haversine formula and the small error in short distance calculation, the Haversine distance between the real point and the predicted point is used to represent the prediction error:

d_{t} = 2 R \arcsin (\sqrt{\sin^{2} (\frac{ϕ_{2} - ϕ_{1}}{2}) + \cos (ϕ_{1}) \cos (ϕ_{2}) \sin^{2} (\frac{λ_{2} - λ_{1}}{2})})

(7)

In the formula,

R

represents the radius of the Earth, and

ϕ_{1}

,

ϕ_{2}

,

λ_{1}

,

λ_{2}

represent the latitudes and longitudes of the predicted point and the actual point, respectively.

3.3.2. Fréchet Distance

The performance of trajectory prediction can also be evaluated by calculating the similarity between the real trajectory and the predicted trajectory. In this paper, the Fréchet distance is used to compute the distance between two trajectories, and its calculation formula is as follows:

F r e c h e t_{t} (A_{t}, B_{t}) = \min_{μ} \max_{a \in A} d (a, μ (a))

(8)

where

μ : A_{t} \to B_{t}

refers to a one-to-one continuous mapping from a point on trajectory

A

to a point on trajectory

B

. Here,

d

denotes the distance calculated between point pairs, and the Haversine distance is used for this purpose.

Due to the diversity of ship trajectories, we adopt the method of multiple sampling, generating multiple different trajectories through sampling and selecting the best one as the prediction result. The process of multiple sampling is as follows:

Based on the probability distribution output by the model, random sampling is performed to obtain a predicted trajectory point.
This newly predicted point is used as input to continue predicting the probability distribution of the next point, and sampling is conducted again.
Repeat this process until a complete predicted trajectory is generated.

3.4. Comparative Experiment

We compared MART with different trajectory prediction models, including seq2seq [20], seq2seq_attn [21], and TrAISformer [22]. To ensure the input formats were as consistent as possible, we added embedding layers to all tested models. All models were evaluated in two areas, and the experimental results are presented in Figure 11. In these figures, we use the same symbols to represent the prediction results of the same model in different regions, and the error values are calculated using the evaluation criteria proposed above. Table 2 shows the prediction errors of each model at the fifth, tenth, and fifteenth hours in both areas. In these two areas, the MART model outperformed all other models. When the Haversine distance was used as the evaluation metric, it achieved a 9.5% higher accuracy than the second-best model in the 15 h prediction task for Area 1, and a 15.4% accuracy improvement in the same prediction task for Area 2. In tests across multiple time intervals and different areas, the average error of this model was reduced by approximately 10–15%. When the Fréchet distance was adopted as the evaluation metric, the prediction errors of all models increased further; however, the MART model still remained the optimal one. Specifically, it achieved a 5.5% accuracy improvement in Area 1 and an 11.7% accuracy improvement in Area 2. In tests spanning multiple time intervals and different areas, the average error of this model was reduced by approximately 5–12%. From the above comparison results, it can be seen that the MART model achieved better performance both in the prediction of trajectory points at a single moment and in the fitting of the entire trajectory, which further proves the effectiveness of the AssLoss and DisLoss modules.

Next, we conducted a Paired Samples t-test to evaluate whether the prediction accuracy of the MART model is statistically significantly better than other comparison models. For each model, its error sequence was obtained

E_{i} = \{d_{1}, d_{2}, \dots, d_{t}, \dots d_{T}\}

. Among them,

d_{t}

represents the average value of the distances between all predicted point positions and real point positions at time t. For the two models M1 and M2, the null hypothesis (the accuracy of the M1 model is not significantly better than that of the M2 model) and the alternative hypothesis (the accuracy of the M2 model is significantly better than that of the M2 model) can be defined as follows:

H 0 : μ (E_{i}) \geq μ (E_{j})

(9)

H 1 : μ (E_{i}) < μ (E_{j})

(10)

The significance level is set to 0.05, and the comparison results of the prediction accuracy of the different models are shown in Table 3 and Table 4. If marked as “YES”, it means accepting the alternative hypothesis; on the contrary, accept the null hypothesis. It can be observed that the prediction accuracy of MART is significantly better than the others.

The improvement in accuracy is primarily attributed to the innovative methods we proposed. For illustrative purposes, we selected representative trajectories from two areas for presentation and explanation, as shown in Figure 12. With DisLoss, we provided the model with prior knowledge of the relative distances between different values within the same attribute. Meanwhile, AssLoss enabled the model to learn the correlations between different attributes. In contrast, the other three models exhibited significant deviations when predicting two straight-line trajectories in the test dataset. This discrepancy arises because they failed to capture the semantic meaning of direction, leading to immediate divergence from the true course starting from the first predicted point. Conversely, MART consistently predicted the initial point after the input trajectory accurately. Although it cannot guarantee perfectly accurate long-term predictions, it maintains the general directionality, ensuring that the predicted trajectories remain close to the ground truth.

In the trajectory prediction for narrow waterways and tortuous routes, the MART model can always fit the trajectory route well and make timely direction changes; and even for some trajectories that cannot be fully fitted, it always moves along the course of the real trajectory. This indicates that the model has an accurate understanding of the trajectory destination. In contrast, other models sometimes fail to identify the destination clearly, which leads to the phenomenon of stagnation or wandering in the predicted trajectories.

Moreover, since the positions corresponding to adjacent embedding encodings are closer in the actual space, we attempted to calculate whether the vectors corresponding to adjacent embedding encodings in the position vectors would be closer in the vector space. Thus, we calculated their cosine similarities, as shown in Figure 13 The formula for calculating cosine similarity is as follows:

S i m (A^{j}, A^{j + 1}) = \frac{A^{j} \cdot A^{j + 1}}{|| A^{j} || \times || A^{j + 1} ||} = \frac{\sum_{i = 1}^{n} (A_{i}^{j} * A_{i}^{j + 1})}{\sqrt{\sum_{i = 1}^{n} A_{i}^{j} * A_{i}^{j}} \times \sqrt{\sum_{i = 1}^{n} A_{i}^{j + 1} * A_{i}^{j + 1}}}

(11)

In the formula,

A

represents the set of embedding vectors for longitude or latitude,

A^{j}

denotes the vector obtained after the index j passes through the corresponding attribute embedding layer, and

A_{i}^{j}

represents the i-th element in the vector

A^{j}

. Observing the above formula, it is easy to see that the range of similarity is (−1, 1), which corresponds to the y-axis in the figure. The value j on the x-axis represents the cosine similarity calculated between the embedding vectors corresponding to the j-th index and the (j + 1)-th index. Taking figure (a) as an example, since the latitude in Area 1 is divided into 250 segments, there are 250 corresponding indices. As a result, 249 cosine similarity values can be obtained by calculating the similarity between adjacent vectors, so the range of the x-axis is (0, 248). The x-axis of the other figures follows the same logic.

From the similarity graph of the longitude and latitude vector space in Area 1, it can be seen that the vectors corresponding to adjacent position embedding encodings always have a high similarity. In Area 2, except for the MART model, other models cannot learn the spatial vectors of these areas sufficiently due to the small number of trajectories in the marginal areas. As a result, the similarity of the adjacent embedding vectors in these areas is low, which is also the reason why the other models perform poorly in predicting the test trajectories as shown in Figure 12. By adding simulated trajectories, we enable MART to fully learn and understand the position vectors of all areas in the space, so the similarity of adjacent vectors is always high.

The performance variation between Area 1 and Area 2 highlights the model’s response to different data distributions. Area 2, while containing more trajectories overall (78,647 vs. 13,679), covers a significantly larger geographical area, resulting in sparser data coverage in many regions, particularly at the periphery as seen in Figure 9. This sparsity is a key challenge for baseline models, as evidenced by the low cosine similarity of adjacent latitude vectors in these regions for TrAISformer (Figure 13c), indicating poorly trained position vectors. In contrast, MART’s use of simulated trajectories for AssLoss ensures that the model learns the underlying physics of movement even in areas with no real training data. This allows it to maintain high cosine similarity across the entire vector space and results in a more pronounced performance improvement in the sparser, more challenging Area 2 (a 15.4% error reduction vs. 9.5% in Area 1).

3.5. Ablation Experiment

To demonstrate the improvement of each method on the prediction effect, we conducted ablation experiments and the results are presented in Figure 14 and Table 5. As can be seen from the table, compared with the original model, when the Haversine distance is used as the evaluation metric, the DisLoss method has improved the prediction effect in both regions. However, when the Fréchet distance is used instead as the evaluation metric, the improvement effect of DisLoss decreases significantly, and in Area 2, the performance after improvement is even worse than before. The AssLoss method shows the opposite trend: in Area 1, when the former (Haversine distance) is used as the evaluation metric, the AssLoss method achieves better prediction performance within 10 h, but its prediction effect is worse than that of the original model after 10 h. Nevertheless, when the evaluation metric is switched to the latter (Fréchet distance), the error variation of AssLoss is relatively small, and it outperforms the original model in both regions. The MART model, which integrates these two improvement methods, combines the advantages of both: it not only achieves accurate single-point prediction but also performs well in fitting the overall trajectory. This result not only proves the effectiveness of these two improvement methods but also strongly indicates that providing the model with the correlation between attributes and the distance information between attribute values can indeed help the model achieve better prediction results.

4. Discussion

From the result graphs and tables of the comparative experiment, it can be seen that the proposed MART model achieved the best performance both in the original Danish region and the expanded larger region. Moreover, as the prediction time increases, the improvement in prediction accuracy becomes more significant. At 15 h, the improvement compared to the second-best model is more than 10%, which also proves the superiority and stability of the MART model. Moreover, when MART uses real AIS data that contain complex maritime transportation patterns, its prediction error within 15 h remains less than 10 nautical miles (10 nautical miles ≈ 18.52 km). This result indicates that it can assist in maritime search and rescue operations under conditions of low visibility at sea. Additionally, in maritime supervision, it can also provide early warnings for potential route conflicts; when applied to abnormal behavior detection, if the actual trajectory deviates significantly from the high-confidence predicted trajectory, it indicates that an abnormal event may have occurred. Since the two improvement methods proposed above do not actually involve modifications to the structure of the Transformer model, theoretically, they are both transferable methods, and it is feasible to attempt to transplant them into other models, such as recurrent neural networks. However, as the above methods are all based on probability sampling experiments, when transplanting them, it is necessary to perform grid processing on the data, converting continuous values into discrete values to facilitate the final sampling generation. Furthermore, the practical utility of the MART model could be further validated through experiments in more challenging scenarios. The current datasets primarily represent routine voyages; testing the model’s performance in congested waterways, under adverse weather conditions, or on abnormal trajectories (e.g., collision avoidance maneuvers) would highlight its robustness. This constitutes a significant avenue for future research but would necessitate the curation of specialized datasets that include contextual labels for such conditions.

While we have sought to enable the model to gain a deeper understanding of the intrinsic properties of route data, it still fundamentally relies on memorizing trajectory routes. Specifically, the probability distribution of its predictions essentially mirrors the distribution of routes present in the training set—making it difficult for the model to predict routes unseen during training. Our effort to enable the model to learn the relationship between position and speed is precisely driven by this limitation. Although AssLoss enhances the model’s predictive capability in data-sparse regions by learning basic motion laws, its predictive performance remains limited in entirely new regions where “no ships have ever passed through.” This is essentially an Out-of-Distribution Generalization (OOD Generalization) problem, which is not only one of the limitations of the current model but also a direction worthy of further research in the future. If it is to be used for more complex trajectory prediction, we need to consider more factors. For example, regarding the AIS data, the datasets from the two regions we currently use are of high quality with little noise, so the data processing methods are relatively simple, requiring only deduplication and removal of obvious outliers. However, when dealing with datasets significantly affected by noise, we need to adopt more complex denoising methods. Since different denoising methods yield varying results in dataset processing [25], this will also require us to conduct comparisons and make decisions in the future. In addition, there are maritime environmental factors. Similar to land traffic prediction, where factors such as the number of lanes and lane-changing directions affect the lane-changing time of vehicles [26], factors like nearby vessels of the target ship and climate conditions may also impact the navigation of the ship. For the impact of weather factors on ship trajectories, we will attempt to collect spatiotemporal information of the region, obtain the corresponding weather conditions based on the location and timestamp information of trajectory points, and then discretize different weather conditions to input them into the model as information and connect them with other information through embedding. Alternatively, we can train different models for different weather conditions. Regardless of the method used, this is highly challenging. For the nearby vessels around the target vessel, current mainstream methods adopt Graph Neural Network (GNN) approaches to extract relevant information [27,28,29]. They treat the target ship and all its surrounding neighboring ships as an entire spatial graph, where each ship serves as a node in the graph. The influence between ships is represented by the distance between them, which enables the models to exhibit stronger robustness in areas with complex maritime traffic. This also constitutes the research direction of our future work; going forward, we will conduct in-depth research on how to integrate the influence of neighboring ships as parameter information with the MART model.

5. Conclusions and Future Work

In this study, we established associations between the four attributes of the trajectory input and the embedding encodings, and provided this information to the model by improving the calculation of the loss function. Meanwhile, we incorporated simulated trajectories into the training to enable the model to understand the associations between the input attributes. Through the improvement of the above methods, we achieved enhancements to the original model and obtained optimal prediction results. By comparing the results of the ablation experiment, we verified the significance of each component of the method in improving the model. Beyond proposing the two methods, what is more important is that our work validates a new research insight: by cleverly designing training tasks and loss functions, physical prior knowledge and the inherent topological relationships of data can be integrated into deep learning models. This enables the model to shift from simply relying on “trajectory pattern memorization” to “understanding motion laws,” which is of great significance for enhancing the model’s generalization ability in data-sparse regions.

In future research, we will further improve the structure of these two modules. Regarding the DisLoss module, we currently perform weight assignment for it using artificially set parameters. In subsequent work, we will attempt to enable the model to learn the weight distribution independently. When this is achieved, it can pay attention to multiple possible routes to solve more complex route-related problems. As for the AssLoss module, we will integrate a ship dynamics model to simulate the phenomenon of speed decay that occurs when a ship is steering. This will enhance the authenticity of the simulated data, thereby significantly strengthening the ability of AssLoss to assist the model in learning the deep correlations between attributes. In this paper, we tried to solve the multi-modal issue through a probabilistic sampling method; however, the results of probabilistic sampling are closely related to the distribution of the training set, and it is difficult to model potential trajectories. To address this problem, we have decided to proceed from two aspects: the dataset and the model. For the dataset, we will adopt a clustering method to eliminate duplicate trajectories. For the model, we will attempt to combine this model with Mixture Density Networks (MDNs) to model potential navigation trajectories, thereby solving the multi-modal issue more effectively. Since maritime navigation in complex environments requires considering factors beyond the factor of the target ship that we have not yet accounted for in our current work, we will attempt to integrate these factors with our model to make its prediction results more practically meaningful. Meanwhile, we will also seek appropriate methods to denoise the source data. At this point, it is necessary to identify suitable denoising methods to address potential noise in the original data.

Author Contributions

Conceptualization, methodology, investigation, and writing—original draft, Senyang Zhao; data curation and writing—review and editing, Wei Guo and Yi Liu. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AIS	Automatic Identification System
MART	Multi-dimensional Attribute Relationship Transformer
AssLoss	Association Loss
IMO	International Maritime Organization
lat	Latitude
lon	Longitude
sog	Speed over ground
cog	Course over ground
MMSI	Maritime Mobile Service Identity
NCV	Nearly Constant Velocity
KDE	Kernel Density Estimation
RNN	Recurrent Neural Network
LSTM	Long Short-Term Memory
CE	Cross Entropy Loss
FFN	Feed Forward Network
GNN	Graph Neural Network
OOD Generalization	Out-of-Distribution Generalization
MDNs	Mixture Density Networks

References

Chen, C.-H.; Khoo, L.P.; Chong, Y.T.; Yin, X.F. Knowledge Discovery Using Genetic Algorithm for Maritime Situational Awareness. Expert Syst. Appl. 2014, 41, 2742–2753. [Google Scholar] [CrossRef]
Tu, E.; Zhang, G.; Mao, S.; Rachmawati, L.; Huang, G.-B. Modeling Historical AIS Data for Vessel Path Prediction: A Comprehensive Treatment. arXiv 2020, arXiv:2001.01592. [Google Scholar]
Liu, C.; Li, Y.; Jiang, R.; Du, Y.; Lu, Q.; Guo, Z. TPR-DTVN: A Routing Algorithm in Delay Tolerant Vessel Network Based on Long-Term Trajectory Prediction. Wirel. Commun. Mob. Comput. 2021, 2021, 6630265. [Google Scholar] [CrossRef]
Nguyen, X.-P.; Dang, X.-K.; Do, V.-D.; Corchado, J.M.; Truong, H.-N. Robust Adaptive Fuzzy-Free Fault-Tolerant Path Planning Control for a Semi-Submersible Platform Dynamic Positioning System with Actuator Constraints. IEEE Trans. Intell. Transp. Syst. 2023, 24, 12701–12715. [Google Scholar] [CrossRef]
Bi, J.; Cheng, H.; Zhang, W.; Bao, K.; Wang, P. Artificial Intelligence in Ship Trajectory Prediction. J. Mar. Sci. Eng. 2024, 12, 769. [Google Scholar] [CrossRef]
Dang, X.K.; Truong, H.N.; Do, V.D. A Path Planning Control for a Vessel Dynamic Positioning System Based on Robust Adaptive Fuzzy Strategy. Automatika 2022, 63, 580–592. [Google Scholar] [CrossRef]
Guan, M.; Cao, Y.; Cheng, X. Research of AIS Data-Driven Ship Arrival Time at Anchorage Prediction. IEEE Sens. J. 2024, 24, 12740–12746. [Google Scholar] [CrossRef]
Ou, Z.; Zhu, J. AIS Database Powered by GIS Technology for Maritime Safety and Security. J. Navig. 2008, 61, 655–665. [Google Scholar] [CrossRef]
Varlamis, I.; Tserpes, K.; Sardianos, C. Detecting Search and Rescue Missions from AIS Data. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), Paris, France, 16–20 April 2018; pp. 60–65. [Google Scholar]
Venskus, J.; Treigys, P.; Markevičiūtė, J. Unsupervised Marine Vessel Trajectory Prediction Using LSTM Network and Wild Bootstrapping Techniques. Nonlinear Anal. Model. Control. 2021, 26, 718–737. [Google Scholar] [CrossRef]
Olesen, K.V.; Boubekki, A.; Kampffmeyer, M.C.; Jenssen, R.; Christensen, A.N.; Hørlück, S.; Clemmensen, L.H. A Contextually Supported Abnormality Detector for Maritime Trajectories. J. Mar. Sci. Eng. 2023, 11, 2085. [Google Scholar] [CrossRef]
Jurkus, R.; Venskus, J.; Markevičiūtė, J.; Treigys, P. Enhancing Maritime Safety: Estimating Collision Probabilities with Trajectory Prediction Boundaries Using Deep Learning Models. Sensors 2025, 25, 1365. [Google Scholar] [CrossRef]
Zhang, X.; Fu, X.; Xiao, Z.; Xu, H.; Qin, Z. Vessel Trajectory Prediction in Maritime Transportation: Current Approaches and Beyond. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19980–19998. [Google Scholar] [CrossRef]
Rong Li, X.; Jilkov, V.P. Survey of Maneuvering Target Tracking. Part I. Dynamic Models. IEEE Trans. Aerosp. Electron. Syst. 2003, 39, 1333–1364. [Google Scholar] [CrossRef]
Ristic’, B.; Scala’, B.L.; Morelande’, M.; Gordon, N. Statistical Analysis of Motion Patterns in AIS Data: Anomaly Detection and Motion Prediction. In Proceedings of the 2008 11th International Conference on Information Fusion, Cologne, Germany, 30 June–3 July 2008. [Google Scholar]
Mazzarella, F.; Arguedas, V.F.; Vespe, M. Knowledge-Based Vessel Position Prediction Using Historical AIS Data. In Proceedings of the 2015 Sensor Data Fusion: Trends, Solutions, Applications (SDF), Bonn, Germany, 6–8 October 2015; pp. 1–6. [Google Scholar]
Rong, H.; Teixeira, A.P.; Guedes Soares, C. Ship Trajectory Uncertainty Prediction Based on a Gaussian Process Model. Ocean Eng. 2019, 182, 499–511. [Google Scholar] [CrossRef]
Ma, H.; Zuo, Y.; Li, T. Vessel Navigation Behavior Analysis and Multiple-Trajectory Prediction Model Based on AIS Data. J. Adv. Transp. 2022, 2022, 6622862. [Google Scholar] [CrossRef]
Park, J.; Jeong, J.; Park, Y. Ship Trajectory Prediction Based on Bi-LSTM Using Spectral-Clustered AIS Data. J. Mar. Sci. Eng. 2021, 9, 1037. [Google Scholar] [CrossRef]
Forti, N.; Millefiori, L.M.; Braca, P.; Willett, P. Prediction Oof Vessel Trajectories from AIS Data via Sequence-to-Sequence Recurrent Neural Networks. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 8936–8940. [Google Scholar]
Capobianco, S.; Millefiori, L.M.; Forti, N.; Braca, P.; Willett, P. Deep Learning Methods for Vessel Trajectory Prediction Based on Recurrent Neural Networks. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 4329–4346. [Google Scholar] [CrossRef]
Nguyen, D.; Fablet, R. A Transformer Network with Sparse Augmented Data Representation and Cross Entropy Loss for AIS-Based Vessel Trajectory Prediction. IEEE Access 2024, 12, 21596–21609. [Google Scholar] [CrossRef]
Li, H.; Lam, J.S.L.; Yang, Z.; Liu, J.; Liu, R.W.; Liang, M.; Li, Y. Unsupervised Hierarchical Methodology of Maritime Traffic Pattern Extraction for Knowledge Discovery. Transp. Res. Part C Emerg. Technol. 2022, 143, 103856. [Google Scholar] [CrossRef]
Li, H.; Jiao, H.; Yang, Z. Ship Trajectory Prediction Based on Machine Learning and Deep Learning: A Systematic Review and Methods Analysis. Eng. Appl. Artif. Intell. 2023, 126, 107062. [Google Scholar] [CrossRef]
Chen, X.; Wu, S.; Shi, C.; Huang, Y.; Yang, Y.; Ke, R.; Zhao, J. Sensing Data Supported Traffic Flow Prediction via Denoising Schemes and ANN: A Comparison. IEEE Sens. J. 2020, 20, 14317–14328. [Google Scholar] [CrossRef]
Chen, S.; Piao, L.; Zang, X.; Luo, Q.; Li, J.; Yang, J.; Rong, J. Analyzing Differences of Highway Lane-Changing Behavior Using Vehicle Trajectory Data. Phys. A Stat. Mech. Its Appl. 2023, 624, 128980. [Google Scholar] [CrossRef]
Wang, S.; Li, Y.; Xing, H.; Zhang, Z. Vessel Trajectory Prediction Based on Spatio-Temporal Graph Convolutional Network for Complex and Crowded Sea Areas. Ocean Eng. 2024, 298, 117232. [Google Scholar] [CrossRef]
Zhang, R.; Chen, X.; Ye, L.; Yu, W.; Zhang, B.; Liu, J. Predicting Vessel Trajectories Using ASTGCN with StemGNN-Derived Correlation Matrix. Appl. Sci. 2024, 14, 4104. [Google Scholar] [CrossRef]
Jin, W.Z.; Zhang, X.D.; Tang, H.N. STGDPM:Vessel Trajectory Prediction with Spatio-Temporal Graph Diffusion Probabilistic Model. arXiv 2025, arXiv:2503.08065. [Google Scholar]

Figure 1. Multi-modal problem presentation.

Figure 2. Test trajectory display.

Figure 3. Testing trajectory prediction results.

Figure 4. TrAISformer vector space similarity graph.

Figure 5. Architecture of the ship trajectory prediction model.

Figure 6. Text and trajectory of the corresponding embedding encoding example.

Figure 7. Schematic diagram of trajectory prediction.

Figure 8. True category probability distribution (

σ = 0.5, m e a n = i_{t a r g e t}

).

Figure 8. True category probability distribution (

σ = 0.5, m e a n = i_{t a r g e t}

).

Figure 9. Preprocessed dataset display.

Figure 10. Visualization of the processed AIS data: the trajectories used for demonstration are represented in the form of trajectory points, each with a different color.

Figure 11. Multiple model prediction results: the red line represents seq2seq, the blue line represents seq2seq_attn, the green line represents TrAISformer, and the purple line represents MART.

Figure 12. Display of multi-trajectory prediction results: the solid red line represents the input trajectory, the solid green line represents the predicted trajectory of the model, and the dashed blue line represents the real trajectory to be predicted.

Figure 13. Vector space cosine similarity graph: (a–d), respectively, calculate the cosine similarity of adjacent longitudes and latitudes in the vector space for different models in different regions.

Figure 14. Multiple model prediction results.

Table 1. Dataset display.

Datasets	Time Range	Spatial Range	Data Volume
Area 1	2019.1.1–2019.3.31	(55.5°, 10.3°)–(58°, 13°)	13,679
Area 2	2023.9.1–2024.2.29	(51°, −1°)–(60°, 21.2°)	78,647

Table 2. Multiple model prediction errors (unit: km).

Model	5 h (Area 1)	10 h (Area 1)	15 h (Area 1)	5 h (Area 2)	10 h (Area 2)	15 h (Area 2)	Type
seq2seq	4.43	9.24	15.58	4.83	11.49	19.64	Haversine
seq2seq	5.31	11.48	22.10	5.50	13.13	22.51	Fréchet
seq2seq_attn	4.46	8.93	15.68	4.64	10.72	18.99	Haversine
seq2seq_attn	5.22	10.41	20.27	5.43	12.47	21.66	Fréchet
TrAISformer	5.22	9.76	18.56	4.74	11.40	18.97	Haversine
TrAISformer	6.13	12.88	30.17	5.31	12.97	21.76	Fréchet
MART	4.30	8.07	14.10	4.19	9.57	16.04	Haversine
MART	5.06	9.74	18.99	4.97	11.24	18.82	Fréchet

Table 3. Statistical comparison of the prediction accuracy of different models (Area 1).

Model	seq2seq	seq2seq_attn	TrAISformer
MART	YES	YES	YES
seq2seq		NO	YES
seq2seq_attn			YES

Table 4. Statistical comparison of the prediction accuracy of different models (Area 2).

Model	seq2seq	seq2seq_attn	TrAISformer
MART	YES	YES	YES
seq2seq		NO	NO
seq2seq_attn			YES

Table 5. Ablation experiment results (unit: km).

Model	5 h (Area 1)	10 h (Area 1)	15 h (Area 1)	5 h (Area 2)	10 h (Area 2)	15 h (Area 2)	Type
Without Improvement	5.22	9.76	18.56	4.74	11.40	18.97	Haversine
Without Improvement	6.13	12.88	30.17	5.31	12.97	21.76	Fréchet
Only With DisLoss	4.62	8.81	13.14	4.35	10.33	17.20	Haversine
Only With DisLoss	5.27	9.41	20.51	5.33	12.90	22.42	Fréchet
Only With AssLoss	4.64	9.26	20.64	4.67	10.40	16.74	Haversine
Only With AssLoss	5.62	13.11	21.55	5.04	11.77	19.43	Fréchet
MART	4.30	8.07	14.10	4.19	9.57	16.04	Haversine
MART	5.06	9.74	18.99	4.97	11.24	18.82	Fréchet

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, S.; Guo, W.; Liu, Y. MART: Ship Trajectory Prediction Model Based on Multi-Dimensional Attribute Association of Trajectory Points. ISPRS Int. J. Geo-Inf. 2025, 14, 345. https://doi.org/10.3390/ijgi14090345

AMA Style

Zhao S, Guo W, Liu Y. MART: Ship Trajectory Prediction Model Based on Multi-Dimensional Attribute Association of Trajectory Points. ISPRS International Journal of Geo-Information. 2025; 14(9):345. https://doi.org/10.3390/ijgi14090345

Chicago/Turabian Style

Zhao, Senyang, Wei Guo, and Yi Liu. 2025. "MART: Ship Trajectory Prediction Model Based on Multi-Dimensional Attribute Association of Trajectory Points" ISPRS International Journal of Geo-Information 14, no. 9: 345. https://doi.org/10.3390/ijgi14090345

APA Style

Zhao, S., Guo, W., & Liu, Y. (2025). MART: Ship Trajectory Prediction Model Based on Multi-Dimensional Attribute Association of Trajectory Points. ISPRS International Journal of Geo-Information, 14(9), 345. https://doi.org/10.3390/ijgi14090345

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MART: Ship Trajectory Prediction Model Based on Multi-Dimensional Attribute Association of Trajectory Points

Abstract

1. Introduction

2. Methodology

2.1. Problem Statement

2.2. Model Structure

2.3. Distance Loss

2.4. Association Loss

3. Experiment

3.1. Datasets

3.2. Model Parameters

3.3. Evaluation Criteria

3.3.1. Haversine Distance

3.3.2. Fréchet Distance

3.4. Comparative Experiment

3.5. Ablation Experiment

4. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI