Application of an Encoder–Decoder Model with Attention Mechanism for Trajectory Prediction Based on AIS Data: Case Studies from the Yangtze River of China and the Eastern Coast of the U.S.

: With the rapid growth of shipping volumes, ship navigation and path planning have attracted increased attention. To design navigation routes and avoid ship collisions, accurate ship trajectory prediction based on automatic identiﬁcation system data is required. Therefore, this study developed an encoder–decoder learning model for ship trajectory prediction, to avoid ship collisions. The proposed model includes long short-term memory units and an attention mechanism. Long short-term memory can extract relationships between the historical trajectory of a ship and the current state of encountered ships. Simultaneously, the global attention mechanism in the proposed model can identify interactions between the output and input trajectory sequences, and a multi-head self-attention mechanism in the proposed model is used to learn the feature fusion representation between the input trajectory sequences. Six case studies of trajectory prediction for ship collision avoidance from the Yangtze River of China and the eastern coast of the U.S. were investigated and compared. The results showed that the average mean absolute errors of our model were much lower than those of the classical neural networks and other state-of-the-art models that included attention mechanisms.


Introduction
Since 2002, the International Maritime Organization (IMO) has required that all seagoing ships (>300 GT) and passenger ships are equipped with an onboard automatic identification system (AIS) [1]. This is a transmission and communication technology that enables a ship to transmit AIS information to other ships. This information includes the ship identity, location, speed, and course; that is, the ship navigation behavior and status [2][3][4]. Based on these data, ships can effectively avoid collisions with other ships. Decisions about collision avoidance must comply with the collision avoidance rules formulated by the IMO, which have been noted in the Convention on the International Rules for the Prevention of Collisions at Sea (COLREGs) [5]. The risk of collision is specified in the COLREGs, which is assessed based on the estimated closest point of approach. The distance closest point of approach and time closest point of approach are used as indicators of collision risk. If these two values are less than the threshold, a risk of ship collision is considered [6]. Therefore, it is necessary to accurately predict the trajectory of ships, to help with ship navigation planning and collision warning [7,8].
To address the above issues, this paper proposes an encoder-decoder model for ship trajectory prediction for collision avoidance, which uses a sequence-to-sequence (Seq2Seq) structure and multi-attention mechanism [29][30][31]. The main advances of this study in the field of machine learning and ship navigation can be divided into two aspects.
First, this study is the first to introduce state-of-the-art attention mechanisms into the field of navigation trajectory prediction, to effectively capture the potential information and correlations in the AIS series data. Second, this study applied the proposed model to case studies of ship collision avoidance, to demonstrate its effectiveness and efficiency. We performed experiments with six case studies of trajectory prediction for ship collision avoidance on the Yangtze River of China and the eastern coast of United States. The results showed that the mean absolute error (MAE) of our model in trajectory prediction was much lower than those of the classical models, such as back-propagation neural networks (BPNN) and LSTM. Furthermore, our model also outperformed other state-of-art models with attention mechanisms for trajectory prediction.
The remainder of this paper is organized as follows: Section 2 reviews related work in the field of ship trajectory prediction. Section 3 summarizes the trajectory prediction model studied in this paper and introduces data preprocessing. In Section 4, we apply the proposed prediction method to real data of AIS and summarize the results. Section 5 discusses our conclusions and future work.

Trajectory Prediction Based on Kinematics Models
Regarding kinematics models, most studies directly used the current position and sailing speed of the ship to estimate its future position and then used the constant speed and ground heading values to predict the future position of the ship [9,10]. These studies also described the uncertainty of the future position of ships based on statistical models [11][12][13]. On the other hand, ship trajectory prediction can be considered a typical time-series problem; therefore, Kalman filters [14,15] and Markov models [16,17] are used. Perera et al. [15] proposed an extended Kalman filter to formulate the ship position, speed, and acceleration, to predict its trajectory under noisy conditions. Guo et al. [17] divided the designated sea area into grids, with the state of ship position, speed, and direction, and then used a K-order hidden Markov model to establish the state transition matrix for prediction.

Trajectory Prediction Based on Machine Learning Techniques
Classical machine-learning methods, such as SVM [19] and clustering algorithm [20], are widely used in ship trajectory prediction. They have improved prediction efficiency and accuracy. At the beginning of the 2000s, Hinton et al. proposed a multi-hidden-layer neural-network model [32]; deep-learning methods have shown advanced performance in the field of machine learning. In trajectory prediction, deep-learning methods have achieved higher prediction accuracy than MLP [21] and BPNN [22]. Since RNN [28] and LSTM [33] have become the most representative prediction methods for time-series classification and prediction models, a large number of studies have applied them to ship trajectory prediction [24][25][26]. Based on a RNN, the encoder-decoder model is considered the standard method for Seq2Seq prediction tasks, because of its excellent performance in machine translation [30] and speech recognition [34], which can also be applied to trajectory prediction [35]. The Seq2Seq model based on an attention mechanism [31,36,37] has proven its effectiveness in a wide range of prediction tasks. Several studies have applied attention-mechanism-based models to the field of ship trajectory prediction [38][39][40][41]. Capobianco et al. [41] proposed an attention-based recursive encoder-decoder architecture to solve the trajectory prediction problem of applying uncertainty quantification to a case study in the maritime field.
In comparison with previous studies, the model proposed in this study introduces multiple-attention modules of global attention and multi-head self-attention. The global attention mechanism is used to combine the trajectory history and current state information to obtain hidden information from within sequences, and the multi-head self-attention mechanism can capture the spatiotemporal correlations between sequence-feature data, to perform feature fusion for generating new feature representations, as well as effectively capturing potential information and correlations contained in the ship position sequence.

Problem Statement
During marine traffic encounters, an AIS can obtain the state of interaction between the target ship and the surrounding ships. The state of the ship at time t can be expressed as s t = (LON t , LAT t , SOG t , COG t ), where LON, LAT, SOG and COG represent longitude, latitude, speed, and heading, respectively. s t−1 − s t represents the change in relative position and state information from t-1 to time t. The spatial position of the encounter ship can be expressed as s t , where s t − s t represents the relative position and navigation status information from the encounter ship to the target ship.
As shown in Figure 1, the ship state information at t is used as input to the prediction model, and the location information at t+1 after t is used as the model output. Therefore, we can formulate Output = f(Input), where f(.) represents the prediction function of the ship trajectory obtained using our model.

Methodology Design of Ship Trajectory Prediction
To solve the trajectory prediction of ships in an encounter situation, we add the relative position and navigation status information of the trajectory of the observed ship and the trajectories of the surrounding ships to our prediction framework. In this study, a sequence model is used to determine the impact of the relative positional changes of the observed ship and the ship sailing on the future navigation trajectory of the observed ship. The attention mechanism is used to dynamically adjust the weight of the sequence information to help the model focus on the important position change information, to dynamically adjust the prediction in the sequence prediction process.
This study proposes a new trajectory prediction structure that uses AIS data to train the model, as shown in Figure 2. The model is composed of three modules: Module 1 is an AIS data processing module, which can effectively improve the data quality and model execution efficiency. Module 2 is the trajectory prediction model developed in this study, which is a deep-learning prediction model with an encoder-decoder structure and an attention mechanism. The encoder structure is a multi-layer LSTM, and the decoder consists of an RNN, a multi-layer LSTM, and a self-attention mechanism, which is described in detail in Section 3.3. The training data from Module 1 are used as input to Module 2 to train the prediction model. Finally, Module 3 is a prediction and validation module. Its main function is to apply the test data of Module 1 to optimize the parameters and verify the prediction model of Module 2.

Design of the Encoder-Decoder Learning Model
The encoder of the proposed model is an LSTM neural network, which maps the input influence onto the sequential context representation. Based on the attention mechanism, the hidden state sequence encoder is combined with the information representation of the context. The decoder of the proposed model is a feature fusion layer that extracts the potential relationships of future ship trajectory state information from the historical and current state information. The weighted representation between the feature vectors of each trajectory is then input into the RNN, so that it can obtain the information representation of the correlation between features in each future prediction step. The RNN in the decoder has a multi-layer structure, to improve the learning ability of the internal sequence information representation. The overall structure is shown in Figure 3.

LSTM-Based Sequence to Sequence
The Seq2Seq model consists of an encoder and a decoder. The two units use a recursive neural network (RNN or LSTM) to encode the input as a vector representation and then use another sequential network to decode it. The main task of the encoder is to read the sequence and pass the discovered rules to the decoder. The decoder decodes the received rule information to generate an output sequence. Figure 4 shows the classic Seq2Seq model architecture. • x 1 to x t represent the input sequence characteristic information of the model; • h 1 to h t are the outputs of each circulating neural network cell; • y 1 and y 2 represent the label sequence of the model output; • The variable C between the encoder and decoder represents the sequence information representation obtained by passing the input feature sequence information through the encoder. The RNN is a feature extractor of global information from the sequence and can be used to process the sequence data. In the RNN, neurons can accept information from other neurons and also their own information, to form a loop structure, as shown in Figure 5a. Here, the gradient disappeared due to the long input sequence.  To solve the problem of the vanishing gradient, a gating mechanism for forgetting the previously accumulated information is required. An LSTM is a type of RNN based on a gating mechanism. Compared with the traditional RNN, an LSTM introduces a gating mechanism to control the speed of information accumulation. Through the forgetting gate, input gate, and output gate, it forgets the previous information and simultaneously adds new information, which effectively solves the loss of learning information caused by a gradient explosion or disappearance. The unit structure is shown in Figure 5b. The notation σ represents the sigmoid activation function, h t−1 is the output of the previous LSTM unit, x t indicates the state information of the input at the current time, and C t−1 is the internal state of the memory unit in the last moment. Each memory block has three gates to control the path of the information transmission. a.
Forget gate. h t−1 , C t−1 , and x t are used as inputs to calculate the amount of information f t (value is between 0 and 1) to be forgotten. b.
Input gate. The input information i t and candidate status C t can be obtained by the inputs h t−1 and x t with a sigmoid function and tanh function, respectively. To calculate i t · C t , we need to update the information and forgotten information C t−1 · f t , and then obtain a new state C t . The specific Equations are (2)-(4). c.
Output gate. h t−1 and x t are input to the sigmoid function to obtain the output information O t . The product of the output information and activated value of the current updated state is the information carried by the internal state at the current time h t as the output information at time t.
Equations (1)-(6) introduce the operation of the LSTM unit in detail. The output function of the LSTM unit can be expressed as follows: where the LSTMUnit(·) function represents the operation rules forget, input, and output in Equations (1)-(6); and θ represents the parameters in the LSTM unit.

Attention Mechanism
The attention mechanism module is used in the decoder of the model, to improve the information resource allocation of the model. This can enable the model to dynamically adjust the weights of serial information and allow it to focus on important positions to achieve dynamic adjustment of the weights during the prediction process. The structure of the attention mechanism is shown in Figure 6. Keys = values = h t (t ∈ {1, . . . , N}) are the outputs of all LSTM unit sequences in the encoder at all times, and query = h is the output of the LSTM layer in the decoder. First, the correlation between h and h t is calculated using the attention-scoring function s(·). The calculation formula of h is h = LSTMUnit(X t , h t , θ ).
The models commonly used as s(·) are additive models, such as point product models or scaling point product models.In this study, a scaling point product model is selected as the score function. It can make better use of a matrix product in the process of matrix oper-ation and can effectively solve the decrease in the softmax function gradient. D represents the dimension of the input vector.
The softmax function is used to map the output value of the score function between 0 and 1, and the attention distribution of h with respect to h t is obtained, α t , which indicates the degrees of input vector at t. Finally, α t is the weight, and h t is the weighted sum of the corresponding positions.

Feature Fusion Layer
To model the potential relationships of the ship trajectory information between the historical state and current state, we propose a feature fusion layer. Its structure is shown in Figure 7a. This layer consists of two parts: a multi-layer perceptron (MLP) and a multihead self-attention (MHSA) mechanism. The MLP is used for linear mapping of the input sequence information. The MHSA is used for calculating and selecting multiple information points from the input information in parallel (see Figure 7b). The original structure of the self-attention mechanism is shown in Figure 6, which is set as Keys = Values = Query. The MHSA can obtain the dependency information at the input stage, connect the input information, and extract the important features from the input data. Simultaneously, these features are spliced with the linear mapping information, which is extracted by the MLP from the input data. The calculation formulas are given as follows: where σ, W MH , and W m are the activation function in the full connection layer and weight parameters, respectively. The concat(·) function is used to connect multiple arrays without changing the existing array values. The subscript i of the attention function is the head number of the self-attention mechanism, where K = V = Q = X. Finally, X is the position and status information of the observed ship at time t, the previous k times, and the encounter ship.

Data Description
The AIS data used in this study were mainly collected from onboard AIS equipment in the Yangtze River delta region of China and the eastern coastal region of the United States. In this experiment, we collected a large amount of AIS data and selected six case studies of collision avoidance from the two regions. The research subject data from each region were collected on different dates. Table 1 provides detailed information about the collected data, and Figure 8 shows the specific location of each collision avoidance case.
In the experiment, the navigation trajectory sequence was sequentially sampled with a set sliding window length, where 60% and 40% of the samples were randomly divided into training and testing sets. The processing of experimental datasets for each situation was the same. The training set was used to train and determine the weights, deviation, and other parameters of the model. After training, the test set was used to evaluate the proposed model and other comparison models.

Criterion of Model Evaluation
The mean square error (MSE) was used as the loss function to quantify the difference between the predicted and real values. After completing the model construction, we used the MAE and average displacement error based on the Haversine distance (HADE) to evaluate the model. These are calculated as follows: where p is the total number of AIS data samples for training or testing,ŷ i is the estimated value of the ship trajectory longitude and latitude, Y i is the measured value of the navigation longitude and latitude of the ship, r is the Earth's radius,lat i andlon i represent the predicted latitude and longitude, and lat i and lon i represent the true latitude and longitude, respectively. In the experiment, the ship status information containing the first 10 time steps of the current time t was used as input to the model, and the geographic location information of the 10 time steps after the current time t was predicted as the output.

Model Parameter Setting
The Adam adaptive learning rate optimization algorithm [42] was used to update the network parameters of the model structure. In the LSTM layer part of the sequence information extraction, the number of LSTM layers was between one to three. The number of hidden units in each layer of the LSTM searching for the optimal value was taken from [32,320]. The serial batch size in the experiment was 512, the number of training simulations was 200, and the number of heads for the multi-head self-attention mechanism was 2. To prevent model overfitting, a dropout mechanism [43] and a regularization term were used in the training process. The optimization range and optimization interval granularity of each parameter and the parameter value ranges are shown in Table 2. After the comparison experiment with multiple sets of super parameter selection, using the encoder-decoder model based on the multi-module attention mechanism, the parameters of two different trajectory regions (Situation 1 and Situation 2) were chosen as shown in the last two columns of Table 2.

Introduction of Baseline Methods
In the comparisons, we employed four baseline methods, as follows: (1) The BPNN has the classic three layers: input layer, hidden layer, and output layer (see Figure 9a). In its network structure, the neurons are connected from the input layer to the output layer; (2) LSTM is a classic sequence prediction model. The structure is shown in Figure 9b, and the unit structure of each LSTM cell can be found in Figure 5b; (3) DANAE, Denoising automatic encoders (DAE) were proposed by Vincent et al. [44] and are used for prediction tasks, while DANAE is a deep denoising automatic encoder used for attitude estimation [45,46]; (4) EncDec-ATTN is a deep learning method used for ship trajectory prediction based on recurrent neural networks and was proposed by Capobianco et al. [28]. This method can learn spatiotemporal correlations from historical ship mobility data and predict future ship trajectories.

Analysis of Model Performance
MSE and MAE were used to evaluate the performance of the model, which was carried out using a Windows 10 system with a 2.90 GHz i5 central processor and 32 GB of memory. The model was coded using TensorFlow 2.4.0 in Python 3.8. The analyses of the prediction performance of our model and those of the other two models are shown in Table 3. Table 3 shows that our model had the lowest MSE, MAE, and HADE. In both regions, the parameter number of the LSTM layer was 3, the number of hidden units was 128, the learning rate was 0.002, and the regularization parameter was 0.001. For Head-on Situation 1 from the Yangtze River, the MSE and MAE of the latitude predicted by our model were 2.8857 × 10 −5 and 0.0042, respectively, and the MSE and MAE of the longitude were 2.5220 × 10 −5 and 0.0041, respectively. For Situation 2 from the eastern coastal area, the MSE and MAE of the latitude predicted by our model were 3.8907 × 10 −5 and 0.0044, respectively, and the MSE and MAE of the longitude were 3.0541 × 10 −5 and 0.0042, respectively. Comparing with the experimental results of the other models, our model and EncDec-ATTN consistently outperformed the classic network models not using attention mechanisms. Moreover, the evaluation scores of our model decreased by 27.5% and 21.4%, respectively, compared with EncDec-ATTN, which indicated a greater advantage for trajectory prediction in this experiment. HADE reflects the displacement error of the predicted position of a ship from its true position in real scenarios, as shown in Figure 10. In terms of the prediction performance of the ship trajectories, our model had the lowest MSE and MAE values. Meanwhile, the HADE was also lower than that of the other prediction models.

Discussion of the Prediction Results
To further evaluate the prediction ability, a comparison of the prediction results of the different prediction methods at different time steps is shown in Figures 11 and 12. The model prediction was evaluated according to the error between the predicted results and the actual longitude and latitude.
In Head-on Situation 1 from the Yangtze River, as shown in Figure 11a,c, where two ships meet and avoid each other, the first ship is sailing along the planned route, and the course is relatively stable. We observed that all models could predict accurately under relatively simple sailing conditions. As shown in Figure 11b, when the first ship began to change course, the quality of the trajectory prediction generated by our model was much lower than that of the other models.  In Head-on Situation 2 from the eastern coastal region, as shown in Figure 12a-d, during the initial sailing phase with the encounter ship, the first ship encountered the other ship and changed the sailing course. At this time, the prediction results of BPNN, LSTM, and DANAE have large errors. One benefit of the attention mechanism is that our model can predict ship trajectories with less variance from the actual ship trajectories.
In previous studies [28,[38][39][40][41], only a single attention mechanism was used to express the historical trajectory and current state information of ships, mainly focusing on hidden information within sequences. However, the correlation of information between the spatiotemporal sequence features is often overlooked. Unlike in previous studies, the proposed model introduces both global attention and multi-head self-attention mechanisms. The global attention mechanism is used to combine the trajectory history and current state information to obtain hidden information between sequences. The multi-head self-attention mechanism can capture the spatiotemporal correlations between the sequence-feature data and extract the fusion features, to generate a new feature representation. The two extracted parts of this information are correlated, to predict the ship trajectory at the next time point.

Analysis of the Attention Mechanism with the Weight Score
Our model includes two attention modules: The global attention mechanism recognizes the information interactions in the input sequence, and the multi-head self-attention mechanism extracts the effective feature information of the output target in the input sequence information. To explore the impact of the attention mechanism on model performance, we compared the performance of the attention mechanism module with and without the attention mechanism, as shown in Table 4.  Table 4 and Figure 13 show the comparison results of the models under different attention mechanisms. The Seq2Seq model showed a good prediction accuracy with the attention mechanism. Focusing on the prediction for Head-on Situation 1 from the Yangtze River, the prediction results of the model with a global attention mechanism showed decreases in MAE of 0.0014 and 0.0010 in the longitude and latitude predictions, and a decrease in HADE of 15.12%. The MAE of longitude and latitude predicted by the MHSA mechanism decreased by 0.0024 and 0.0019, respectively, and the HADE decreased by 28%. The results of comparing the MSE, MAE, and HADE showed that our model outperformed the other two models with the attention mechanism.
To explain the internal working of the neural network, we obtained the importance weight vector of the input sequence at each position in the prediction sequence and explored the influence of the attention mechanism on the proposed model. Inspired by Lee et al. [47] for the interpretability of the attention mechanism, we visualized the output of two selfattention heads in the prediction model, and the visualization of the output attention weight is shown in Figure 14.
We visualized the weight values calculated by the attention mechanism of the Head-on Situation 2 in the model of different navigation state stages and explained the importance of the network to specific trajectory characteristics. The first column in Figure 14 shows the input, target, and prediction sequences of the different models; the second column shows the visualization of the global attention mechanism weight score of our model, and the last two columns show the visualization of the MHSA weight score of the feature fusion module in our model. In the thermodynamic diagram, the line represents the output sequence, and the list shows the weight distribution of the input sequence. Thus, it can be determined that positions in the history mode are considered more important when generating the predicted trajectory for the global attention mechanism (second column). Over time (from left to right), the model can influence the characteristics of the input sequence when generating the output sequence. At different stages of the predicted navigation status, the positions considered by the output series are different, as shown in Figure 14b,f, which pay more attention to the middle and tail segments of the input series, respectively. Figure 14j shows a marginal difference in the weight values of the entire series. For the MHSA mechanism (the third and fourth columns), it can be observed from the figures that the different information focused on by the attention head was inconsistent, as shown in Figure 14k. The input X 1 sequence had a greater correlation with X 1 , X 7 , and X 8 , while X 3 had a greater correlation with X 4 and X 9 . The MHSA mechanism calculates the correlation representation between AIS information features and generates a better information feature code for the current input sequence by making full use of the position state information in the sequence. This allows the model to focus on the information of different positions in the input sequence, and it can also alleviate overfitting by integrating different attention heads, to improve the accuracy and robustness of the overall model.

Analysis of Model Validation
For both the Yangtze River delta and eastern coast, we selected two encounter scenarios to verify the effectiveness of the trajectory prediction. Test samples 1 and 2 for the Yangtze River delta contained encounter scenarios of MMSI (414386000, 312958000) on 28 June 2019 and MMSI (413556520, 413585000) on 2 January 2022. Test samples 1 and 2 for the eastern coast contained encounter scenarios of MMSI (372821000, 31100037) on 31 December 2021 and MMSI (316001635, 316044371) on 2 January 2022. The experimental results and comparative analysis are shown in Table 5.
From Table 5, it can be seen that our model achieved the best results for the various evaluation indicators, with the highest accuracy and a good predictive performance. In addition, statistical methods were used to analyze the results, as shown in Figure 15. In terms of the prediction performance of the ship trajectories, our model included an attention module that more effectively extracted important feature information from the trajectory sequences than BPNN, LSTM, Seq2Seq, and DANAE. Compared with the models containing a single attention mechanism, such as Seq2Seq-ATTN, Seq2Seq-MHSA, and EncDec-ATTN, our model effectively extracted correlations between the sequences and features, accounting for multiple attention structures. The experimental results also showed that our model had a good trajectory prediction performance under encounter situations.   Figure (a-e) and (f-j) represent the Yangtze River and eastern coastal area, respectively.

Conclusions
To predict the future trajectory of a ship in the case of encounter situations, a highprecision trajectory prediction model based on AIS navigation-history data was proposed. This method uses an LSTM neural-network model to encode and decode trajectory information from sequences. The framework of the proposed model uses the relative navigation state information of the encounter and observation ships as part of the input state information characteristics, to predict the observation ship trajectory. Compared with classical models, the proposed model has a stronger generalizability and better performance. Ex-periments showed that the attention-based model could effectively capture the potential information and correlations in a ship position sequence, so that the proposed model had a better prediction ability for the curve trajectory segment. This significantly improved on the performance of the existing models, which have strong advantages in trajectory prediction. It provides an effective safeguard for ship intelligent navigation systems, by providing real-time trajectory prediction and developing safe and efficient decision support.
In future work, we plan to consider more influencing factors around ships, in the case of multiple ship collisions, and provide further decision support for our model, for more cases of ship collision avoidance.