An Improved Ship Trajectory Prediction Based on AIS Data Using MHA-BiGRU

: According to the statistics of water transportation accidents, collision accidents are on the rise as the shipping industry has expanded by leaps and bounds, and the water transportation environment has become more complex, which can result in grave consequences, such as casualties, environmental destruction, and even massive financial losses. In view of this situation, high-precision and real-time ship trajectory prediction based on AIS data can serve as a crucial foundation for vessel traffic services and ship navigation to prevent collision accidents. Thus, this paper proposes a high-precision ship track prediction model based on a combination of a multi-head attention mechanism and bidirectional gate recurrent unit (MHA-BiGRU) to fully exploit the valuable information contained in massive AIS data and address the insufficiencies in existing trajectory prediction methods. The primary advantages of this model are that it allows for the retention of long-term ship track sequence information, filters and modifies ship track historical data for enhanced time series prediction, and models the potential association between historical and future ship trajectory status information with the current state via the bidirectional gate recurrent unit. Significantly, the introduction of a multi-head attention mechanism calculates the correlation between the characteristics of AIS data, actively learns cross-time synchronization between the hidden layers of ship track sequences, and assigns different weights to the result based on the input criterion, thereby enhancing the accuracy of forecasts. The comparative experimental results also verify that MHA-BiGRU outperforms the other ship track prediction models, demonstrating that it possesses the characteristics of ease of implementation, high precision, and high reliability.


Introduction
In recent decades, as the shipping industry of China has grown by leaps and bounds, water transportation has been confronted with issues, such as an increase in ship traffic density, the frequency of water traffic accidents, and the increasing difficulty of maritime safety supervision, all of which pose obstacles to the sustainable growth of the shipping industry [1,2].AIS data, whose primary information is spatiotemporal data consisting of ship location and time, provides ship trajectory data that can be used to analyze ship navigation behavior in real time, as well as provide critical supplementary information in the process of collision avoidance [3,4].The target ship trajectory can be predicted based on known historical location information using full analysis and deep mining of AIS ship Citation: Bao, K.; Bi, J.; Gao, M.; Sun, Y.; Zhang, X.; Zhang, W.An

Improved Ship Trajectory Prediction
Based on AIS Data Using MHA-BiGRU.J. Mar.Sci.Eng.2022, behavior data, which can provide a strong reference for the supervision of vessel traffic services (VTS), allowing for the timely detection and resolution of abnormal and nonstandard ship navigation problems [5,6].Therefore, real-time and accurate ship trajectory prediction can contribute significantly to ensuring water traffic safety and enhancing the efficacy of water traffic guarantee.Methods for ship trajectory prediction can be broadly categorized into two types: kinematic modeling-based approaches and neural network modeling-based approaches, respectively [7].
Methods based on kinematic modeling are widely used in the ship trajectory prediction industry, with the most common being the Gaussian process regression models (GP) and the Kalman filter (KF).With time as the independent variable, Anderson measured the trajectory as a one-dimensional Gaussian process.This method determines the posterior distribution of the projected value by extracting the joint prior density and covariance matrix of the observed value and the anticipated value, as well as models smooth trajectory estimation with the aid of dynamical systems [8].Rong et al. regarded the shipping local as a Gaussian distribution and used GP modeling to forecast the route of a ship [9].Jiang proposed constructing a polynomial Kalman filter to suit the nonlinear system based on the classic Kalman filter theory, compensating for the lack of track location data information and sluggish update, and predicting the ship's trajectory based on the longitude and latitude data [10].These aforementioned methods function effectively when the ship's navigation behavior state is somewhat steady.However, ship dynamics are typically sensitive to distinct environmental excitations in different areas, which may result in a nonstationary condition and render the prediction result less accurate in reality.
The widespread usage of neural networks has ushered in a new stage in ship trajectory prediction.Giulia et al. developed a radial basis neural network for the construction of a short-term vessel prediction [11].Zhou et al. built a track prediction model based on a three-layer back-propagation (BP) neural network, the training and prediction results of which match the standards of the VTS for accuracy, real time, and universality.However, due to the fact that the hidden units of this model are fewer in number, its expressive capacity is constrained [12].Liu et al. suggested a trace estimation method with support vector regression and used an enhanced differential evolution approach to optimize the parameters of this model [13].However, these solutions cannot effectively overcome the problem of long-term sequence dependency.
Due to AIS data being typically time series data, it is required to evaluate not only the present time step's ship trajectory but also the previously observed trajectory data in order to anticipate future ship trajectory.A recurrent neural network (RNN) can be regarded as a representative neural network capable of predicting future data using time series information, despite gradient-vanishing and gradient-explosion problems [14,15].To work out these gradient errors of RNN, long short-term memory (LSTM) introduces the memory unit and gate mechanism to replace the hidden layer unit in RNN [16].Additionally, then, Ger et al. optimized the LSTM by introducing a forget gate, which enables the LSTM to learn to reset itself [17].The gated recurrent unit (GRU) is an excellent variation on LSTM, in that it only requires an update and reset gate to regulate the information flow [18].Thus, due to their effectiveness in time series prediction, RNN and its variant models have been applied to the field of ship trajectory prediction in recent years.Ferrandis et al. established the LSTM method to predict the ship trajectory and solve the problem of the gradient vanishing and gradient explosion of RNN owing to rising data length [19].Agarap utilized the GRU method for time series prediction and proved this method has a good performance and is suitable for time series forecasting [20].The bidirectional recurrent neural network structure enables the output layer to receive complete past and future information for each point in the input sequence [21].Gao et al. and Siami-Namini et al. created a bidirectional structure to improve contextual relevance based on the RNN method, which improves the accuracy of the ship trajectory prediction compared to RNN alone [22,23].It is worth mentioning that Stateczny et al. proposed the optimum dataset method, which contributes to comparative navigation and provides a model for big data set processing [24].After the application of attention mechanism (AM) in the field of image recognition, Vaswani et al. used this mechanism to replace the recurrent neural network modeling, provided a model for machine translation, and then, it became prevalent in regression problems [25].Cheng et al. implemented AM in the area of ship trajectory prediction, with the attention modes enhancing the AIS data characteristics extracted by each block and the attention module classifying these characteristics [26].
However, although these deep-learning approaches based on AIS data performed reasonably well at predicting ship trajectory, there are still a few issues with insufficient accuracy and real-time enforcement.The primary reason for these issues is that the majority of existing approaches for mining AIS data are relatively isolated and overlook elements such as AIS data characteristics and ship track sequence information.Thus, a highprecision ship track prediction model based on a combination of multi-head attention mechanism and bidirectional gate recurrent unit (MHA-BiGRU) is developed to solve the issues mentioned above.The contribution of this model is briefly summarized below: Firstly, this model retains long-term ship track sequence information, filters and modifies ship track historical data for enhanced time series prediction, and models the potential association between historical and future ship trajectory status information with the current state, thereby increasing forecast accuracy.Secondly, an MHA mechanism based on BiGRU is introduced, which not only calculates the correlation between the characteristics of AIS information but also actively learns cross-time synchronization between the hidden layers of the output and input ship track sequences and assigns different weights to the result based on the input criterion, thereby improving the accuracy and robustness of the overall model.Finally, the comparative experimental results in this paper verify that MHA-BiGRU, which fully exploits the advantages of bidirectional RNN, multi-head attention mechanisms, and GRU, outperforms the other seven ship track prediction models, demonstrating that the MHA-BiGRU possesses the characteristics of ease of implementation, high precision, and high reliability.

Materials and Methods
Figure 1 depicts the framework of the proposed method, which consists of four components: data processing, MHA-BiGRU model proposal, MHA-BiGRU model training, and comparison experiments.Specifically, data processing, which includes ship trajectory extraction, missing value recognition and completion, and data cleansing, is a crucial step in deep learning, as the processed data enable an improved model performance.An easyto-implement method that is suitable for quick and concise analysis is proposed by combining the advantages of bidirectional RNN, multi-head attention mechanism, and GRU, which enables the improvement of prediction efficiency and accuracy of the ship trajectory.Additionally, then, the structure, application principle, training method, and contribution of the MHA-BiGRU are presented in a step-by-step manner.Finally, in order to demonstrate the effectiveness of the proposed method, some other prediction methods are compared in this paper.

AIS Data Processing
The AIS is a critical component of modern ship navigation systems, which is installed and widely available for ships to reinforce the capacity to mark the location and identify targets.There are two major issues with trajectory prediction using AIS data: time interval inconsistency and measurement error.The former issue is caused by a variety of circumstances, including variability in the broadcast frequency and packet losses.The latter issue occurs when the received AIS data value does not match the true value of the sensor at the moment of measurement, and the deviation can be rather considerable [7,27].These two issues may result in data loss, sparsity, and offset.Thus, processing data, such as ship trajectory extraction, missing value recognition and completion, and data cleaning, are vital stages in deep learning, as processed data enable model performance to be improved.
AIS data are multidimensional and multiparametric in nature and are used to characterize ship behavior, such as the direction, position, and speed of the ship, as they change over time [28].Each ship was classified based on its Maritime Mobile Service Identification (MMSI).After that, the ships were sorted according to their timestamps.To handle deficiency, deviation, and sparse AIS data from the original dataset, this section employs the following data processing techniques: ship trajectory extraction, deficiency value recognition, linear interpolation, and data cleaning.
The method for extracting the ship trajectory is based on time intervals and navigation speed.When the time interval between the ship trajectory points reaches 6 h, or the ship navigation speed reaches 0, the ship trajectory points are identified as tangent points to the trajectory sequence.Each track point contains information about the longitude and latitude positions, as well as its navigation speed and direction of ships.
Let the original data be kk tp are the two data points closest to the deficiency data, then the completed data can be shown as follows [29]:

Comparasion of GRU and LSTM
To work out the gradient-vanishing and gradient-explosion problems of RNN, LSTM introduces a memory unit and a gate mechanism to replace the hidden layer unit in RNN [15,16].The LSTM modifies the current state of the memory cell and determines the output content via the forget gate, input gate, and output gate, which correspond to the writing and reading of the ship track reading characteristic data sequence and the reset operation of the previous state, respectively, in this paper.GRU is a great variation of LSTM, in that it requires only an update and reset gate to govern the flow of information.As a result of its smaller parameters compared to LSTM, it is extremely easy to train and enables it to respond more effectively to the implications of this information on current time inputs [18,29].The comparison of the LSTM and GRU neural network structures can be seen in Figure 2 [17,18], and then, the following describes the concrete calculation process for these two models: The following section details the precise calculation procedure employed by LSTM.
The amount of memory cell information used at the previous moment is controlled by the forget gate ( t f ).
The input gate ( t i ) enables the control of the amount of information updated by the memory unit.Ct is a candidate vector produced by the tanh layer and will be added to the cell state.Additionally, then, it integrates the −1 t C with the t C to update the cell units.
The output gate ( t O ) controls the amount of information output to the next hidden state.The output value is passed to the status value ( t h ) of the next unit to complete the training procedure.
The description of the concrete calculation process of GRU is as follows.
The reset gate ( t r ) enables the determination of how to combine the new input infor- mation with the previous memory.Additionally, when it is turned off, GRU cells can effectively forget the previous calculation and return to the state in which they are reading the first input sequence, so as to achieve the purpose of the reset.
The update gate ( t z ) determines the activation status of GRU cells and the degree of update content.
The reset gate is applied to the −1 t h vector, and the obtained result is multiplied by t x to form a splicing vector with t x .The obtained result is transformed into a vector with elements between −1 and 1 through the tanh function, and the candidate hidden state value is obtained.Through the above steps, the final hidden layer output information can be obtained.
where [ ] represents the multiplication of two vectors,  means matrix multiplication, Θ shows that each element in the matrix is multiplied accordingly, W and b are the weight item of corresponding gates and bias items, respectively, σ is the sigmoid activation function.
Overall, as shown in Figure 2, GRU integrates t f and t i of the LSTM unit into t z , and it also integrates the hidden state and unit state of the LSTM with the t r , which can be used to control the extent of ignoring the states information of the previous time, so as to master the flow of vessel trajectory information.Based on this, GRU preserves the most critical data in order to avoid information loss during long-term propagation.Because the structure of GRU is simpler than that of LSTM, fewer parameters must be taught, and it also offers the benefit of quick training speed throughout the training process.

Application of Bidirectional RNN Structure
The bidirectional recurrent neural network structure enables the output layer to receive complete past and future information for each point in the input sequence.To be more precise, the forward RNN learns from previous data, while the reverse RNN learns from future data, so that each time step makes optimal use of upper-and lower-related data.Additionally, then, these two outputs are spliced together as the final output of the whole bidirectional RNN [21,30].
From this, BiGRU is a bidirectional RNN neural network that employs the GRU for each hidden node [31].BiGRU divides GRU neurons into forward and backward layers that correspond to positive and negative time directions, respectively.
As shown in Figure 3 [21,29], the current statement of the hidden layer of BiGRU is determined by current input t x , the hidden layer statement output of the forward layer In conclusion, BiGRU enables the modeling of the potential association between historical and future ship trajectory status information with the current state, hence increasing forecast accuracy.

Application of MHA Mechanism
The attention-based model originated in the field of image recognition and can now be used in place of RNN in the area of machine translation.By assigning a different weight to each factor in the input sequence, the attention-based model highlights the most significant influencing factors, thereby increasing the model's accuracy.It is expressed as follows [26]: Where i x represents the input sequence, It is mapped in the (0, 1) interval through the normalized exponential function, which is "weight".Additionally, dot product attention is the weighted combination of i x .
With the attention-based model mechanism being widely used in image and natural language processing tasks, the multi-head attention (MHA) mechanism emerges as the situation requires [32].An MHA is a combination of multiple self-attention structures.Using the query and kex'Iy, the MHA mechanism calculates the weight coefficient of the relevant value and then performs weighted summation.MHA works by performing a linear transformation on the query, key, and value and then inserting them into the zoom point to garner attention; this process is repeated a number of times.Additionally, each iteration's linear transformation parameters W for Q, K, and V are unique; they are not shared.Rather than using simple maximum or average pooling, MHA is used to process the data from the BiGRU output layer, as demonstrated by the following formula: Thus, the multi-head attention mechanism, which is a combination of multiple attention-based models, can be regarded as a weighting scheme for information, which can assign weights to the hidden layer of BiGRU, so that they can make more rational use of information sources when making predictions.

MHA-BiGRU Model
By combining the advantages of bidirectional RNN, multi-head attention mechanism, and GRU, the MHA-BiGRU model is proposed as an easy-to-implement method suitable for quickly and succinctly analyzing ship trajectory.This model improves the prediction efficiency and accuracy of ship trajectory.This section introduces the MHA-BiGRU model in a hierarchical fashion and demonstrates the benefits of this method.The MHA-BiGRU model retains long-term ship track sequence information, filters and modifies ship track historical data for enhanced time series prediction via GRU, and models the potential association between historical and future ship trajectory status information with the current state via the BiGRU structure, thereby increasing forecast accuracy.
Additionally, then, in order to resolve the common problems associated with RNN, which include AIS data being relatively isolated and overlooking elements such as AIS data characteristics and ship track sequence information, it is essential to implement an MHA mechanism based on the BiGRU structure.Firstly, this method allows for the calculation of the correlation between AIS information characteristics, such as time, latitude, longitude, speed, course, and heading, and the critical of the global impact.That is, a weighted representation is obtained by using attention sort and then put into a feedforward neural network to obtain a new representation that takes into account the correlation between various parameters.
Secondly, because the vector length is difficult to summarize with the complete track sequence information, and the information input after BiGRU will dilute the information of the previous vector to a certain extent, the accuracy of the fixed context vector response track data will gradually decrease.In addition, because the ship operation in the application scenario changes dynamically with time, to address the aforementioned issue, the MHA mechanism can actively learn the degree of cross-time synchronization between the hidden layers of the output and input sequences and assign different weights to the result based on the input criterion, thereby improving the accuracy and robustness of the overall model.

Experimental Dataset
The following comparative experiments were performed using Python 3.7, Keras 2.1.4,and Pytorch 4.0 in a software environment.The original AIS data for August were collected in the coastal waters near the port entrance of Lianyungang, China.In order to avoid overfitting, the original AIS dataset characteristics of MMSI, Time, Latitude, Longitude, Speed, Course, and Heading were selected for deep learning.The experimental dataset was then created by completing data processing, including data extraction, missing value recognition and completion, data cleaning, and characteristics extraction from the original AIS dataset, which can be divided into the training set, the validation set, and the test set.Among them, the training set was utilized to train and determine the model's weight, bias, and other parameters.The validation set was employed to validate the model's performance and enhance its generalizability by adjusting the hyper parameters.After training, the test set was used to evaluate the final model [33].The first 80% of this experimental dataset served as the training set, while the other 20% served as the test set.In order to accelerate the convergence of the model and enhance its precision, the maximum-minimum normalization was applied, so that all of the values are concentrated within the interval (0, 1).
Figure 5 depicts the application of the sliding window method for data training in this experiment [34].Through the sliding window, the final ship trajectory data for the slider enable forecasting to be performed via sliding each unit until all of the data in the training set are traversed, thereby completing a training epoch.

Hyper Parameters Setting
Parameter design plays a crucial role for recurrent neural networks.In this experiment, the Adam optimizer was chosen, which combines the benefits of the gradient descent algorithm with adaptive learning rate and the momentum gradient descent algorithm to not only adapt to sparse gradients but also mitigate the issue of gradient oscillation.The initial network learning rate was set to 0.002, and each training cycle's learning rate was reduced adaptively [35].The model performs better if the learning rate is gradually reduced.The training epoch ends when the loss value approaches 0, and the maximum training number is reached.Then, the MHA-BiRU was used to compare and validate the fundamental parameters, the number of units in the hidden layer and batch size in the network, in order to determine the optimal parameter combination.The selectable range of the former was {16, 32, 64, 128}, and the latter was {16, 32, 64, 128}.In particular, when the value of units in the hidden layer is large, the complexity of the model increases, and it is prone to overfitting, but when the value is small, the nonlinear fitting ability may be weakened.Additionally, when batch sizes are either oversized or undersized, the number of errors generated is excessive [36].After several groups of hyper parameter selection comparative experiments based on MHA-BiGRU, the parameters selected in this experiment are as shown in Table 1.In order to effectively compare the prediction effect, the choice of other methods' parameters is consistent with the MHA-BiGRU.

Evaluation Index
This experiment selected mean squared error (MSE) as the loss function of the proposed model, allowing the overall error degree to be quantified.The more robust the model, the smaller the loss function.MAE and RMSE served as the evaluation indices for each method.The smaller the number, the more closely the predicted value matches the actual value, and thus, the more accurate the prediction.These methods of computation are illustrated below.

Comparative Experiment Based on the Bidirectional RNN Structure
BiGRU, BiLSTM, GRU, and LSTM were used in the bidirectional RNN structure comparison experiment to compare prediction results.The model with the lower loss on new data has superior generalization performance and can alleviate the issue of overfitting.The prediction accuracy of these methods is illustrated below: BIGRU is superior to GRU, and BiLSTM is superior to LSTM, as shown in Figure 6.

Overall Comparative Experiments
As illustrated in  and Table 2, the prediction results of all eight methods on the test dataset perform well, and the ship trajectory prediction results are relatively accurate, demonstrating not only that the models were not fitted but also that all of these models can better deal with the problem of track prediction.The model with the lower loss value has the highest accuracy.As a result, the prediction accuracies of these algorithms are listed in the following order: MHA-BiGRU > MHA-GRU > MHA-BiLSTM > BiGRU > MHA-LSTM > GRU > BiLSTM > LSTM.These results show that GRU can outperform LSTM in this comparison scenario, both in terms of efficiency and accuracy, regardless of whether the model is combined with a two-way structure, the MHA mechanism, or neither.In addition, MHA-GRU is superior to BiGRU, and MHA-LSTM is superior to BiLSTM, which may indicate that the MHA mechanism contributes more to the model's accuracy and robustness than the bidirectional structure.
Finally, and most importantly, the MHA-BiGRU model has the lowest loss, RMSE, and MAE values, indicating that its applicability, accuracy, and validity are superior to those of other comparison experiments.Additionally, as shown in Table 2 and Figures

Discussion
By gradually demonstrating the benefits of bidirectional RNN, multi-head attention mechanism, and GRU, the comparative experiment results demonstrated that MHA-BiGRU outperforms other models in terms of effectiveness and accuracy of ship trajectory prediction.

The Contribution of the MHA-BiGRU Model
The LSTM and GRU, excellent variants of RNN, have a gate structure that not only preserves long-term sequence information but also filters and modifies ship track historical data for enhanced time series prediction.Additionally, in comparison with LSTM, the prediction task with GRU can be accomplished with fewer model parameters, but it can perform similarly to LSTM [17][18][19].This experiment finds that GRU can outperform LSTM in this comparison scenario, both in terms of efficiency and accuracy, regardless of whether the model is combined with a two-way structure, the MHA mechanism, or neither.Although GRU outperforms LSTM in this experiment, there is no final conclusion on which is better or worse and which must be chosen based on specific tasks and datasets.
Gao et al. and Siami-Namini et al. proved that the use of a bidirectional structure to improve contextual relevance based on the RNN method improves the accuracy of ship trajectory prediction compared to RNN alone [23,24].Whether combined with LSTM or GRU, this experiment further demonstrated that the bidirectional structure can improve the accuracy of ship trajectory prediction.As a result, this finding thoroughly demonstrates that the bidirectional RNN structure can simulate the prospective relationship between past ship trajectory status information and future ship trajectory status information with current state in order to increase prediction accuracy.
The MHA mechanism is frequently employed in image recognition and automatic translation.It was combined with a recurrent neural network in this experiment, from which significant conclusions are drawn.The most important results of the comparative experiments demonstrate the advantage of the MHA mechanism in combination with RNN and BiRNN.Additionally, when compared to bidirectional structures, the MHA mechanism contributes significantly more to the model's accuracy and robustness.Thus, the MHA mechanism not only calculates the correlation between the characteristics of AIS information but also actively learns cross-time synchronization between the hidden layers of the output and input sequences, and it assigns different weights to the result based on the input criterion, thereby improving the overall model's accuracy and robustness.
Overall, the most crucial advantage of MHA-BiGRU is that it enables the preservation of long-term sequence information, filters and modifies ship track historical data for improved time series prediction, models the potential relationship between historical and future ship trajectory status information and the current state via a bidirectional structure, and highlights critical ship trajectory prediction information in AIS characteristics and time series dimension via an MHA mechanism.

The Limitations and Future Development
Experiments indicate that the MHA-BiGRU model has high prediction accuracy under normal navigation conditions, as well as good applicability and track prediction reliability.However, as the navigational status of each ship changes over time, the navigational status of other ships will have varying effects on the future course of the ship in inquiry.Additionally, the bad weather will impact the ship's navigation, leading to an abnormal ship trajectory.Moreover, in addition to using AIS data for ship trajectory prediction, it can also be supplemented with other system data, such as the radar system, to further increase the model's accuracy.Thus, in order to further investigate whether the model can correct and avoid ship collisions under abnormal conditions, it is necessary to combine other ship spatial information and bad weather information to verify the model's performance under abnormal circumstances.

Conclusions
To improve the performance of vessel track prediction compared with some existing approaches, a high-precision method, which combines the advantages of bidirectional RNN, multi-head attention mechanism, and GRU is proposed in this paper.Through comparison experiments, the following conclusions can be drawn: 1. GRU is a great variation of LSTM, in that it requires only an update and reset gate to govern the flow of information.As a result of its smaller parameters compared to LSTM, it is extremely easy to train and outperforms in terms of efficiency and accuracy in this experiment.2. Bidirectional RNN structure enables the modeling of the potential association between previous and future ship trajectory status information with the current state, so as to increase prediction accuracy.3. The MHA mechanism not only calculates the correlation between AIS information characteristics but also actively learns cross-time synchronization between the hidden layers of the output and input sequences, and it assigns different weights to the result based on the input criterion, improving the accuracy and robustness of the overall model.4. In general, all evaluation indicators show that the prediction accuracy of MHA-BiGRU is higher than that of other comparative experiments, implying that the proposed model can effectively improve ship trajectory prediction performance.
In the future, it will be necessary to combine other ship spatial information and bad weather information to verify the model's performance under abnormal conditions in order to further investigate whether the model can correct and avoid ship collisions in extreme circumstances.

Figure 1 .
Figure 1.Flowchart of the ship trajectory prediction method.

( 1 )
Additionally, to address the ship trajectory deviation and sparse data, set i p as the current track point.If the distance between the current track point pi and its adjacent track point −1 i p , +1 i p is greater than the threshold, the adjacent track point −1 i p , +1 i p should be used as the observation point for linear fitting.When the track is too sparse and a significant amount of data are missing, the sparse ship trajectory is removed and no longer used.

Figure 2 .
Figure 2. Comparison of LSTM and GRU neural network structures.
BiGRU can be regarded as two single GRU, the hidden layer state of BiGRU at time t can be obtained by the weighted sum of −1 t h and −1 t h , which can be shown as follows:

−
Figure 4 vividly illustrates the structure of the proposed model.

Figure 5 .
Figure 5. Model training over time, where the blue block represents the input training data and the green one represents the prediction result for each batch of input.

Figure 6 .
Figure 6.Comparative experiment loss value based on the bidirectional RNN structure.3.4.2.Comparative Experiment Based on the MHA Mechanism Figure 7 vividly show comparative experiment results based on the MHA mechanism.The prediction accuracies of these algorithms are compared as follows: MHA-BiGRU > BiGRU, MHA-BiLSTM > BiLSTM, MHA-GRU > GRU, MHA-LSTM > LSTM.The results of these comparative experiments show that the combination of the MHA mechanism with both RNN and BiRNN is preferable to that without the MHA mechanism.

Figure 7 .
Figure 7. Comparative experiment loss value based on the MHA mechanism: (a) the combination of the MHA mechanism with RNN (b) the combination of the MHA mechanism with bidirectional RNN.
10, MHA-BiGRU comes closest to the original ship's path, which shows that this model's prediction is the most accurate in this comparison.

Figure 8 .
Figure 8. Loss value of overall comparative experiments.

Figure 9 .
Figure 9.The comparative prediction results: (a) the predicted longitude (b) the predicted latitude.

Table 1 .
The hyper parameters selection of comparative experiments.

Table 2 .
RMSE and MAE value comparison among all models.