A Ship Trajectory Prediction Model Based on Attention-BILSTM Optimized by the Whale Optimization Algorithm

: Nowadays, maritime transportation has become one of the most important ways of international trade. However, with the increase in ship transportation, the complex maritime environment has led to frequent trafﬁc accidents, causing huge economic losses and safety hazards. For ships in maritime transportation, collision avoidance and route planning can be achieved by predicting the ship’s trajectory, which can give crews warning to avoid dangers. How to predict the ship’s trajectory more accurately is of great signiﬁcance for risk avoidance. However, existing ship trajectory prediction models suffer from problems such as poor prediction accuracy, poor applicability, and difﬁcult hyperparameter design. To address these issues, this paper adopts the Bidirectional Long Short-Term Memory (BILSTM) model as the base model, as it considers contextual information of time-series data more comprehensively. Meanwhile, to improve the accuracy and ﬁtness of complex ship trajectories, this paper adds an attention mechanism to the BILSTM model to improve the weight of key information. In addition, to solve the problem of difﬁcult hyperparameter design, this paper optimizes the hyperparameters of the Attention-BILSTM network by fusing the Whale Optimization Algorithm (WOA). In this paper, the AIS data are ﬁltered, and the trajectory is complemented by the cubic spline interpolation method. Using the pre-processed AIS data, this WOA-Attention-BILSTM model is compared and assessed with traditional models. The results show that compared with other models, the WOA-Attention-BILSTM prediction model has high prediction accuracy, high applicability, and high stability, which provides an effective and feasible method for ship collision avoidance, maritime surveillance, and intelligent shipping.


Introduction
Maritime trade is the most important form of international trade, and according to the 2019 Global Maritime Transport Development Report published by the United Nations Conference on Trade and Development (UNCTAD), maritime trade carries more than 80% of the world's trade volume [1].The increasing demand for waterway transportation and the growing traffic pressure in complex areas such as coasts and harbors increase the risk of traffic accidents such as ship collisions [2][3][4].In 2021, there were 126 traffic accidents involving transport vessels in China, resulting in 150 deaths, 46 sunk vessels, and direct economic losses of RMB 226 million.The accident investigation reports show that more than 80% of marine accidents were caused by human factors, and 70% of these collisions were caused by the negligence of the lookout personnel [5].The Internet of Things (IoT) has played a significant role in the transportation domain [6].At present, ship collision avoidance mainly uses ARPA radar and Ship Automatic Identification System (AIS) and other equipment as auxiliary; the analysis of the navigational situation and the formulation of collision avoidance decisions depend on the experience of the crew [7].Due to crew fatigue and the lack of experience of junior crew members, relying only on a manual to make decisions often affects navigational safety.Therefore, the development of intelligent navigation of ships and unmanned ships is an inevitable trend.
Ship trajectory prediction is an important method for the intelligent perception of ships.In the process of navigation, ships inevitably interact with other ships in the surrounding area.In this increasingly complex navigation environment, ship trajectory prediction can provide information and reference for ship avoidance decisions, which is a key skill to improve ship perception and ensure ship safety.Therefore, improving the accuracy of ship trajectory prediction is of great significance for ship collision avoidance, intelligent navigation, and collision crisis warning [8][9][10][11][12][13][14][15].
A lot of research has been conducted by scholars, and the methods of modeling and predicting ship trajectories can be broadly divided into methods based on traditional models and methods based on neural network models.The ship trajectory prediction problem is a nonlinear problem, and for the nonlinear problem, a finite element scheme is a good tool [16].At the early stage of trajectory prediction research, most scholars used the prediction of ship trajectories based on the construction of physical motion models, including curve models [13], lateral models [17], and ship models [3].The physical model describes the ship operation model by analyzing the problem and establishing a series of mathematical formulas, taking into account the influence of various factors during ship navigation as much as possible.Ristic et al. [18] used the adaptive bandwidth kernel density estimation method to statistically analyze the historical motion patterns and then used the filter to estimate the ship motion state and achieved good prediction results.The Kalman filter method [19] is able to estimate the future position of a ship's trajectory based on a series of data in the presence of noise when the measurement variance is known, using the dynamic data of the predicted target.Perera et al. [20] improved the systematic noise variance in the Kalman filter so that it changes dynamically with the motion state of the ship to improve the prediction accuracy of the ship's trajectory.Although this type of prediction method can obtain ideal prediction results, it requires the target motion trajectory to satisfy a certain distribution and is more suitable for ideal state trajectory prediction research.
Many scholars choose neural networks as a method for ship trajectory prediction because of their powerful nonlinear fitting ability and parallel computing capability [21], and these prediction methods are becoming more and more popular in the field of ship navigation [15,22,23].Perera et al. [13] proposed an artificial neural network (ANN) for ship trajectory prediction combined with an extended Kalman filter for ship state prediction.Ma et al. [24] proposed a 4D trajectory prediction model based on BP neural network for the problem that traditional trajectory prediction methods cannot meet high accuracy, multidimensionality, and real-time.However, the BP neural network is weak in dealing with nonlinear problems, and the prediction results are more accurate only when the trajectory is short.Considering that BP neural networks cannot extract the contextual features in the data and cannot fully utilize the historical voyage information of ships, many scholars have turned their attention to recurrent neural networks.Recurrent neural network has memory, and its hidden layer neurons will remember the previous information and apply it to the current output calculation, which has a greater advantage for predicting time series data [25].Gao et al. [26] introduced physical assumptions to balance complexity and accuracy on the basis of LSTM based on the navigational trajectory characteristics of AIS data and verified the effectiveness of LSTM for the trajectory prediction problem.Park et al. [5] established BILSTM based on trajectory clustering and compared it with GRU and LSTM, and the experimental results concluded that BILSTM has the highest prediction accuracy.Wu et al. [27] proposed CNN-BILSTM to improve the prediction accuracy of the trajectory prediction model, which constructs feature vectors by CNN to better mine the relationship between data and constructs the feature vectors as time series data to input into the BILSTM network for ship trajectory prediction.
For the ship trajectory prediction problem, due to the large mass and inertia of ships, the current motion state of a ship is strongly correlated with its previous motion state.Therefore, effectively extracting context-related information about the ship's motion behavior is crucial to improving trajectory prediction accuracy.To enable LSTM units to retain longer-term context information, many researchers have adopted Attention mechanisms to enhance the model's focus on key information, thus preserving more useful information in the same LSTM units.Ma et al. [28] used an Attention-BILSTM to predict collision risk for ships and conducted experiments using real ship trajectory data, demonstrating that the proposed Attention-BILSTM outperforms traditional LSTM in terms of accuracy and stability.Xue et al. [29] proposed a goal-driven self-attentive recurrent network for trajectory prediction, conducted extensive experiments on publicly available datasets, and showed that our approach performs on par with state-of-the-art techniques while reducing model complexity.Jiang et al. [30] proposed a spatial attention mechanism based on LSTM to capture the spatial relationship between prediction targets and also designed a temporal attention mechanism to distinguish the impact of different historical time steps on future trajectory prediction.The proposed model was experimentally verified on public datasets and demonstrated its superior performance.
Neural network models have shown good performance in ship trajectory prediction.However, these models have multiple hyperparameters that need to be optimized, which is crucial for prediction performance [31].Traditional hyperparameter optimization methods usually require a large number of experiments, are inefficient, and are prone to get stuck in local optima [32].Therefore, in this study, we will use the Whale Optimization Algorithm (WOA) from the swarm intelligence optimization algorithm to find the best hyperparameter combination for LSTM in order to improve the accuracy and stability of trajectory prediction.The WOA algorithm [33] is a global optimization algorithm based on bionics, which is inspired by the feeding behavior of whales and simulates natural behavior to solve optimization problems.In recent years, the WOA algorithm has been widely used in the field of neural network optimization.Han et al. [34] proposed using WOA to optimize ship paths in complex marine environments and compared it with GWO and PSO algorithms.The results show that WOA has better performance and a very high ability to avoid local optima.Zhang et al. [35] used an improved WOA algorithm to optimize GRU and demonstrated the effectiveness of the proposed model on three datasets.Yang et al. [36] used an improved WOA (IWOA) to establish an IWOA-LSTM for prediction and compared it with 11 models, demonstrating the effectiveness of the proposed model.
In summary, most of the existing ship trajectory prediction models rely on manual experience to set parameters, which can easily fall into local optima.Meanwhile, there are limitations in the long-term information extraction capability of traditional recurrent neural networks that need to be optimized.
In this paper, we propose a WOA-Attention-BILSTM model for ship trajectory prediction.Compared to existing methods, our model can selectively utilize valuable features in historical trajectories, allowing the LSTM unit to retain more contextual information from past trajectories.At the same time, it has excellent parameter design ability, avoiding the disadvantage of human-set parameters easily falling into local optima.The main contribution of this paper can be summarized as follows:

•
In this paper, we employ a bidirectional LSTM structure to enhance the correlation between motion states at different time points in ship trajectory prediction tasks.

•
As the impact of the ship's motion behavior on future trajectories varies during the sailing process, an attention mechanism is introduced into the model to better capture important information in trajectory sequences and preserve more historical context information.

•
In order to reduce the influence of manually setting parameters on model accuracy and avoid falling into local optima, we utilize the powerful optimization ability of WOA to optimize the model's hyperparameters.

•
The designed model aims to improve the ship's safety awareness and perceive collision risks in advance, leaving enough time for the driver to make decisions and reducing the occurrence of collision accidents.Through experiments on a real AIS dataset near the coast, the proposed model is shown to have high accuracy and strong parameter design capabilities, effectively providing information support for intelligent ship navigation.
The paper is organized as follows: The first part introduces the background, significance, and literature review of the research.The second part describes the trajectory prediction problem and introduces the models and methods involved in this paper.The third part establishes the WOA-Attention-BILSTM model and introduces its process and principle of it.The fourth part presents the analysis of experiments and results, which begins by detailing the processing method used for the AIS dataset, followed by a comparison of WOA-Attention-BILSTM with Attention-BILSTM, BILSTM, LSTM, BP, and CNN-LSTM.The part also includes visualizations of the prediction trajectory and distance error for a more comprehensive performance analysis of the model.Finally, the fifth part concludes the paper and suggests future directions for improvement.

Ship Trajectory Pattern Definition
The ship trajectory prediction problem can be abstracted as a ship time series prediction problem.After cleaning and resampling a large amount of AIS data, the ship trajectory data consist of a series of discrete data points in a given time range, as shown in Figure 1.

•
The designed model aims to improve the ship's safety awareness and perce sion risks in advance, leaving enough time for the driver to make decision ducing the occurrence of collision accidents.Through experiments on a rea taset near the coast, the proposed model is shown to have high accuracy an parameter design capabilities, effectively providing information support fo gent ship navigation.
The paper is organized as follows: The first part introduces the background cance, and literature review of the research.The second part describes the trajec diction problem and introduces the models and methods involved in this paper.T part establishes the WOA-Attention-BILSTM model and introduces its process a ciple of it.The fourth part presents the analysis of experiments and results, whic by detailing the processing method used for the AIS dataset, followed by a comp WOA-Attention-BILSTM with Attention-BILSTM, BILSTM, LSTM, BP, and CNN The part also includes visualizations of the prediction trajectory and distance er more comprehensive performance analysis of the model.Finally, the fifth part co the paper and suggests future directions for improvement.

Ship Trajectory Pattern Definition
The ship trajectory prediction problem can be abstracted as a ship time serie tion problem.After cleaning and resampling a large amount of AIS data, the sh tory data consist of a series of discrete data points in a given time range, as s Figure 1.In our setup, T is a time-ordered observation sequence, the discrete point  i dimensional data point, which includes SOG (Speed Over Ground), COG (Cou Ground), LAT (latitude), and LON (longitude), and  i is the time scale correspo the four-dimensional discrete data points, as shown in Equation (1).

{ 𝑇: ((𝑝
where i is the index of the trajectory sequence, and n is the total number of traje quences.Thus, a dataset containing N trajectories can be expressed as   = {( In our setup, T is a time-ordered observation sequence, the discrete point p i is a fourdimensional data point, which includes SOG (Speed Over Ground), COG (Course Over Ground), LAT (latitude), and LON (longitude), and t i is the time scale corresponding to the four-dimensional discrete data points, as shown in Equation (1).
where i is the index of the trajectory sequence, and n is the total number of trajectory sequences.Thus, a dataset containing N trajectories can be expressed as , where p i is the four-dimensional feature vector defined in Equation (1) and t i is the observation timestamp of the corresponding AIS data.
The ship trajectory prediction problem can be described as inputting the actual observed s trajectories into the model, with the aim of letting the model learn the variation function of the ship trajectory and obtaining the k predicted trajectories output by the model, as shown in Equation (2).
The mapping relationship can be expressed in a mathematical formula such as Equation (3).
where x i,s indicates that at the i-th trajectory point, s observation sequences are input to the model for prediction, f s,k indicates that the model predicts k trajectories from the input s trajectories, and y i,k indicates that the model predicts k trajectories at the i-th trajectory point, and the data are input to the model for training through the sliding time window, as shown in Figure 2.
where  i is the four-dimensional feature vector defined in Equation ( 1) and  i is th servation timestamp of the corresponding AIS data.
The ship trajectory prediction problem can be described as inputting the actu served s trajectories into the model, with the aim of letting the model learn the var function of the ship trajectory and obtaining the k predicted trajectories output b model, as shown in Equation (2).
where  , indicates that at the i-th trajectory point, s observation sequences are inp the model for prediction,  , indicates that the model predicts k trajectories from t put s trajectories, and  , indicates that the model predicts k trajectories at the i-th t tory point, and the data are input to the model for training through the sliding time dow, as shown in Figure 2.

LSTM and BILSTM
Recurrent Neural Network (RNN) is a class of neural network with short memory capability, which has the characteristics of memorability and parameter sh and can make full use of the historical features of the data, so it has better results in de with time series problems.However, RNN also has its drawbacks: if the delay of the series is too long, RNN has the problem of gradient explosion and disappearance.Short-Term Memory (LSTM) Network is a special kind of RNN, which adds memory to selectively remember important information and eliminate noisy information, and three gating cells: forgetting gate, input gate, and output gate so that LSTM can control the global memory cells and "remember" information for a longer period of LSTM has demonstrated excellent performance in processing time series data, mak particularly suitable for handling AIS data with significant time patterns and larg umes of data.Through its ability to effectively extract potential change patterns in mation, LSTM is well-suited for achieving nonlinear prediction tasks such as ship t tory prediction.
The structure of the LSTM cell is shown in Figure 3.The data of the model a processed ship trajectory time series,   is the data at moment t, c is the cell state, h

LSTM and BILSTM
Recurrent Neural Network (RNN) is a class of neural network with short-term memory capability, which has the characteristics of memorability and parameter sharing and can make full use of the historical features of the data, so it has better results in dealing with time series problems.However, RNN also has its drawbacks: if the delay of the time series is too long, RNN has the problem of gradient explosion and disappearance.Long Short-Term Memory (LSTM) Network is a special kind of RNN, which adds memory cells to selectively remember important information and eliminate noisy information, and adds three gating cells: forgetting gate, input gate, and output gate so that LSTM can better control the global memory cells and "remember" information for a longer period of time.LSTM has demonstrated excellent performance in processing time series data, making it particularly suitable for handling AIS data with significant time patterns and large volumes of data.Through its ability to effectively extract potential change patterns in information, LSTM is well-suited for achieving nonlinear prediction tasks such as ship trajectory prediction.
The structure of the LSTM cell is shown in Figure 3.The data of the model are the processed ship trajectory time series, x t is the data at moment t, c is the cell state, h is the hidden state, c t−1 and h t−1 are the long-term memory value and short-term memory value at moment t − 1, respectively, and the output value is the memory value c t−1 and the output value h t−1 at moment t.The forgetting gate f t in the unit performs selective memory of information by taking the information of the memory unit c t−1 at the previous moment, f t as in Equation ( 4).
where σ is the sigmoid activation function; b f is the bias peak; and w is the matrix of weights between the layer and each gate.The input gate i t determines the important information to be remembered, and the variables are controlled between [0, 1] by the sigmoid activation function and the output i t , as in Equation ( 5).
The state change of the cell c t is determined by the output of the forgetting gate and the input gate, which is given by Equation ( 6).
where tanh denotes the hyperbolic tangent activation function.The output gate o t performs the transfer of cell state values with the formula as in Equation ( 7).
Appl.Sci.2023, 13, x FOR PEER REVIEW 6 of 21 hidden state,  −1 and ℎ −1 are the long-term memory value and short-term memory value at moment t − 1, respectively, and the output value is the memory value  −1 and the output value ℎ −1 at moment t.The forgetting gate   in the unit performs selective memory of information by taking the information of the memory unit  −1 at the previous moment,   as in Equation ( 4).
where  is the sigmoid activation function;   is the bias peak; and w is the matrix of weights between the layer and each gate.The input gate   determines the important information to be remembered, and the variables are controlled between [0, 1] by the sigmoid activation function and the output   , as in Equation (5).
The state change of the cell   is determined by the output of the forgetting gate and the input gate, which is given by Equation (6).
where tanh denotes the hyperbolic tangent activation function.The output gate   performs the transfer of cell state values with the formula as in Equation (7).The LSTM has better performance in predicting nonlinear time series data, but it can only learn the previous information, while for the training task of ship trajectory, the ship's motion trajectory is vainly coherent, and the ship's position at the next moment is influenced by the combination of the data of the previous moment and the data of the following moment.BILSTM structurally combines the LSTM network in two directions, and through the bidirectional structure, the output layer of the network can effectively use the state information of the previous cell and the state information of the latter cell to fully utilize the data for prediction.Its model structure is shown in Figure 4.The LSTM has better performance in predicting nonlinear time series data, but it can only learn the previous information, while for the training task of ship trajectory, the ship's motion trajectory is vainly coherent, and the ship's position at the next moment is influenced by the combination of the data of the previous moment and the data of the following moment.BILSTM structurally combines the LSTM network in two directions, and through the bidirectional structure, the output layer of the network can effectively use the state information of the previous cell and the state information of the latter cell to fully utilize the data for prediction.Its model structure is shown in Figure 4.The hyperparameters of neural networks are often judged by manual experience; thus the selection of hyperparameters in the experiment is random, and the unreasonable selection of hyperparameters will affect the accuracy of ship trajectory prediction.Therefore, this paper selects a heuristic algorithm to optimize the network.

Whale Optimization Algorithm
In this paper, the Whale Optimization Algorithm (WOA) is used to optimize the training hyperparameters, the number of hidden neurons, and the learning rate of Attention-BILSTM.The WOA [33,37] simulates one of the four predatory behaviors of humpback whales-bubble net predation.When a humpback whale feeds on a school of fish, one or more whales can simultaneously send out bubbles to encircle the fish school like a net and then continue to narrow the bubble net, eventually trapping the school of fish in a small area, flushing it from the bottom up, and swallowing it in one gulp.It has the advantages of strong search capability, fast convergence, and few adjustment parameters and is widely used to solve various optimization problems.The WOA consists of three behaviors: "Surround the prey", "Spiral bubble type attack on prey", and "Search for prey".
1. Surround the prey: After the prey is found, the whales first approach and surround the prey.The fitness value of the whale group is calculated according to the target function, and the position of the whale with the highest fitness value is considered as the position of the target prey, and the other whales move their positions according to the position of the target prey.The target function is the function to be optimized, which can be represented by a loss function.In this stage, the position of the whale is updated according to Equations ( 8) and ( 9), where  ⃗ is the position vector of the whale,  ⃗ * is the position vector of the whale with the highest current fitness, and  ⃗ ⃗⃗ is the distance between the whale and the prey. ⃗ and  ⃗ are coefficients specified in Equations ( 10) and (11), where  1 ⃗⃗⃗⃗ and  2 ⃗⃗⃗⃗ are random vectors in the interval [0, 1]. ⃗ is a vector decreasing from 2 to 0, as in Equation (12), where   is the maximum number of iterations: The hyperparameters of neural networks are often judged by manual experience; thus the selection of hyperparameters in the experiment is random, and the unreasonable selection of hyperparameters will affect the accuracy of ship trajectory prediction.Therefore, this paper selects a heuristic algorithm to optimize the network.

Whale Optimization Algorithm
In this paper, the Whale Optimization Algorithm (WOA) is used to optimize the training hyperparameters, the number of hidden neurons, and the learning rate of Attention-BILSTM.The WOA [33,37] simulates one of the four predatory behaviors of humpback whales-bubble net predation.When a humpback whale feeds on a school of fish, one or more whales can simultaneously send out bubbles to encircle the fish school like a net and then continue to narrow the bubble net, eventually trapping the school of fish in a small area, flushing it from the bottom up, and swallowing it in one gulp.It has the advantages of strong search capability, fast convergence, and few adjustment parameters and is widely used to solve various optimization problems.The WOA consists of three behaviors: "Surround the prey", "Spiral bubble type attack on prey", and "Search for prey".

1.
Surround the prey: After the prey is found, the whales first approach and surround the prey.The fitness value of the whale group is calculated according to the target function, and the position of the whale with the highest fitness value is considered as the position of the target prey, and the other whales move their positions according to the position of the target prey.The target function is the function to be optimized, which can be represented by a loss function.In this stage, the position of the whale is updated according to Equations ( 8) and ( 9), where → X is the position vector of the whale, is the position vector of the whale with the highest current fitness, and is the distance between the whale and the prey.→ a is a vector decreasing from 2 to 0, as in Equation (12), where T max is the maximum number of iterations: Spiral bubble type attack on prey: During this stage, the whales continue the behavior of the encircling stage.The whales swim according to the position of the whales with the highest fitness in the group, continuously encircling and contracting.During this process, the coefficient vector → A decreases with the decreasing vector → a .In addition, the whales swim in a spiral pattern and blow bubbles, at which point the whales spiral to approach their prey based on the distance between the individual and the whale with the best fitness in the group.The spiral swimming pattern can be expressed by Equations ( 13) and ( 14), where → D is the distance vector from the whale's current position to the prey, b is a constant that determines the degree of spiraling, and l is a random number that takes values in the range [−1, 1]: In this stage, the encircling contraction and spiral swimming cannot occur simultaneously, so it is assumed that the two behaviors are executed with equal probability, and the integrated position updating process of both is described as Equation (15), where p is a random variable in the interval [0, 1], which is used to ensure that the two behaviors are carried out with equal probability.
Search for prey: In addition to the encircling contraction of whales according to the optimal whale position in the population, there is also a random swimming process of individuals and the whales in this phase move towards the position of random individuals in the population.This stage is the global search process of the algorithm.The behavior of this stage can be expressed by Equations ( 16) and (17), where → X rand denotes the current position of some random whale.

The random vector
According to the principle of the above three stages, the flowchart of WOA is shown in Figure 5.

Attention Mechanism
The attention mechanism is inspired by the way humans selectively focus their attention on specific moments and regions amidst large amounts of information in order to emphasize key details while disregarding irrelevant information.LSTM units have limitations in retaining historical trajectory information, particularly when faced with long sequences.While the attention mechanism can effectively mitigate this issue by reducing the influence of useless information, thereby enabling better retention of all relevant trajectory data.Figure 6 provides a visual diagram illustrating the attentional mechanism.The Attention mechanism processes the data with Equations ( 18)- (20).
where ℎ represents the hidden state of the i-th LSTM unit prior to time ,  represents the hidden state of the LSTM unit at the current time ,  represents the attention weight assigned to the i-th LSTM unit,  represents the normalized weight assigned to the i-th LSTM unit,  represents the final context vector.The () in Equation ( 18) represents an attention function, such as tanh, Relu, or other functions.In this paper, we choose tanh instead of Relu as the attention function for the ship trajectory prediction problem because there are many negative values in the training data, the Relu function may cause many negative values to disappear during the process, and this may increase

Attention Mechanism
The attention mechanism is inspired by the way humans selectively focus their attention on specific moments and regions amidst large amounts of information in order to emphasize key details while disregarding irrelevant information.LSTM units have limitations in retaining historical trajectory information, particularly when faced with long sequences.While the attention mechanism can effectively mitigate this issue by reducing the influence of useless information, thereby enabling better retention of all relevant trajectory data.Figure 6 provides a visual diagram illustrating the attentional mechanism.The Attention mechanism processes the data with Equations ( 18)-( 20).
where h i−1 represents the hidden state of the i-th LSTM unit prior to time j, s j represents the hidden state of the LSTM unit at the current time j, e ij represents the attention weight assigned to the i-th LSTM unit, α ij represents the normalized weight assigned to the i-th LSTM unit, c i represents the final context vector.The score() in Equation ( 18) represents an attention function, such as tanh, Relu, or other functions.In this paper, we choose tanh instead of Relu as the attention function for the ship trajectory prediction problem because there are many negative values in the training data, the Relu function may cause many negative values to disappear during the process, and this may increase the error in the predicted results.The function tanh can be utilized to model complex nonlinear relationships, and its output range is constrained within [−1, 1].This constraint helps to mitigate the issues of gradient explosion and vanishing gradient.This paper uses the attention mechanism for the BILSTM layer, where the data features in the network are assigned different weights according to their importance, with an emphasis on critical information such as latitude and longitude to amplify its contribution to the prediction process.

Construction of the WOA-Attention-BILSTM
Parameters of the Attention-BILSTM model, such as the number of neurons, time step, number of epochs, and learning rate, have a strong influence on the experimental results.For example, the time step is reflected in the model as the number of recurrent units of Attention-BILSTM.If the time step is set too small, it will lead to an insufficient representation of the information correlation.Setting the time step too large will result in fewer data sets and insufficient learning of the neural network.These parameters are often set based on manual experience and optimized in experiments, and if these parameters are set inappropriately, it will lead to poor model performance.Therefore, in the model architecture design, this paper uses the WOA to optimize the Attention-BILSTM, and the specific steps are as follows: 1. First, clean the AIS data, fill in the missing data using cubic spline interpolation, and then resample the data with a time interval of 60 s; 2. Normalize and divide data into training and test sets; 3. Determine the number of iterations, population size, and individual space dimension of the WOA; determine the learning rate, batch size, epoch, and number of hidden layer units as optimization objects; and determine the lower bound and upper bound for each object; 4. Randomly generate the population and obtain the parameters of the initial Attention-BILSTM; 5.The Mean Square Error between the target and predicted values of different populations is calculated and used as the fitness in training, and the parameter combination with the smallest fitness is selected as the optimal solution for the global optimal test.
To make the parameters obtained by WOA optimization more suitable for Attention-BILSTM, the Mean Square Error in this paper is the weighted Mean Square Error of the training and test sets, as shown in Equation ( 21); This paper uses the attention mechanism for the BILSTM layer, where the data features in the network are assigned different weights according to their importance, with an emphasis on critical information such as latitude and longitude to amplify its contribution to the prediction process.

Construction of the WOA-Attention-BILSTM
Parameters of the Attention-BILSTM model, such as the number of neurons, time step, number of epochs, and learning rate, have a strong influence on the experimental results.For example, the time step is reflected in the model as the number of recurrent units of Attention-BILSTM.If the time step is set too small, it will lead to an insufficient representation of the information correlation.Setting the time step too large will result in fewer data sets and insufficient learning of the neural network.These parameters are often set based on manual experience and optimized in experiments, and if these parameters are set inappropriately, it will lead to poor model performance.Therefore, in the model architecture design, this paper uses the WOA to optimize the Attention-BILSTM, and the specific steps are as follows: 1.
First, clean the AIS data, fill in the missing data using cubic spline interpolation, and then resample the data with a time interval of 60 s; 2.
Normalize and divide data into training and test sets; 3.
Determine the number of iterations, population size, and individual space dimension of the WOA; determine the learning rate, batch size, epoch, and number of hidden layer units as optimization objects; and determine the lower bound and upper bound for each object; 4.
Randomly generate the population and obtain the parameters of the initial Attention-BILSTM; 5.
The Mean Square Error between the target and predicted values of different populations is calculated and used as the fitness in training, and the parameter combination with the smallest fitness is selected as the optimal solution for the global optimal test.To make the parameters obtained by WOA optimization more suitable for Attention-BILSTM, the Mean Square Error in this paper is the weighted Mean Square Error of the training and test sets, as shown in Equation ( 21); where k is the weight value and takes the value range [0  Start iterations to obtain the Attention-BILSTM network parameters.Update the Attention-BILSTM using the parameters; 7.
Repeat steps 5-6 until the maximum number of iterations is reached; 8.
Output the optimal results, use the obtained optimal network hyperparameters to build Attention-BILSTM for ship trajectory prediction.
The architecture of the prediction model we built is shown in Figure 7.

Experimental Environment
We used Python 3.7 compilation language, tensorflow 2.3.0 to realize our training, and CPU Intel(R) Xeon(R) CPU @ 2.00GHz, GPU Tesla T4 to speed up the calculation.

Model Validation and Results Analysis 4.1. Experimental Environment
We used Python 3.7 compilation language, tensorflow 2.3.0 to realize our training, and CPU Intel(R) Xeon(R) CPU @ 2.00GHz, GPU Tesla T4 to speed up the calculation.

Evaluation Indicators
For the problem of ship trajectory prediction, the predicted results consist of latitude and longitude data that have actual physical significance.Therefore, this paper selected the Haversine distance as the metric for measuring the accuracy of the model's predictions.The Haversine distance provides an intuitive evaluation of the prediction results by calculating the distance between each predicted point and its corresponding target value.In this paper, a smaller Haversine distance is deemed to indicate a more accurate model prediction.The formula for calculating the Haversine distance is as Equation ( 22): where R is the radius of the Earth and , where φ 1 is the predicted longitude, and φ 2 is the target longitude, and λ 1 is the predicted latitude, and λ 2 is the target latitude.
Meanwhile, Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) are also calculated in this paper as the evaluation indicators of the ship trajectory prediction model.MAPE, RMSE, and MAE are calculated as Equations ( 23)-( 25): where y i is the target value of the data set, and ŷi is the predicted value of the model.MAE and RMSE show the difference between the predicted value and the target value, and the range of their values is [0, ∞), and the value is 0 when the predicted value matches the target value.MAPE also shows the degree of difference between the predicted value and the target value and is suitable for problems with a large difference in the magnitude of the target variable.

Data Processing
According to a study by Karahalios, approximately 49% of accidents take place in coastal areas, with 27% near the coast and 22% in narrow channels [38].Moreover, a study conducted in Japan revealed that approximately 90% of maritime accidents occur within a distance of 37 km (20 nautical miles) from the coast [39].Since this research can be applied to enhance maritime safety and ship regulations, the coastal data were selected for analysis in this paper.The U.S. Coast Guard (marinecadastre.gov)AIS data in January 2022 were selected as a prediction sample.The data structure is shown in Table 1, while the data ranges used in this paper are listed in Table 2. Due to the transmission characteristics of AIS equipment, data preprocessing is necessary to address issues such as missing trajectories, data duplication, and too short ship motion trajectories, which can negatively impact the training effectiveness of the model.In this study, we consider the discrete points in the trajectory as a four-dimensional vector consisting of LAT, LON, SOG, and COG.In addition to these features, other information in the AIS data is also used as a basis for data filtering during the preprocessing step.The specific steps of data cleaning are as follows: 1.
Remove unrealistic ship information (For example, ship speed over ground is too high, latitude and longitude are on land); 2.
Delete ships with records less than 200 characters in length or less than 4 h in duration; 4.
Remove ships with too small a mixed variance in latitude and longitude.
According to the International Maritime Organization (IMO), AIS data should be collected every six minutes when the ship is at rest and a minimum of two seconds and a maximum of three minutes while underway [40].However, due to physical factors, most of the received AIS data do not meet the standard and cannot be used as standard time-series data.This issue necessitates data interpolation, particularly for missing trajectories, which can significantly affect the accuracy of trajectory prediction.The cubic spline interpolation method has good convergence and smoothness [41][42][43].In this paper, the cubic spline interpolation method is selected to process the missing trajectory of the ship.The data are grouped based on the MMSI number of the ship for interpolation.Data with a time interval of more than 15 min are segmented and treated as two different trajectories.Two adjacent points before and after the missing data point t are selected, and their times t i , t i+1 are extracted, and the ship positions are (λ i , φ i ), (λ i+1 , φ i+1 ).For the ship longitude data at time t between t i , t i+1 can be calculated by Equation (26).
where h i = x i − x i−1 , and M i is the value of the second-order derivative at the node of the interpolation function λ(t i ).The restoration results are shown in Figure 8.For the interpolated data set, we performed resampling.Since the ship's speed will not be too fast, we set the time interval to 60 s, and the sampling method is set to average.This preserves the characteristics of the trajectory and reduces the computational effort.Finally, Lon, Lat, SOG, and COG are selected in the data set for prediction.Before inputting into the model, the data need to be normalized, and the normalization formula is shown in Equation ( 27) and scaled to the interval [0, 1] to ensure the consistency of the magnitudes of each field.

Prediction Results Analysis
The data used in our experiments are the tanker data in the dataset with a time interval of 60 s, which contains 61 tankers with a total of 725,069 records, of which 80% are classified as the training set and 20% as the test set.

WOA Hyperparameters Selection and Optimization Results
The initial WOA-Attention-BILSTM model built in this paper consists of an input layer, two BILSTM hidden layers, an attention layer, a fully connected layer, and an output layer, and the Adam optimizer is used to update the model parameters.The search For the interpolated data set, we performed resampling.Since the ship's speed will not be too fast, we set the time interval to 60 s, and the sampling method is set to average.This preserves the characteristics of the trajectory and reduces the computational effort.Finally, Lon, Lat, SOG, and COG are selected in the data set for prediction.Before inputting into the model, the data need to be normalized, and the normalization formula is shown in Equation ( 27) and scaled to the interval [0, 1] to ensure the consistency of the magnitudes of each field.

Prediction Results Analysis
The data used in our experiments are the tanker data in the dataset with a time interval of 60 s, which contains 61 tankers with a total of 725,069 records, of which 80% are classified as the training set and 20% as the test set.

WOA Hyperparameters Selection and Optimization Results
The initial WOA-Attention-BILSTM model built in this paper consists of an input layer, two BILSTM hidden layers, an attention layer, a fully connected layer, and an output layer, and the Adam optimizer is used to update the model parameters.The search iteration and population size settings usually affect the convergence speed and performance of the algorithm.If the number of search iterations is too small, the algorithm will be trapped in local optima, and if the number of search iterations is too large, there will be many useless iterations, which will result in an excessively long execution time.Therefore, the search iteration number in this paper is set to 50.We conducted a sensitivity analysis on the population size and selected four different population sizes, namely 10, 15, 20, and 30, to find the optimal population size.As shown in Table 3 and Figure 9, when the population size was set to 20, the convergence speed of WOA was faster, and it had the best performance evaluation index.Therefore, a population size of 20 was chosen in this paper.iteration and population size settings usually affect the convergence speed and performance of the algorithm.If the number of search iterations is too small, the algorithm will be trapped in local optima, and if the number of search iterations is too large, there will be many useless iterations, which will result in an excessively long execution time.Therefore, the search iteration number in this paper is set to 50.We conducted a sensitivity analysis on the population size and selected four different population sizes, namely 10, 15, 20, and 30, to find the optimal population size.As shown in Table 3 and Figure 9, when the population size was set to 20, the convergence speed of WOA was faster, and it had the best performance evaluation index.Therefore, a population size of 20 was chosen in this paper.WOA optimizes the learning rate, the number of hidden layer units, the number of fully connected layer units, the number of iterations, and the batch size of the neural network.As iterations are performed, the parameters of Attention-BILSTM are continuously adjusted until an optimal result is obtained.The range of parameter search is determined based on the complexity of the ship trajectory problem.The model's error tends to stabilize after about 40 epochs, so the upper limit of the epoch was set to 80.As the changes in ship trajectories are relatively slow, the upper limit of batch size was set to 512 to allow the model to better learn the long-term features in the data.The search range and the best value of each parameter are shown in Table 4.  WOA optimizes the learning rate, the number of hidden layer units, the number of fully connected layer units, the number of iterations, and the batch size of the neural network.As iterations are performed, the parameters of Attention-BILSTM are continuously adjusted until an optimal result is obtained.The range of parameter search is determined based on the complexity of the ship trajectory problem.The model's error tends to stabilize after about 40 epochs, so the upper limit of the epoch was set to 80.As the changes in ship trajectories are relatively slow, the upper limit of batch size was set to 512 to allow the model to better learn the long-term features in the data.The search range and the best value of each parameter are shown in Table 4.And the performance of the WOA-Attention-BILSTM on the training sets and test sets is shown in Figure 10.The results show that the WOA-Attention-BILSTM has a low MSE, and the model converges faster because the ship routinely travels in the channel, and there is some similarity in the trajectory.And the performance of the WOA-Attention-BILSTM on the training sets and test sets is shown in Figure 10.The results show that the WOA-Attention-BILSTM has a low MSE, and the model converges faster because the ship routinely travels in the channel, and there is some similarity in the trajectory.

Prediction Results of WOA-Attention-BILSTM Model
The ship will go through various states in one trajectory, such as straight ahead, turning, etc.The prediction results are usually better when the prediction is made in a straight navigation state, but the effect of the model becomes poor when it turns.In order to verify the prediction effect of the model under different sailing states in ship navigation, the ship with an MMSI of 563,108,500 is selected in this paper, and its trajectory contains straight ahead and turning, as shown in Figure 11.

Prediction Results of WOA-Attention-BILSTM Model
The ship will go through various states in one trajectory, such as straight ahead, turning, etc.The prediction results are usually better when the prediction is made in a straight navigation state, but the effect of the model becomes poor when it turns.In order to verify the prediction effect of the model under different sailing states in ship navigation, the ship with an MMSI of 563,108,500 is selected in this paper, and its trajectory contains straight ahead and turning, as shown in Figure 11.
BILSTM hidden layer 1 [1,30] 18 BILSTM hidden layer 2 [1,30] 15 Dense [1,100] 28 And the performance of the WOA-Attention-BILSTM on the training sets and test sets is shown in Figure 10.The results show that the WOA-Attention-BILSTM has a low MSE, and the model converges faster because the ship routinely travels in the channel, and there is some similarity in the trajectory.

Prediction Results of WOA-Attention-BILSTM Model
The ship will go through various states in one trajectory, such as straight ahead, turning, etc.The prediction results are usually better when the prediction is made in a straight navigation state, but the effect of the model becomes poor when it turns.In order to verify the prediction effect of the model under different sailing states in ship navigation, the ship with an MMSI of 563,108,500 is selected in this paper, and its trajectory contains straight ahead and turning, as shown in Figure 11.To verify the effectiveness of the proposed WOA-Attention-BILSTM model, we compare it with the Attention-BILSTM, BILSTM, LSTM, BP, and CNN-LSTM.First, the results are visualized and analyzed, and the prediction results of each model are shown in Figure 12.When the ship is moving straight, the models can basically fit the ship's motion trajectory, and the WOA-Attention-BILSTM predicted trajectory is the closest to the target trajectory.BP networks have larger errors at turns and fluctuate substantially, and the predicted trajectory points are discontinuous and unstable.The trajectory points predicted by LSTM, BILSTM, and Attention-BILSTM are continuous but not accurate enough.GRU performs poorly in complex trajectory segments because of its weaker capability in learning long-term dependency relationships in the sequence.The trajectories predicted by CNN-LSTM are better fitted because of better feature extraction, but there are some cases of continuous offset of the predicted trajectory points, resulting in larger errors.At the first turning point, the BP model appears to have the problem of incoherent predicted trajectories, and the prediction error becomes unstable.At the fourth turning point, due to the long prediction step, the trajectory incoherence problem occurs in all models except WOA-Attention-BILSTM.At the end of the trajectory, only the WOA-Attention-BILSTM model still maintains a low error.To verify the effectiveness of the proposed WOA-Attention-BILSTM model, we compare it with the Attention-BILSTM, BILSTM, LSTM, BP, and CNN-LSTM.First, the results are visualized and analyzed, and the prediction results of each model are shown in Figure 12.When the ship is moving straight, the models can basically fit the ship's motion trajectory, and the WOA-Attention-BILSTM predicted trajectory is the closest to the target trajectory.BP networks have larger errors at turns and fluctuate substantially, and the predicted trajectory points are discontinuous and unstable.The trajectory points predicted by LSTM, BILSTM, and Attention-BILSTM are continuous but not accurate enough.GRU performs poorly in complex trajectory segments because of its weaker capability in learning long-term dependency relationships in the sequence.The trajectories predicted by CNN-LSTM are better fitted because of better feature extraction, but there are some cases of continuous offset of the predicted trajectory points, resulting in larger errors.At the first turning point, the BP model appears to have the problem of incoherent predicted trajectories, and the prediction error becomes unstable.At the fourth turning point, due to the long prediction step, the trajectory incoherence problem occurs in all models except WOA-Attention-BILSTM.At the end of the trajectory, only the WOA-Attention-BILSTM model still maintains a low error.To further analyze the prediction effect of the WOA-Attention-BILSTM model, we analyze it by the evaluation metrics, as shown in Table 5.The evaluation metrics presented in the table are elaborated in Section 4.2.The Haversine Distance is the primary focus of this paper as it is a more intuitive and comprehensible measure of accuracy.From Table 5, it can be seen that besides the execution time, WOA-Attention-BILSTM has the lowest metrics, and its average Haversine Distance reaches 0.4042 km.Next is the Attention-BILSTM model, which can also get good prediction results with the manual tuning of parameters.The Haversine Distance of GRU also reaches 0.7 km, but it performs worse in the part of complex trajectories, and its error is more unstable at the beginning compared to the other model.The error performance of the CNN-LSTM model is relatively stable, To further analyze the prediction effect of the WOA-Attention-BILSTM model, we analyze it by the evaluation metrics, as shown in Table 5.The evaluation metrics presented in the table are elaborated in Section 4.2.The Haversine Distance is the primary focus of this paper as it is a more intuitive and comprehensible measure of accuracy.From Table 5, it can be seen that besides the execution time, WOA-Attention-BILSTM has the lowest metrics, and its average Haversine Distance reaches 0.4042 km.Next is the Attention-BILSTM model, which can also get good prediction results with the manual tuning of parameters.The Haversine Distance of GRU also reaches 0.7 km, but it performs worse in the part of complex trajectories, and its error is more unstable at the beginning compared to the other model.The error performance of the CNN-LSTM model is relatively stable, and its Haversine Distance has been around 1 km in several experiments.The difference between the results of BILSTM and LSTM is large.The BILSTM model can learn the information in the ship's historical data better, so the results are good.The Haversine Distance of the BP network is 1.9199 km and is relatively unstable, which performs better when the ship is going straight but will have great fluctuations when turning.It can be seen that the optimization of the Attention-BILSTM by WOA is quite obvious, and using the hyperparameters obtained from WOA to construct the Attention-BILSTM can effectively improve the accuracy of the model.The Execution Time in Table 5 is the required time for the trained model to predict 1100 steps.The time of BP is the shortest, and the time of WOA-Attention-BILSTM does not increase significantly, only 0.10165 s more than Attention-BILSTM, which is in the acceptable range.Therefore, the prediction accuracy and performance of WOA-Attention-BILSTM are excellent.
Compare the Haversine Distance of the first 120 steps of each model, as shown in Figure 13.From the figure, it can be seen that the Haversine Distance errors of the LSTMbased model fluctuate relatively regularly in a certain range, while the BP model has a large fluctuation of errors, and the predicted trajectories are relatively incoherent.The error of WOA-Attention-BILSTM is more stable, ranging from 0.003 km to 0.4 km.
Appl.Sci.2023, 13, x FOR PEER REVIEW 18 of 21 and its Haversine Distance has been around 1 km in several experiments.The difference between the results of BILSTM and LSTM is large.The BILSTM model can learn the information in the ship's historical data better, so the results are good.The Haversine Distance of the BP network is 1.9199 km and is relatively unstable, which performs better when the ship is going straight but will have great fluctuations when turning.It can be seen that the optimization of the Attention-BILSTM by WOA is quite obvious, and using the hyperparameters obtained from WOA to construct the Attention-BILSTM can effectively improve the accuracy of the model.The Execution Time in Table 5 is the required time for the trained model to predict 1100 steps.The time of BP is the shortest, and the time of WOA-Attention-BILSTM does not increase significantly, only 0.10165 s more than Attention-BILSTM, which is in the acceptable range.Therefore, the prediction accuracy and performance of WOA-Attention-BILSTM are excellent.As a result of the above analysis, it can be concluded that the proposed WOA-Attention-BILSTM model has the lowest error and is stable, has a significant improvement in prediction accuracy compared with other models, and takes less time to perform the prediction task.Therefore, the WOA-Attention-BILSTM model has good accuracy and As a result of the above analysis, it can be concluded that the proposed WOA-Attention-BILSTM model has the lowest error and is stable, has a significant improvement in prediction accuracy compared with other models, and takes less time to perform the prediction task.Therefore, the WOA-Attention-BILSTM model has good accuracy and performance in the ship trajectory prediction problem and can satisfy the needs of ship collision avoidance, route planning, and maritime supervision.

Conclusions
This paper proposes a WOA-Attention-BILSTM model for ship trajectory prediction in order to enhance the intelligent perception capabilities of ships.The model employs a bidirectional structure and an attention mechanism, which enables it to capture important information within the trajectory sequence and better retain contextual information in ship trajectories.Additionally, the use of the WOA algorithm optimizes the parameter design capabilities and generalization of the model, thereby avoiding the problem of model convergence caused by artificial parameter design.Furthermore, this paper includes the ship's COG and SOG in the feature vector, which facilitates the extraction of multimodal features and improves the accuracy of the model's prediction.To validate the effectiveness of the proposed model, the paper processed a public AIS dataset using methods such as deduplication, filtering, resampling, and missing value completion and conducted extensive comparative experiments based on this dataset.The experimental results demonstrate that the proposed model has higher accuracy compared to other models, strong parameter optimization capabilities, and robust generalization ability.Thus, this method can provide support for the ship's intelligent decision-making system ensure safe navigation.It should be noted that the features used in this study are based on a single-step neural network model.In the future, consideration should be given to multi-step predictions to reduce the decrease in accuracy caused by recursive predictions and further improve the accuracy of trajectory prediction.

Figure 3 .
Figure 3.The structure of the LSTM.

Figure 3 .
Figure 3.The structure of the LSTM.

Figure 4 .
Figure 4.The structure of the BILSTM.

Figure 4 .
Figure 4.The structure of the BILSTM.

→A
determines the swimming direction of the whales: if | → A| < 1, the whales search randomly and swim towards the position of the random whale in the population; if | → A| ≥ 1, the whales encircle and contract and swim towards the position of the optimal whale in the population.This behavior makes the convergence of the algorithm more global and reduces the probability of falling into a local optimum.
Appl.Sci.2023,13,  x FOR PEER REVIEW 10 of 21the error in the predicted results.The function tanh can be utilized to model complex nonlinear relationships, and its output range is constrained within [−1, 1].This constraint helps to mitigate the issues of gradient explosion and vanishing gradient.

Figure 6 .
Figure 6.The structure of the Attention mechanism.

Figure 6 .
Figure 6.The structure of the Attention mechanism.
k is the weight value and takes the value range [0, 1].N is the total number of samples in the training set, M is the total number of samples in the test set, and  ,  are the predicted values of the training sets and test sets;  ,  are the target values of the training sets and test sets; 6. Start iterations to obtain the Attention-BILSTM network parameters.Update the Attention-BILSTM using the parameters; 7. Repeat steps 5-6 until the maximum number of iterations is reached; 8. Output the optimal results, use the obtained optimal network hyperparameters to build Attention-BILSTM for ship trajectory prediction.The architecture of the prediction model we built is shown in Figure 7.

Figure 11 .
Figure 11.Trajectory of the target ship.The blue lines are the trajectories of other ships in the dataset, and the red line is the target ship's trajectory.

Figure 11 .
Figure 11.Trajectory of the target ship.The blue lines are the trajectories of other ships in the dataset, and the red line is the target ship's trajectory.

Figure 12 .
Figure 12.The visualization and comparison of the models' prediction results.

Figure 12 .
Figure 12.The visualization and comparison of the models' prediction results.
Compare the Haversine Distance of the first 120 steps of each model, as shown in Figure 13.From the figure, it can be seen that the Haversine Distance errors of the LSTMbased model fluctuate relatively regularly in a certain range, while the BP model has a large fluctuation of errors, and the predicted trajectories are relatively incoherent.The error of WOA-Attention-BILSTM is more stable, ranging from 0.003 km to 0.4 km.

Figure 13 .
Figure 13.Comparison of model prediction results by Haversine Distance.

Figure 13 .
Figure 13.Comparison of model prediction results by Haversine Distance.
, 1].N is the total number of samples in the training set, M is the total number of samples in the test set, and ŷi

Table 1 .
Example of AIS data structure.

Table 2 .
Selected data range.

Table 3 .
The performance of the WOA-Attention-BILSTM with different population sizes.

Table 3 .
The performance of the WOA-Attention-BILSTM with different population sizes.

Table 4 .
The search range and results of each parameter in iterations.

Table 4 .
The search range and results of each parameter in iterations.

Table 5 .
Metrics of WOA-Attention-BILSTM and other models.

Table 5 .
Metrics of WOA-Attention-BILSTM and other models.