Online Prediction of Ship Behavior with Automatic Identification System Sensor Data Using Bidirectional Long Short-Term Memory Recurrent Neural Network

The real-time prediction of ship behavior plays an important role in navigation and intelligent collision avoidance systems. This study developed an online real-time ship behavior prediction model by constructing a bidirectional long short-term memory recurrent neural network (BI-LSTM-RNN) that is suitable for automatic identification system (AIS) date and time sequential characteristics, and for online parameter adjustment. The bidirectional structure enhanced the relevance between historical and future data, thus improving the prediction accuracy. Through the “forget gate” of the long short-term memory (LSTM) unit, the common behavioral patterns were remembered and unique behaviors were forgotten, improving the universality of the model. The BI-LSTM-RNN was trained using 2015 AIS data from Tianjin Port waters. The results indicate that the BI-LSTM-RNN effectively predicted the navigational behaviors of ships. This study contributes significantly to the increased efficiency and safety of sea operations. The proposed method could potentially be applied as the predictive foundation for various intelligent systems, including intelligent collision avoidance, vessel route planning, operational efficiency estimation, and anomaly detection systems.


Introduction
For a ship to avoid collision, it must predict the behaviors of other ships in order to estimate the collision risk. High-precision real-time prediction of ships' navigational behaviors can effectively increase the reliability of collision avoidance decisions and decrease the risk of collisions.
Navigation behavior prediction algorithms can be divided into two categories: offline and online prediction. Numerous offline prediction algorithms are employed in the fields of trajectory estimation and data restoration. In offline prediction, trajectory data are input to a fixed formula or trained model. These algorithms are insufficiently flexible for adaptation to actual predicted data; they also lack timeliness and high efficiency.
Examples of offline prediction studies include work by Han et al. [1] on the prediction of a ship's trajectory by establishing a state-switching model; they proposed a conflict-free four-dimensional aircraft prediction method based on hybrid system theory. For predicting ship positions, Xu et al. [2] trained a three-layer back-propagation (BP) neural network to accept the ship's direction and speed as inputs and yield differences in the latitude and longitude as output. gradient explosive may occur when error parameters are back-propagated. The gradient explosion phenomenon does not meet the training objectives, and a typical RNN does not provide satisfactory results. Therefore, in a bidirectional RNN structure [27], based on the traditional RNN concept, forward and backward propagation are used to effectively update the weights, thereby increasing the contextual relevance and prediction precision. Instead of a hidden-layer unit, the bidirectional RNN uses a long short-term memory (LSTM) unit. The LSTM unit effectively filters key features, performs selective memory, and solves the problem of RNN processing for long-time data. The LSTM algorithm was proposed by Sak [28] to establish long-term correlations between input values. By replacing the hidden-layer network in an RNN with an LSTM unit [29,30], the problem of gradient dispersion is solved and a new storage unit is created to selectively forget or remember a given operation. In this study, we developed a bidirectional LSTM RNN (BI-LSTM-RNN) combining historical experience and real-time adjustment flexibility that can be utilized to predict the navigation behavior of an unmanned ship during collision avoidance measures against a manned ship. This approach can support a human operator's better understanding of complex conditions at sea and enhance decision-making to avoid danger.

RNN Structure
RNNs use directional loops to address problems in the context of input nodes. The RNNs overcome the connection between a traditional neural network structure layer and the hidden layer. The transition between each layer node is no longer the input of a hidden layer. An RNN is a sequence-to-sequence model [25,26,31] that can suitably process sequence data of any length. In processing AIS data, the current state is related only to the previous ship states. The basic network structure of an RNN is depicted in Figure 1. In the figure, t represents the time, x is input data, O is output data, S is the network state, W is the update weight, V is the weight between the cell and the output, and U is the weight between the input and cell.
The idea of the RNN structure is to make full use of the information in the previous sequence, which is common in traditional neural networks. It is assumed that all inputs or outputs are independent of each other. However, many natural language processing tasks are judged by context. Therefore, this assumption has its limitations. An RNN is directly translated into a circulating neural network or RNN because different inputs pass through the same neural network, and the difference is concealed by the previous state of the hidden layer. As shown in Figure 1, the unrolled RNN is on the left side and the network structure of the expanded RNN is on the right side. induce a gradient error, or a gradient explosive may occur when error parameters are back-propagated. The gradient explosion phenomenon does not meet the training objectives, and a typical RNN does not provide satisfactory results. Therefore, in a bidirectional RNN structure [27], based on the traditional RNN concept, forward and backward propagation are used to effectively update the weights, thereby increasing the contextual relevance and prediction precision. Instead of a hidden-layer unit, the bidirectional RNN uses a long short-term memory (LSTM) unit. The LSTM unit effectively filters key features, performs selective memory, and solves the problem of RNN processing for long-time data. The LSTM algorithm was proposed by Sak [28] to establish long-term correlations between input values. By replacing the hidden-layer network in an RNN with an LSTM unit [29,30], the problem of gradient dispersion is solved and a new storage unit is created to selectively forget or remember a given operation. In this study, we developed a bidirectional LSTM RNN (BI-LSTM-RNN) combining historical experience and real-time adjustment flexibility that can be utilized to predict the navigation behavior of an unmanned ship during collision avoidance measures against a manned ship. This approach can support a human operator's better understanding of complex conditions at sea and enhance decision-making to avoid danger.

RNN Structure
RNNs use directional loops to address problems in the context of input nodes. The RNNs overcome the connection between a traditional neural network structure layer and the hidden layer. The transition between each layer node is no longer the input of a hidden layer. An RNN is a sequence-to-sequence model [25,26,31] that can suitably process sequence data of any length. In processing AIS data, the current state is related only to the previous ship states. The basic network structure of an RNN is depicted in Figure 1. In the figure, t represents the time, x is input data, O is output data, S is the network state, W is the update weight, V is the weight between the cell and the output, and U is the weight between the input and cell.
The idea of the RNN structure is to make full use of the information in the previous sequence, which is common in traditional neural networks. It is assumed that all inputs or outputs are independent of each other. However, many natural language processing tasks are judged by context. Therefore, this assumption has its limitations. An RNN is directly translated into a circulating neural network or RNN because different inputs pass through the same neural network, and the difference is concealed by the previous state of the hidden layer. As shown in Figure 1, the unrolled RNN is on the left side and the network structure of the expanded RNN is on the right side. An RNN contains input units labeled  [32]. A one-way flow of information occurs from the input to the hidden units An RNN contains input units labeled {x 0 , x 1 , . . . , x t , x t+1 }, output units labeled {y 0 , y 1 , . . . , y t , y t+1 }, and hidden units labeled {s 0 , s 1 , . . . , s t , s t+1 }. The hidden units perform the most important work [32]. A one-way flow of information occurs from the input to the hidden units to the output units ( Figure 2). In some cases, the RNN breaks the one-way nature of the second flow, and the boot information is returned to the hidden unit from the output unit. This process is called back-projection, and the input to the hidden layer includes the state of the previous hidden layer. The nodes in the layer can be connected to the next layer or to other units. to the output units ( Figure 2). In some cases, the RNN breaks the one-way nature of the second flow, and the boot information is returned to the hidden unit from the output unit. This process is called back-projection, and the input to the hidden layer includes the state of the previous hidden layer. The nodes in the layer can be connected to the next layer or to other units.

LSTM Cell Structure
An LSTM unit is a special RNN unit that can solve the long-term dependency problem. In an LSTM unit, the cell state controls the discarding and adding of information through the gate to achieve forgetting and memorizing functions [33][34][35]. An LSTM performs selective operations on the knowledge learned using the three gate structures of input, output, and forget gate. The LSTM structure ( Figure 3) allows self-looping and real-time updating of weights to prevent gradient disappearance and gradient expansion. The training process of the LSTM network is as follows:

LSTM Cell Structure
An LSTM unit is a special RNN unit that can solve the long-term dependency problem. In an LSTM unit, the cell state controls the discarding and adding of information through the gate to achieve forgetting and memorizing functions [33][34][35]. An LSTM performs selective operations on the knowledge learned using the three gate structures of input, output, and forget gate. The LSTM structure ( Figure 3) allows self-looping and real-time updating of weights to prevent gradient disappearance and gradient expansion. to the output units ( Figure 2). In some cases, the RNN breaks the one-way nature of the second flow, and the boot information is returned to the hidden unit from the output unit. This process is called back-projection, and the input to the hidden layer includes the state of the previous hidden layer. The nodes in the layer can be connected to the next layer or to other units.

LSTM Cell Structure
An LSTM unit is a special RNN unit that can solve the long-term dependency problem. In an LSTM unit, the cell state controls the discarding and adding of information through the gate to achieve forgetting and memorizing functions [33][34][35]. An LSTM performs selective operations on the knowledge learned using the three gate structures of input, output, and forget gate. The LSTM structure ( Figure 3) allows self-looping and real-time updating of weights to prevent gradient disappearance and gradient expansion. The training process of the LSTM network is as follows: The training process of the LSTM network is as follows: (1) State initialization: The number of neural nodes in the input layer, number of output nodes, and number of each cell unit (k) are determined. The initial state {S} of each cell unit is equal to 0, and the link weight (ω ij ) of each layer is equal to 0. The average value (±1) is a randomly generated range. The offset θ j is initialized to 0.1. W represents the weight of the matrix.
(2) The output data (H) are calculated from the input layer, according to ω ij and θ j from Equation (1): (3) LSTM unit calculation: The output of the unit above the forget gate and the input of this unit are used as inputs for the sigmoid function ( Figure 4), which adds to the degree of forgetting used to control the previous unit.
Sensors 2018, 18, x 5 of 16 (1) State initialization: The number of neural nodes in the input layer, number of output nodes, and number of each cell unit (k) are determined. The initial state {S} of each cell unit is equal to 0, and the link weight ( ij ω ) of each layer is equal to 0. The average value (±1) is a randomly generated range. The offset j  is initialized to 0.1. W represents the weight of the matrix.
(2) The output data (H) are calculated from the input layer, according to ij  and j  from Equation (1): (3) LSTM unit calculation: The output of the unit above the forget gate and the input of this unit are used as inputs for the sigmoid function ( Figure 4), which adds to the degree of forgetting used to control the previous unit. (4) The input gate integrates the Ct−1 of the previous state with the Ct of the current state to update the cell unit state.
The output value ( t o ) from the output gate is passed to the status value ( t h ) of the next unit to complete the training procedure. (4) The input gate integrates the C t−1 of the previous state with the C t of the current state to update the cell unit state.
(5) The output value (o t ) from the output gate is passed to the status value (h t ) of the next unit to complete the training procedure.
(6) Error calculation: The prediction error (e) is calculated according to O and the expected output (y) of the RNN prediction error, returning the error number of each batch.
(7) Weight updating: The random gradient descent method is used to optimize the error depending on the value. For each update parameter, it is unnecessary to traverse all of the training sets; only one value is used to update a parameter. Such an algorithm is more suitable for big data and has fast searching capability.

Bidirectional Structure
If the following information can be accessed in advance, the analysis of the current sequence information becomes relatively easy [27,36,37]. As standard RNN processes have time series sequences, the following information is usually ignored. A delay is commonly added between the input and the target to predict future information. However, in practical applications, an excessively long delay yields an inferior prediction result. An inferior prediction result will cause the network to devote excessive resources to memorizing large amounts of input information, thus diminishing its ability to model the combined knowledge of different input vectors. Therefore, the size of the delay must be adapted manually. In a bidirectional RNN, the training sequence involves both forward and backward RNNs, which are connected to an output layer. Figure 5 illustrates the structure of a bidirectional RNN.
(7) Weight updating: The random gradient descent method is used to optimize the error depending on the value. For each update parameter, it is unnecessary to traverse all of the training sets; only one value is used to update a parameter. Such an algorithm is more suitable for big data and has fast searching capability.

Bidirectional Structure
If the following information can be accessed in advance, the analysis of the current sequence information becomes relatively easy [27,36,37]. As standard RNN processes have time series sequences, the following information is usually ignored. A delay is commonly added between the input and the target to predict future information. However, in practical applications, an excessively long delay yields an inferior prediction result. An inferior prediction result will cause the network to devote excessive resources to memorizing large amounts of input information, thus diminishing its ability to model the combined knowledge of different input vectors. Therefore, the size of the delay must be adapted manually. In a bidirectional RNN, the training sequence involves both forward and backward RNNs, which are connected to an output layer. Figure 5 illustrates the structure of a bidirectional RNN. ··· y 1 The structure of the bidirectional RNN provides complete historical and future information for each point in the input sequence to the output layer. Figure 5 displays the bidirectional RNN spread over time. Six unique weights are used repeatedly in each time step: (w1, w3) correspond to inputs to The structure of the bidirectional RNN provides complete historical and future information for each point in the input sequence to the output layer. Figure 5 displays the bidirectional RNN spread over time. Six unique weights are used repeatedly in each time step: (w1, w3) correspond to inputs to the forward and backward hidden layers, (w2, w5) correspond to hidden layers to the hidden layer itself, and (w4, w6) correspond to forward and backward hidden layers to the output layer.
No information flow occurs between the forward and backward hidden layers, ensuring that the expanded graph is acyclic for the weight relationship ( Figure 6). the forward and backward hidden layers, (w2, w5) correspond to hidden layers to the hidden layer itself, and (w4, w6) correspond to forward and backward hidden layers to the output layer.
No information flow occurs between the forward and backward hidden layers, ensuring that the expanded graph is acyclic for the weight relationship ( Figure 6 w4 w4 Figure 6. BI-RNN weight relationship diagram.

Batch Training Structure
In this study, the LSTM unit was added to the time series data to solve the gradient disappearance problem. Moreover, a bidirectional structure was implemented to enhance the context correlation, yielding the BI-LSTM-RNN. Batch training was performed with batches comprising 10 sets of data each. The AIS data were arranged in a matrix and randomly batched, as illustrated in Figure 7, to prevent overfitting, eliminate any self-correlation of the data, and improve the learning efficiency.

Batch Training Structure
In this study, the LSTM unit was added to the time series data to solve the gradient disappearance problem. Moreover, a bidirectional structure was implemented to enhance the context correlation, yielding the BI-LSTM-RNN. Batch training was performed with batches comprising 10 sets of data each. The AIS data were arranged in a matrix and randomly batched, as illustrated in Figure 7, to prevent overfitting, eliminate any self-correlation of the data, and improve the learning efficiency.
the forward and backward hidden layers, (w2, w5) correspond to hidden layers to the hidden layer itself, and (w4, w6) correspond to forward and backward hidden layers to the output layer.
No information flow occurs between the forward and backward hidden layers, ensuring that the expanded graph is acyclic for the weight relationship ( Figure 6 w4 w4 Figure 6. BI-RNN weight relationship diagram.

Batch Training Structure
In this study, the LSTM unit was added to the time series data to solve the gradient disappearance problem. Moreover, a bidirectional structure was implemented to enhance the context correlation, yielding the BI-LSTM-RNN. Batch training was performed with batches comprising 10 sets of data each. The AIS data were arranged in a matrix and randomly batched, as illustrated in Figure 7, to prevent overfitting, eliminate any self-correlation of the data, and improve the learning efficiency.

Navigation Behavior Prediction Model
To verify the validity of the model, 2015 AIS trajectory data from Tianjin Port for 11,032 ships were selected. The data included 36,807,928 coordinate points occupying 8.58 GB. The AIS data from January to October included 29,277,849 points and comprised the training group. The AIS data from

Navigation Behavior Prediction Model
To verify the validity of the model, 2015 AIS trajectory data from Tianjin Port for 11,032 ships were selected. The data included 36,807,928 coordinate points occupying 8.58 GB. The AIS data from January to October included 29,277,849 points and comprised the training group. The AIS data from November to December included 7,530,103 points and formed the verification group. A flowchart of the complete prediction model is displayed in Figure 8.

Parameter Analysis
For an unmanned ship meeting another ship, it must predict the behavior of the incoming ship and adopt an effective collision avoidance strategy. In our BI-LSTM-RNN, the current and historical AIS ship data are taken as input values, and the future ship position is taken as the output value of the network; the output can then be compared with actual ship position data. The AIS data are multidimensional and multiparametric to characterize ship behavior; for example, the data include the ship's direction, position, and speed modified over time. In the test AIS data, each ship was subdivided according to its Maritime Mobile Service Identity (MMSI). The ships were sorted using timestamps. The storage structure of the AIS information is displayed in Table 1; the AIS data for June are illustrated in Figure 9.

Parameter Analysis
For an unmanned ship meeting another ship, it must predict the behavior of the incoming ship and adopt an effective collision avoidance strategy. In our BI-LSTM-RNN, the current and historical AIS ship data are taken as input values, and the future ship position is taken as the output value of the network; the output can then be compared with actual ship position data. The AIS data are multidimensional and multiparametric to characterize ship behavior; for example, the data include the ship's direction, position, and speed modified over time. In the test AIS data, each ship was subdivided according to its Maritime Mobile Service Identity (MMSI). The ships were sorted using timestamps. The storage structure of the AIS information is displayed in Table 1; the AIS data for June are illustrated in Figure 9.  To avoid overfitting, the position information is comprehensive and includes information regarding speed. Therefore, information related to the position, heading, and time is selected for learning. The input layer ship behavior data can be expressed as Batch standardization is required before data are entered into each layer network:  is the batch variance, and  and  are the learned parameters.  To avoid overfitting, the position information is comprehensive and includes information regarding speed. Therefore, information related to the position, heading, and time is selected for learning. The input layer ship behavior data can be expressed as Batch standardization is required before data are entered into each layer network: where µ B is the batch mean, σ 2 B is the batch variance, and γ and β are the learned parameters. The output layer ship position data O(t + 2) can be expressed as The error function is Under the machine learning framework of Google TensorFlow, the BI-LSTM-RNN was implemented in the Python language. The learning network structure contained one input layer, two hidden layers, two LSTM unit layers, and one output layer. The overall structure of the network training, automatically generated through TensorBoard, is depicted in Figure 10.   (14) Under the machine learning framework of Google TensorFlow, the BI-LSTM-RNN was implemented in the Python language. The learning network structure contained one input layer, two hidden layers, two LSTM unit layers, and one output layer. The overall structure of the network training, automatically generated through TensorBoard, is depicted in Figure 10.

AIS Trajectory Data Value Filter
When the compression algorithm compresses the trajectory data, the algorithm attempts to establish a criterion for judging the value of the ship's trajectory [38] (Figure 11). It removes data with low trajectory values and retains data with extreme trajectory values. This operation achieves compression and must be retained. Data with high trajectory values are called ship trajectory feature points and key feature points. At feature points, the original trajectory is strong in the AIS ship trajectory data. If such a point is lost, the ability to restore the original trajectory considerably decreases. Points that are not key feature points can be simplified to achieve the compression effect. Eliminating some track data inevitably causes distortion; the threshold is determined to be between the compression and distortion rates.

AIS Trajectory Data Value Filter
When the compression algorithm compresses the trajectory data, the algorithm attempts to establish a criterion for judging the value of the ship's trajectory [38] (Figure 11). It removes data with low trajectory values and retains data with extreme trajectory values. This operation achieves compression and must be retained. Data with high trajectory values are called ship trajectory feature points and key feature points. At feature points, the original trajectory is strong in the AIS ship trajectory data. If such a point is lost, the ability to restore the original trajectory considerably decreases. Points that are not key feature points can be simplified to achieve the compression effect. Eliminating some track data inevitably causes distortion; the threshold is determined to be between the compression and distortion rates.

Results
Four points from the AIS data were used to form a training sample. The first data point was fixed, and the second, third, and fourth points were intercepted at intervals of three to five points. This selection disrupted the personality association between the data and increased the sensitivity of the data to time parameters. The problems of gradient explosion and gradient disappearance were solved through batch training. The trained network model was highly versatile and directly usable; it did not require retraining for specific areas. The BI-LSTM-RNN network continued to learn online and adjusted the network in the actual application scenario. By using three historical data points at a time, six future ship position points could be predicted. Furthermore, by adjusting the network parameters according to the continuously generated ship data, six new future ship position points could be predicted ( Figure 12).
The data from Tianjin Port for January-October were used as the training group to train the neural network parameters. The error was stable at approximately 90 m. The training error is displayed in Figure 13.

Results
Four points from the AIS data were used to form a training sample. The first data point was fixed, and the second, third, and fourth points were intercepted at intervals of three to five points. This selection disrupted the personality association between the data and increased the sensitivity of the data to time parameters. The problems of gradient explosion and gradient disappearance were solved through batch training. The trained network model was highly versatile and directly usable; it did not require retraining for specific areas. The BI-LSTM-RNN network continued to learn online and adjusted the network in the actual application scenario. By using three historical data points at a time, six future ship position points could be predicted. Furthermore, by adjusting the network parameters according to the continuously generated ship data, six new future ship position points could be predicted ( Figure 12).
The data from Tianjin Port for January-October were used as the training group to train the neural network parameters. The error was stable at approximately 90 m. The training error is displayed in Figure 13.
Previously published trajectory prediction algorithms had two major disadvantages: (1) The prediction algorithms mostly included multiple input points and eigenvalues, but only one output point, thus limiting their prediction capacity and consequently restricting their applicability and accessibility in navigation. In this study, the BI-LSTM-RNN structure used three historical ship position data points, and six points could be continuously predicted. Thereafter, the prediction accuracy gradually decreased. (2) Conventional prediction algorithms possess limited versatility, are based on a fixed mathematical model, and are only valid for learning the training data of a ship in advance. Conventional algorithms cannot be adaptively changed, and precise prediction and judgment of the targeted object are impossible, thus the versatility is weak. The BI-LSTM-RNN can be applied to improve versatility by retaining the navigational habits of ships included in AIS big data and using the forget operation on the individual cases of single-ship data. The object is forecasted online, and the characteristics of the current ship can be memorized for a short period of time. Thus, the navigation behavior of a ship can be precisely predicted. Previously published trajectory prediction algorithms had two major disadvantages: (1) The prediction algorithms mostly included multiple input points and eigenvalues, but only one output point, thus limiting their prediction capacity and consequently restricting their applicability and accessibility in navigation. In this study, the BI-LSTM-RNN structure used three historical ship Previously published trajectory prediction algorithms had two major disadvantages: (1) The prediction algorithms mostly included multiple input points and eigenvalues, but only one output point, thus limiting their prediction capacity and consequently restricting their applicability and accessibility in navigation. In this study, the BI-LSTM-RNN structure used three historical ship Trajectory prediction involves grasping the movement of the ship and obtaining reliable collision avoidance decisions in advance. The trajectory prediction algorithm requires accuracy and timeliness. Greater consistency with reality improves the user's likelihood of obtaining the correct conclusion. The time available for making collision avoidance decisions is very limited. Therefore, obtaining the prediction result quickly is necessary. To prove the superiority of the proposed prediction algorithm, AIS data from Tianjin Port from November to December, which included 7,530,103 data points, were used as the verification group. The data were not intercepted when the network was being trained, and all the data were directly taken as input for the navigation behavior of the ship. The ship used the BI-LSTM-RNN, BI-RNN, and LSTM-RNN to compare prediction errors. Figure 14 displays a comparison of the convergence effects of the three algorithms.
advance. Conventional algorithms cannot be adaptively changed, and precise prediction and judgment of the targeted object are impossible, thus the versatility is weak. The BI-LSTM-RNN can be applied to improve versatility by retaining the navigational habits of ships included in AIS big data and using the forget operation on the individual cases of single-ship data. The object is forecasted online, and the characteristics of the current ship can be memorized for a short period of time. Thus, the navigation behavior of a ship can be precisely predicted.
Trajectory prediction involves grasping the movement of the ship and obtaining reliable collision avoidance decisions in advance. The trajectory prediction algorithm requires accuracy and timeliness. Greater consistency with reality improves the user's likelihood of obtaining the correct conclusion. The time available for making collision avoidance decisions is very limited. Therefore, obtaining the prediction result quickly is necessary. To prove the superiority of the proposed prediction algorithm, AIS data from Tianjin Port from November to December, which included 7,530,103 data points, were used as the verification group. The data were not intercepted when the network was being trained, and all the data were directly taken as input for the navigation behavior of the ship. The ship used the BI-LSTM-RNN, BI-RNN, and LSTM-RNN to compare prediction errors. Figure 14 displays a comparison of the convergence effects of the three algorithms. It can be clearly seen from Figure 14 that the neural network predicts the behavior of the ship. When predicted for a period of time, the data of a single ship in the verification dataset are limited, and the data after verification are replaced with the data of the other ship to continue the prediction and verification. Although the prediction accuracy deteriorates suddenly, it always converges in a short time and stabilizes. The convergence velocity, oscillation amplitude, and prediction accuracy of the BI-LSTM-RNN are superior to those of the LSTM-RNN and bidirectional RNN. The experimental results indicate that the BI-LSTM-RNN is trained to predict the position of a single ship in a short time, as illustrated in Figures 15 and 16. It can be clearly seen from Figure 14 that the neural network predicts the behavior of the ship. When predicted for a period of time, the data of a single ship in the verification dataset are limited, and the data after verification are replaced with the data of the other ship to continue the prediction and verification. Although the prediction accuracy deteriorates suddenly, it always converges in a short time and stabilizes. The convergence velocity, oscillation amplitude, and prediction accuracy of the BI-LSTM-RNN are superior to those of the LSTM-RNN and bidirectional RNN. The experimental results indicate that the BI-LSTM-RNN is trained to predict the position of a single ship in a short time, as illustrated in Figures 15 and 16.
It can be seen from Figure 16 that although each conversion of the new ship data image will cause the rebound phenomenon to occur, it can quickly converge and stabilize in a short time. After about 15 batches of training, the predictive performance of the BI-LSTM-RNN appeared to be stable. The reliability of the navigation behavior prediction was 10 m or less within the accuracy of GPS positioning. The BI-LSTM-RNN was then applied as a prediction module.  It can be seen from Figure 16 that although each conversion of the new ship data image will cause the rebound phenomenon to occur, it can quickly converge and stabilize in a short time. After about 15 batches of training, the predictive performance of the BI-LSTM-RNN appeared to be stable. The reliability of the navigation behavior prediction was 10 m or less within the accuracy of GPS positioning. The BI-LSTM-RNN was then applied as a prediction module.

Conclusions
We inferred the following from our experiment: 1. Selecting an RNN for the time-series data characteristics of AIS big data allows training regarding the general rules of ship maneuvering and motion characteristics. 2. Adding the LSTM unit improves the gradient loss caused by infinite-sequence data in the loop training. An RNN can remember the common features of the AIS big data and forget personality differences. Thus, the RNN has an autonomous choice to remember or forget.  It can be seen from Figure 16 that although each conversion of the new ship data image will cause the rebound phenomenon to occur, it can quickly converge and stabilize in a short time. After about 15 batches of training, the predictive performance of the BI-LSTM-RNN appeared to be stable. The reliability of the navigation behavior prediction was 10 m or less within the accuracy of GPS positioning. The BI-LSTM-RNN was then applied as a prediction module.

Conclusions
We inferred the following from our experiment: 1. Selecting an RNN for the time-series data characteristics of AIS big data allows training regarding the general rules of ship maneuvering and motion characteristics. 2. Adding the LSTM unit improves the gradient loss caused by infinite-sequence data in the loop training. An RNN can remember the common features of the AIS big data and forget personality differences. Thus, the RNN has an autonomous choice to remember or forget.

Conclusions
We inferred the following from our experiment:

1.
Selecting an RNN for the time-series data characteristics of AIS big data allows training regarding the general rules of ship maneuvering and motion characteristics.

2.
Adding the LSTM unit improves the gradient loss caused by infinite-sequence data in the loop training. An RNN can remember the common features of the AIS big data and forget personality differences. Thus, the RNN has an autonomous choice to remember or forget.

3.
By incorporating a two-way RNN structure, the network can learn the information provided by historical data and optimize the network by using future data. The current prediction can establish a strong correlation related to the context. 4.
The trained BI-LSTM-RNN can accurately predict future ship navigation behavior and adjust parameters in real time with existing data as input.