1. Introduction
The interaction and movement between ships and waves are of utmost importance for ship handling and safety at sea. Understanding how waves influence a ship’s behavior, stability, and navigational challenges is vital to ensuring the safety of vessels, crew, cargo, and the environment. This helps optimize ship handling, cargo stability, emergency response, and vessel design while minimizing environmental impact. In the maritime industry, accurate predictions and models of ship–wave interactions are essential for enhancing safety, efficiency, and overall operational excellence.
With the increasing demand for ship motion prediction, researchers are improving traditional prediction methods. Triantafyllou and Bodson (1983) [
1] applied Kalman filtering techniques to estimate the complex ship’s motions. However, a significant limitation arose from nonminimum phase characteristics due to the spatial integration of water wave forces, resulting in an irrational and nonminimum phase transfer matrix function. Sutulo S (2002) [
2] studied ship maneuvering simulations, focusing on two essential methods: dynamic and kinematic prediction models, emphasizing improved kinematic prediction techniques. Rigatos (2013) [
3] explored dynamic ship positioning using sensor fusion techniques based on Kalman and Particle Filtering algorithms, while Perera (2017) [
4] used an extended Kalman filter and vector-based algorithms for short-term ship maneuver prediction. Fossen (2018) [
5] introduced an exogenous Kalman filtering (XKF)-based ship motion prediction method, leveraging real-time Automatic Identification System (AIS) data for visualization and motion prediction. Jiang H et al. (2020) [
6] studied the scale effects of autoregressive (AR) models in real-time ship motion prediction, enhancing prediction accuracy and providing valuable guidance for real-time prediction. Luo W et al. (2020) [
7] proposed a vector analysis-based ship motion and trajectory prediction method that analyzes ship movement vectors and velocities, constructing a concise and efficient prediction model capable of accurately predicting ship motion trajectories in complex marine environments. These methods primarily rely on prior knowledge and existing data for ship trajectory prediction. However, due to the complex and variable nature of the marine environment, the prediction performance of these methods has certain limitations.
Researchers have also continued to explore the detection and prediction of ship motion states, gradually shifting from traditional mathematical model-based prediction methods to machine learning and deep learning-based methods, which have increased the reliability and performance of their predictions. Shen Y (2005) [
8] explored the application of diagonal recurrent neural networks (DRNN) for predicting complex movements in large-scale ships affected by nonlinear and random factors, demonstrating improved prediction accuracy and stability. Khan et al. (2007) [
9] explored using artificial neural networks, trained with singular value decomposition and conjugate gradient algorithms, to predict ship motion accurately. Ge et al. (2017) [
10] presented a BP neural network-based motion attitude prediction method to enhance prediction accuracy and speed for ship motion compensation systems, addressing the challenge of obtaining timely specific motion features. Chen X et al. (2020) [
11] introduced a convolutional neural network (CNN) based method for ship movement classification, providing an effective tool for ship trajectory prediction and monitoring. Zhang T et al. (2021) [
12] introduced a multiscale attention-based LSTM model for ship motion prediction, utilizing attention mechanisms to capture information at different time scales, resulting in superior predictive performance. Bassam et al. (2022) [
13] explored machine learning-based ship speed prediction, and various machine learning methods were compared, such as random forests and support vector machines, and demonstrated the effectiveness of these methods in predicting a ship’s speed to support efficient shipping operations. As technology advances, these methods will be further refined and optimized to provide better support for safe ship navigation. Kong Z et al. (2022) [
14] developed a context-enhanced trajectory-based ship target recognition method that leverages the contextual information of ship trajectory data to improve target recognition accuracy. Abebe et al. (2022) [
15] proposed a hybrid ARIMA–LSTM model for ship trajectory planning, offering more accurate ship trajectory prediction and collision avoidance planning.
In recent years, more and more scholars have used machine learning and deep learning to predict ship motion, among which the LSTM neural network has received more attention because of its excellent ability to process time series data. However, the features of the wave conditions are usually not taken into account when establishing the database and training the neural network. It has been noticed that the course stability and motion of the ship navigation are significantly influenced by wave conditions [
16]. Hence, the wave features and ship motion characteristics are combined in this paper to establish the database for the neural network training, which aims to improve the accuracy of the course prediction for the ship sailing in various wave conditions.
In this paper, based on the navigation data of a ship sailing in different wave conditions at various initial forward speeds that have been published in our previous work [
17], the navigation data of the autopilot ship in different sea conditions are classified by the k-means clustering method firstly, and the prediction accuracy of the neural network model trained by the classified data was analyzed. Secondly, the wave and ship motion are used as the input features of the database for training, and the effect of the different input features on prediction accuracy is studied. Finally, navigation data of an autopilot ship in a series of wave conditions are learned by the neural network at the same time, and the course of the ship in a new wave condition is predicted by the improved policy for neural network training. In this paper, the LSTM neural network model is used to predict the heading change. Through the comparison and discussion of various prediction policy, the influence of input feature combination, the number of datasets, and the classification of datasets on the accuracy of the model is discussed. Compared with the traditional LSTM neural network model, the optimized policy in this paper can better ensure the prediction accuracy and reliability of the results. The LSTM-based course prediction process of an autopilot ship in waves is shown in
Figure 1.
5. Prediction Based on the LSTM
5.1. Selection Process for Input Length
This section aims to find the optimal time series input length k to predict the data at timestep k + 1. Five input lengths (k = 250, 500, 750, 1000, and 1500) have been chosen to compare prediction accuracy. Because of the accumulation of errors during the prediction process, the main error between the predicted data and the original data mainly appears after t = 30 s. Hence, the predicted data and the original data between 30 s and 40 s are used for comparison in this paper.
The prediction results by different input data lengths are shown in
Figure 4.
predicted by LSTM of different input lengths, and the performance evaluation has been listed in
Table 3. It can be seen that the accuracy of the neural network has increased with the growth of input length, especially between 250 and 1000. However, the accuracy of the neural network model starts to decrease when the length exceeds 1000, and it probably means the existence of an optimal input length corresponding to the best prediction accuracy. The reason is that the increase in input lengths of data could lead to better training results, but it will also produce more error accumulation with the increase in the input length
k used for training. Hence,
k = 1000 is used as the input length for training.
5.2. Prediction Based on the Data after k-means Clustering
In this paper, a K-means clustering model based on the values is used to cluster the navigation data of the autopilot ship in different sea conditions. By continuously optimizing the parameters of the clustering model, the classification performance of the clustering model meets the expectation. At this time, the K-means model is configured with 3 clusters, 123 random seeds, and the initial center point selection is iterated 20 times. After completing the classification, the model assigns a clustering label to each file, effectively dividing them into groups based on their similarity of the tendency.
The classification results are illustrated in
Figure 5. After clustering, 34 datasets were divided into three categories, named Group1, Group2, and Group3. Three groups of random data are taken from Group1 (Data1, Data2, Data3), Group2 (Data4, Data5, Data6), and Group3 (Data7, Data8, Data9). Comparing
Figure 6a,
Figure 6b and
Figure 6c, it can be seen that each combination will have significant differences in amplitude, period, and overall trend convergence. Three groups of navigation data in Group1 were selected for training and prediction, and the values of their initial features are shown in
Table 4.
In this section, Data1 and Data2 from Group1 were selected as the training sets. The model was trained separately on each dataset to capture their unique characteristics. Then, Data3 is predicted, respectively, and the comparison between the actual trend and the predicted trend is shown in
Figure 7a,b. To further compare the model’s performance, Data4 in Group2 and Data7 in Group3 are selected as training sets, respectively, and the comparison between the actual trend and the predicted trend is shown in
Figure 7c,d. The comparative analysis of evaluation metrics is presented in
Table 5.
It can be observed from
Figure 7 that the prediction performance of
Figure 7a,b is better than that of
Figure 7c,d. Further combining the differences of evaluation metrics in
Table 5, the Data2 dataset achieves better performance when training the model. This model trained under Data2 has lower MSE, RMSE, and MAE. This means that the average difference between the predictions on Data2 and the original data is small and has better accuracy. In contrast, the Data7 dataset produced the worst results in model training. The model has a larger MSE, which indicates that the prediction error of the model on Data7 is large, and there is a significant difference between the prediction and the original data. The performance of the models trained under Data1 and Data4 datasets is between Data2 and Data7.
Some reasons can be found in
Figure 7d for the extremely poor prediction performance on Data7. Based on the data trends, the gap between Group3 and Group1 is far more significant than the gap between Group2 and Group1. Considering the grouping situation of cluster analysis and the trend comparison of each group in
Figure 7, the training performance of Data4 is better than that of Data7. Hence, it can be preliminarily concluded that cluster analysis can improve the accuracy of the LSTM neural network model to a certain extent. In order to further verify this point of view, this paper adds two sets of prediction results. Training the model on Data5, and then making predictions on Data6. The same is performed for Data8 and Data9.
Figure 8a,b shows the two prediction results.
From
Figure 8a, it can be found that prediction accuracy meets expectations. The further analysis combined with
Table 6 shows that the prediction of Data6 achieves an ideal prediction performance, while for the prediction of Data9, although the MSE is not ideal, the values of RMSE and MAE have reached a relatively small level. To sum up, after training between datasets with large trend gaps, the prediction results are often unsatisfactory, but within a specific range, cluster analysis can optimize this result and improve the accuracy of the LSTM neural network.
5.3. The Effect of Input Features
Applying each feature group to the model aims to identify the features that significantly influence the prediction performance. This process involved comparing the prediction results obtained from each feature group and analyzing their respective impact on the accuracy and reliability of the prediction performance. The neural network model learns various features of navigation data under one sailing condition, saves the trained model, and then predicts the course change under other sailing conditions. The MSE, RMSE, and MAE are calculated for each of the six datasets to evaluate the model’s performance.
The prediction results for full features are shown in
Figure 9. The prediction results for full features are shown in
Figure 9.
Table 7 shows the initial conditions for the selected training and test sets. The prediction results of full features can be used as one of the criteria to judge the prediction performance of each set of feature combinations.
In
Figure 10, it can be seen that after removing the features of trajectory, speed, and torque, the gap between the prediction and the original data is significant. Combining the changes in MSE, RMSE, and MAE values under various conditions is necessary for further analysis. In
Table 8, we can observe that a decrease in MAE, MSE, and RMSE is evident when the torque feature and the force-feature are removed from the database, which means prediction accuracy is improved. Remarkably, we found that removing the influence of the force feature had a significant positive impact on the prediction performance. It also means that the force feature could negatively influence the prediction performance. However, upon removing the motion feature, we observed a slight deterioration in the prediction performance. Both MSE and RMSE increased, accompanied by a rise in MAE, indicating the importance of this feature in achieving accurate predictions. Similarly, excluding the speed feature resulted in a substantial decline in the prediction performance will lead to higher values of MSE, RMSE, and MAE, which means the necessity of this feature on the accuracy of the predictions. It could also be found that removing the trajectory feature had a noticeable negative impact on the prediction performance. The increase in MSE, RMSE, and MAE indicated the significance of this feature in achieving accurate predictions. Conversely, removing the influence of the
feature led to a slight improvement in the model’s predictive performance. The decrease in MSE, RMSE, and MAE implied a potential positive effect of removing this feature.
Our analysis revealed that removing the force features yielded the most substantial improvement in the model’s predictive performance. Conversely, the exclusion of the speed-features and trajectory-features resulted in a significant decrease in accuracy. Considering these findings, decreasing the influence of the force-features is recommended to achieve optimal prediction results. For each type of feature, there are multiple dimensions of data. The L2 norm of each type of feature can be taken to compress the latitude and reduce the effect of error accumulation.
5.4. Comparisons of Multi-Task Learning
In this section, multiple datasets are introduced simultaneously for training, focusing on the result of K-means clustering. The aim is to discover if there is an optimal number of datasets to achieve the best prediction performance. The model will be trained with one to six data groups, respectively, and the final prediction results will be compared. The six groups of data were named {TrainingD1, TrainingD2... TrainingD6}, their initial features are shown in
Table 9. And the number of datasets for training was successively increased. Notably, this paper incorporates initial features as independent features in the training data. By including initial features, shown in
Table 9, as a separate feature column, the model can directly capture the relationship between these features and the others, leading to improved understanding and prediction of model performance. We can assess their influence by comparing the final prediction results obtained from different numbers of training sets. The prediction results for different numbers of training sets are visually represented in
Figure 11, while specific performance metrics such as mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE) are summarized in
Table 10.
In the analysis of the results, it was observed that when training was conducted using a single dataset, as depicted in
Figure 11a, the predicted outcomes exhibited a substantial margin of error. Nevertheless,
Figure 11b showcases a marked improvement, wherein training with two distinct datasets resulted in predicted trends that approached the reference values (MSE, RMSE, and MAE), thereby achieving comparable predictive efficacy as the earlier employed clustering analysis.
As the experiment progressed from
Figure 11c to
Figure 11f, each iteration entailing an incremental increase in the number of training datasets, a discernible enhancement in the accuracy of predictions was witnessed. The quantification of prediction errors using metrics such as MSE, RMSE, and MAE substantiated this observation, with diminishing values signifying prediction closer to the original data.
More specifically, when training encompassed two datasets, the accuracy of the neural network model has increased significantly. Including three, four, five, and six datasets for training purposes led to further reductions in prediction errors, thus exemplifying a noteworthy augmentation in precision. However, when the number of datasets has increased to four, the performance improvement of the model begins to flatten out, and the benefits brought by increasing the number of datasets gradually decrease.
In summary, multi-task learning can significantly improve the prediction performance of the model, but multi-task also brings more computational effort. It is necessary to maintain a balance between increasing the dataset to improve prediction accuracy and increasing the amount of computation. Therefore, in practical applications, choosing the optimal number of datasets can enable the model to maximize its performance.
5.5. Optimization Method Prediction
After conducting a comprehensive analysis of various combinations of input features, it has been observed that the impact of force features is negative. For the force feature, there are three dimensions of data. The L2 norm can be taken to compress the latitude and reduce the effect of error accumulation. Furthermore, by adopting a multi-task learning methodology, multiple datasets are analyzed in accordance with established patterns, with particular attention given to the initial wave and ship features. Six datasets from Group2 in
Section 5.2 are selected for demonstration. Five groups are employed for training purposes to ensure robustness, where one group serves as the test set, as indicated in
Table 11, which includes the compressed force-features data, thereby minimizing error accumulation. The predicted results are presented in
Figure 12.
The comparison in
Table 12 reveals that the predicted performance has smaller MSE, RMSE, and MAE values than the previous predictions.
Figure 12 illustrates that the overall trend aligns closely with the actual data, demonstrating a high fitting level until around 15 s, with some deviation occurring afterward. Several speculations can be made to explain this observation. Firstly, additional unknown data scenarios might not be adequately learned during the training process, leading to decreased prediction accuracy. Secondly, noise or outliers within the dataset can adversely impact the prediction results. Furthermore, the complexity and feature settings of the model itself may contribute to the gradual increase in deviation during later predictions. Nonetheless, through continuous optimization and meticulous selection, the current experiment achieved a relatively accurate level of trimaran sailing forecast within the initial 15 s.