Research on Accurate Prediction of the Container Ship Resistance by RBFNN and Other Machine Learning Algorithms

: Resistance is one of the important performance indicators of ships. In this paper, a prediction method based on the Radial Basis Function neural network (RBFNN) is proposed to predict the resistance of a 13500 transmission extension unit (13500TEU) container ship at different drafts. The predicted draft state in the known range is called interpolation prediction; otherwise, it is extrapolation prediction. First, ship features are extracted to make the resistance R t prediction. The resistance prediction results show that the performance of the RBFNN is signiﬁcantly better than the other four machine learning models, backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). Then, the ship data is processed in a dimensionless manner, and the models mentioned above are used to predict the total resistance coefﬁcient C t of the container ship. The prediction results show that the RBFNN prediction model still performs well. Good results can be obtained by RBFNN in interpolation prediction, even when using part of dimensionless features. Finally, the accuracy of the prediction method based on RBFNN is greatly improved compared with the modiﬁed admiralty coefﬁcient.


Introduction
Ship resistance prediction has always been a hot area of ship research. Researchers usually use approximate methods such as model series data, empirical formula, and parent ship estimation method to predict ship resistance. Many scholars have modified the approximate methods for different ship types and working conditions [1][2][3]. With the development of computer technology, computational fluid dynamics (CFD) technology has been more and more used in ship performance calculation [4][5][6][7][8]. Compared with the traditional resistance prediction methods, CFD technology has higher accuracy and is widely used. Both of these prediction methods have shortcomings. The prediction accuracy of approximate methods needs to be improved, and the CFD technology needs more time and requires high computational resources.
Artificial intelligence (AI) algorithms such as machine learning and deep learning are making marks in the areas of image recognition and speech synthesis. The work of AI algorithms relies on large-scale sample data. These algorithms are also increasingly used in shipping [9][10][11][12] and fluid mechanics [13][14][15][16]. The current application of these algorithms in ship resistance prediction remains at simply using maps for prediction. There is a long way 2 of 17 to go in the research of using these AI algorithms to predict the ship resistance, especially for the resistance prediction of the same ship.
The research aims to establish a prediction model of the model ship resistance at different draft states by using the ship's features and resistance data. Ship resistance is affected by many factors, such as the waterline's length and hull shape factors, etc. When the draft changes, these factors change accordingly. It is different from predicting the resistance changing with speed at a given draft, which can be obtained through spline interpolation. The ship resistance changes with draft are more complicated. The ship resistance needed to be predicted has two draft states. One state is when the draft of the ship is within the known draft range, which is called the interpolation prediction; the other is that the draft of the ship is outside the known draft range, which is called the extrapolation prediction. The Radial Basis Function neural network (RBFNN) is used to establish a resistance prediction model for a 13500 transmission extension unit (13500TEU) container ship by comparing it with the other four machine learning models, a three-layer backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). The predicted resistance is verified by the towing tank test.
The work of this paper is presented in the following sections. The second section introduces the ship features and the resistance of the 13500TEU model ship, as well as the data dimensionless process. The third section briefly defines different machine learning algorithms for resistance prediction (RBFNN, BPNN, SVM, RF, XGBoost). The fourth section presents the processes of data set division, sample features selection, model parameters selection, and the evaluation criteria of prediction models. The fifth section provides the predicted results and the comparison of different prediction models, as well as the comparison of the prediction accuracy between the RBFNN and the modified admiralty coefficient. The sixth section provides the conclusions.

Ship Features
In this paper, a 13500TEU container ship is selected as the research object. The scale ratio of the model ship is 1:55. The body plan of the container ship is shown in Figure 1. The principal dimensions of the real ship and model ship are shown in Table 1.
Sci. Eng. 2021, 9, x FOR PEER REVIEW     The features of the 13500TEU container ship at different drafts can be obtained from the ship's loading manual. The features of the model ship can be obtained according to the scale ratio, as shown in Table 2. The dimensionless features of the model ship are T m /B, L wl /∇ 1/3 , C b , C p , C m , C w , L cb /L wl , L c f /L wl , shown in Table 3.

Ship Model Resistance
The resistance test of the model ship has been conducted at the towing tank laboratory of Huazhong University of Science and Technology (HUST), China. The model ship is shown in Figure 2. The principle and detailed process of the model ship towing test is explained in the paper by Sun et al., 2016 [6]. The resistance of the model ship at different velocities and drafts are shown in Table 4. The model ship resistance curves with velocities at different drafts are shown in Figure 3.      The model ship resistance Rt is replaced by the resistance coefficient Ct and the ship velocity is replaced by the Froude number Fn in the process of dimensionle definition formulas of these two features, Ct and Fn, are shown as follows.  The model ship resistance R t is replaced by the resistance coefficient C t and the model ship velocity is replaced by the Froude number Fn in the process of dimensionless. The definition formulas of these two features, C t and Fn, are shown as follows.
where, ρ represents the density of water, the value of ρ is 1000 kg/m 3 in this paper. g is the acceleration of gravity.
The resistance coefficient C t of the model ship at different Froude numbers Fn and draft/breath T m /B are shown in Table 5.

RBFNN
RBFNN can be used to solve the problem of the multi-dimensional fitting. The basis function, as the hidden layers of RBFNN, is a set of functions. The basis function performs nonlinear mapping on the input vector, and the output value is obtained by superimposing the linear mapping value.
In this paper, the Gaussian function [17] is selected as the basis function. The standard Gauss function can be expressed by the following formula.
where, r = X − c represents the Euclidean distance between the input vector X and the center of the basis function c. σ is standard deviation in Gaussian function, and it reflects the decay rate of the basis function value various with the distance from the center point. Set c = 0, the change of Gaussian function curve with standard deviation σ is shown in Figure 4.  The performance of RBFNN is affected by the number of center points, the position of center points and the width of basis function which, in this paper, is the standard deviation of Gaussian function  . In this paper, the evolution strategy is used to determine the number of center points and the standard deviation, and the K-mean clustering algorithm [18] is used to determine the location of the center points.
RBFNN has been widely used in the field of ships, such as ship profile description [19] and ship seakeeping forecast [20].  The performance of RBFNN is affected by the number of center points, the position of center points and the width of basis function which, in this paper, is the standard deviation of Gaussian function σ. In this paper, the evolution strategy is used to determine the number of center points and the standard deviation, and the K-mean clustering algorithm [18] is used to determine the location of the center points.
RBFNN has been widely used in the field of ships, such as ship profile description [19] and ship seakeeping forecast [20]. The prediction of RBFNN is the cumulative result of the basis functions of different positions. The selection of the basis function affects the result of RBFNN. The Gaussian basis function can simulate multiple functions well by changing the parameters. The change of ship resistance with a single parameter, such as speed, can also be well expressed by polynomials. This is why RBFNN was chosen as the key research algorithm.

Related Machine Learning Algorithms
The BPNN [21] is one of the most widely used artificial neural network models. A typical BPNN comprises three layers: an input layer, a hidden layer, and an output layer. A different number of hidden layer nodes will affect the prediction accuracy of the BPNN. Usually, the empirical formula is used to determine the number of hidden layer neurons. The expression of the empirical formula is as follows.
where, n 1 , n 2 represents the number of the input layer and output layer nodes, respectively; a is an integer between 1 and 100. The empirical formula gives the minimum number of hidden layer nodes, √ n 1 + n 3 + 1. In this paper, in order to reduce the influence of the number of hidden layer nodes on the prediction accuracy, three BPNN models with the hidden layer nodes' numbers of 4, 8, and 12 are used to predict resistance.
SVMs are theoretically well-justified machine learning techniques [22] with their root in structural risk minimization [23], which have also been successfully applied to many real-world domains. SVM realizes nonlinear mapping from low-dimensional space to high-dimensional space through kernel function. The most widely used kernel functions are polynomial kernel functions, radial basis functions, etc. The radial basis kernel function is adopted in this study.
RF [24] and XGBoost are ensemble learning algorithms based on decision tree. Traditional decision trees divide data according to the contribution of attributes. The best feature of the current node is used as the partition attribute. The classification/regression analysis is realized through the continuous division of data. However, the prediction results of a single decision tree usually fail to reach the expected accuracy. The accuracy of prediction can be improved by the algorithm, which contains multiple decision trees RF randomly divides attributes and sample data into several subsets. Each subset is trained by decision tree. The performance of RF is obtained by analyzing the results of all subsets.
XGBoost contains multiple decision trees, and the next decision tree is used to make up for the fitting error of the previous decision tree. The prediction result of the XGBoost model can be expressed by the following formula.
where,ŷ i is the predicted value of XGBoost. K is the number of decision trees. f k represents the kth decision tree. BPNN has been used in ship research [25]. BPNN has a strong nonlinear fitting ability. It has been proved in theory that a three-layer BPNN can approximate nonlinear functions of arbitrary precision [26,27]. Several studies [28,29] have shown that the BPNN has an advantage in the prediction task over linear regression and SVR. The SVR algorithm used in this paper has the same kernel function, the Guassian function, with the RBFNN. At the same time, the principles of the two algorithms are different. RF and XGBoost are relatively novel algorithms and have outstanding performance in machine learning competitions. This paper studies the performance of these four algorithms in ship resistance prediction as a comparison.

Data Set Division
In this study, the sample ratio of the training set, the validation set, and the test set is 3:1:1. The data is randomly divided according to different drafts of the model ship for the study purpose. The resistance prediction of the model ship at different drafts is divided into two types: interpolation prediction and extrapolation prediction. The test set draft of the interpolation prediction is within the training set draft range, and the test set draft of the extrapolation prediction is outside the training set draft range. In the interpolation prediction, the model ship data at draft = 0.209 m and draft = 0.275 m are taken as the test set. The distribution of the test set is shown in Figure 5.
At the same time, the principles of the two algorithms are different. RF and XGBoost relatively novel algorithms and have outstanding performance in machine learning c petitions. This paper studies the performance of these four algorithms in ship resista prediction as a comparison.

Data Set Division
In this study, the sample ratio of the training set, the validation set, and the test s 3:1:1. The data is randomly divided according to different drafts of the model ship for study purpose. The resistance prediction of the model ship at different drafts is divi into two types: interpolation prediction and extrapolation prediction. The test set dra the interpolation prediction is within the training set draft range, and the test set dra the extrapolation prediction is outside the training set draft range. In the interpola prediction, the model ship data at draft = 0.209 m and draft = 0.275 m are taken as the set. The distribution of the test set is shown in Figure 5.  In the extrapolation prediction, the model ship data at draft = 0.136 m and draft = 0.291 m are taken as the test set. The distribution of the test set is shown in Figure 6.
J. Mar. Sci. Eng. 2021, 9, x FOR PEER REVIEW 8 In the extrapolation prediction, the model ship data at draft = 0.136 m and draft 0.291 m are taken as the test set. The distribution of the test set is shown in Figure 6. The remaining sample data outside the test set is distributed into four subsets us the stratified sampling method in the process of searching for the optimal predic model. Each group of subsets is considered as the validation set once, and at that time, other three subsets are regarded as the training set. The ratio of the training set sam to the validation set samples is 3:1.

Ship Features and Predicted Values
In this research, there are three schemes of ship features and predicted value shown in Table 6. The remaining sample data outside the test set is distributed into four subsets using the stratified sampling method in the process of searching for the optimal prediction model. Each group of subsets is considered as the validation set once, and at that time, the other three subsets are regarded as the training set. The ratio of the training set samples to the validation set samples is 3:1.

Ship Features and Predicted Values
In this research, there are three schemes of ship features and predicted value, as shown in Table 6. Table 6. Ship features and predicted value schemes. In scheme 2, the model ship features and resistance R t are dimensionless processed. The model ship resistance coefficient C t is predicted by prediction models, BPNN, RBFNN, SVM, RF, and XGBoost using these dimensionless features.
The features in scheme 3 are parts of the ship features. Among the three model ship form features, C b , C p , C m , only two of them are independent variables. So, C b and C p are selected as ship features for prediction. The shape features, L cb /L wl and L c f /L wl , are abandoned in scheme 3. The RBFNN is also used to predict the model ship's resistance coefficient. For the convenience of expression, the three feature schemes are marked as F1, F2, F3 in the following text, respectively.

Model Parameters Selection
To ensure the best performance of each machine learning prediction models, the parameters of the prediction model need to be tuned manually or automatically. In this paper, the evolutionary strategies are used to find the optimal parameters of each machine learning model. In the process of parameter tuning, initially, a certain number of parameter sets are randomly generated within the search range as the parent set. The subset is generated through the cross mutation of the parent set. Calculate the fitness of the parent set and the subset, and select part of parameter sets with better fitness to be the parent set of the next generation. The evolution is stopped until the number of the iteration meets the requirements, and the subset with the best fitness is selected as the final parameters of the prediction model. The parameters properties and the searching range of the five models are shown in Table 7.
In order to reduce the influence of the evolution strategies parameters on the selection of model parameters, the mutation strength boundary of all the model parameters is set to 2/3 of the searching range, and the number of iterations is set to 200.

Evaluation Metric of Prediction Models
In this paper, the Maximum of Relative Absolute Error (MRAE) is used to evaluate the accuracy of prediction models. The formula of Relative Absolute Error (RAE) and MRAE are shown as follows.
where, Value E is the experimental value, Value P is the predicted value, RAE i represents the RAE of the ith sample.
In the training process, the model parameters that minimize the MRAE of the training set and the validation set are taken as the optimal parameters of the prediction model. The prediction model with the optimal parameters is used to predict the resistance of the test set. The complete training process is shown in Figure 7. where, ValueE is the experimental value, ValueP is the predicted value, RAEi represents the RAE of the ith sample.
In the training process, the model parameters that minimize the MRAE of the training set and the validation set are taken as the optimal parameters of the prediction model. The prediction model with the optimal parameters is used to predict the resistance of the test set. The complete training process is shown in Figure 7.

Comparison of the Predicted Results
The number of hidden layers will affect the performance of the BPNN. This paper studies the prediction results of BPNN with three different hidden layer numbers of 4, 8,

Comparison of the Predicted Results
The number of hidden layers will affect the performance of the BPNN. This paper studies the prediction results of BPNN with three different hidden layer numbers of 4, 8, and 12. The interpolation and extrapolation prediction results of the model ship resistance R t and model ship resistance coefficient C t using BPNN are shown in Tables 8 and 9, respectively. It is not difficult to find that the prediction results of these three BPNN models are not much different. The average value of the three BPNN models prediction results is taken as the performance of the BPNN and compare it with other prediction models. The MRAE of the training set, the validation set, and the test set of different ship resistance prediction models are shown in Table 10.
In F1, the sample data is the actual model ship features and model ship resistance. In F2, the sample data was processed in a dimensionless manner. The sample data in F3 is partially dimensionless features. The interpolation and extrapolation prediction results comparison of different prediction models are shown in Figures 8 and 9. The following conclusions can be drawn from the prediction results.

•
The extrapolation prediction accuracy of the model ship resistance is much lower than that of the interpolation prediction.

•
For the interpolation prediction, the dimensionless processing of data can improve the resistance prediction accuracy of all models. • Among all the interpolation schemes, using dimensionless ship features to estimate ship resistance coefficient C t based on RBFNN has the highest accuracy. The MRAE of the RBFNN prediction model is 1.731%. All interpolation prediction schemes using RBFNN have good performance. The results show that RBFNN is suitable for predicting the model ship resistance with an interpolated draft. The accuracy of using the RF model to predict ship resistance coefficient is also good, and its MRAE is 2.546%. They were followed by other prediction models.       The optimal predicted resistance coefficient Ct is converted to the corresponding r sistance Rt. The experimental values and predicted values of each sample point in the te set are shown in Table 11. The curves of Ct versus Fn and the curves of Rt versus Vm draft = 0.209 m and draft = 0.275 m are shown in Figures 10 and 11, respectively.


Among all the extrapolation schemes, using dimensionless ship features to estima ship resistance coefficient Ct based on SVM has the highest accuracy. The MRAE the model prediction is 6.072%. The MRAE of the training set, validation set, and te set of this model is not much different, all greater than 6%. The performance of t RBFNN model that uses ship features to predict resistance Rt in the test set is sligh inferior to the SVM model with the MRAE of 7.718%, and better than other predicti models. The optimal predicted resistance coefficient C t is converted to the corresponding resistance R t . The experimental values and predicted values of each sample point in the test set are shown in Table 11. The curves of C t versus Fn and the curves of R t versus V m at draft = 0.209 m and draft = 0.275 m are shown in Figures 10 and 11, respectively.

•
Among all the extrapolation schemes, using dimensionless ship features to estimate ship resistance coefficient C t based on SVM has the highest accuracy. The MRAE of the model prediction is 6.072%. The MRAE of the training set, validation set, and test set of this model is not much different, all greater than 6%. The performance of the RBFNN model that uses ship features to predict resistance R t in the test set is slightly inferior to the SVM model with the MRAE of 7.718%, and better than other prediction models.        For the extrapolation prediction, the resistance coefficient C t obtained by the SVM prediction model is converted to the corresponding ship resistance R t . The performances of the SVM model and the RBFNN model are shown in Figures 12 and 13    In the actual situation, it is not initially known that the predicted draft belongs to a particular type of extrapolation or interpolation. Combining the resistance prediction results under the draft states of interpolation and extrapolation, the RBFNN prediction model based on F1 is undoubtedly the best one among all the models mentioned in this article.

Comparison of the RBFNN and the Modified Admiralty Coefficient
The accurate prediction of ship power/resistance has always been a hot issue for designers. The most commonly used method is the admiralty coefficient, which is expressed as: ships with similar hull form, scale and speed have the same admiralty coefficient. The relationship between ship resistance, effective power and the admiralty coefficient is shown in the following formula. In the actual situation, it is not initially known that the predicted draft belongs to a particular type of extrapolation or interpolation. Combining the resistance prediction results under the draft states of interpolation and extrapolation, the RBFNN prediction model based on F1 is undoubtedly the best one among all the models mentioned in this article.

Comparison of the RBFNN and the Modified Admiralty Coefficient
The accurate prediction of ship power/resistance has always been a hot issue for designers. The most commonly used method is the admiralty coefficient, which is expressed as: ships with similar hull form, scale and speed have the same admiralty coefficient. The relationship between ship resistance, effective power and the admiralty coefficient is shown in the following formula.
where, P e is the effective power. R is the ship resistance. v is the ship velocity. ∆ is the displacement. C is the admiralty coefficient. The admiralty coefficient method, although a useful and simple estimation method, is a somewhat 'blunt instrument' when used in power estimation since it fails to effectively distinguish between the power and hull-related parameters. The effectiveness of the admiralty coefficient formula should be evaluated in the engineering.
Many researchers [30][31][32] have undertaken a significant amount of work on the estimation method of the power curve. In the paper by Tu in 2018, the fixed power exponent of admiralty coefficient is replaced by 2/3 × (C b + C w ), and the power curves of the container ship is estimated by the modified admiralty formula.
The power of the model ship at 0.191 m draft is taken as P s1 ; the power of the model ship at 0.209 m draft P s2 can be estimated by the modified formula. The parameters C b and C w belong to the model ship at 0.191 m draft. Similarly, using the data of 0.260 m draft, the power of the model ship at 0.275 m draft can be obtained. At the draft = 0.209 m and draft = 0.275 m, the ship power is estimated by the modified admiralty formula, and the resistance of the model ship is estimated by RBFNN prediction model. The absolute value of the MRAE comparison of the two methods is shown in Table 12. Compared with the previous modified admiralty formula, the prediction accuracy of the RBFNN prediction model has been greatly improved.

Conclusions and Discussion
This paper uses five machine learning methods-BPNN, RBFNN, SVM, RF, and XGBoost-to predict the model ship resistance of a 13500TEU container ship at different drafts using actual ship data and dimensionless ship data. There are two states of interpolation and extrapolation for the resistance prediction. The optimal parameters of each model are obtained through evolutionary strategies. Compared with other prediction models, the following conclusions can be drawn:

•
The model ship resistance can be predicted more accurately by RBFNN for the interpolation prediction. The prediction accuracy of the RBFNN has been further improved using by dimensionless ship features and the total resistance coefficient. Good prediction results can be obtained by using a part of dimensionless features.

•
For the extrapolation prediction, the prediction results of R t using RBFNN have a high accuracy (7.718%), which is second only to the prediction accuracy (6.072%) of C t using SVM. The performance of C t prediction using dimensionless features is poor.

•
For the resistance prediction of container ships, the RBFNN using actual ship characteristics is a better choice.

•
Compared with the modified admiralty coefficient, the prediction accuracy of the RBFNN method is significantly improved, and has more value in terms of engineering applications.
For a prediction problem, there is a suitable prediction method. Reasonable data processing methods can improve prediction accuracy. If there are a lot of data for different container ships, the prediction ideas mentioned in the paper can be used to build a ship resistance prediction model to predict the resistance of other container ships.  Froude numbe R t Total resistance of model ship (kgf) C t Coefficient of total resistance of model ship