Data-Driven Prediction of Vessel Propulsion Power Using Support Vector Regression with Onboard Measurement and Ocean Data

The fluctuation of the oil price and the growing requirement to reduce greenhouse gas emissions have forced ship builders and shipping companies to improve the energy efficiency of the vessels. The accurate prediction of the required propulsion power at various operating condition is essential to evaluate the energy-saving potential of a vessel. Currently, a new ship is expected to use the ISO15016 method in estimating added resistance induced by external environmental factors in power prediction. However, since ISO15016 usually assumes static water conditions, it may result in low accuracy when it is applied to various operating conditions. Moreover, it is time consuming to apply the ISO15016 method because it is computationally expensive and requires many input data. To overcome this limitation, we propose a data-driven approach to predict the propulsion power of a vessel. In this study, support vector regression (SVR) is used to learn from big data obtained from onboard measurement and the National Oceanic and Atmospheric Administration (NOAA) database. As a result, we show that our data-driven approach shows superior performance compared to the ISO15016 method if the big data of the solid line are secured.


Introduction
The fluctuation of the oil price and unstable shipping rates have enforced ship builders and shipping companies to improve the energy efficiency of vessels [1]. Since the fuel cost is the largest portion of the operating cost of a vessel, improving energy efficiency can result in huge savings in the total operating cost [2]. Achieving good energy efficiency is also a prerequisite to cope with demanding environmental regulations because more energy-efficient vessels can reduce fuel consumption and greenhouse gas emissions [3]. Therefore, several technological solutions have been proposed to improve the energy efficiency of vessels [4]. Design optimization technology such as 'hull form optimization' [5] or 'propeller configuration' [6] try to design vessels to improve the fuel efficiency. On the other hand, operational optimization technologies aim to improve the operational performance by finding optimal speed and optimized voyage routes of vessels [7,8]. For a more comprehensive list of energy saving technologies, the reader is referred to Tilling et al. [4].
To measure the effectiveness of energy-saving technology, it is necessary to measure the speed/power performance of the vessel [9]. Today, many ship owners leave the responsibility of the delivery trials including speed-power trials with the shipyard [10]. However, the speed/power relationship obtained from a testing environment cannot be generalized to realistic operating conditions Sensors 2020, 20, 1588 2 of 12 and the propulsion performance of the vessel may be affected by several external factors such as wind, tide, wave or hull fouling [11]. Thus, in a dynamic environment, it is difficult to determine exactly whether the increase in energy efficiency is due to newly adopted energy-saving solutions or other factors.
As an internationally recognized standard, ISO15016 is proposed to measure the speed/power relationship from the delivery trials [12]. ISO15016 also introduces several calculation procedures to adjust the impact of external factors on the propulsion performance. Unfortunately, those procedures are computationally expensive because many input parameters are required and their related equations are complex [13]. As an alternative measure, the Energy Efficiency Operation Index (EEOI) [14] was proposed to measure the energy efficiency of the vessel. However, EEOI is just an aggregated index that is obtained by the total amount of fuel usage and cannot be used to predict the relationship between the propulsion performance and other related factors.
To overcome this limitation, this study proposes a data-driven approach to predict the propulsion power by learning data obtained from sensors installed on board the vessel. In this study, support vector regression (SVR) which shows excellent performance in prediction and pattern recognition is used to learn from data. For the illustration, the data obtained from an actual bulk is used to predict its propulsion power at various speeds and dynamic operational conditions. The prediction accuracy was also compared with that of ISO15016. The result shows that if the operational data are obtainable, the proposed model outperforms that of ISO15016.
The remainder of the paper is as follows. Section 2 briefly introduces the traditional methods that have been used to predict vessel propulsion power. Section 3 explains data that are used to train the prediction model. Section 4 explains the model development procedure in detail. Section 5 discusses the prediction result and compare them with other methods. Finally, Section 6 presents the conclusion and future works.

Related Works
Predicting the propulsion power of a vessel at various operating conditions is considered a difficult problem because horsepower can be affected by several external factors such as wind, waves, and hull fouling. Those external factors would increase the vessel resistance, the force working against the vessel movement, requiring more propulsion power to maintain a certain vessel speed. Thus, several methodologies have analyzed the added resistance and speed loss to accurately predict the propulsion power.
As previously noted, the conventional method estimates the added resistance of the vessel, and uses this value in prediction of the required propulsion power [15]. The standard model for measuring the total resistance of the vessel is determined by the following equation: where R f riction , R residual and R wind are resistance induced by the friction of the hull, residual resistance, and wind, respectively. R f riction can be calculated by the following: where R hull is the size of the hull's wetted area, R f ouling is fouling of the hull and R dra f t is specific frictional resistance coefficient. The residual resistance R residual is determined by the following: where R wave is the energy loss caused by waves created by the vessel during its propulsion through the water, while R eddy is eddy resistance which refers to the loss caused by flow separation which creates eddies, particularly at the aft end of the ship. In calm weather, R wind is proportional to the square of the ship's speed, and proportional to the cross-sectional area of the ship above the waterline.
Air resistance normally represents about 2% of the total resistance. Finally, the efficient propulsion power (EHP) of the vessel at a certain speed is determined by the following: where the SPEED is a specific vessel speed and R total is total resistance obtained from the above equation. In ISO15016, several procedures are introduced to correct added resistance [12]. For example, the added resistance due to the effect of short crested irregular wave, R aw can be calculated by the following: where ζ A is wave amplitude, ω is circular frequency of regular wave, α is angle between ship heading and incident regular wave, V S is the vessel speed through the water, and E is directional spectrum. In calculation of R aw , several methods such as the STAWAVE-1, STAWAVE-2, theoretical methods or the Seakeeping model test were proposed in [12]. However, such methods proposed in ISO15016 may not be appropriate to predict the propulsion performance in the actual operating condition because it requires data from the speed trial environment and requires too many input parameters. Even if such big data from operational conditions are available, it would take too much time and cost because the analysis procedure consists of complex equations which are computationally expensive. Alternatively, Holtrop and Mennen's method [16][17][18] is also widely used to estimate the resistance and propulsion power requirement during the vessel design phase. It is an empirical method that utilizes data accumulated from many model tests results. The model coefficient and equation are obtained from statistical analysis and regression applied to the data. Although this method requires less computation and numerical analysis, it only provides rough approximation of the resistance and propulsion power which may result in a less accurate result. Moreover, since data is obtained from the model experiments, prediction result cannot be applied to the actual operational condition.
Another stream of works utilizes computational fluid dynamics (CFD) in the analysis of the vessel resistance and the propulsion performance [19][20][21]. CFD is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. In the vessel design phase, CFD have been utilized to predict the resistance by simulating the fluid performance of the designed hull form. Although a CFD-based model can accurately predict the resistance, the calculation of the fluid performance is computationally expensive, and usually requires several hours to complete. Thus, it is difficult to use CFD to predict the propulsion power at various operating conditions. To overcome the limitation of the above methodologies, this study proposes using machine-learning technique in the prediction of propulsion power using the data obtained from the actual operating environment.

Data Collection
To predict the propulsion power, a 200,000-ton bulk cargo ship was chosen. The general arrangement of our target vessel is shown in Figure 1. Also, the detailed specification is depicted in Table 1. The target ship is dedicated to iron ore transportation and operates only on two regular routes: from South Korea to the US and from South Korea to Australia. Since it operates on a small number of stable routes, the data quality was thought to be reliable. For the propulsion system of the target ship, a fixed-pitch propeller whose pitch angle cannot be changed is adopted.   The detailed description of the dataset is shown in Table 2. Most of the data were collected using sensors installed onboard the vessel. The target feature to predict is shaft horsepower which indicates the amount of mechanical power that is delivered by the engine to the propeller shaft. For the input features, velocity, draft, rotation per minute of shaft (RPM), sea depth, tide, wave height, and wind vectors are chosen. Sensor information of each data feature is also illustrated in the same table.
The collected data were then processed through the Voyage Data Recorder (VDR) system which digitizes, compress and stores the information in an externally mounted protective unit. Under regulations of the IMO (International Maritime Organization), a cargo ship that weighs more than 20,000 tons should be equipped with VDR. Although the primary purpose of the VDR is to assist marine causality investigation, the recorded data also can be used for performance monitoring of the vessel. The picture of the VDR system installed on the vessel is shown in Figure 2.  In processing the raw data obtained from different sensors, many computational steps would be required. For example, the shaft horsepower can be measured with the torque which can be measured with strain. The loading torque T is calculated by the following: where ε is shaft strain, G is shaft material shear modulus (pa) and D is shaft diameter (m). To accurately measure the torque, a strain gauge is attached to the propeller shaft of the vessel. The strain gauge measures the change in electrical resistance of the shaft and converts it into the torque. Several commercial system providers [22,23] as well as software vendors [24] are available for shaft power measurement. For our target ship, the SpecsVision-TPM system [25] is used for measuring and processing the shaft power.
In addition to the shaft power measure, the raw data collected from the different sensors were processed by the data acquisition unit (DAU) of the VDR system. DAU transforms each sensor data according to the industrial standard protocol and synthesizes the data from different sensors into a structured data table wherein each datum is recorded at the same time periods. As a result, the synthesized data are transmitted throughout the satellite-based communication system and stored into the cloud database. As a result, each data observation is generated every 10 seconds, and overall 178,000 observations were collected over seven months (from 2016.01 to 2016.07). The data protocol information used to process the sensor data is also shown in Table 2.
To collect the wave height, which cannot be measured directly on board the vessel, the National Oceanic and Atmospheric Administration (NOAA) database is utilized. NOAA collects spectral wave data using accelerometers on board the buoys which measure the heave acceleration of the buoy. A fast Fourier transform is then applied to obtain the wave height. For more detailed information about wave height measure of the NOAA database, the reader is referred to [26].

Support Vector Regression (SVR)
As a machine learning model to predict the propulsion power of the vessel, SVR is applied. SVR, which was introduced by Drucker et al. [27], is an extension of support vector machine (SVM) that considers a regression problem as well. Like SVM, SVR tries to determine the hyperplane that maximizes the margin while ensuring that the error is tolerated.
Suppose that a training data  In processing the raw data obtained from different sensors, many computational steps would be required. For example, the shaft horsepower can be measured with the torque which can be measured with strain. The loading torque T is calculated by the following: where ε is shaft strain, G is shaft material shear modulus (pa) and D is shaft diameter (m). To accurately measure the torque, a strain gauge is attached to the propeller shaft of the vessel. The strain gauge measures the change in electrical resistance of the shaft and converts it into the torque.
Several commercial system providers [22,23] as well as software vendors [24] are available for shaft power measurement. For our target ship, the SpecsVision-TPM system [25] is used for measuring and processing the shaft power. In addition to the shaft power measure, the raw data collected from the different sensors were processed by the data acquisition unit (DAU) of the VDR system. DAU transforms each sensor data according to the industrial standard protocol and synthesizes the data from different sensors into a structured data table wherein each datum is recorded at the same time periods. As a result, the synthesized data are transmitted throughout the satellite-based communication system and stored into the cloud database. As a result, each data observation is generated every 10 s, and overall 178,000 observations were collected over seven months (from 2016.01 to 2016.07). The data protocol information used to process the sensor data is also shown in Table 2.
To collect the wave height, which cannot be measured directly on board the vessel, the National Oceanic and Atmospheric Administration (NOAA) database is utilized. NOAA collects spectral wave data using accelerometers on board the buoys which measure the heave acceleration of the buoy. A fast Fourier transform is then applied to obtain the wave height. For more detailed information about wave height measure of the NOAA database, the reader is referred to [26].

Support Vector Regression (SVR)
As a machine learning model to predict the propulsion power of the vessel, SVR is applied. SVR, which was introduced by Drucker et al. [27], is an extension of support vector machine (SVM) that considers a regression problem as well. Like SVM, SVR tries to determine the hyperplane that maximizes the margin while ensuring that the error is tolerated.
Suppose that a training data (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ) is given, SVR assumes the relationship between x and y represented by the following: Sensors 2020, 20, 1588 7 of 12 where ϕ(x) denotes a kernel function that transforms the data into a higher dimensional space to make it possible to perform the linear separation, w is weight vector associated with vector x and b is coefficient. To find good estimator, SVR tries to find linear function that is as flat as possible. This can be formulated as a convex optimization problem and formulated as follows: subject to ∀n : y n − f (x n ) ≤ ε. As shown in the above equation, SVR tries to minimize the norm of w while ensuring all residuals having a value less than ε. However, functions f (x) may not be exist for satisfying the above condition.
To deal with such an infeasible constraint, we slightly modify the problem as follows: The slack variables ξ i and ξ * i serves as soft margin that allows the regression error of each data point i to exist up value ξ i and ξ * i . The constraint C controls the overfitting of the model by imposing a penalty on observations that lie outside the margin ε.
The solution of Equation (7) can be optimized by solving a dual problem that is computed as follows: subject to: The function K(x i , x j ) = < ϕ(x i ), ϕ(x j ) > is called as kernel function where ϕ(x i ) is a transformation that maps x i to a high-dimensional space. If we assume a linear model, linear kernel function K(x i , x j ) = x T i x j is utilized. If the non-linear relationship is assumed, radial basis function (RBF) kernel are widely adopted. Now the function that is used to predict new value x n is equal to Equation (9): The predictive performance of SVR is significantly affected by the hyper-parameters. If we assume RBF as the kernel function, the performance of SVR is affected by hyper-parameters C, ε and σ.
Parameters ε and C affect the way of penalizing the training error. The parameter ε controls the amount of error allowed in the model. If the residual is larger than ε, then the training error is penalized by C. Thus, too small C values may result in the overfitting, while too large C values may result in the underfitting of the model. On the other hand, the parameter σ determines the correlation of Sensors 2020, 20, 1588 8 of 12 the kernel function. Thus, an optimal set of hyper-parameters among its possible combinations should be found. For a more detailed description of SVR, the reader is referred to [28].

Data Preprocessing
Before applying SVR, several preprocessing tasks were performed. In this step, data records that may lower the prediction performance were omitted from the dataset. Firstly, the observations with zero speed were omitted because it means that the vessel does not use propulsion power at all. Moreover, the observations whose vessel speed is less than 6 knots were also omitted because this indicates that the vessel is near a harbor and is likely to undergo frequent course changes. Such a situation may not be adequate for the propulsion power analysis.
Also, some outlier data that has abnormal shaft power value (output feature) is omitted from the dataset. To systematically determine the outlier, Chauvenet's criterion, is utilized [29]. Given N samples of a dataset, an observation is considered as an outlier if NP(>|z|) < 0.5 where P(>|z|) is the cumulative probability of the observation is being more than z standard deviation of the mean. An example of outlier detection is shown in Figure 3. As shown in this figure, the outlier that shows a large deviation from the normal shaft power value is omitted from the dataset.
Finally, feature scaling is applied to the dataset. Feature scaling is important for SVR, since it tries to maximize the distance between the separating plane and the support vectors. If one feature has larger magnitudes than others, it will dominate the other features when calculating the distance. In this study, we normalize each feature between the −1 to +1 intervals. All the preprocessing procedure was conducted using Pandas module with a python 3.6 environment.
Sensors 2020, 20, x FOR PEER REVIEW 8 of 13 σ . Parameters ε and C affect the way of penalizing the training error. The parameter ε controls the amount of error allowed in the model. If the residual is larger than ε , then the training error is penalized by C. Thus, too small C values may result in the overfitting, while too large C values may result in the underfitting of the model. On the other hand, the parameter σ determines the correlation of the kernel function. Thus, an optimal set of hyper-parameters among its possible combinations should be found. For a more detailed description of SVR, the reader is referred to [28].

Data Preprocessing
Before applying SVR, several preprocessing tasks were performed. In this step, data records that may lower the prediction performance were omitted from the dataset. Firstly, the observations with zero speed were omitted because it means that the vessel does not use propulsion power at all. Moreover, the observations whose vessel speed is less than 6 knots were also omitted because this indicates that the vessel is near a harbor and is likely to undergo frequent course changes. Such a situation may not be adequate for the propulsion power analysis.
Also, some outlier data that has abnormal shaft power value (output feature) is omitted from the dataset. To systematically determine the outlier, Chauvenet's criterion, is utilized [29]. Given N samples of a dataset, an observation is considered as an outlier if NP(>|z|) < 0.5 where P(>|z|) is the cumulative probability of the observation is being more than z standard deviation of the mean. An example of outlier detection is shown in Figure 3. As shown in this figure, the outlier that shows a large deviation from the normal shaft power value is omitted from the dataset.
Finally, feature scaling is applied to the dataset. Feature scaling is important for SVR, since it tries to maximize the distance between the separating plane and the support vectors. If one feature has larger magnitudes than others, it will dominate the other features when calculating the distance. In this study, we normalize each feature between the −1 to +1 intervals. All the preprocessing procedure was conducted using Pandas module with a python 3.6 environment.

Model Learning
The SVR algorithm was applied to the training dataset. In this study, RBF was chosen for the kernel function. The RBF is a widely adopted kernel function because the number of hyper parameters is small compared to that of other models without losing too much prediction performance.

Model Learning
The SVR algorithm was applied to the training dataset. In this study, RBF was chosen for the kernel function. The RBF is a widely adopted kernel function because the number of hyper parameters is small compared to that of other models without losing too much prediction performance.
As previously mentioned, finding the optimal parameter is a crucial step for learning the SVR model. For the SVR model with RBF kernel function, three hyper-parameters C, ε and σ are required. In this study, to find the optimal hyper-parameter set, every possible parameter combination was validated by a K-fold cross validation approach. As a result, the optimal hyper parameter for the training Sensors 2020, 20, 1588 9 of 12 dataset was C = 4950, σ= 0.6 and ε= 1.0. The experiment was conducted with Scikit-learn which is a famous open-sourced machine-learning library of python.

Result
To validate our model, we divided the data into training and testing datasets. The data collected during first five months was used to train the model while the remaining two months were used to test the model. To measure the predictive performance of the model, RMSE (residual mean squared error) and R2-score were used. The evaluation result is shown in Table 3. The R2 score is 89.78% which indicates a fairly good performance for the regression. In Figure 4, actual propulsion power vs. speed record is compared with the dataset predicted by our SVR model. Each red dot is actual propulsion power in the testing data set given at specific speed of the vessel. As shown in this figure, given the same speed level, the propulsion power shows large deviation. This deviation can be explained by the impact of external factors (wind, wave and tide). The blue dots are the data predicted by the SVR model. As shown in this figure, the SVR model shows almost the same pattern as actual data, which indicates that our SVR is a reliable tool to predict the propulsion power of the vessel.    Figure 5 shows the relationship between the speed and the propulsion power of the vessel. The green and black lines are estimated speed vs. power curve obtained from ISO15016 method. The green is obtained when the vessel is loaded with the cargo while the black line is obtained when the vessel is empty. The blue cross dots are speed vs. power relationship predicted by the SVR model while eliminating the impact of external factors. As shown in this figure, the predicted propulsion performance lies between green and black lines, which indicates that SVR model shows generic S/P relationship when it is not affected by the external factors (wind, wave, and tide). The SVR model shows an abnormal pattern when the input is over 18 notes because the data measured with vessel speed being over 18 notes is not available.  We also have compared the prediction performance of the proposed model with that of an ISO15016 method. In order to do this, we have compared the prediction results of the two methods using the same testing dataset (2016.06.01~2016.08.01). When applying ISO15016 [12], the wind measure is adjusted based on the Fujiwara method (p.41) and the wave feature is adjusted based on the STAWAVE-2 method (p.45). STAWAVE-2 approximates the aw R in Equation (5) by the following: We also have compared the prediction performance of the proposed model with that of an ISO15016 method. In order to do this, we have compared the prediction results of the two methods using the same testing dataset (2016.06.01~2016.08.01). When applying ISO15016 [12], the wind measure is adjusted based on the Fujiwara method (p.41) and the wave feature is adjusted based on the STAWAVE-2 method (p.45). STAWAVE-2 approximates the R aw in Equation (5) by the following: where R AWML and R AWRL is mean resistance increase due to wave motion and reflection respectively. r aw (ω) and α 1 (ω) are functions of circular frequency of regular wave ω which are explained in detail in [12]. For a more detailed procedure of STAWAVE-2, the reader is referred to [12]. The comparison result is shown in Table 4. As shown in the table, the SVR-based model outperforms the ISO15016 in both R2 score and RMSE. The ISO15016 method underperforms the SVR-based model especially when the vessel is in bad weather conditions.

Conclusions and Future Works
This study proposes a data-driven approach to predict propulsion power. Although several prediction methods are available, most of them are computationally expensive, and suffer from low prediction accuracy. Instead, we propose to use a machine-learning model that utilizes actual operational data of the vessel. In this study, support vector regression (SVR) is used to learn from 178,000 onboard data observations obtained from a bulk carrier that operates on solid lines. Compared to the conventional methods, the proposed model does not require complex equations and showed superior performance if only the big data of the solid line are secured.
There are, however, further issues to explore to improve the model performance. Currently, the data feature related to the vessel status such as the ship damage, roughness of the hull, or engine performance degradation are not addressed. Accommodating such additional features may improve the model performance. Another issue is to compare numerous competing machine-learning algorithms and find the best models.