A Novel Hybrid Fuel Consumption Prediction Model for Ocean-Going Container Ships Based on Sensor Data

Hu, Zhihui; Zhou, Tianrui; Osman, Mohd Tarmizi; Li, Xiaohe; Jin, Yongxin; Zhen, Rong

doi:10.3390/jmse9040449

Open AccessArticle

A Novel Hybrid Fuel Consumption Prediction Model for Ocean-Going Container Ships Based on Sensor Data

by

Zhihui Hu

¹

,

Tianrui Zhou

¹,

Mohd Tarmizi Osman

¹,

Xiaohe Li

²,

Yongxin Jin

^1,* and

Rong Zhen

³

¹

Merchant Marine College, Shanghai Maritime University, Shanghai 200120, China

²

College of Power and Energy Engineering, Harbin Engineering University, Harbin 150001, China

³

National-local Joint Engineering Research Center for Marine Navigation Aids Services, Navigation College of Jimei University, Xiamen 361000, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2021, 9(4), 449; https://doi.org/10.3390/jmse9040449

Submission received: 30 March 2021 / Revised: 13 April 2021 / Accepted: 14 April 2021 / Published: 20 April 2021

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate, reliable, and real-time prediction of ship fuel consumption is the basis and premise of the development of fuel optimization; however, ship fuel consumption data mainly come from noon reports, and many current modeling methods have been based on a single model; therefore they have low accuracy and robustness. In this study, we propose a novel hybrid fuel consumption prediction model based on sensor data collected from an ocean-going container ship. First, a data processing method is proposed to clean the collected data. Secondly, the Bayesian optimization method of hyperparameters is used to reasonably set the hyperparameter values of the model. Finally, a hybrid fuel consumption prediction model is established by integrating extremely randomized tree (ET), random forest (RF), Xgboost (XGB) and multiple linear regression (MLR) methods. The experimental results show that data cleaning, the size of the dataset, marine environmental factors, and hyperparameter optimization can all affect the accuracy of the model, and the proposed hybrid model provides better predictive performance (higher accuracy) and greater robustness (smaller standard deviation) as compared with a single model. The proposed hybrid model should play a significant role in ship fuel consumption real-time monitoring, fault diagnosis, energy saving and emission reduction, etc.

Keywords:

fuel consumption; real-time prediction; hybrid model; sensor data; hyperparameters optimization

1. Introduction

The maritime transportation industry has played a significant role in the cargo industry as a whole since the development of international trade [1,2], and it also has an important impact on the development of the national economy [3]. The total volume of international seaborne trade has been growing significantly over the last years [4]. In addition, container shipping is important for global seaborne trade and the quantity of cargo transported by container shipping has been increasing over the past decades [5]. The increased volume of maritime transport consumes a huge amount of fuel, and as the price of fuel continues to increase, the companies operating ships are facing tremendous freight pressure. In fact, fuel costs for tankers and container ships have been estimated to account for 58% and 78% of the total operating costs, respectively [6]. Another side effect of the significant volume of maritime transportation is an increase in ship-induced greenhouse gas emissions. As a consequence, global warming and various air pollution issues have surfaced. Worldwide estimated carbon emissions from ships, in 2012, were approximately 938 million tons, representing 2.6% of the global total carbon emissions. If no effective control measures are taken, it is expected to rise by 50% to 250% by 2050 [7]. The literature also proves that greenhouse gases emitted by ships mainly include SO

_{2}

, NO

_{x}

, CO

_{2}

, PM

_{2.5}

, PM

_{10}

[8] and some technical research aiming at greenhouse gas emissions reduction is also being worked on by some experts, such as concerning seawater desulphurization [9,10,11].

The increase in greenhouse gases and environmental pollution has resulted in the International Maritime Organization (IMO), the member states, and related organizations taking various measures to improve the energy efficiency of ships. In 2009, the IMO issued the Guidelines for Voluntary Use of Energy Efficiency Operational Indicator (EEOI) which applies to all ships, and is used to measure the energy efficiency level of operational ships. In addition, the Energy Efficiency Design Index (EEDI) and the Ship Energy Efficiency Management Plan (SEEMP) were launched by the IMO, in 2011, for new ships and all ships, respectively. In 2015, the Marine Environment Protection Committee (MEPC) formulated a three-steps plan focused on ship energy savings and emission reduction based on fuel consumption data, i.e., gathering the data, analyzing the data, and optimizing the support decision. In the same year, China also issued a document entitled “Code for Smart Ships”, which integrates the collection, analysis, assessment, and support decision of ship fuel consumption data as part of smart energy efficiency. In 2019, the Norwegian government partnered with the IMO to establish the GreenVoyage-2050 project, which aims to transform the shipping industry towards a lower carbon future. The main purpose of all the above measures is to improve energy efficiency and to minimize greenhouse gas emissions from international shipping and a prerequisite for the aforementioned objective is the development of an accurate and robust ship energy efficiency prediction model.

Therefore, our main task is to establish a real-time prediction model with high accuracy and robustness based on the collected ship fuel consumption data, and it will be the basis and premise of fuel management and optimization of ship fuel efficiency.

The remaining framework of this paper is as follows. Section 2 reviews the existing studies on ship fuel consumption prediction. ship fuel consumption data collection and processing is described in Section 3. The methodology, hyper-parameters optimization, and error matrices are outlined in Section 4. Section 5 discusses the experimental results and finally, the conclusion and future work is discussed in Section 5.

2. Literature Review

An accurate and robust ship fuel consumption prediction model plays a significant role in the optimization of ship fuel consumption. Currently, there are three main types of ship fuel consumption prediction models, namely the physics-based model, simulation-based model, and data-driven model.

2.1. Physics-Based Ship Fuel Consumption Prediction Model

In the physics-based ship fuel consumption prediction model, the ship’s resistance is calculated through an empirical formula. This is followed by calculating the ship’s fuel consumption based on the principle of equal resistance and thrust, combined with the relationship between thrust and a ship’s fuel consumption rate. The earliest and most classic documentation of the physics-based ship fuel consumption prediction model was published by Holtrop and Mennen in 1982 [12]; however, the ship’s resistance was calculated in calm water, without considering the marine environment. The model was improved by Kwon where marine environmental factors were considered [13]. Subsequent studies on the physics-based ship fuel consumption prediction model have been based on the above mentioned studies [14,15,16,17,18].

The advantages of the physics-based model are that the calculations are relatively simple and the principle of the model is easy to understand; however, it is difficult to accurately depict the impact of environmental factors on the ship fuel consumption using an empirical formula; therefore, the physics-based prediction model approaches are usually less accurate.

2.2. Simulation-Based Ship Fuel Consumption Prediction Model

The widely adopted simulation-based prediction model for ship fuel consumption uses computational fluid dynamics (CFD), an emerging interdisciplinary field of hydromechanics and computer science [19,20,21]. CFD is used to approximate the integral and differential terms of the fluid dynamics control equations into discrete algebraic forms, turning them into algebraic groups of equations. Then, these discrete groups of algebraic equations are solved using computer software in order to obtain numerical solutions at discrete time/space points.

The simulation-based prediction model produces accurate results for ships sailing in calm water; however, the accuracy of a simulation-based prediction model for ships in actual sea conditions is still arguable, since it is still difficult to depict the impact of environmental factors on the ship fuel consumption. In addition, CFD simulations take a relatively long time, making it difficult to satisfy the demand for real-time prediction.

2.3. Data-Driven Ship Fuel Consumption Prediction Model

The data-driven ship fuel consumption prediction model was developed by using data mining, deep learning, ensemble learning, and other methods. This approach is becoming increasingly popular in this field of research since a large number of noon-report data and sensor data are collected and made available.

From the empirical formula, the ship fuel consumption has a cubic relationship with engine speed, whereas the engine speed is related linearly to the voyage speed. Hence, the ship fuel consumption can be related directly to voyage speed. Through this relationship, the ship fuel consumption model can be established using statistical methods that combine the relationship between fuel consumption and voyage speed. Then, the collected data are used to fit the model parameters in order to make the model more realistic.

Yao et al. [22] fitted the daily fuel consumption (y) and speed (v) of container ships and obtained the following relationship:

y = k 1 * v^{3} + k 2

. Le et al. [23] collected the noon-report data from more than 100 container ships and classified ships into five types according to their sizes. Finally, ship speed, sailing time and total fuel consumption were linearly fitted. Bocchetti et al. [24] conducted experiments on oil tanker fuel consumption data and obtained the sixth power relationship. Bialystocki and Konovessis [25] used the collected data from the noon reports and took environmental factors into consideration. Finally, the daily fuel consumption and speed were fitted to obtain a quadratic relationship. The least absolute shrinkage and selection operator (LASSO) and ridge regression [26,27] techniques were also used to model ship fuel consumption, as compared with traditional linear regression technique, the prediction performance of LASSO and ridge regression techniques were better due to the characteristics of compressed features and deleted collinear features. Furthermore, the low accuracy of the linear regression technique is due to the high dimensional and nonlinear nature of ship fuel consumption data which make it difficult to fit their intrinsic relationships. Therefore, nonlinear models have been gradually applied to ship fuel consumption modeling and have obtained better prediction results [28,29,30].

With the continuous development of machine learning, the use of new technologies for developing ship fuel consumption prediction models is becoming increasingly well researched. The strong nonlinear fitting ability of artificial neural network (ANN) enables it to be widely used for models with high accuracy. It has been reported that the ship fuel consumption models using ANN have produced good prediction results based on noon-report data [23,31,32,33,34], sensor data [29,35,36,37,38,39] and automatic identification system(AIS) data [40]. Another machine learning approach, known as ensemble learning, has been emerging as is gradually being applied in ship fuel consumption models [41,42]. Experimental results have revealed that the accuracy of ensemble learning methods for predicting ship fuel consumption is superior as compared with other algorithms.

From the above literature review of data-driven methods, there is not one algorithm that is applicable to all research datasets. Different algorithms can be more appropriate because they perform better in particular research datasets [43]. In general, statistical methods are suitable for small datasets, whereby deep learning and ensemble learning perform are better in large datasets.

2.4. Research Gap and Contributions

In the process of developing a ship fuel consumption model, there are two major components that determine the performance of the model. One component is the quality of the fuel consumption data; there are two main types of ship fuel consumption data, namely noon-report data and sensor data. Noon-report data are filled in by crew once a day at noon, and therefore it is difficult to used these data for real-time monitoring of ship performance. Sensor data are collected by many sensors, with high frequency acquisition, and therefore these data meet the requirements of real-time monitoring of ship fuel consumption. The second component of a model is the methods of ship fuel consumption modeling. From the literature review, is seems that many current modeling methods have been based on a single model, and no multiple models are used, although multiple models have been proven to be effective approaches in other fields [44,45].

The main contributions of this study are the following:

(1): A precise and high-frequency ship fuel consumption data set is obtained via multi-sensor in order to provide a substantial high-quality data for the model development.
(2): A novel hybrid fuel consumption prediction model based on multiple models is proposed.

3. Data Collection and Processing

3.1. Data Collection

The ship fuel consumption data were obtained from a container ship from 14 September 2017 to 25 September 2018. The container ship information is shown in Table 1. The ship fuel consumption data records consist of information on characteristics such as data acquisition time, fuel consumption, Global Positioning System (GPS) speed, trim, mean draft, current speed and direction, wind speed and direction, and wave direction and height, and is shown in Table 2. The fuel consumption was collected by an installed onboard flow meter sensor, and each data record value is the volume of heavy fuel consumed by the ship’s main engine within a 15-min period multiplied by the density of the heavy fuel. The GPS speed was collected by an installed onboard GPS sensor. Mean draft (mean value of for draft and aft draft) and trim (for draft minus aft draft) were obtained by the installed onboard ecosounder sensor. We also acquired wind speed, wind direction, wave height, wave direction, current speed, and current direction through the onboard radarsonde sensor, wave gauge sensor and current meter sensor, respectively. Since the collection frequency of the onboard GPS, echosounder, radarsonde, current meter, and wave gauge sensors vary from a few seconds to a few minutes, in order to be consistent with the collection frequency of fuel consumption, the values of GPS speed, mean draft, trim, wind speed, wind direction, wave height, wave direction, current speed and current direction are the mean value within 15 min. For the convenience of subsequent research, the fuel consumption value within 15 min was converted into daily fuel consumption, E, where Equation is as follows:

E = \frac{E_{r}}{(15 / 60)} * 24

(1)

where

E_{r}

is the ship fuel consumption in every 15 min.

3.2. Data Processing

Data processing is an important step and a prerequisite for developing a ship fuel consumption model, because there are inevitably some errors in the raw data collection process due to data transmission delay, deviation, and/or interruption, etc. [37], the errors include null data, noisy data, anomaly data, etc. The following steps were performed to delete the errors in the raw data.

(1): Some of the characteristics in the fuel consumption data contained null data that were deleted to ensure the integrity of the data records.
(2): Characteristics that contained noisy data that were greater that the recognition range were considered to be noisy data and deleted. For example, if the values for the direction of wind, wave, and current goes beyond 0–360 $^{\circ}$ , and mean draft over 20 m, trim over 5 m in absolute value, wind speed over 30 m/s, wave height over 10 m, and current speed over 2 Kn are all considered as noise data, those data were deleted.
After the data processing on null data and noisy data was completed, there are 9371 ship fuel consumption data records remaining, as shown in Figure 1a.
(3): Unlike null data and noisy data that are relatively easy to find, anomaly data can only be found with the help of existing research and domain knowledge. The process of deleting anomaly data is as follows [46].

Step 1.: Delete any ship fuel consumption data records with ship GPS speed V < 10 kn or ship GPS speed V > 30 kn.
Step 2.: Calculate the ratio k of any two daily fuel consumption data records $k = E_{i} / E_{j} (i, j = 1, 2, \dots, n)$ , if $k < min ({(V_{i} / V_{j})}^{2}, {(V_{i} / V_{j})}^{4})$ or $k > max ({(V_{i} / V_{j})}^{2}, {(V_{i} / V_{j})}^{4})$ , then, add 1 to the outlier scores of the $i_{-} t h$ and $j_{-} t h$ data records, traverse all data records, and count the total score of each data record.
Step 3.: Sort the outlier scores of the data records in descending order and delete the top 20% of the data records.

After deleting anomaly data, the final cleaned data consisted of 7493 reliable ship fuel consumption data records, as shown in Figure 1b. It can be observed from Figure 1 that after deleting anomaly data, the distribution of ship fuel consumption data became more regular. Furthermore, the accuracy of the data when fitted to the GPS speed curve increased from 0.7773 to 0.9179, which indicates that data processing can effectively improve the performance of the model.

3.3. Data Overview

The distribution of the processed fuel consumption data is shown in Figure 2. Figure 2a is the distribution of fuel consumption and GPS speed, and we can see that, most of the time, the fuel consumption value is approximately 100–130 t, and the GPS speed value is approximately 18–20 kn, which is the speed range that corresponds to the customary speed of container ships. The distribution of mean draft and trim is shown in Figure 2b. There are mainly three different draft conditions and their values from small to large correspond to the ship’s empty, ballast, and full load conditions, respectively. The trim is mainly distributed between −0.4 and 1.5 m, and the highest frequency is at about 0.5 m, indicating that the ship is often in the bow (bow is positive). The distribution of wind, wave, and current in the marine environment are shown in Figure 2c–e, their main values are concentrated near 0, which shows that the marine environment is in good condition.

4. Methodology

4.1. Overall Framework

The main focus of this study included the following three objectives: fuel consumption data collection, data processing, and data analysis (fuel consumption modeling). Then, the model was applied (fuel consumption optimization), which was the ultimate goal of the study, i.e., to improve energy efficiency, reduce emissions, and protect the marine environment. The details are shown in Figure 3.

The main steps of the study are as follows:

Step 1.: The fuel consumption data of an ocean-going container were obtained by different sensors, including fuel consumption, GPS speed, mean draft, etc., for a total of 10 related features. The daily fuel consumption was the output variable, and the remaining variables were used as input variables.
Step 2.: Data preprocessing was performed, including data transformation, null, noisy, and anomaly data deletion, etc., to obtain a high-quality fuel consumption dataset.
Step 3.: The fuel consumption dataset was divided into the training and testing sets according to a certain ratio.
Step 4.: A hybrid fuel consumption prediction model was proposed by integrating ET (extremely randomized tree), RF (random forest), XGB (Xgboost), and MLR (multiple linear regression) methods. Reference models were developed using ET, RF, XGB, MLR, support vector machine (SVM), and artificial neural network (ANN).
Step 5.: A Bayesian optimization method of hyperparameters was used to enhance model performance.
Step 6.: The model performance was evaluated using error metrics.
Step 7.: The best model could be applied to ship route, speed, trim optimization in the future.

4.2. The Related Methods

4.2.1. XGB

Xgboost (XGB) algorithm was first proposed by Chen and Guestrin [47]. and has a wide range of applications in various fields due to its high accuracy, regularization, support for parallel operations, and automatic processing of missing values [48,49]. The XGB algorithm is solved through the following steps [47]:

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(2)

where

{\hat{y}}_{i}

is the predicted value of the i-th sample, K is the total number of trees,

x_{i}

is the feature vectors, F is the set of trees, and f is the structure of trees.

O b j = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(3)

where

l (y_{i}, {\hat{y}}_{i})

is the loss function and

Ω (f_{k})

is the regularization term.

Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

(4)

where T is the number of leaf nodes,

w_{j}

is the weights of leaf nodes, and

γ

and

λ

are weight penalties.

In the process of objective function minimization, each newly added function

f_{t} (x_{i})

should minimize the loss function, the t-th round objective function of Equation (3) can be converted:

O b j^{(t)} = \sum_{i = 1}^{n} l (y^{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t})

(5)

The following Equation can be obtained by using second-order Taylor expansion to approximate value of the loss function:

\begin{matrix} O b j^{(t)} & ≅ \sum_{i = 1}^{n} [g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + Ω (f_{t}) \\ ≅ \sum_{i = 1}^{n} [g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2} \\ ≅ \sum_{j = 1}^{T} [(\sum_{i \in I_{j}} g_{i}) w_{j} + \frac{1}{2} (\sum_{i \in I_{j}} h_{i} + λ) w_{j}^{2}] + γ T \end{matrix}

(6)

where

I_{j} = \{i ∣ q (x_{i} = j)\}

is the set of each leaf node of the j-th tree,

g_{i} = \partial_{{\hat{y}}_{i}^{(t - 1)}} l (y_{i}, {\hat{y}}_{i}^{(t - 1)})

and

h_{i} = \partial_{{\hat{y}}_{i}^{(t - 1)}}^{2} l (y_{i}, {\hat{y}}_{i}^{(t - 1)})

are the first and second derivative of the loss function, respectively.

Let

G_{j} = \sum_{i \in I_{j}} g_{i}

and

H_{j} = \sum_{i \in I_{j}} h_{i}

, then, substitute them into Equation (6) to obtain Equation (7) as follows:

O b j^{(t)} ≅ \sum_{j = 1}^{T} [G_{j} w_{j} + \frac{1}{2} (H_{j} + λ) w_{j}^{2}] + γ T

(7)

The following Equation by calculating partial derivative of w.

w_{j} = - \frac{G_{j}}{H_{j} + λ}

(8)

Substitute Equation (8) into Equation (7) and obtain Equation (9) as follows:

O b j^{(t)} ≅ - \frac{1}{2} \sum_{j = 1}^{T} \frac{G_{j}^{2}}{H_{j} + λ} + γ T

(9)

The greedy algorithm is used to enumerate the feasible split points to split the subtree, so that the model obtains a higher gain and smaller objective function. The calculation Equation is as follows:

O b j_{Gain} ≅ \frac{1}{2} [\frac{G_{L}^{2}}{H_{L} + λ} + \frac{G_{R}^{2}}{H_{R} + λ} - \frac{(G_{L} + G_{R})}{H_{L} + H_{R} + λ}] - γ

(10)

where

\frac{G_{L}^{2}}{H_{L} + λ}

is the gain generated after the split of the left sub-tree,

\frac{G_{R}^{2}}{H_{R} + λ}

is the gain generated after the split of the right sub-tree,

\frac{(G_{L} + G_{R})}{H_{L} + H_{R} + λ}

is the gain generated without sub-tree splitting.

4.2.2. RF and ET

Random Forest (RF) was proposed by Breiman and was developed based on the bagging technique [50]. The final result of the RF is averaged from the results of many independent decision trees. The calculation Equation is as follows [50]:

y = \frac{1}{m} \sum_{i = 1}^{m} f_{i} (x)

(11)

where m is the total number of trees and

f_{i} ()

is the prediction result of i-th tree.

Extremely randomized tree (ET) is a variant of RF [51]. The principle is similar to that of RF where the only differences are as the following:

(1): RF uses bootstrap random sampling to select the training set for each of the decision trees, whereby ET generally does not use bootstrap random sampling.
(2): After selecting the split feature, RF will select an optimal feature value as the split point, which is the same as the traditional decision tree. However, ET will randomly select a feature value to split.

4.2.3. MLR

Multiple linear regression (MLR) is a statistical analysis method used to determine the interdependent quantitative relationship between two or more variables. Assuming the input variable is

X = (x_{1}, \dots, x_{D})

, its expression is the following [42,52]:

y (x, w) = w_{0} + w_{1} x_{1} + \dots + w_{D} x_{D} = w_{0} + \sum_{i = 1}^{D} w_{i} x_{i}

(12)

where w can be estimated using the least squares(LS) approach as follows [53]:

\hat{w} = \underset{w}{arg min} \{\sum_{j = 1}^{N} {(y_{i} - w_{0} - \sum_{i = 1}^{D} (w_{i} x_{j i}))}^{2}\}

(13)

4.2.4. The Hybrid Fuel Consumption Prediction Model

The proposed hybrid fuel consumption prediction model was developed on the basis of the stacking theory method [54,55]. By fusing multiple algorithms into the hybrid model, the advantages of each algorithm were fully utilized to improve the robustness of the model and enhance its generalization ability.

The hybrid model improves the generalization ability by combining a set of single models, rather than selecting the best one among them. The proposed hybrid model is a hierarchical model integration framework and its structure is shown in Figure 4. There are two layers of models in a hybrid model framework. The first-level layer is composed of multiple base-models (ET, RF and XGB in this study), the original training set (

X_{-} t r a i n

) is used to train the base models and to generate a new training set (

S_{-} t r a i n

), combined with K-fold cross-validation. Subsequently, the new test set (

S_{-} t e s t

) can be generated by using the trained base models to predict the original test set (

X_{-} t e s t

). The second-level layer is a meta model (MLR in this study), the training set (

S_{-} t r a i n

) is used to train the MLR model, then, the trained model is used to predict the test set (

S_{-} t e s t

) and obtain the final prediction result. The calculation equation is as follows:

y = h (f_{i} (x))

(14)

where

f_{i} ()

is the

i_{-} t h

basis-learner and

h ()

is the meta-learner.

The advantages of a hybrid model are better generalization ability, the ability to adapt to more complex tasks, the ability to fit nonlinear relationships, and greater robustness; however, the disadvantages of the hybrid model include difficulty in determining values of the hyperparameters and complex calculations.

4.3. Hyperparameters Optimization and Cross-Validation

The value of hyperparameters for a model are determined before training, not obtained through training. Therefore, it is necessary to have a set of optimized values of hyperparameters in order to improve the prediction performance of the model. Since it is relatively challenging to determine the value of hyperparameters, it is important to choose a reasonable hyperparameters optimization method. There are main three methods for hyperparameters optimization, i.e., grid search, random grid search, and Bayesian optimization [56]. In grid search, a comprehensive search on all the enumerated possibilities is performed. As a result, the optimal values of hyperparameters are obtained for all combinations at the cost of a longer runtime. In random grid search, a certain number of random searches are performed on all possible combinations of hyperparameters; therefore, random grid search has the shortest runtime; however, there is low possibility of obtaining the optimal combination of hyperparameters. The Bayesian optimization of hyperparameters method is between the previous two methods, it runs faster and can also obtain a better hyperparameter combination.

To obtain a more reasonable hyperparameter values, K-fold cross-validation is usually combined during hyperparameters tuning to jointly train the model. K-fold cross-validation is used to divide the dataset into K parts in equal proportions, takes one part as the validation set, and the other K-1 parts as the training set, and the experiment is repeated K times.

4.4. Error Metric

To evaluate the performance of the the ship fuel consumption prediction model, in the study, four performance metrics were constructed. They are R

^{2}

, mean square error (MSE), mean absolute error (MAE) and running time (T) as follows [29,41]:

R^{2} = 1 - \frac{\sum_{i}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i}^{n} {(y_{i} - \bar{y})}^{2}}

(15)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(16)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(17)

T = t_{end} - t_{start}

(18)

where

y_{i}

is the true value of ship fuel consumption,

{\hat{y}}_{i}

is the predicted value,

{\bar{y}}_{i}

is the average value, and n is the number of data samples.

t_{start}

and

t_{end}

are the start and end time of model operation, respectively.

It can be observed from Equation (15) that a larger value of the performance index R

^{2}

, indicates a better model performance. Conversely, a better model performance is indicated by a smaller value of MSE, MAE, and T in Equations (16), (17), (18).

5. Results and Discussion

All experiments were conducted using Python3.5 running on a 64-bit Windows 10 operating system, Intel Core i5-7200 CPU processor, and 12.0 GB memory. To verify the superiority of the proposed hybrid model, it was validated against reference models developed using MLR, SVM, ANN, ET, RF, and XGB.

Due to the different range of characteristic values of ship fuel consumption data, the performances of the models (MLR, SVM, ANN) are affected. Therefore, the data need to be standardized before being used for the models. The equation for data standardization is as follows [29]:

x^{'} = \frac{x - μ}{σ}

(19)

where

μ

and

σ

are the mean of each characteristic and standard deviation respectively.

Each ship fuel consumption dataset was divided into a training set and a test set according to the ratio 0.8:0.2. The model was trained using the training set, and the trained model was used to predict the test set in order to obtain the prediction result. To ensure that the training and test sets were the same for each model, the random division state (

r a n d o m_{-} s t a t e

) of the data needs to be fixed and set to the same value. Simultaneously, in order to reproduce the experimental result, the

r a n d o m_{-} s t a t e

of each model also needs to be fixed. Since the training set and the test set were divided according to

r a n d o m_{-} s t a t e

, the model result in a certain

r a n d o m_{-} s t a t e

does not indicate whether the model is suitable; therefore, it is necessary to test in different

r a n d o m_{-} s t a t e

and take the average value as the value of model performance.

The following experiments were conducted on the influence of data volume, environmental factors, and hyperparameters on the model; the first two experiments models are not hyperparameter tuned. At the same time, the performances of different models before and after hyperparameter optimization were also compared. All the experimental results are the average of five experiments.

5.1. The Impact of Data Volume on Model Performance

The most significant difference between noon-report data and sensor data is the data acquisition frequency. The noon-report data acquisition frequency is once a day, while the sensor data is every 15 min; therefore the amount of sensor data is 96 times the amount of noon-report data in the same duration. Therefore, we can indirectly compare the effect of fuel consumption modeling based on sensor data versus noon report-data from the impact of the dataset size on model performance.

The data volume is set to 100, 200, 400, 800, 1600, 3200, 6400, and 7493. A total of eight different data volumes and 400 (

10 \times 8 \times 5

= 400) experiments are required. The models R

^{2}

, MSE, MAE, and T values under different data volumes are shown in Figure 5a–d. It can be observed from Figure 5a–c that for data volume less than 1000, the R

^{2}

values of all models increase rapidly with the an increase in the data volume. It takes almost three years for noon reports to achieve a dataset with 1000 records, while sensor data only requires about 10 days. It clearly demonstrates that the sensor data are more suitable for fuel consumption prediction modeling as compared with noon-report data. For data volume between 1000 and 3000, the R

^{2}

values increase slightly with an increase in the data volume, and for data volume more than 3000, the R

^{2}

values for all models are basically constant. The runtime of a model, T, generally increases with an increase in data volume. As the data volume continues to increase, the model runtime, T, also continues to increase, as shown in Figure 5d. The findings from this experiment can be used as a reference for the selection a model for real-time and online incremental modeling of ship fuel consumption.

5.2. The Impact of Marine Environmental Factors on Model Performance

To study the impact of marine environmental factors such as wind, wave, and current on ship fuel consumption, four different ship fuel consumption datasets were designed, i.e., Set 1, Set 2, Set 3, and Set 4. Set 1 covers all environmental factors, whereby Set 2, Set 3, and Set 4 are without wind factors, wave factors, and current factors respectively. A total of 200 (

10 * 4 * 5 = 200

) tests were conducted.

The results of the proposed model, and the reference models on the four datasets are as shown in Table 3 and Table 4 respectively. As shown in Table 3, the R

^{2}

values for the proposed model for Set 1, Set 2, Set 3, and Set 4 are 0.9932, 0.9927, 0.9924, and 0.9932, respectively. This shows that the R

^{2}

value decreases by 0.0005, 0.0008, and 0.0000 when the wind, wave, and current factor is missing from the dataset, respectively. In terms of the MSE value for the proposed model, it increases by 0.6493, 0.9637 and 0.0753 when the dataset lacks wind, wave, and current factor, respectively. In a similar trend, the value of MAE for the proposed model increased by 0.0957, 0.1149 and 0.0083, respectively. From the above analysis, it can be observed that among the three environmental factors, the wave factor has the greatest impact on the model, followed by wind and current factors. Since the feature is reduced, the model runtime, T is also slightly reduced, as shown in the last column of Table 3. The experimental results of the reference models in Table 4 show similar trends to those in Table 3 in terms of the importance of wind, wave, and current factors to the model.

In order to further verify the findings of the above experiments, ET and XGB are used to calculate the importance of different features to the ship fuel consumption model. As shown in Figure 6a,b, the sum of the importance value for GPS speed, trim, and mean draft for ET and XGB reached 0.9617 and 0.9385, respectively. This indicates that these three factors play a leading role in ship fuel consumption modeling. According to the literature [33], the fuel consumption of a ship is approximately cubic, or even quadratic, in relation to the GPS speed, and two-thirds in relation to the draft. The importance values for the environmental factors wind, wave, and current are 0.0135, 0.0208, and 0.0040, respectively, for the ET model. In the XGB model, the importance value for wind factor is 0.0221, wave factor is 0.0258, and current factor is 0.0136. Wind resistance is quadratic with wind speed, and wave resistance is quadratic with wave height and ship’s hydrostatic speed, so both wave and wind will lead to increased fuel consumption of the ship [17].

From this analysis, we verified that the importance value of the wave factor is the most significant among the environmental factors, followed by wind and current. This indicates that wind and waves are the more important factors because they reduced the ship’s propeller propulsion efficiency, thereby, affecting the ship’s daily fuel consumption. Current had little effect on the ship’s propulsion efficiency; therefore, it had a smaller impact on the ship’s daily fuel consumption.

5.3. The Influence of Hyperparameters on Model Performance

A challenging but important step in the modeling process is to obtain reasonable values of hyperparameters.

As previously mentioned in Section 4.3, the Bayesian optimization and five-fold cross-validation methods were chosen to obtain the optimized values of hyperparameters for each model. Table 5 outlines the hyperparameters that need to be optimized and it can be observed that ET, RF, XGB and ANN have more hyperparameters as compared with SVM, and MLR has no hyperparameter. Additionally, there is no hyperparameters optimization for the proposed model, as shown in Table 5, because the proposed model consists of some single models, and therefore the hyperparameter values of those single models are also the hyperparameter value of the proposed model.

In order to verify the effect of the hyperparameter optimization, the performances of the models are compared before and after hyperparameters optimization, as shown in Figure 7. Figure 7a shows the R

^{2}

value, where the blue dotted line and red line represent the results before and after hyperparameter optimization, respectively. It can be observed that the red line is almost always on the periphery, which indicates that the R

^{2}

value of the models after hyperparameters optimization is increased, in other words, the model performance is improved. Figure 7b,c show the MSE and MAE values, respectively. The red lines are always located in the inner circle, which indicates that the MSE and MAE values after hyperparameters optimization have been reduced, which also shows that the performance of the model is improved. Figure 7d shows the model runtime, T, before and after hyperparameters tuning. It can be observed that after hyperparameters tuning, the runtime, T, increases for all models.

5.4. Performance Analysis of Different Models

Experiments were conducted to find out the impact of different data volumes, different environmental factors, and hyperparameters optimization. The results revealed that increasing the data volume, increasing the environmental factors, and optimizing the model hyperparameters all improved the performance of the model to some extent. In order to determine the most suitable model for ship fuel consumption prediction, Set 1 was chosen as the data source and Bayesian optimization of hyperparameters was performed on each model. Then, the mean and standard deviation (Std) of error metrics were used to compare the performance of the models as shown in Table 6. In all single models, ET is the best model with the highest value of R

^{2}

(mean R

^{2}

= 0.9938), the lowest value of MSE (mean MSE = 7.3496), and MAE (mean MAE = 1.6752). The ANN is one of the most widely used models for ship fuel consumption modeling; however, its predictive performance is lower than that of ensemble learning methods (ET, RF, and XGB). MLR has the worst predictive performance, because MLR is a typical linear regression model, but the ship fuel consumption data present a nonlinear relationship.

By comparing the ET model to the proposed model, it can be observed, as shown in Table 6, that the models have similar mean accuracy values, because of the same R

^{2}

values, the proposed model has a lower MSE value (mean = 7.3446), but the ET model has a lower MAE value (Mean = 1.6752). In terms of Std, the proposed model has lower MSE and MAE Std values than the ET model, i.e., the former has MSE (Std = 0.2982) and MAE Std values (Std = 0.0354), whereas the latter has MSE (Std = 0.3025) and MAE Std values (Std = 0.0371). This indicates that the robustness and stability of the proposed model is better than the ET model. The model runtimes, T, for all models are less than 80 s, which is significantly faster than the 15-min ship fuel consumption collection time interval. This indicates that all of the models meet the requirements of real-time ship fuel consumption prediction. The proposed model exhibited good performance in ship fuel consumption prediction, and therefore can provide a reference for other ship fuel consumption data sources modeling in the future.

6. Conclusions

In this study, a hybrid method was implemented to develop a ship fuel consumption prediction model based on collected real-time ship sensor data. The research conclusions are mainly reflected in the following three aspects. First, the proposed data processing method can effectively improve the quality of ship fuel consumption data. The R

^{2}

value of the data, when fitted to the GPS speed curve, increased from 0.7773 to 0.9179 after data cleaning was conducted. Second, the increase of data volume, with maritime environmental factors and hyperparameter optimization were also found to contribute to the prediction accuracy of the model. Third, the experimental results revealed that the proposed model produced the better prediction results, followed by ET, XGB, RF, ANN, SVM and MLR. By comparing the proposed model with the best single model (ET), the accuracy (mean R

^{2}

= 0.9938) of the two methods were similar, but the standard deviation values (MSE Std = 0.2982 and MAE Std = 0.0354) of the proposed model were lower, which indicates that the proposed model is more robust and stable than ET. The runtime of all models were shorter than the ship fuel consumption collection time interval (15 min), thus, meeting the requirements of real-time ship fuel consumption prediction.

The proposed hybrid model can accurately predict fuel consumption in real time and it can also be applied for ship fuel consumption monitoring and fault diagnosis. Shipping companies and related maritime organizations are concerned about how to achieve energy conservation and emission reduction from ships; therefore, in the future, to achieve these goals, the proposed hybrid model could be applied to optimize ship fuel consumption, such as speed optimization, trim optimization, and route optimization. At the same time, this study had several limitations. The research object in this study is a container ship; it is difficult to discover the universal laws through a container ship’s fuel consumption data; therefore, the fuel consumption data for a fleet of container ships need to be collected in the future. In addition, all results or conclusions were obtained only from the data-driven level, and therefore we plan to focus on the impact of ship hydrodynamics on fuel consumption in the future.

Author Contributions

Conceptualization, Z.H. and Y.J.; methodology, Z.H. and T.Z.; software, Z.H. and T.Z.; validation, Z.H., T.Z. and Y.J.; formal analysis, Z.H.; investigation, Z.H. and T.Z.; data curation, Z.H. and T.Z.; writing—original draft preparation, Z.H.; writing—review and editing, M.T.O.; visualization, Z.H.; supervision, Y.J. and X.L.; funding acquisition, R.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 52001134; and Navigation College of Jimei University, National-local Joint Engineering Research Center for Marine Navigation Aids Services under Grant JMCBZD202011.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality.

Acknowledgments

This study was greatly helped by Yuquan DU from University of Tasmania, and we would like to express my gratitude to him. The authors also would like to thank the anonymous reviewers and editors for their helpful and constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shivachev, E.; Khorasanchi, M.; Day, S.; Turan, O. Impact of trim on added resistance of KRISO container ship (KCS) in head waves: An experimental and numerical study. Ocean. Eng. 2020, 211, 107594. [Google Scholar] [CrossRef]
Dui, H.; Zheng, X.; Wu, S. Resilience analysis of maritime transportation systems based on importance measures. Reliab. Eng. Syst. Saf. 2021, 209, 107461. [Google Scholar] [CrossRef]
Akbulaev, N.; Bayramli, G. Maritime transport and economic growth: Interconnection and influence (an example of the countriesin the Caspian sea coast; Russia, Azerbaijan, Turkmenistan, Kazakhstan and Iran). Mar. Policy 2020, 118, 104005. [Google Scholar] [CrossRef]
Dulebenets, M.A. Minimizing the Total Liner Shipping Route Service Costs via Application of an Efficient Collaborative Agreement. IEEE Trans. Intell. Transp. Syst. 2019, 20, 123–136. [Google Scholar] [CrossRef]
Pasha, J.; Dulebenets, M.A.; Kavoosi, M.; Abioye, O.F.; Theophilus, O.; Wang, H.; Kampmann, R.; Guo, W. Holistic tactical-level planning in liner shipping: An exact optimization approach. J. Shipp. Trade 2020, 5, 1–35. [Google Scholar] [CrossRef]
Ballou, P.J. Ship energy efficiency management requires a Total Solution approach. Mar. Technol. Soc. J. 2013, 47, 83–95. [Google Scholar] [CrossRef] [Green Version]
IMO. Reduction of GHG Emissions from Ships; Technical Report, Third IMO GHG Study 2014 Final Report, MEPC 67/INF.3; IMO: London, UK, 2014. [Google Scholar]
Bagoulla, C.; Guillotreau, P. Maritime transport in the French economy and its impact on air pollution: An input-output analysis. Mar. Policy 2020, 116, 103818. [Google Scholar] [CrossRef]
Flagiello, D.; Parisi, A.; Lancia, A.; Carotenuto, C.; Erto, A.; Di Natale, F. Seawater desulphurization scrubbing in spray and packed columns for a 4.35 MW marine diesel engine. Chem. Eng. Res. Des. 2019, 148, 56–67. [Google Scholar] [CrossRef]
Flagiello, D.; Erto, A.; Lancia, A.; Di Natale, F. Experimental and modelling analysis of seawater scrubbers for sulphur dioxide removal from flue-gas. Fuel 2018, 214, 254–263. [Google Scholar] [CrossRef]
Flagiello, D.; Di Natale, F.; Carotenuto, C.; Erto, A.; Lancia, A. Seawater desulphurization of simulated flue gas in spray and packed columns: An experimental and modelling comparison. Chem. Eng. Trans. 2018, 69, 799–804. [Google Scholar] [CrossRef]
Holtrop, J.; Mennen, G.G.J. An approximate power prediction method. Int. Shipbuild. Prog. 1982, 29, 166–170. [Google Scholar] [CrossRef]
Kwon, Y.J. Speed loss due to added resistance in wind and waves. Nav. Archit. 2008, 3, 14–16. [Google Scholar]
Fan, A.; Yan, X.; Bucknall, R.; Yin, Q.; Ji, S.; Liu, Y.; Song, R.; Chen, X. A novel ship energy efficiency model considering random environmental parameters. J. Mar. Eng. Technol. 2018, 19, 215–228. [Google Scholar] [CrossRef]
Wang, K.; Yan, X.; Yuan, Y.; Jiang, X.; Lin, X.; Negenborn, R.R. Dynamic optimization of ship energy efficiency considering time-varying environmental factors. Transp. Res. Part Transp. Environ. 2018, 62, 685–698. [Google Scholar] [CrossRef]
Wang, K.; Yan, X.; Yuan, Y.; Li, F. Real-time optimization of ship energy efficiency based on the prediction technology of working condition. Transp. Res. Part Transp. Environ. 2016, 46, 81–93. [Google Scholar] [CrossRef]
Yan, X.; Wang, K.; Yuan, Y.; Jiang, X.; Negenborn, R.R. Energy-efficient shipping: An application of big data analysis for optimizing engine speed of inland ships considering multiple environmental factors. Ocean. Eng. 2018, 169, 457–468. [Google Scholar] [CrossRef]
Li, X.; Sun, B.; Guo, C.; Du, W.; Li, Y. Speed optimization of a container ship on a given route considering voluntary speed loss and emissions. Appl. Ocean. Res. 2020, 94, 101995. [Google Scholar] [CrossRef]
Sherbaz, S.; Duan, W. Ship Trim Optimization: Assessment of Influence of Trim on Resistance of MOERI Container Ship. Sci. World J. 2014, 2014, 603695. [Google Scholar] [CrossRef]
Reichel, M.; Minchev, A.; Larsen, N.L. Trim Optimisation—Theory and Practice. TransNav Int. J. Mar. Navig. Saf. Sea Transp. 2014, 8, 387–392. [Google Scholar] [CrossRef] [Green Version]
Moustafa, M.M.; Yehia, W.; Hussein, A.W. Energy efficient operation of bulk carriers by trim optimization. In Proceedings of the 18th International Conference on Ships and Shipping Research, NAV 2015, Lecco, Italy, 24–26 June 2015; pp. 484–493. [Google Scholar]
Yao, Z.; Ng, S.H.; Lee, L.H. A study on bunker fuel management for the shipping liner services. Comput. Oper. Res. 2012, 39, 1160–1172. [Google Scholar] [CrossRef]
Le, L.T.; Lee, G.; Kim, H.; Woo, S.H. Voyage-based statistical fuel consumption models of ocean-going container ships in Korea. Marit. Policy Manag. 2020, 47, 304–331. [Google Scholar] [CrossRef]
Bocchetti, D.; Lepore, A.; Palumbo, B.; Vitiello, L. A Statistical Approach to Ship Fuel Consumption Monitoring. J. Ship Res. 2015, 59, 162–171. [Google Scholar] [CrossRef]
Bialystocki, N.; Konovessis, D. On the estimation of ship’s fuel consumption and speed curve: A statistical approach. J. Ocean. Eng. Sci. 2016, 1, 157–166. [Google Scholar] [CrossRef] [Green Version]
Wang, S.; Ji, B.; Zhao, J.; Liu, W.; Xu, T. Predicting ship fuel consumption based on LASSO regression. Transp. Res. Part D Transp. Environ. 2018, 65, 817–824. [Google Scholar] [CrossRef]
Soner, O.; Akyuz, E.; Celik, M. Statistical modelling of ship operational performance monitoring problem. J. Mar. Sci. Technol. 2019, 24, 543–552. [Google Scholar] [CrossRef]
Yuan, J.; Nian, V. Ship energy consumption prediction with Gaussian process metamodel. Energy Procedia 2018, 152, 655–660. [Google Scholar] [CrossRef]
Hu, Z.; Jin, Y.; Hu, Q.; Sen, S.; Zhou, T.; Osman, M.T. Prediction of Fuel Consumption for Enroute Ship Based on Machine Learning. IEEE Access 2019, 7, 119497–119505. [Google Scholar] [CrossRef]
Petersen, J.P.; Jacobsen, D.J.; Winther, O. Statistical modelling for ship propulsion efficiency. J. Mar. Sci. Technol. 2012, 17, 30–39. [Google Scholar] [CrossRef]
Besikci, E.B.; Arslan, O.; Turan, O.; Olcer, A.I. An artificial neural network based decision support system for energy efficient ship operations. Comput. Oper. Res. 2016, 66, 393–401. [Google Scholar] [CrossRef] [Green Version]
Du, Y.; Meng, Q.; Wang, S.; Kuang, H. Two-phase optimal solutions for ship speed and trim optimization over a voyage using voyage report data. Transp. Res. Part Methodol. 2019, 122, 88–114. [Google Scholar] [CrossRef]
Yang, L.; Chen, G.; Rytter, N.G.M.; Zhao, J.; Yang, D. A genetic algorithm-based grey-box model for ship fuel consumption prediction towards sustainable shipping. Ann. Oper. Res. 2019. [Google Scholar] [CrossRef]
Yan, R.; Wang, S.; Du, Y. Development of a two-stage ship fuel consumption prediction and reduction model for a dry bulk ship. Transp. Res. Part Logist. Transp. Rev. 2020, 138, 101930. [Google Scholar] [CrossRef]
Petersen, J.P.; Winther, O.; Jacobsen, D.J. A Machine-Learning Approach to Predict Main Energy Consumption under Realistic Operational Conditions. Ship Technol. Res. 2012, 59, 64–72. [Google Scholar] [CrossRef]
Jeon, M.; Noh, Y.; Shin, Y.; Lim, O.; Lee, I.; Cho, D. Prediction of ship fuel consumption by using an artificial neural network. J. Mech. Sci. Technol. 2018, 32, 5785–5796. [Google Scholar] [CrossRef]
Yuan, Z.; Liu, J.; Zhang, Q.; Liu, Y.; Yuan, Y.; Li, Z. Prediction and optimisation of fuel consumption for inland ships considering real-time status and environmental factors. Ocean. Eng. 2021, 221, 108530. [Google Scholar] [CrossRef]
Kim, Y.R.; Jung, M.; Park, J.B. Development of a Fuel Consumption Prediction Model Based on Machine Learning Using Ship In-Service Data. J. Mar. Sci. Eng. 2021, 9, 137. [Google Scholar] [CrossRef]
Karagiannidis, P.; Themelis, N. Data-driven modelling of ship propulsion and the effect of data pre-processing on the prediction of ship fuel consumption and speed loss. Ocean. Eng. 2021, 222, 108616. [Google Scholar] [CrossRef]
Zheng, J.; Zhang, H.; Yin, L.; Liang, Y.; Wang, B.; Li, Z.; Song, X.; Zhang, Y. A voyage with minimal fuel consumption for cruise ships. J. Clean. Prod. 2019, 215, 144–153. [Google Scholar] [CrossRef]
Peng, Y.; Liu, H.; Li, X.; Huang, J.; Wang, W. Machine learning method for energy consumption prediction of ships in port considering green ports. J. Clean. Prod. 2020, 264, 121564. [Google Scholar] [CrossRef]
Uyanık, T.; Karatuğ, Ç.; Arslanoğlu, Y. Machine learning approach to ship fuel consumption: A case of container vessel. Transp. Res. Part D Transp. Environ. 2020, 84, 102389. [Google Scholar] [CrossRef]
Olson, R.S.; La Cava, W.; Mustahsan, Z.; Varik, A.; Moore, J.H. Data-driven Advice for Applying Machine Learning to Bioinformatics Problems. In Proceedings of the Acific Symposium on Biocomputing 2018: Proceedings of the Pacific Symposium, Kohala Coast, HI, USA, 3–7 January 2018; Volume 23, pp. 192–203. [Google Scholar]
Massaoudi, M.; Refaat, S.S.; Chihi, I.; Trabelsi, M.; Oueslati, F.S.; Abu-Rub, H. A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting. Energy 2021, 214, 118874. [Google Scholar] [CrossRef]
Ribeiro, M.H.D.M.; dos Santos Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. J. 2020, 86, 105837. [Google Scholar] [CrossRef]
Meng, Q.; Du, Y.; Wang, Y. Shipping log data based container ship fuel efficiency modeling. Transp. Res. Part B Methodol. 2016, 83, 207–229. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Bai, S.; Li, M.; Kong, R.; Han, S.; Li, H.; Qin, L. Data mining approach to construction productivity prediction for cutter suction dredgers. Autom. Constr. 2019, 105, 102833. [Google Scholar] [CrossRef]
Dong, W.; Huang, Y.; Lehane, B.; Ma, G. XGBoost algorithm-based prediction of concrete electrical resistivity for structural health monitoring. Autom. Constr. 2020, 114, 103155. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
Bocchetti, D.; Lepore, A.; Palumbo, B.; Vitiello, L. A statistical control of the ship fuel consumption. In Proceedings of the Royal Institution of Naval Architects—Design and Operation of Passenger Ships 2013, London, UK, 20–21 November 2013; pp. 87–92. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Zhou, Z.H. Ensemble Learning. Encycl. Biom. 2009, 1, 270–273. [Google Scholar] [CrossRef]
Wei, L.; Hu, X.; YI, S. Optimized-XGBoost Early Warning of Generator Front Bearing Fault. J. Syst. Simul. 2020. [Google Scholar] [CrossRef]

Figure 1. Effect of data processing on the ship fuel consumption data: (a) Before removing anomaly data. (b) After removing anomaly data.

Figure 2. Data distribution of the dataset features: (a) Distribution of GPS speed and fuel consumption. (b) Distribution of mean draft and trim. (c) Distribution of wind speed and direction. (d) Distribution of wave height and direction. (e) Distribution of current speed and direction.

Figure 3. Research structure map.

Figure 4. The hybrid model flowchart.

Figure 5. The impact of different data volumes on the model: (a) R

^{2}

. (b) MSE. (c) MAE. (d) T(s).

Figure 5. The impact of different data volumes on the model: (a) R

^{2}

. (b) MSE. (c) MAE. (d) T(s).

Figure 6. Feature importance: (a) ET. (b) XGB.

Figure 7. Performance comparison before and after model hyper-parameter optimization: (a) R

^{2}

. (b) MSE. (c) MAE. (d) T(s).

Figure 7. Performance comparison before and after model hyper-parameter optimization: (a) R

^{2}

. (b) MSE. (c) MAE. (d) T(s).

Table 1. Information of the research object, a container ship.

Ship Type	Length/m	Beam/m	Capacity/TEU	Gross Tonnage/t	Built/Year
Container	349	46	10,060	114,394	2007

Table 2. Ship fuel consumption data.

Feathers	Unite	Input/Output	Sample 1	Sample 2	Sample 3	Sensor Name
Fuel consumption	t/15 min	Output	0.3284	0.3120	0.3593	Flow meter
GPS speed	Kn	Input	11.9879	11.9701	4.8693	GPS
Mean draft	m	Input	9.5692	9.5903	9.4224	Echosounder
Trim	m	Input	0.7660	0.7719	2.6120	Echosounder
Wind speed	m/s	Input	2.3216	1.7954	NaN	Radarsonde
Wind direction	degree	Input	310.5	310.5	NaN	Radarsonde
Current speed	Kn	Input	0.1	0.1	NaN	Current meter
Current direction	degree	Input	101.7	101.7	NaN	Current meter
Wave direction	degree	Input	242.8	242.8	NaN	Wave gauge
Wave height	m	Input	1.1	1.1	NaN	Wave gauge

Table 3. Effects of wind, wave and current factors on the results of the proposed model.

Methods	Datasets	R $^{2}$	MSE	MAE (t/day)	T(s)
Stacking	Set 1	0.9932	8.0196	1.7488	4.1553
	Set 2	0.9927	8.6689	1.8445	3.5038
	Set 3	0.9924	8.9833	1.8637	3.9863
	Set 4	0.9932	8.0949	1.7571	3.6782

Table 4. Effects of wind, wave and current factors on the results of the reference models.

Methods	Datasets	R $^{2}$	MSE	MAE (t/day)	T(s)
ET	Set 1	0.9927	8.6671	1.8038	0.1634
	Set 2	0.9923	9.1759	1.8763	0.1333
	Set 3	0.9916	9.9306	1.9098	0.1502
	Set 4	0.9926	8.8101	1.7987	0.1370
RF	Set 1	0.9906	11.1515	2.0047	0.3449
	Set 2	0.9898	12.0498	2.0911	0.2599
	Set 3	0.9897	12.1662	2.0864	0.3070
	Set 4	0.9904	11.4236	2.0251	0.2844
XGB	Set 1	0.9908	10.8770	2.1565	0.4833
	Set 2	0.9904	11.3566	2.1816	0.3633
	Set 3	0.9901	11.6896	2.2481	0.5087
	Set 4	0.9908	10.9167	2.1393	0.3893
ANN	Set 1	0.9834	19.7553	3.2609	2.6657
	Set 2	0.9798	24.0192	3.6488	2.8197
	Set 3	0.9791	24.8390	3.7201	2.7607
	Set 4	0.9817	21.7287	3.4076	2.7656
SVM	Set 1	0.9830	20.2271	3.2763	0.8833
	Set 2	0.9800	23.6969	3.5838	0.9294
	Set 3	0.9779	26.1999	3.7563	0.9963
	Set 4	0.9821	21.2294	3.3078	0.8206
MLR	Set 1	0.9268	86.8341	7.5772	0.0032
	Set 2	0.9259	87.9067	7.6768	0.0038
	Set 3	0.9250	88.9571	7.6654	0.0036
	Set 4	0.9250	88.8938	7.6727	0.0032

Table 5. Hyper-parameters that need to be optimized.

Models	Hyper-Parameters	Package Version
ET	$m a x_{-} f e a t u r e s$ : [2, 30], $n_{-} e s t i m a t o r s$ : [1, 20], $m a x_{-} d e p t h$ : [2, 20], $m i n_{-} s a m p l e s_{-} l e a f$ : [1, 9], $m i n_{-} s a m p l e s_{-} s p l i t$ : [100, 600]	scikit-learn 0.20.1
XGB	$m a x_{-} d e p t h$ : [2, 20], $m i n_{-} c h i l d_{-} w e i g h t$ : [1, 6], $g a m m a$ : [0.1, 0.8], $c o l s a m p l e_{-} b y t r e e$ : [0.6, 0.1], $n_{-} e s t i m a t o r s$ : [100, 800], $l e a r n i n g_{-} r a t e$ : [0.01, 0.5], $s u b s a m p l e$ : [0.6, 1.0], $r e g_{-} a l p h a$ : [0.05, 3.0], $r e g_{-} l a m b d a$ : [0.05, 3.0]	xgboost1.1.1
RF	$m a x_{-} f e a t u r e s$ : [2, 30], $n_{-} e s t i m a t o r s$ : [1, 20], $m a x_{-} d e p t h$ : [2, 20], $m i n_{-} s a m p l e s_{-} l e a f$ : [1, 9], $m i n_{-} s a m p l e s_{-} s p l i t$ : [100, 600]	scikit-learn 0.20.1
ANN	$a c t i v a t i o n$ : [‘identity’, ‘logistic’, ‘tanh’, ‘relu’], $l e a r n i n g_{-} r a t e$ [‘constant’, ‘invscaling’, ‘adaptive’], $a l p h a$ :[0.000001, 0.01], $m a x_{-} i t e r$ :[100, 400], $s o l v e r$ :[‘sgd’, ‘adam’], $h i d d e n_{-} l a y e r_{-} s i z e s$ : [10, 500]	scikit-learn 0.20.1
SVM	C: [0.01, 10]	scikit-learn 0.20.1
MLR	None	scikit-learn 0.20.1

Note: the left and right value in the square brackets represent the minimum and maximum, respectively.

Table 6. Mean and Standard deviation(Std) of different model error metrics.

Methods	Datasets	R $^{2}$	MSE	MAE (t/day)	T(s)
Proposed model	Mean	0.9938	7.3446	1.6908	79.3902
Proposed model	Std	0.0003	0.2982	0.0354	19.5612
ET	Mean	0.9938	7.3496	1.6752	5.3304
ET	Std	0.0003	0.3025	0.0371	2.6599
XGB	Mean	0.9928	8.5844	1.8620	6.7476
XGB	Std	0.0004	0.4950	0.0356	3.3799
RF	Mean	0.9927	8.6332	1.7993	8.5345
RF	Std	0.0003	0.4691	0.0356	3.1417
ANN	Mean	0.9867	15.7299	2.8619	6.5073
ANN	Std	0.0014	1.3770	0.1329	0.8787
SVM	Mean	0.9863	16.2503	2.8369	2.6990
SVM	Std	0.0002	0.5718	0.0470	0.4470
MLR	Mean	0.9268	86.8341	7.5772	0.0032
MLR	Std	0.0022	2.1098	0.1129	0.0017

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Z.; Zhou, T.; Osman, M.T.; Li, X.; Jin, Y.; Zhen, R. A Novel Hybrid Fuel Consumption Prediction Model for Ocean-Going Container Ships Based on Sensor Data. J. Mar. Sci. Eng. 2021, 9, 449. https://doi.org/10.3390/jmse9040449

AMA Style

Hu Z, Zhou T, Osman MT, Li X, Jin Y, Zhen R. A Novel Hybrid Fuel Consumption Prediction Model for Ocean-Going Container Ships Based on Sensor Data. Journal of Marine Science and Engineering. 2021; 9(4):449. https://doi.org/10.3390/jmse9040449

Chicago/Turabian Style

Hu, Zhihui, Tianrui Zhou, Mohd Tarmizi Osman, Xiaohe Li, Yongxin Jin, and Rong Zhen. 2021. "A Novel Hybrid Fuel Consumption Prediction Model for Ocean-Going Container Ships Based on Sensor Data" Journal of Marine Science and Engineering 9, no. 4: 449. https://doi.org/10.3390/jmse9040449

APA Style

Hu, Z., Zhou, T., Osman, M. T., Li, X., Jin, Y., & Zhen, R. (2021). A Novel Hybrid Fuel Consumption Prediction Model for Ocean-Going Container Ships Based on Sensor Data. Journal of Marine Science and Engineering, 9(4), 449. https://doi.org/10.3390/jmse9040449

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Hybrid Fuel Consumption Prediction Model for Ocean-Going Container Ships Based on Sensor Data

Abstract

1. Introduction

2. Literature Review

2.1. Physics-Based Ship Fuel Consumption Prediction Model

2.2. Simulation-Based Ship Fuel Consumption Prediction Model

2.3. Data-Driven Ship Fuel Consumption Prediction Model

2.4. Research Gap and Contributions

3. Data Collection and Processing

3.1. Data Collection

3.2. Data Processing

3.3. Data Overview

4. Methodology

4.1. Overall Framework

4.2. The Related Methods

4.2.1. XGB

4.2.2. RF and ET

4.2.3. MLR

4.2.4. The Hybrid Fuel Consumption Prediction Model

4.3. Hyperparameters Optimization and Cross-Validation

4.4. Error Metric

5. Results and Discussion

5.1. The Impact of Data Volume on Model Performance

5.2. The Impact of Marine Environmental Factors on Model Performance

5.3. The Influence of Hyperparameters on Model Performance

5.4. Performance Analysis of Different Models

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI