1. Introduction
During the past years, it has been observed that due to the high demand of transporting goods via sea, there was a need of expanding the world’s fleet [
1]. That meant that at the same time the average fuel consumption has rapidly risen. This fact has significantly affected fuel prices, which have increased [
2]. Nevertheless, such an increase in fuel prices was not enough to affect the afore-mentioned trend. This is partly justifiable, taking into consideration that due to globalization and the expansion of the fleet, there was also an increase in the income of the shipping companies. As time went by, both society and the shipping community started to become deeply concerned about the increase of the emissions that were produced as a result of commercial shipping [
3]. The beginning of this concern was the “Kyoto Protocol”, which introduced a series of measures that had to be urgently adopted in order to reduce the emissions of CO
2 and therefore restrict the global development of the greenhouse gases [
4]. It was only in 2008 that shipping was included in the target of reducing the emissions of CO
2 as well as other greenhouse gases. As there was an expectation of an extreme growth of CO
2 in the future, shipping could no longer be a member of a non-regulation team in this matter. Further to the “Kyoto Protocol”, the “Monitoring, Reporting, Verifying” (EU-MRV) regulation has been enforced in the shipping sector, suggesting that all the shipping companies and operators are obliged to monitor, report, and verify the ships’ emissions and consequently to observe their daily fuel consumption [
5]. At this point, it should be mentioned that most of the existing regulations are focused on the reduction of CO
2, as it is the predominant greenhouse gas. Moreover, a vessel’s fuel consumption and the production of emissions is a subject which has been analyzed, mostly based on container vessels. According to the International Maritime Organization (IMO), the vessels that produce highest emissions are those which are the most fuel consuming. Despite the fact that they do not represent a big part of the global fleet, container ships of 3000–5000 twenty-foot equivalent units (TEUs) and 5000–8000 TEUs and roll-on/roll-off passenger (Ro/Pax) vessels (i.e., ships carrying passengers and wheeled cargo) are the two most fuel-consuming categories. This has been repeatedly justified by their speed and the time they both need to stay in the port area [
6].
The shipping industry plays a vital role in economic development, and it is recognized not only as the cornerstone of the world trade, as over 90% of the world’s trade is carried by sea, but also as a low-cost transportation mean [
6]. However, the growing demand of goods will lead to a rise of the world fleet—a fact that will result not only in an increase in global emissions from seaborne transportation but also more fossil fuel required for ships’ operations [
7]. It is well known that shipping-related activities mainly rely on fossil fuel consumption, a fact that has great impact not only to the environment but also on public health [
8]. This fossil fuel utilization leads to the production of greenhouse gases (CO
2) but also nitrogen oxides (NOx) and sulfur oxides (SOx), which are related to human fatalities and environmental degradation [
9]. According to the International Council on Clean Transportation for the period 1990–2007, it was observed that seaborne emissions increased from 585 to 1096 million tons [
10]. According to [
11], during 2008, the emissions generated from shipping amounted to 7.40 million tons. In addition to the above, it is noteworthy to mention that shipping emissions represent around 3.30% of the global anthropogenic emissions, and this percentage is expected to be increased by 2050 [
11]. The shipping sector is the second-largest emitter of carbon dioxide compared to other transportation means [
12]. From another point of view, it should be stated that bunkers have always been a key factor of shipping operations, as fossil fuels account for 50–60% of a company’s operational running costs [
13]. Thus, a potential increase in the price of oil will constitute a liability, as it will negatively affect the profitability of the shipping company or the operator. Therefore, the need for the development of a prediction tool for fuel consumption is clearly noted, as it would be considered not only a competitive advantage, but it can also contribute to the increase of company’s revenues through sustained energy savings. This prediction tool can be used for achieving both optimization of the ship’s operations and fuel efficiency [
7]. 
It is a fact that few studies have been proposed so far on passenger ship fuel consumption prediction. Bal Beşikçi et al. developed an Artificial Neural Network (ANN) prediction model including seven input variables provided by a noon dataset (speed, trim, draft, weather conditions, quantity of the cargo) in conjunction with engine’s revolutions per minute (RPM). This ANN-based model was used in a latter phase so that a decision support system for energy efficiency in real-time ship operations could be built. Furthermore, the prediction ANN model was compared to a Multiple Regression (MR) model, leading to the outcome that the former has greater prediction performance than the latter [
14]. Petersen et al. modeled the fuel consumption for real-time conditions through the implementation of an ANN. More specifically, the factors that describe the dynamic state of the vessel (i.e., speed, trim, draft, propeller pitch, and engine’s RPM) have been collected through sensors and were used as input variables to the model. The outcomes from this research provided that this model can obtain accurate results, and it can be used for a vessel’s trim optimization [
15]. Pedersen and Larsen presented in their study a neural network for a propulsion power predicting model by taking into account a vessel’s noon report dataset, weather, and onboard measurement data. They concluded that the ANN model is more accurate compared to linear and non-linear models [
16].
Taking into consideration all the above, it can be concluded that the developed models presented in these researches were mainly based on vessels’ operational data collected either by a vessel’s noon reports or through sensors on board. The literature review undertaken reveals that there are few studies on passenger ship fuel consumption. All those studies involve neural networks. More specifically, in [
14], a neural network is compared with a regression model, while in [
15] and [
16], there is no comparison with other modeling approaches. While neural networks are a prominent forecasting methodology for many applications, a comparison with other models enables the provision of more reliable conclusions regarding identifying more robust models for fuel consumption predictions. Since the passenger ship fuel consumption forecasting problem is a relatively under-examined research field, the authors of this study believe that as a first step to examine this problem, a comparison of models of different types and complexities should take place. Therefore, in the present paper, a novel forecasting hybrid model is proposed, combining shallow and deep learning. The model is compared with eight alternative models. This approach supports the conclusions drawn by the application of this hybrid model. The contribution of the paper concerns also the assessment of several cases of input data, compared to the literature, as studies [
14,
15,
16] involve only one case of inputs. In the present paper, three different test cases of input combinations are formulated and applied. The scope is to investigate how different inputs influence the forecasting accuracy and determine which combination has the highest influence on the forecasting results. Another critical theme in forecasting studies is the evaluation framework. In [
14], the evaluation takes into account two indices, while in [
15,
16], one indicator is used. In this paper, five indices are used, enhancing a more robust validation of the applied models. Thus, the present study contributes to the literature in the following ways: (a) it applies a novel model for passenger ship fuel consumption forecasting, which is compared with more alternative approaches compared to the literature; (b) more cases of inputs are used for the validation of the proposed methodology, compared to the literature; and (c) a more extended set of validation criteria is employed, compared to the literature. Examining passenger ship fuel consumption in a more detailed manner enables the provision of more robust conclusions on the validity of the proposed novel hybrid model.
This study focuses on the fuel consumption of Ro/Pax vessels, taking as a case study a specific vessel. This category has been chosen as it significantly raises the average port emissions, which does not significantly contribute to the national emissions inventory, but they are extremely important for the port’s greater area emission management. In the present paper, a discussion on the factors that affect fuel consumption is provided. Next, the day-ahead fuel consumption forecasting problem is formulated and studied. A hybrid machine learning system that combines a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) and an Elman Neural Network (ENN) is proposed for various cases that differ in terms of types and number of inputs [
17]. This is due to fully evaluate the influence of exogenous parameters on fuel consumption. A comparison with other machine learning and time series models is performed, and superiority of the proposed model is observed.
  2. Fuel Consumption
Fuel consumption is a principle exponential to a vessel’s velocity, and it closely affects both the operational costs and the increase of greenhouse gas (GHG) emissions. Moreover, regarding fuel consumption, it should be taken into account that it is a complex variable due to its physical principles, which sometimes may lead to disputable and ambiguous results, a fact that makes a generalized explanation virtually impossible [
18]. It should be noted that a vessel’s total fuel consumption can be expressed as the sum of main and auxiliary engine consumptions while the vessel is at port and during its service at sea [
19]. In the following, the types of fuels and the parameters which affect the fuel consumption are presented. Firstly, an important factor that must be taken into consideration is the type of fuel oil used onboard the vessels. Several studies have concluded that a vessel’s energy efficiency and the fuel consumption are related to the type of fuel used. More specifically, according to [
20], variations in fuel consumption are observed during the voyages due to fuel specifications (i.e., sulfur, water, ash quality, etc.). This can also be confirmed by the fact that engine manufacturers admit that the type of fuel in conjunction with operational practices may result in an increment in fuel consumption [
21]. On the other hand, shipping companies and operators are forced to switch to bunkering fuel in order to meet regulation requirements. It should be noted that the ISO 8217 international standard classifies marine fuel oils into two distinct categories, distillate and residual fuel oils [
22]. Heavy Fuel Oil (HFO) is an industrial fuel and is also known as a “refinery residual”, as it is incurred from refining process and more precisely from the distillation of crude oil [
23]. HFO accounts for 80% of the global marine fuel oils used, and it is divided into three types regarding their sulfur content. In addition to the above, HFO is also classified, with regard to its viscosity, into HFO 180 and HFO 380 (i.e. 180 mm
2/s and 380 mm
2/s, respectively.) [
24]. On the other hand, distillate marine fuel oils commonly used are Marine Gas Oil (MGO) and Marine Diesel Oil (MDO), and they are both classified as low in sulfur. Their main difference lies on the fact that MGO has a lower sulfur content and viscosity than MDO. It is essential to state here, that these distillate fuels are more expensive but lower in sulfur content than HFOs [
25]. Moreover, it should be mentioned that unlike the low sulfur content, distillate marine fuels (MDO and MGO) have higher carbon contents—a fact that can be observed by the emission factors provided by the IMO [
26]. 
Table 1 lists all the aforementioned information regarding the specifications of each fuel and the emissions factors.
From all the above, it is understood that the “bunkering switch” process to lower sulfur content fuel oil will have an impact on shipping companies due to the fact that low-sulfur bunkers are more expensive, which will result in additional costs and an increase of freight rates and ticket fares which will be passed on to customers [
28]. Therefore, the implementation of low sulfur bunkering will affect short sea-shipping companies significantly, including companies owning Ro/Pax vessels, as operators will have to take a decision to either increase the ticket fare or to reduce daily itineraries. The “bunkering switch” process will increase the costs of a shipping company by around 80%. Another threat, that shipping companies may face is the fuel oil price volatility. In the case of a potential increase of fuel price, the cost will be passed on to the customer, resulting in the use of other, less expensive transportation modes [
2]. Another issue is that in the case that the operator chooses to use low sulfur fuel oil onboard the vessels, this decision will lead to higher CO
2 emissions due to the fact that both MDO and MGO have higher carbon contents. Thus, the fuel switching process to fuels with low sulfur content may result in both additional costs and higher carbon emissions.
The prediction of fuel consumption plays an important role for the viability of a shipping company, and it is characterized by uncertainties, as it is clearly dependent on the ship’s design, operational performance, and environmental conditions. The actual fuel consumption is monitored onboard the vessel through mass flow meters, which measure fuel oil usage both in main and auxiliary engines right after the arrival and the departure from the port. The ship design factors involve the main dimensions, cargo arrangement, propulsion system, engine specifications (main and auxiliary), propeller design, and hull/steel structure. The operational performance factors involve the performance itself at sea and at port, time in service at port, speed, draft, trim, displacement, hull performance, and drydocking. Finally, the environmental condition factors involve wind speed, wave height, and water and air temperature [
29].
  4. Results
The scope of the analysis of the paper is to forecast the total ship fuel demand (LSFO) in a day-ahead time frame using various inputs combinations. The examined time period refers to 02/01/2018–14/11/2018, i.e., a total of 322 days. 
Figure 8 presents the fuel consumption time series. The period 02/01/2018–12/09/2018 serves as a training set, and the period 13/09/2018–14/11/2018 as test set. The training set covers almost 80% of the total data, and the remaining 20% refer to the test set. The training set is used to derive the optimal parameters of the models (i.e., structure, number of neurons, number of epochs, etc.) while the test set is employed for the models’ comparison. It can be noticed that there are several spikes in the time series, a fact that makes fuel demand forecasting a challenging task.
The prediction accuracy will be evaluated with a set of mathematical indicators. Let 
 and 
 be the actual and predicted fuel demand of the 
m-th day of the test set, 
m = 1,2,…,M and 
M = 65, respectively. The Absolute Error (AE) is defined as [
39]:
The Mean Absolute Error (MAE) refers to the sum all AEs [
39]:
The Mean Absolute Percentage Error (MAPE) is given by [
39]:
The Root Mean Squared Error (RMSE), is expressed as [
39]:
The Mean Absolute Range Normalized Error (MARNE) is the absolute difference between the actual and forecast natural gas demand, normalized to the maximum fuel demand [
40]:
To fully validate the LSTM model, a comparison takes place with Radial Basis Function Neural Network (RBFNN) [
41], General Regression Neural Network (GRNN) [
42], Elman Neural Network (ENN) [
43], Support Vector Regression (SVR) [
44], Group Method of Data Handling (GMDH)-based neural network [
45], Relevant Vector Machine (RVM) [
46], Feed-Forward Neural Network (FFNN) [
47], and Multiple Regression Model (MRM) [
48]. Regarding the MRM, a linear model is assumed. Not only does it assess the effect of two or more independent variables to the dependent variable, but it also predicts the value of the dependent variable taking into account the values of predictors, and hence, its equation is defined by the following formula:
      where 
 is the response variable, 
 are the explanatory or independent variables, 
 are the regression coefficients, and 
 is the error term. In order to model the relationship between an independent variable and a predictor by applying multiple regression analysis, the following assumptions must be tested [
49]:
- The dependent and at least two predictor variables must be continuous. 
- Linearity between the dependent variable with each one of predictors. The independent variable must be expressed as a linear function of independent variables. 
- The difference between the predicted and the actual values (residuals) must follow the Gaussian distribution. 
- Absence of multicollinearity, a fact which underlies that regressors (independent variables) must not be tightly correlated to each other. 
- Absence of autocorrelation, a fact which provides that the residuals must be independent from each other. Therefore, the observations must be independent from their past values. 
- The residuals must be homoscedastic (constant variance of errors). 
In the case that one of the aforementioned assumptions are not met, then the implications of the violation may lead to invalid or misleading results, and therefore the model must be adjusted. However, not all violations have the same impact on the analysis. More precisely, a linearity violation is critical, and it results in biased predictions, while a violation in the independence of residuals has an impact only on standard errors. Additionally, a violation in homoscedasticity negatively influences the standard errors and the statistical significance [
48]. Τhe statistical procedure of the least square method will be applied in order to reduce the squares of residuals occurred by the results. Another aspect that holds a crucial role is the goodness of fit in conjunction with the statistical significance, which will reveal whether the regression model adequately describes the set of observations.
In order to perform the multiple regression analysis, the original variables are converted into standardized values (z-scores). The overall MRM accuracy and how well the regression line fits will be determined by the coefficient of determination (
), and its values must range between 0 and 1, where values closer to 1 indicate a perfect fit. Another aspect that must be evaluated is the significance of the model, which will be determined by the 
p-values by taking into consideration that the level of significance is 
. Referring to 
Table 2, it is observed that the predictor variables “Main Engine & Total LSFO Consumption” and “Miles” are perfectly correlated with 
 a fact which indicates the occurrence of multicollinearity in the regression model. Hence, the independent variable “Miles” is omitted from the MR analysis. Furthermore, a stepwise regression procedure will also be applied in order to identify which predictor variables add variability to the model resulting in the increase of 
 As a result, the multiple regression analysis will be divided into two categories concerning the selection process (enter and stepwise) by which the predictor variables are entered in the equation. Utilizing the enter method selection process, all predictor variables are entered in the equation simultaneously, and the results from the regression analysis performed are depicted in 
Table 3.
It is observed that  a fact that shows that the relationship between the dependent and the predictor variables is strong enough, so that the 83% of the variation of the total fuel consumption are linearly explained by the predefined independent variables. The remaining 17% of the variation in total fuel consumption can be explained by other factors, such as the vessel’s total resistance, hull roughness, etc. Another factor that must be taken into account is the Adjusted , which is not much lower than , a fact that shows that the regression model can be generalized to the population. Hence, the 81% of total variance of the response variable can be explained by the model. Moreover, from the Durbin–Watson (DW) value  it is assumed that there is no linear autocorrelation in residuals, as it falls within the range of 1.50 and 2.50. The Standard Error (SE) represents the regression error. In our case,  i.e., 1.16% of variance in the Total Fuel Consumption cannot be explained by the regression model. Therefore, it is understood that the SE is not high enough, leading to the fact that the values are well-fitted to the regression line.
From the ANOVA results in 
Table 4, we may examine the 
p-value in order to evaluate the significance of the regression model and whether the predictor values contribute significantly to the prediction of the total fuel consumption values. It is observed that the model’s 
p-value accounts for 0.000 while the level of significance is 
; hence, the 
p-value is much lower than the level of significance. As a result, it is proved that the developed model is significant. 
Table 5 presents the results of the regression analysis utilizing the enter method. It can be noticed that the contribution of the independent variables to the model is indicated by the column “Sig.” More specifically, from the predictors’ 
p-values, it is observed that almost all independent variables significantly contribute to the regression model. However, only the wind variable with a 
p-value of 0.424 does not contribute. Moreover, when examining the Variation Inflation Factor (VIF), the absence of multicollinearity is noted, as VIF only denotes the occurrence of multicollinearity in the model when the it ranges between 5 and 10.
 Furthermore, the B column is used in order to develop our regression model, as the values in the column are replacing the coefficients. These coefficients represent the association between the total fuel consumption and the independent variables. However, from 
Table 5, it is noted that the predictor “Average Speed” has a negative coefficient, fact that indicates the negative association between this variable and the independent variable. This observation can be justified by the fact that the speed reduction can lead to higher fuel consumption due to the increase in the hull resistance and consequently to an increment in Effective Horsepower [
50]. Hence, the MRM has the following form:
Using the stepwise selection process to identify all the explanatory variables that significantly influence the dependent variable, the model summary is derived and presented in 
Table 6. The results from the regression analysis indicate that the model with the highest 
 (0.829) is the third one, which incorporates the variables average speed, M&E LSFO fuel consumption, and number of passengers, while the variable wind is omitted from the regression analysis, as it does not significantly contribute to the model’s ability to predict the fuel consumption. This observation was also confirmed when applying the enter method.
The coefficients and the variables used in the three regression models are illustrated in 
Table 7. Furthermore, it is also denoted that only the variables with a 
p-value of more than 0.05 were entered in the model. More precisely, from the t-values, it is observed that the strongest predictor is the main engine hours and LSFO total consumption, as fuel consumption is tightly associated with the main engine’s working hours in conjunction with the LSFO fuel usage. It is denoted that also in the stepwise process, the average speed has a negative association with the dependent variable. The results also revealed the absence of multicollinearity in the regression model.
From the interpretation of the scatter plot in 
Figure 9, the relationship between the actual and predicted values of the fuel consumption is depicted (322 observations). It should be stated that the x-axis shows the fuel consumption actual values, while the y-axis represents the predicted values of the fuel consumption. A positive slope is clearly observed, a fact that reflects the uphill positive relationship. Nevertheless, it should be stated that some outliers may be identified.
The basic model parameters derived from the training process of each model are the following: For the RBF and GRNN, the spread of the radial basis function is set to 1; for the ENN, the number of hidden layer neurons is set to 10; for the SVR, the type of kernel is linear and the kernel scale parameter is set to 1; for the GMDH, the number of layers is set to 4; the number of neurons in each layer is set to 14, and alpha parameter is set to 0.5; for the RVM, the kernel width is set to 5 and the likelihood is Gaussian type; for the FFNN, the number of hidden layers is set to 1, the activation function in both the hidden and output layer is hyperbolic tangent sigmoid, the maximum number of epochs is set to 1000, and the network is trained with the Bayesian regulation back-propagation algorithm [
51]; and for the LSTM, the number of epochs is set to 25, and the initial learning rate is set to 0.005.
The scores of the models on the error indicators are presented in 
Table 8, 
Table 9, and 
Table 10 for Case Study #1, Case Study #2, and Case Study #3, respectively. Instead of utilizing one error metric, a set of metrics leads to more accurate comparisons of the models. An ideal model should lead to lower errors in most, if not all, of the cases. MAE and RMSE measure the difference between the forecast and the actual value. MAPE and MARNE are percentage indicators. MARNE provides a solution to the inherent limitation of MAPE when zero or extremely low values are present in the data set. In this case, MAPE would be very high, a fact that does not provide reliable indication on the forecasting performance.
According to the above regression analysis, the MRM is applicable only for Test Case #2. It can be observed that in all examined cases the proposed LSTM-ENN model results in lower errors. According to 
Table 8, the FFNN and the ENN display robust performance. Highest errors correspond to GRRN and RBF. GMDH and RVM leads to comparable performance. The inputs of Test Case #2 appear more promising for the fuel demand forecasting problem under study. In 
Table 9, it is shown that all errors values are lower compared to Test Case #1 apart from GMDH. The latter is more robust in Test Case #1, where a reduced set of inputs is used. Moreover, no considerable differences are noticed in the operation of GRNN when MAPE indicator is used. When considering MAE and RMSE, which measure deviations from the target value, the difference is more visible. As in Test Case #1, ENN and FFNN are next in the comparison ranking after the LSTM-ENN model. SVR is the 4th model that leads to MAPE below 3%. GMDH comes next in the competition. RBF and MRM lead to MAPE values above 7%. For most of the models, Test Case #3 leads to lower errors. For instance, the LSTM-ENN model manages to provide even better accuracy. Moreover, ENN and FFNN score MAPE values below 2.50%. On the contrary, RVM increases its error. As in the previous cases, the RBF neural network corresponds to poor prediction performance, and hence, it is not recommended for the problem under study. 
The findings in 
Table 8 and 
Table 9 revealed that the fuel consumption is more related to exogenous factors rather using only its preceding values. This is also evident from the results of the correlation analysis presented in 
Figure 4. The specific time series does not display strong autocorrelation with its preceding values. Moreover, the forecasting models are also dependent on the previous day values of the variables used. Therefore, a model can provide the fuel consumption prediction for the day ahead when several parameters are known; consequently, this can also lead to the prediction of ship-generated emissions. In addition, the total fuel consumption can be expressed as a nonlinear function of operational, design, and environmental parameters (i.e., average vessel speed, wind force, number of passengers, distance, and ME Hours and Total Consumption of LSFO).
The AE distributions for Test Case #1, Test Case #2, and Test Case #3 are illustrated in 
Figure 10, 
Figure 11, and 
Figure 12, respectively. The selected bin in the horizontal axis is set to 2 mtonMt/h. The number of instances per bin is also shown. Recall that the test set is composed of 65 days. According to the figures, the optimal forecaster refers to the cases that most instances are within the [0,2] range. In some models, i.e., RBF, there are instances above 10 Mt/h; these cases refer to forecasting failure. This is also the case with the MRM. It provides seven forecasts where the AE exceeds the 10 Mt/h upper threshold.
The actual and the predicted time series for Test Case #1, Test Case #2, and Test Case #3 are shown in 
Figure 13, 
Figure 14, and 
Figure 15, respectively. It can be observed that predicted values of the proposed model are very close to the actual values and follow the same trend during the test process (65 observations). Large deviations are displayed by RBF and MRM. This fact confirms the corresponding values are high values of the error metrics. Moreover, robust models to predict the fuel demand at different conditions (wind, speed, main engine working hours, and LSFO fuel consumption) are the EN and FFNN due to the fact the fuel consumption displays a non-linear relationship with some of these variables, and these models better captured and simulated it.