Predicting External Inﬂuences to Ship’s Average Fuel Consumption Based on Non-Uniform Time Set

: Nowadays, the impact of the ships on the World economy is enormous, considering that every ship needs fuel to sail from source to destination. It requires a lot of fuel, and therefore, there is a need to monitor and predict a ship’s average fuel consumption. However, although there are much models available to predict a ship’s consumption, most of them rely on a uniform time set. Here we show the model of predicting external inﬂuences to ship’s average fuel consumption based on a non-uniform time set. The model is based on the numeric ﬁtting of recorded data. The ﬁrst set of recorded data was used to develop the model, while the second set was used for validation. Statistical quality measures have been used to choose the optimal ﬁtting function for the model. According to statistical measures, the Gaussian 7, Fourier 8, and smoothing spline ﬁtting functions were chosen as optimal algorithms for model development. In addition to extensive data analysis, there is an algorithm for ﬁlter length determination for the preprocessing of raw data. This research is of interest to corporate logistics departments in charge of ensuring adequate fuel for ﬂeets when and where required.


Introduction
The effects of the marine environment and other causes on fuel consumption can be examined by various parameters and various approaches. Approaches could be model-based, which is usual in references or signal-based, which is considered in this paper. Model-based approaches tend to estimate the exact fuel consumption in the exact engine operating mode, examples of this are [1,2]. Such models are useful in, e.g., dual-fuel engines, when it is possible to calculate how much of fuels are being spent [1]. ANNs (Artificial Neural Network) in [2] need to be trained and re-trained for specific ships over time. However, there is no exact real-time monitoring of the hull status, which could be a problem that increases the error in the calculation during time between maintenances. Namely, during exploitation and over time, the ship hull is affected by fouling-a natural phenomenon where marine/aquatic vegetation and microorganisms attach to the hull, creating bio-layers that have an impact on the ship's speed and fuel consumption. Other contributing factors may include weather conditions, cargo, propulsion, engine conditions, etc. Data trend analysis is a suitable approach for predicting fuel consumption, depending on the biofouling layers as environmental contributors. In light of current increased efforts to improve energy efficiency, the above-mentioned topic is current, especially since it includes the use of hybrid technologies [3,4]. As stated in [4], fuel consumption reduction cannot be established without first exploring standard fuel consumption prediction models.
Instead of focusing on a specific ship, there are a number of references to fuel demand prediction [5][6][7][8] in which the focus is on energy efficiency and ecology [9][10][11], or predicting global where x is an independent variable, and a, b, and c are constants that have to be determined. The next data fitting function is the so-called exponential of the 1st order, which can be defined as [26]: where x is an independent variable, a e1 and b e1 are constant coefficients. The data were tested using the so-called exponential of the 2nd order, defined as follows [26]: 3 of 30 where a e2 , b e2 , c e2 , and d e2 are coefficients. The next curve fitting functions considered for research are taken from the Fourier series [26]: [a i · cos(i·ω·x) + b i · sin(i·ω·x)] (4) where coefficients from (4) are: a 0 = ( 2 /T)· T 0 f (x)·dt, a i = ( 1 /T)· T 0 f (x)· cos(i·ω·x)·dt, and b i = ( 1 /T)· T 0 f (x)· sin(i·ω·x)·dt, ω = (2·π) /T, and T is the period or data width. In the analysis, the Fourier series of the 1st, 2nd, and 8th order were considered, presented with Equations (5)-(7) as follows: Fourier series of the first order: Fourier series of the second order: [a i · cos(i·ω·x) + b i · sin(i·ω·x)] (6) and Fourier series of the eighth order: We likewise considered the Gaussian function [26], defined as: where a i , b i , and c i are Gaussian function coefficients. From (8), a Gaussian function with one term (defined in Matlab as Gaussian 1) was considered in the analysis, which could be defined with the following equation [26]: where a 11 , b 11 , and c 11 are coefficients of the Gaussian function with one term. Next, a Gaussian function with two terms (Gaussian 2 in Matlab) was considered, defined as [26]: where a 12 , b 12 , c 12 , a 22 , b 22 , and c 22 are coefficients of the Gaussian function with two terms. Polynomials-based fitting functions were also considered. First, the polynomial of the first order was defined as follows [26]: where p 01 and p 11 are first-order polynomial coefficients. Next, the second-order polynomial was defined as [26]: f P2 (x) = p 02 + p 12 ·x + p 22 ·x 2 (12) where p 02 , p 12 , and p 22 are second-order polynomial coefficients. Third-order polynomials were also used in the analysis, that could be defined as [26]: f P3 (x) = p 03 + p 13 ·x + p 23 ·x 2 + p 33 ·x 3 (13) where p 03 , p 13 , p 23 , and p 33 are third-order polynomial coefficients. All coefficients from (11), (12), and (13) could be calculated by performing the least square method [26]. In the next two equations, (14) and (15), the so-called first and second power model fitting functions were used, which could be described with the following equations [26]: f P1 = α 11 ·x β 11 (14) where α 11 and β 11 are coefficients from the so-called first-order power model, and [26]: (15) where α 21 , β 21 , and δ 21 are coefficients of the so-called second-order power model. Furthermore, Equations (16)- (20) use rational fitting functions, and can be defined as follows; the rational fitting function 1/1 could be expressed as [26]: where z 01 , z 11 , and q 01 are coefficients of the rational 1/1 fitting function. The rational fitting function 2/1 can be expressed as follows [26]: where z 02 , z 12 , z 22 , and q 02 are coefficients of the rational 2/1 fitting function. The rational fitting function 3/1 could be expressed as [26]: f 3/1 (x) = z 03 + z 13 ·x + z 23 ·x 3 + z 33 ·x 4 q 04 + x (18) where z 03 , z 13 , z 23 , z 33 , and q 03 are coefficients of the rational 2/1 fitting function. Equation (19) describes the 3/2 rational fitting function as follows [26]: f 3/2 (x) = z 05 + z 15 ·x + z 25 ·x 3 + z 35 ·x 4 q 05 + q 15 ·x + x 2 (19) where z 05 , z 15 , z 25 , z 35 , q 05 , and q 15 are coefficients of the rational 3/2 fitting function. Finally, the rational 5/3 fitting function could be expressed as [26]: where z 08 , z 18 , z 28 , z 38 , z 48 , q 08 , q 18 , and q 28 are coefficients of the rational 5/3 fitting function.
In the end, the data were fitted using the smoothing spline s, as in (21), for the specified smoothing parameter p and specified weights w i [26]. The smoothing spline minimizes the expression: If the weights are not specified, they are assumed to be 1 for all the data points. Parameter p is defined between 0 and 1. For p = 0, a least-squares straight-line is produced that fits the data. For p = 1, a cubic spline interpolant is obtained.
The sheer multitude of fitting functions to be tested with the data requires them to be quantified, and quality measures to be introduced. The following quality measures have been used-the first quality measure is the root mean square error (RMSE)-a measure that determines the differences between samples or population values predicted by the fitting function-that can be described as follows [27,28]: where X is the observed vector, Y the predicted vector, and N the number of data measured in the observed vector space.
The following three quality measures, as in (23), (24), and (25), are statistical measures that describe the number of variations in the dependent variable. The first quality measure is the sum of the squared estimate errors (SSE), which can be expressed as [27,28]: where y i is the i-th value of the variable to be predicted, and y i the predicted value of y i . SSE shows the measure of discrepancy between the data and the estimation model. The second quality measure is the total sum of squares (SST), that can be defined as [27][28][29]: where y i is the i th value of the variable to be predicted, and y the mean value. SST is defined as a quality measure that equals the sum of the squared differences between observations and their overall mean value. The third quality measure used here was the sum of squares due to regression, that could be defined as [27][28][29]: The following expression is known to be true [27][28][29]: Furthermore, the R-square measure could be devised from (23), (24), (25), and (26), and defined as: It is known in the literature [20][21][22] as the coefficient of determination that describes the proportion of dependent variable variance predictable from the independent variable. In addition, there is an adjusted R-square quality measure, which corrects the possible error in the R-square measure by increasing the number of samples, described as [27-29]: where N is the number of data samples, and k is the number, which explains the independent variable. In the next paragraph of this section, the random variable theory used throughout the paper will be presented. If we consider random variables X su , X a , X w , X sp in the vector notation: where X su , X a , X w , X sp represent random variables with N data samples measured in the summer (su), autumn (a), winter (w), and spring (sp) for four years. Standard statistical metrics such as expectation or average value, standard deviation, standard error, and correlation coefficient were used to study random variables. The average value of the random variable x or expectation could be represented by equation [27,29]: where E represents the expectation for a random variable, is the sign for random variable x, and N is the number of samples measured. The standard deviation of the random variable x could be represented by equation [27][28][29]: where σ x represents the standard deviation of the random variable x. The transformation of coordinates from one coordinate system to another is described by the following equation [27-29]: where x * a i x * w are transformed coordinates of random variables. The statistical metric used to quantify the similarity and dependence between variables,x a and x w is the correlation coefficient between the random variables. The correlation coefficient could be calculated using the following equation [27-29]: where r represents the correlation coefficient, N is the number of measurements, while σ xa and σ xw represent standard deviations of the random variables x a and x w . Furthermore, a model matrix A could be created using the following equation: where X su , X a , X w , X sp are vectors of random independent variables, together with Equations (29)-(33), transformed in the correlation matrix C x into the equation: . . · · · .
x suN x aN x wN x spN In the example from Equation (29), the correlation matrix C x has dimensions (4 × 4), as follows: The correlation matrix, C x , shows correlations between variables.
To conclude, this section covers the mathematical background of the fitting functions, from (1) to (21) that will be used in the prediction. Equations from (22) to (28) describe the quality measures, which will be used to grade the fitting functions and determine which fitting function is best suited to predicting the average fuel consumption based on the non-uniform data time set. Finally, the theory of random variables is introduced in (29) to (36) that will be useful for identifying the optimal fitting function for prediction.

Methodology, Setup, and Preprocessing
This section will cover the data smoothing methodology, otherwise known as data preprocessing. Owing to the non-uniform set of data and the need to identify the best-suited fitting function, choosing the "right" window to perform the moving average operation or filtering is of paramount importance. There are a number of moving average algorithms available that can be used for smoothing, such as a simple moving average (SMA), a weighted moving average (WMA), an exponential moving average (EMA), and a weighted exponential moving average method (WEMA). All the mentioned moving average algorithms have their advantages and disadvantages, but they all require choosing optimal filter window size, i.e., the number of past and future points that will determine the current point. The preprocessing procedure can be explained in steps, as shown in Figure 1.
which will be used to grade the fitting functions and determine which fitting function is best suited to predicting the average fuel consumption based on the non-uniform data time set. Finally, the theory of random variables is introduced in (29) to (36) that will be useful for identifying the optimal fitting function for prediction.

Methodology, Setup, and Preprocessing
This section will cover the data smoothing methodology, otherwise known as data preprocessing. Owing to the non-uniform set of data and the need to identify the best-suited fitting function, choosing the "right" window to perform the moving average operation or filtering is of paramount importance. There are a number of moving average algorithms available that can be used for smoothing, such as a simple moving average (SMA), a weighted moving average (WMA), an exponential moving average (EMA), and a weighted exponential moving average method (WEMA). All the mentioned moving average algorithms have their advantages and disadvantages, but they all require choosing optimal filter window size, i.e., the number of past and future points that will determine the current point. The preprocessing procedure can be explained in steps, as shown in Figure 1.
The first step in the signal processing is noise removal. Although the noise does not exist physically within the average daily fuel consumption data, it exists from the signal processing point of view. Namely, as there are sudden spikes in fuel consumption, comparing raw data would be somewhat misleading. Instead, the moving average or filtering operation is performed on raw data. The first step is to choose the size of the moving average filter (filter length). It has to be noted that ship data are not uniformly collected. Hence, there is no year, season, or month with an equal number of samples. Smoothing at the season level has been performed. The optimal filter length was identified by developing and performing an algorithm.
First, the R-square analysis of the raw data was performed. In the raw data, R-square is, e.g., 0.01636 for the Gauss 1 fitting function. As a low R-square measure implies a high noise level, raw data should be preprocessed by eliminating the noise. Since we failed to obtain satisfactory results, we proceeded according to the flow diagram given in Figure 1. First, we detected the filter length necessary for moving the average operation, and then performed the moving average to denoise the raw data and obtain enhanced input to the curve fitting operation. The influence of filter length can be seen in the correlation matrix. The results for the length of 52 points over four years of the moving average filter are as follows (including trends): The first step in the signal processing is noise removal. Although the noise does not exist physically within the average daily fuel consumption data, it exists from the signal processing point of view. Namely, as there are sudden spikes in fuel consumption, comparing raw data would be somewhat misleading. Instead, the moving average or filtering operation is performed on raw data. The first step is to choose the size of the moving average filter (filter length). It has to be noted that ship data are not uniformly collected. Hence, there is no year, season, or month with an equal number of samples. Smoothing at the season level has been performed. The optimal filter length was identified by developing and performing an algorithm.
First, the R-square analysis of the raw data was performed. In the raw data, R-square is, e.g., 0.01636 for the Gauss 1 fitting function. As a low R-square measure implies a high noise level, raw data should be preprocessed by eliminating the noise. Since we failed to obtain satisfactory results, we proceeded according to the flow diagram given in Figure 1. First, we detected the filter length necessary for moving the average operation, and then performed the moving average to denoise the raw data and obtain enhanced input to the curve fitting operation.
The influence of filter length can be seen in the correlation matrix. The results for the length of 52 points over four years of the moving average filter are as follows (including trends): The results for the length of 12 points of the moving average filter are: The results for the length of 24 points of the moving average filter are: The closer the correlation is to 1, the higher the similarity, and the closer the correlation is to 0, the lower the similarity. As various results for the correlation matrix are obtained by using different moving average filter lengths, we implemented an algorithm for identifying the best solution (see below: Algorithm 1 for optimum moving average filter length identification).
Different results were obtained for different seasons (years). An optimum filter length was used for corresponding data in further analysis.
Data are presented by years (in the first part of the research) and by seasons (in the second part of the research). Half of the data were used to generate the equation and the other half to test the prediction hypothesis.
All the calculations were performed in the Matlab application Curve fitting tool [26]. The variables of interest were input as y-data.

Results
The data obtained consist of 829-time samples that are not uniformly sampled. They are obtained from a single ship (5 October 2008-28 August 2012), and refer to fuel consumption in 24 h. We used the first 434 samples (two years) to obtain the fit function, while the remaining 395 samples (two years) were used for the identification of the fitting function best suited to predict future fuel consumption. Similar analysis was performed by seasons. The analysis was divided into two parts: analysis by years known as the vertical analysis and horizontal analysis, where one year was divided into four seasons, namely, summer, autumn, winter, and spring.

Analysis by Year
The correlation matrix for the first two years is: From (40), it can be seen that the correlation between the first, and the second year was 96.61%. For four years, the smoothed data had the following correlation: As the lowest correlations were between the second and the fourth year, the third and the fourth, and within the fourth year itself, there was obviously a fundamental occurrence in the 4th year, as can be seen from (41). The results obtained suggest that denoised data are suitable for further research. Table A1 indicates the coefficients obtained. As in the case of the fitting function called Rational 1/1 (16), the fit computation did not converge, Matlab stopped fitting, because the number of iterations or function evaluations exceeded the specified maximum. The results for the other fitting functions are presented in Table A2. Table A2 shows the results for years 1 and 2, where the fit was examined by SSE, R-square, RMSE, and the adjusted R-square. The table is sorted by R-square (the best on the top). Negative results denoted with "*" imply the invalidity of the model. It can be seen that the best fit (other than the smoothing spline curves) was obtained by Fourier 8. Table A3 shows the results of the prediction for various fitting functions. Table A3 is sorted by R-square in the prediction interval (the best on the top). The measure of quality is the R-square coefficient. The best result after smoothing splines was obtained by Gaussian 7.

Analysis by Seasons
Optimal filter length for the available data set in the summer is 21. By using the result from the algorithm, the correlation matrixes for summers 2 and 4 are: The obtained coefficients for the considered fitting functions are given in Table A4 (see Appendix A). As the fit computation for the Gaussian 7 function did not converge, the fitting was discontinued since the number of iterations or function evaluations exceeded the specified maximum.
The obtained measures of similarity with the fitting functions are given in Table A5, sorted by the R-square (the best on the top row). In spite of the calculation problems, Gaussian 7 yielded the best fit under the R-square criterion. Table A6 shows R-square results for the domain used to extrapolate the fitting function parameters, predicted curve, and total range, sorted by the predicted R-square domain. The Gaussian 7 fitting function could be observed to be the best fit in the prediction interval.

Analysis by Seasons
Optimal filter length for the available data set in the summer is 21. By using the result from the algorithm, the correlation matrixes for summers 2 and 4 are: The obtained coefficients for the considered fitting functions are given in Table A4 (see Appendix A). As the fit computation for the Gaussian 7 function did not converge, the fitting was discontinued since the number of iterations or function evaluations exceeded the specified maximum.
The obtained measures of similarity with the fitting functions are given in Table A5, sorted by the R-square (the best on the top row). In spite of the calculation problems, Gaussian 7 yielded the best fit under the R-square criterion. Table A6 shows R-square results for the domain used to extrapolate the fitting function parameters, predicted curve, and total range, sorted by the predicted R-square domain. The Gaussian 7 fitting function could be observed to be the best fit in the prediction interval. Figures 4 and 5 show examples of the results for the summer.
The obtained coefficients are given in Table A7 (see Appendix A). There was a problem with the Gaussian 7 function, where the fit computation did not converge and Matlab stopped fitting because the number of iterations or function evaluations exceeded the specified maximum. Table A8 shows the results for level of fit. Data were sorted by the R-square parameter. The best results (after the smoothing splines) were obtained by the Gaussian 8 and Fourier 7. Table A9 shows R-square for autumns 1 and 2, autumns 3 and 4, and autumns 1-4. Data are sorted by prediction interval, with the Gaussian 7 fitting function giving the best prediction (after the smoothing spline functions). Figures 6 and 7 show examples of the results, with real data illustrated with dots, and fitted curves with full lines. In this case, Gaussian 7 is chosen for Figure 6, and Rational 3/2 function for Figure 7. Figures 6b,c and 7b,c show discontinuities in the middle of the graph. After the summers, the analysis by season turned to autumns. The algorithm for the optimal length of the moving average filter gave the length of 42 in case of autumn. The correlation matrix for the first two autumns is: 0.9734 0.9828 0.9828 1.0000 In case of all four autumns, the correlation matrix is: The obtained coefficients are given in Table A7 (see Appendix A). There was a problem with the Gaussian 7 function, where the fit computation did not converge and Matlab stopped fitting because the number of iterations or function evaluations exceeded the specified maximum. Table A8 shows the results for level of fit. Data were sorted by the R-square parameter. The best results (after the smoothing splines) were obtained by the Gaussian 8 and Fourier 7. Table A9 shows R-square for autumns 1 and 2, autumns 3 and 4, and autumns 1-4. Data are sorted by prediction interval, with the Gaussian 7 fitting function giving the best prediction (after the smoothing spline functions). Figures 6 and 7 show examples of the results, with real data illustrated with dots, and fitted curves with full lines. In this case, Gaussian 7 is chosen for Figure 6, and Rational 3/2 function for Figure 7.
The obtained coefficients (for fitting functions) for winters 1 and 2 are given in Table A10 (see Appendix A). For the Gaussian 7 fitting function, the fit computation did not converge and the fitting was halted. The results in the table are the last obtained in this case. Table A11, sorted by the R-square parameter, shows quality measures for winters 1 and 2. The best fit was obtained with Gaussian 7 (if the smoothing splines were excluded). Table A12, sorted by the predicted domain, shows R-squares for winters 1-2, 3-4, and 1-4. Gaussian 7 is the best for the prediction (when smoothing splines are excluded). Figure 8 shows an example of the results in the case of the Gaussian 7 fitting function.
The obtained coefficients (for fitting functions) for winters 1 and 2 are given in Table A10 (see Appendix A). For the Gaussian 7 fitting function, the fit computation did not converge and the fitting was halted. The results in the table are the last obtained in this case. Table A11, sorted by the R-square parameter, shows quality measures for winters 1 and 2. The best fit was obtained with Gaussian 7 (if the smoothing splines were excluded). Table A12, sorted by the predicted domain, shows R-squares for winters 1-2, 3-4, and 1-4. Gaussian 7 is the best for the prediction (when smoothing splines are excluded). In the case of the two springs, the optimal length for the moving average filter was 47. Correlations for the two springs are: In the case of the two springs, the optimal length for the moving average filter was 47.
The results for two springs are given. The obtained coefficients for the fitting functions are presented in Table A13 (see Appendix A).
The results for quality measures in case of springs 1 and 2 are shown in Table A14, which is sorted by the best R-square. Better results are for rougher smoothing spline (p closer to 1), which is not favorable. On the contrary, smoother splines are welcome and the achieved results are not as good. Apart from the smoothing splines, the best result was obtained with Gaussian 7. Table A15, sorted by the best fits in the prediction interval, shows the results for springs 1-2, 3-4, 1-4. Fourier 8 was shown as the best for the prediction purposes in the case of springs. Figures 9 and 10 show examples of the results for springs. Fourier 8 clearly exhibits discontinuity around the middle of the dataset.
The results for two springs are given. The obtained coefficients for the fitting functions are presented in Table A13 (see Appendix A).
The results for quality measures in case of springs 1 and 2 are shown in Table A14, which is sorted by the best R-square. Better results are for rougher smoothing spline (p closer to 1), which is not favorable. On the contrary, smoother splines are welcome and the achieved results are not as good. Apart from the smoothing splines, the best result was obtained with Gaussian 7. Table A15, sorted by the best fits in the prediction interval, shows the results for springs 1-2, 3-4, 1-4. Fourier 8 was shown as the best for the prediction purposes in the case of springs. Figures 9 and 10 show examples of the results for springs. Fourier 8 clearly exhibits discontinuity around the middle of the dataset.

Discussion and Conclusions
The paper presents a black-box approach to ship fuel consumption based on signal (daily fuel consumption) analysis. The real data were analyzed. The fitting function should enable the easy build in of the possible tools/software in the future, which is simple and requires low processor power. This fitting function should reflect the coupling between data and sources. Due to nonlinearities, various data were transformed to the same coordinates. This is important, because monthly, seasonally, or yearly data do not have the same length. The explored approach should include aspects of detrending, coupling, nonlinearities, unreliable data, and different lengths of data within the same time period. Finally, the fitting function would enable the prediction of fuel consumption and differentiate it from the actual consumption. If an anomaly is detected, the company could investigate whether the cause is the weather, fouling, or even fraud. During the research, it was assumed that fuel consumption would grow linearly over time. Such dependence would make possible to use some linear regression methods (e.g., [20,29]). However, the results did not show that consumption had actually decreased, which could be associated with the following factors: • non-uniform time sampling (leading to wrong curve angle between the interpolating points), and • average (which depends on the route the ship was sailing at the time of data acquisition, and the sailing hours on a specific day).
It was obvious from the results of the R-square and the adjusted R-square measures that the data were not well conditioned for the correct analysis. Hence, different approaches were used. Firstly, the data were preprocessed with filters, which resulted in better R-square and adjusted R-square metrics. The next step was to use a bank of fitting functions to find the best match. Higher-order functions were observed to result in a better fit. The best functions to choose for the model were the

Discussion and Conclusions
The paper presents a black-box approach to ship fuel consumption based on signal (daily fuel consumption) analysis. The real data were analyzed. The fitting function should enable the easy build in of the possible tools/software in the future, which is simple and requires low processor power. This fitting function should reflect the coupling between data and sources. Due to nonlinearities, various data were transformed to the same coordinates. This is important, because monthly, seasonally, or yearly data do not have the same length. The explored approach should include aspects of de-trending, coupling, nonlinearities, unreliable data, and different lengths of data within the same time period. Finally, the fitting function would enable the prediction of fuel consumption and differentiate it from the actual consumption. If an anomaly is detected, the company could investigate whether the cause is the weather, fouling, or even fraud. During the research, it was assumed that fuel consumption would grow linearly over time. Such dependence would make possible to use some linear regression methods (e.g., [20,29]). However, the results did not show that consumption had actually decreased, which could be associated with the following factors: • non-uniform time sampling (leading to wrong curve angle between the interpolating points), and • average (which depends on the route the ship was sailing at the time of data acquisition, and the sailing hours on a specific day).
It was obvious from the results of the R-square and the adjusted R-square measures that the data were not well conditioned for the correct analysis. Hence, different approaches were used. Firstly, the data were preprocessed with filters, which resulted in better R-square and adjusted R-square metrics. The next step was to use a bank of fitting functions to find the best match. Higher-order functions were observed to result in a better fit. The best functions to choose for the model were the smoothing splines, but in the so-called rougher (not smoother) versions (parameter p closer to 1), as rougher splines fit better sampling points, although they were not suitable for real-world applications. Instead, the Gaussian, Fourier or rational functions were chosen for the prediction model. In this case, the best fit of the predicted data (samples for 3rd and 4th year/season) was obtained by Gaussian, Fourier, and rational functions that generally tend to be represented by the first-order polynomial. Higher order functions tend to oscillate around the sampled data more tightly. That reduces the error, and it could lead to the conclusion that such functions (i.e., Gaussian 7) could be used in any ship by simply adjusting the function parameters, which is the advantage in our signal-based approach. The result was obtained easily and fast, which is an advantage over methods that attempt to estimate real fuel consumption by examining engine characteristics and parameters.
In the analysis, we explored comparisons between years, and comparisons between seasons. The worst correlations were obtained between the second and the fourth year, the third and the fourth, and within the fourth year itself, suggesting a relevant event of some sort in the fourth year. Likewise, the correlation between summers, autumns, and winters was above 80%. On the other hand, springs were less correlated, presumably due to the state of the ship's hull after wintertime, and the great oscillations in weather conditions that typically occur in springtime.
Finally, there were two contributions of this paper. The main contribution, as far as the authors are aware, was that this was the first time that seasons (so called horizontal analysis) were considered for prediction purposes. A minor contribution of the paper was the algorithm for identifying the optimum moving average filter length.
The prediction formula takes into account environmental and biological influences, as well as cargo mass, and all other possible fuel consumption factors. We strongly believe that it could be used to detect potential frauds (fuel theft), which may be of interest to various authorities. Furthermore, the proposed prediction model is simple and fast to use, and can be used to check deviations in fuel consumption by comparing the predicted and deviated consumption. A deviation could be caused by fraud or by environmental factors such as fouling.
There are additional differences between this paper and references. For example, the results in [19] are obtained by excluding suspicious data in the preprocessing stage. We used all available data to obtain our results. Most of the references used a deep learning approach with ANNs [19][20][21][22][23]. On the other hand, it was not necessary to use ANNs always. It was popular, but not necessary. Hence, further research could include the identification of factors that cause unplanned fuel consumption and modeling fuel consumption by artificial neural networks (ANN) and hybrid models of velocity and fuel consumption.

Conflicts of Interest:
The authors declare no conflict of interest.