Prediction of a Ship’s Operational Parameters Using Artiﬁcial Intelligence Techniques

: The maritime industry is one of the most competitive industries today. However, there is a tendency for the proﬁt margins of shipping companies to reduce due to an increase in operational costs, and it does not seem that this trend will change in the near future. The most important reason for the increase in operating costs relates to the increase in fuel prices. To compensate for the increase in operating costs, shipping companies can either renew their ﬂeet or try to make use of new technologies to optimize the performance of their existing one. The software structure in the maritime industry has changed and is now leaning towards the use of Artiﬁcial Intelligence (AI) and, more speciﬁcally, Machine Learning (ML) for calculating its operational scenarios as a way to compensate the reduction of proﬁt. While AI is a technology for creating intelligent systems that can simulate human intelligence, ML is a subﬁeld of AI, which enables machines to learn from past data without being explicitly programmed. ML has been used in other industries for increasing both availability and proﬁtability, and it seems that there is also great potential for the maritime industry. In this paper the authors compares the performance of multiple regression algorithms like Artiﬁcial Neural Network (ANN), Tree Regressor (TRs), Random Forest Regressor (RFR), K-Nearest Neighbor (kNN), Linear Regression, and AdaBoost, in predicting the output power of the Main Engines (M/E) of an ocean going vessel. These regression algorithms are selected because they are commonly used and are well supported by the main software developers in the area of ML. For this scope, measured values that are collected from the onboard Automated Data Logging & Monitoring (ADLM) system of the vessel for a period of six months have been used. The study shows that ML, with the proper processing of the measured parameters based on fundamental knowledge of naval architecture, can achieve remarkable prediction results. With the use of the proposed method there was a vast reduction in both the computational power needed for calculations, and the maximum absolute error value of prediction.


Introduction
The prediction of a ship's crucial operational parameters, such as M/E power, can lead to major operation sustainability. Required M/E power is directly related to Fuel Oil Consumption (FOC). Predicting the increase in FOC due to ship's hull degradation can lead shipowners to choose the right time and place for a dry docking, achieving the maximum profitability. On the other hand, the ability to "decode" and compare the information that lies in the comparison of ships' performance values between any given time and the values that were achieved in the sea trials can lead to accurate prediction of performance reduction of a specific component. This constant comparison will lead to a better control of the service life of a ship.
During the last decades, the maritime industry (shipyards, maritime companies, engineers/researchers) has embarked on a process of continuous research and development of new technologies. This process is even more important for forthcoming years, in order to comply with progressively stricter environmental regulations and, at the same time, keep operational costs as low as possible to maintain shipping as one of the most cost-effective means of transportation. To make this possible, the whole process of ship design and operation is based on a balanced mixture of computational and experimental investigations which are highly demanding in terms of time required and capital cost. After a long and heavy computational period which mostly relies on theoretical/deterministic models as much as empirical methods, prototypes have to be constructed and extensive tests have to be carried out. One promising technique which could help to reduce the effort required for ship design and cost effective ship operation is ML, which has already been used successfully in many land-based industrial applications. Building prediction algorithms based on ML can be the answer for improving ship performance in terms of fuel cost and exhaust emissions (Nitrogen Oxides, Sulfur Oxides, etc.). The current study relies on this field, focusing on the prediction of the ship's M/E power required, depending on the operational conditions, which could become valuable tools to optimize a ship's operational profile.
In shipping and maritime transport in general, the penetration of AI is still in its infancy but this seems likely to change. In other transportation sectors, the use of Internet of Things (IoT) and big data are pursued towards augmentation of a system's intelligence [1], while in other sectors like manufacturing and Information and Communication Technology (ICT) the penetration of artificial intelligence is already in a more advanced stage [2,3]. Looking back at the international literature, studies appear more and more often in the general context of the utilization of ML in shipping. In 2005 a hybrid model for ship operational performance monitoring was developed, and the uncertainty framework specific to ship's performance was analyzed [4]. The application of a gray-box modelling approach to the prediction of ship fuel consumption was proposed in 2006, which can be used as a tool for online trim optimization [5]. A fuel consumption model that utilizes data obtained from the noon-reports that are transmitted daily back to shore was developed in 2018 [6]. A ML approach, for predicting FOC on ships, by training an ML algorithm with dynamic data from the ships' logging system and using high time intervals for fuel sums has been presented [7]. In more recent studies, the efficiency of several multiple regression algorithms on the task of ship FOC prediction under different data sampling frequencies was studied [8]. An estimation model of FOC based on a deep-learning ANN was conducted [9]. Finally, an FOC optimization study of a container vessel using regression algorithms, with data selected by correlation analysis, and comparison of the results has also been carried out [10,11].
The aim of this study is to determine a simple procedure for pre-processing the vast amount of data that are collected from modern ships ADLM systems. Although the use of ML in the prediction of operational parameters has been addressed before, this study focuses on the reduction of the computational work needed for this task. The proposed procedure does not depend on the type of ship and uses a short fragment of the original data set, thus limiting the computational demand. The results are verified for actual use. In this direction, we employ different regression algorithms in the prediction of a ship's required main engine power, using data from an ADLM system in their original form and after a specific preprocess procedure. The performance of algorithms is compared in terms of accuracy and required processing power.
The rest of the paper is organized as follows: in Section 2, we present the proposed method of approach. We explain the logical steps that form the data set and the basic working principle of the different regression algorithms that are going to be used. The application of the proposed method in our data set, the benchmark tools for evaluation, and the actual comparison of the results are presented in Section 3. Finally, Section 4 is dedicated to the conclusion of this paper.

Methodology
Two different data sets are used for the procedure of predicting a ship's required main engine power. A raw data set is constructed, with the use of all values gathered in a period of 6 months, and a preprocessed one that is produced from the above with a certain proposed methodology. The results are then benchmarked and compared. The flow diagram of the prediction method is shown in Figure 1.

Methodology
Two different data sets are used for the procedure of predicting a ship's required main engine power. A raw data set is constructed, with the use of all values gathered in a period of 6 months, and a preprocessed one that is produced from the above with a certain proposed methodology. The results are then benchmarked and compared. The flow diagram of the prediction method is shown in Figure 1. The significance of the prediction of a ship behavior is obvious. Until recently, only white box model approaches (that rely on physics) have been adopted for this purpose in the maritime industry. This approach has the disadvantage of relying on experience and demands heavy computational work. These two factors make the use of this approach an expert-only privilege. In this paper we propose the use of ML in addition to common and well-known practices in order to achieve respective results with less computational work and with no need for previous experience. ML is a rapidly growing science with great penetration in human activity. Establishing a framework is essential for the quickly adoption of ML in maritime.
The biggest challenge in ML is the pre-processing of the data set that will be used. For this purpose, many different approaches have been used. Purely computational methods like Principal Component Analysis (PCA) have been adopted for the purpose of formation and restriction of the vast amount of data that form data sets. For an industry that traditionally relies on more conventional approaches we propose a hybrid model of data pre-processing.
The data set is crucial in ML, since choosing the values that will be imported to the ML algorithms is of utmost importance in any ML procedure. The main objective is to be able to achieve acceptable prediction scores with as little values as possible. On the other hand, choosing the multitude of different attributes is equally important so that with the right regression algorithm and tuning, there will be no need for heavy computational power. In most ML applications, the choice of the different attributes that will form the data set is chosen by the value of correlation factor [12]. Different wrapper or filter methods are used for the selection of attributes that increase the computational work. In this paper the above selection is based on the physical principles that affect a ship's movement on the sea: The significance of the prediction of a ship behavior is obvious. Until recently, only white box model approaches (that rely on physics) have been adopted for this purpose in the maritime industry. This approach has the disadvantage of relying on experience and demands heavy computational work. These two factors make the use of this approach an expert-only privilege. In this paper we propose the use of ML in addition to common and well-known practices in order to achieve respective results with less computational work and with no need for previous experience. ML is a rapidly growing science with great penetration in human activity. Establishing a framework is essential for the quickly adoption of ML in maritime.
The biggest challenge in ML is the pre-processing of the data set that will be used. For this purpose, many different approaches have been used. Purely computational methods like Principal Component Analysis (PCA) have been adopted for the purpose of formation and restriction of the vast amount of data that form data sets. For an industry that traditionally relies on more conventional approaches we propose a hybrid model of data pre-processing.
The data set is crucial in ML, since choosing the values that will be imported to the ML algorithms is of utmost importance in any ML procedure. The main objective is to be able to achieve acceptable prediction scores with as little values as possible. On the other hand, choosing the multitude of different attributes is equally important so that with the right regression algorithm and tuning, there will be no need for heavy computational power. In most ML applications, the choice of the different attributes that will form the data set is chosen by the value of correlation factor [12]. Different wrapper or filter methods are used for the selection of attributes that increase the computational work. In this paper the above selection is based on the physical principles that affect a ship's movement on the sea: The first step of data formation procedure is Sampling. The data we used were synchronized to refer to specific time periods (1-min interval), which is typical for currently deployed ADLM systems. Then, they were identified and given the appropriate unit of measurement.
Attribute selection. From the original attributes (178 in our test case), we opted for selection based on the possibility of influencing the object of the study (e.g., the speed of the ship). The attributes that were judged not to affect the object of the study at all (e.g., electric oil pressure) were removed, in order to reduce the total volume of data and to properly configure the data set. (From the 178 entries, 42 were selected).
Feature Engineering/data transformation. In the context of data limitation and based on the physical nature of the phenomenon under consideration, it is legitimate to transform attributes in order to either limit the data set, or replace some characteristics with others that better describe the affect on the phenomenon. Based on this, the attributes (Wind Direction, Wind Speed) were replaced by the Head Wind Speed (HWS). Head Wind Speed (the component of the wind speed that is opposite of ships movement) is the value that affects the resistance of a ship due to the wind according to literature [13][14][15]. The replacement of values was done using Formula (1): A schematic presentation of the above formula is presented in Figure 2.
The first step of data formation procedure is Sampling. The data we used were synchronized to refer to specific time periods (1-min interval), which is typical for currently deployed ADLM systems. Then, they were identified and given the appropriate unit of measurement.
Attribute selection. From the original attributes (178 in our test case), we opted for selection based on the possibility of influencing the object of the study (e.g., the speed of the ship). The attributes that were judged not to affect the object of the study at all (e.g., electric oil pressure) were removed, in order to reduce the total volume of data and to properly configure the data set. (From the 178 entries, 42 were selected).
Feature Engineering/data transformation. In the context of data limitation and based on the physical nature of the phenomenon under consideration, it is legitimate to transform attributes in order to either limit the data set, or replace some characteristics with others that better describe the affect on the phenomenon. Based on this, the attributes (Wind Direction, Wind Speed) were replaced by the Head Wind Speed (HWS). Head Wind Speed (the component of the wind speed that is opposite of ships movement) is the value that affects the resistance of a ship due to the wind according to literature [13][14][15]. A schematic presentation of the above formula is presented in Figure 2. Also, the attributes that correspond to a ship's relative movement due to weather conditions, such as Inclinometer_X_max (the maximum trim value of ship), Inclinometer_X_min (the minimum trim value of ship), Inclinometer_Y_max (the maximum list of ship), and Inclinometer_Y_min (minimum list of ship), were replaced with the average values, Inclinometer_X_average and Inclinometer_Y_max, respectively.
Instance's selection. There are many methods in literature to deal with missing values (NaN) in the data set [16,17]: deleting the entire line, using a fixed value for all lost values, replacing the lost value with the average value of the column, replacing the lost value with the average value of the class, and loss value prediction (with 1-NN algorithm) are some of the most commonly used. In this paper the first method was selected. All the lines Also, the attributes that correspond to a ship's relative movement due to weather conditions, such as Inclinometer_X_max (the maximum trim value of ship), Inclinome-ter_X_min (the minimum trim value of ship), Inclinometer_Y_max (the maximum list of ship), and Inclinometer_Y_min (minimum list of ship), were replaced with the average values, Inclinometer_X_average and Inclinometer_Y_max, respectively.
Instance's selection. There are many methods in literature to deal with missing values (NaN) in the data set [16,17]: deleting the entire line, using a fixed value for all lost values, replacing the lost value with the average value of the column, replacing the lost value with the average value of the class, and loss value prediction (with 1-NN algorithm) are some of the most commonly used. In this paper the first method was selected. All the lines (Instances) which contained even one lost value (NaN) have been deleted. (This way in our sample, from 236,161 lines initially available, a subset of 39,475 was actually used).
We also proceeded to one more improvement of the initial dataset. There is a power limit given by the manufacturer (Hyundai in this case), below which any Diesel engine must not operate for a long time. [18]. This value of power is in most cases approximately the 15-20% of the maximum continuous (L1) for electronically controlled machines or 20-25% for mechanically controlled (camshaft controlled). Assuming that the crew is aware of these limitations, the operation of a ship's M/E in these conditions can be considered a transient phenomenon. Based on the results of the ship sea trials, it appears that, with 25% of the engine power, about 57 revolutions per minute are achieved. The condition of the ship (in terms of hull degradation) could not be identified, so it is possible for 25% of M/E power at the time of the study to correspond to different performances. Having that in mind, all the lines of the data set in which the value of the M/E revolutions per minute (RPM) was less than 50 were excluded and the final data set with 29,067 lines was obtained.
Data cleansing/ "noise" removal. As there is always the possibility of wrong values being captured due to several reasons (e.g., temporal malfunctioning or transmission error in the sensing network), we also processed the available dataset to remove such values. Towards removing the values that refer to transient states (e.g., acceleration process), the data were renumbered (maintaining their chronological order) and divided into equal frequency intervals to alleviate the computational burden. These intervals have the same number of values, which in our case was 5000. At these intervals and with the K-means method [19], the average and dispersion of the values were identified (always in terms of the actual speed of the ship which is a quantity that was considered fundamental in terms of the transience of the phenomenon under consideration). In all the above intervals, the speed values (mean value ± scatter) were between a minimum and a maximum value, i.e., 9.8384 (position A in Figure 3) and 15.5759 (position B in Figure 3) in our case. Lines with speed values outside this field were identified as noise and removed from the data set. The final data set consisted of 39 columns (attributes) and 27,155 rows (instances).
(Instances) which contained even one lost value (NaN) have been deleted. (This way in our sample, from 236,161 lines initially available, a subset of 39,475 was actually used).
We also proceeded to one more improvement of the initial dataset. There is a power limit given by the manufacturer (Hyundai in this case), below which any Diesel engine must not operate for a long time. [18]. This value of power is in most cases approximately the 15-20% of the maximum continuous (L1) for electronically controlled machines or 20-25% for mechanically controlled (camshaft controlled). Assuming that the crew is aware of these limitations, the operation of a ship's M/E in these conditions can be considered a transient phenomenon. Based on the results of the ship sea trials, it appears that, with 25% of the engine power, about 57 revolutions per minute are achieved. The condition of the ship (in terms of hull degradation) could not be identified, so it is possible for 25% of M/E power at the time of the study to correspond to different performances. Having that in mind, all the lines of the data set in which the value of the M/E revolutions per minute (RPM) was less than 50 were excluded and the final data set with 29,067 lines was obtained.
Data cleansing/ "noise" removal. As there is always the possibility of wrong values being captured due to several reasons (e.g., temporal malfunctioning or transmission error in the sensing network), we also processed the available dataset to remove such values. Towards removing the values that refer to transient states (e.g., acceleration process), the data were renumbered (maintaining their chronological order) and divided into equal frequency intervals to alleviate the computational burden. These intervals have the same number of values, which in our case was 5000. At these intervals and with the K-means method [19], the average and dispersion of the values were identified (always in terms of the actual speed of the ship which is a quantity that was considered fundamental in terms of the transience of the phenomenon under consideration). In all the above intervals, the speed values (mean value ± scatter) were between a minimum and a maximum value, i.e., 9.8384 (position A in Figure 3) and 15.5759 (position B in Figure 3) in our case. Lines with speed values outside this field were identified as noise and removed from the data set. The final data set consisted of 39 columns (attributes) and 27,155 rows (instances).   Normalization. The use of neural networks is particularly facilitated if the values of the input data range between 0 and 1. For this reason, a minimum-maximum normalization was performed, and a data set was constructed with the values of all fields in the range of 0.1 using the following Formula (2): After the completion of the pre-process, data were randomly divided into a training data set (consisting of 80% of the lines) which is used for algorithm training, and validation data set (consisting of the remaining 20% of the lines) for result testing. The power of the main engine was set as a given goal.
The following algorithms were used for prediction and are graphically presented in Figure 4: (1) Linear Regression. The linear regression algorithm locates the linear relationship between the dependent variable we are looking for (Y) and one or more independent (X) variables. Since this relationship is linear, we can easily determine how the dependent variable changes with the value of the independent variable [20]. The graphical representation (Figure 4a) of the linear regression operating model is the draw of a sloping and straight line, representing the correlation between independent and dependent variables. The red line represents the prediction that minimizes the overall error. All the predicted values are part of this line and the difference in Y axis between the line and the actual values is the error. (2) Decision Tree. This is a tree-like classifier. It is a structure that is constructed by nodes. There are three different types of nodes: chance nodes, decision nodes, and end nodes. Chance nodes represents the probability of a certain result, decision node represent the decision to be made and the end node shows the result [21]. The tree acts like a map of all possible outcomes for the question asked. Decisions are made based on the characteristics of the data set. It is essentially a graph that aims to Normalization. The use of neural networks is particularly facilitated if the values of the input data range between 0 and 1. For this reason, a minimum-maximum normalization was performed, and a data set was constructed with the values of all fields in the range of 0.1 using the following Formula (2): After the completion of the pre-process, data were randomly divided into a training data set (consisting of 80% of the lines) which is used for algorithm training, and validation data set (consisting of the remaining 20% of the lines) for result testing. The power of the main engine was set as a given goal.
The following algorithms were used for prediction and are graphically presented in Figure 4: (1) Linear Regression. The linear regression algorithm locates the linear relationship between the dependent variable we are looking for (Y) and one or more independent (X) variables. Since this relationship is linear, we can easily determine how the dependent variable changes with the value of the independent variable [20]. The graphical representation (Figure 4a) of the linear regression operating model is the draw of a sloping and straight line, representing the correlation between independent and dependent variables. The red line represents the prediction that minimizes the overall error. All the predicted values are part of this line and the difference in Y axis between the line and the actual values is the error. (2) Decision Tree. This is a tree-like classifier. It is a structure that is constructed by nodes.
There are three different types of nodes: chance nodes, decision nodes, and end nodes. Chance nodes represents the probability of a certain result, decision node represent the decision to be made and the end node shows the result [21]. The tree acts like a map of all possible outcomes for the question asked. Decisions are made based on the characteristics of the data set. It is essentially a graph that aims to explore all possible solutions to a problem we have posed under specific circumstances. In the example presented in Figure 4b, a decision has to be made for a kid to be allowed to go out to play. In left node, the combination of cloudy weather with high temperature is a 'go' criteria. The same decision (to allow the kid to play outside) has been made with completely different weather conditions: sunshine with low wind (right node). (3) kNN. The kNN algorithm stores the available data and categorizes them. It forms data categories based on their common characteristics. The new data given are classified based on their similarity to the basic characteristics of each category that are already stored [22]. This process is performed dynamically. This means that when new data appears can easily be sorted into a category from the available ones. In Figure 4c, the kNN algorithm has to classify the new spot as part of category A or category B. (4) Random Forest. This is based on the concept of ensemble learning, in which multiple classifiers are combined to improve the performance of the model, in order to solve a complex problem. It was named forest because it essentially contains a number of de-cision trees which, using subsets of the original data, make decisions and predictions in parallel. Once the predictions have been made by the decision trees, it takes the average value of them to improve the predictive accuracy of the total and initial data set [23]. In the example of Figure 4d, each of the decision trees A, B . . . N give results to the problem given (red). The result that appeared more often is the end result of Random Forest Algorithm. (5) Neural Network. The ANN are the mathematical equivalent of the function of the biological neurons that make up the human brain. They consist of a number of simple and internally interconnected processing units, which are organized in layers. The fundamental structure is consisted of one input layer and one output but it is possible that in addition to those there are hidden layers in between. The above layers consist of a number of units or nodes that are connected to each other in such a way that one unit has links with many other units of the same or another level [24]. (6) AdaBoost. The Adaptive Boosting (AdaBoost) algorithm is an enhancement technique that aims to combine multiple weak classifiers to in order to create a strong classifier. An individual (weak) classifier may not be able to accurately predict the class of an object, but when we group many weak classifiers together, which gradually learn from the wrong classifications of each other, we can create a very powerful model.
data categories based on their common characteristics. The new data given are classified based on their similarity to the basic characteristics of each category that are already stored [22]. This process is performed dynamically. This means that when new data appears can easily be sorted into a category from the available ones. In Figure 4c, the kNN algorithm has to classify the new spot as part of category A or category B. (4) Random Forest. This is based on the concept of ensemble learning, in which multiple classifiers are combined to improve the performance of the model, in order to solve a complex problem. It was named forest because it essentially contains a number of decision trees which, using subsets of the original data, make decisions and predictions in parallel. Once the predictions have been made by the decision trees, it takes the average value of them to improve the predictive accuracy of the total and initial data set [23]. In the example of Figure 4d, each of the decision trees A, B…N give results to the problem given (red). The result that appeared more often is the end result of Random Forest Algorithm. (5) Neural Network. The ANN are the mathematical equivalent of the function of the biological neurons that make up the human brain. They consist of a number of simple and internally interconnected processing units, which are organized in layers. The fundamental structure is consisted of one input layer and one output but it is possible that in addition to those there are hidden layers in between. The above layers consist of a number of units or nodes that are connected to each other in such a way that one unit has links with many other units of the same or another level [24]. (6) AdaBoost. The Adaptive Boosting (AdaBoost) algorithm is an enhancement technique that aims to combine multiple weak classifiers to in order to create a strong classifier. An individual (weak) classifier may not be able to accurately predict the class of an object, but when we group many weak classifiers together, which gradually learn from the wrong classifications of each other, we can create a very powerful model.  The above algorithms used the data selected according to the aforementioned procedure and the performance of the forecasts was measured. The same algorithms with the use of the original data set (after a mild preprocess procedure) were used and the results have been compared. In both cases, all of the algorithm parameters stayed the The above algorithms used the data selected according to the aforementioned procedure and the performance of the forecasts was measured. The same algorithms with the use of the original data set (after a mild preprocess procedure) were used and the results have been compared. In both cases, all of the algorithm parameters stayed the same. The procedure was repeated 10 times with a different selection of data used for training.

Our Case and the Dataset
To evaluate the proposed approach, we used a dataset acquired from the ships ADLM system. The ship used as reference in this paper is a Crude oil Tanker with 165.000 tons displacement. The ship is relatively new since it was commissioned ten years ago. A raw data set was constructed with the use of all values gathered in a period of 6 months. Due to the owner shipping company's strict policy regarding data release, the values describing the ship and routes are slightly modified in this presentation (the accurate values were used for problem formulation). The ship's features are presented in Table 1. The data were extracted from the ship's ADLM system in a period of 6 months, from mid-February of 2020 until end of July 2020. A total of 178 different attributes were collected. The sample frequency that was chosen for forming the data set was one complete set of values every minute, so there was a total number of 236,161 instances. The ship's ADLM system reads values from each different sensor and stores them in a database with a much higher frequency than 1 per minute. The procedure of recording all of the sensor values introduces time deviations but these values (in a ship) don't change so rapidly as to introduce negligent error. In the data set there were many cases of missing values (NaN) due to many factors, sensor anomalies, data entry errors, periods in which the recorded accessory was shutdown, etc. The fields of data that are recorded from the ship's ADLM, which make up the data set attributes, belong to four main categories. At this period of 6 months, the ship executed many different itineraries. The routes expand through the perimeter of Africa, North Europe, Mediterranean Sea, and the Indian and Pacific Oceans. The significant differences in the geographical area of routes resulted in great differentiation of environmental conditions. The different routes are graphically presented in Figure 5. At this period of 6 months, the ship executed many different itineraries. The routes expand through the perimeter of Africa, North Europe, Mediterranean Sea, and the Indian and Pacific Oceans. The significant differences in the geographical area of routes resulted in great differentiation of environmental conditions. The different routes are graphically presented in Figure 5. The above routes were executed in various cruising speeds but the area around 12 Knots dominates, as depicted by Figure 6, where the distribution and frequency of the different cruising speeds are presented.  Each cruising speed was kept relatively steady through each route by the ship's crew so the speed over ground values have low fluctuation (with the exception for the transitional periods) [25]. Figure 7 presents the actual values of ship speed over the total number of instances used in the hybrid model, in groups of 5000 instances at each subfigure (due to software limitation on the number of instances that can be interpreted). Each cruising speed was kept relatively steady through each route by the ship's crew so the speed over ground values have low fluctuation (with the exception for the transitional periods) [25]. Figure 7 presents the actual values of ship speed over the total number of instances used in the hybrid model, in groups of 5000 instances at each sub-figure (due to software limitation on the number of instances that can be interpreted). Each cruising speed was kept relatively steady through each route by the ship's crew so the speed over ground values have low fluctuation (with the exception for the transitional periods) [25]. Figure 7 presents the actual values of ship speed over the total number of instances used in the hybrid model, in groups of 5000 instances at each subfigure (due to software limitation on the number of instances that can be interpreted).

Evaluation Methodology
The accuracy of a prediction is defined as the total number of correct predictions divided by the total number of predictions made for a data set. However, in a regression problem that seeks to predict the specific value of an attribute, this definition isn't sufficient. We have to define methods that will essentially set forecast deviation values and through them we will be able to compare prediction performance [26]. The most widely used indicators for conducting comparative performance benchmarks are the following:

Evaluation Methodology
The accuracy of a prediction is defined as the total number of correct predictions divided by the total number of predictions made for a data set. However, in a regression problem that seeks to predict the specific value of an attribute, this definition isn't sufficient. We have to define methods that will essentially set forecast deviation values and through them we will be able to compare prediction performance [26]. The most widely used indicators for conducting comparative performance benchmarks are the following: MAE is the average of the absolute differences between predicted and actual prices and gives us a measure of how far forecasts are from reality. Mathematically it is defined by Equation (3): MSE is quite similar to the MAE, with the only difference being that the MSE takes the average of the square of the difference between the actual values and the predicted values (Equation (4)).
RMSE is the square root of MSE, is essentially the same measurement tool as it but has prevailed and is widely used in the literature (Equation (5)).
CVRMSE is the next step from RMSE by normalizing it from the mean dependent variable (Equation (6)).
Finally, R 2 focuses not so much on the results but on the operation of the algorithm itself. It specifies the degree to which the variations in the values of the dependent variable (prediction target) can be explained by changes in the values of the independent variables (data set) (Equation (7)).
In Table 2 the benchmark scores, using the above-mentioned performance indicators for both cases, either using the complete original data set or the modified ones (hybrid model) are presented for all the prediction algorithms examined. In the case of RFR, Decision Tree, and AdaBoost, using the complete data set resulted in the achievement of MAE is the average of the absolute differences between predicted and actual prices and gives us a measure of how far forecasts are from reality. Mathematically it is defined by Equation (3): MSE is quite similar to the MAE, with the only difference being that the MSE takes the average of the square of the difference between the actual values and the predicted values (Equation (4)).
RMSE is the square root of MSE, is essentially the same measurement tool as it but has prevailed and is widely used in the literature (Equation (5)).
CVRMSE is the next step from RMSE by normalizing it from the mean dependent variable (Equation (6)).
Finally, R 2 focuses not so much on the results but on the operation of the algorithm itself. It specifies the degree to which the variations in the values of the dependent variable (prediction target) can be explained by changes in the values of the independent variables (data set) (Equation (7)).
In Table 2 the benchmark scores, using the above-mentioned performance indicators for both cases, either using the complete original data set or the modified ones (hybrid model) are presented for all the prediction algorithms examined. In the case of RFR, Decision Tree, and AdaBoost, using the complete data set resulted in the achievement of better results. The use of the proposed hybrid model for pre-processing ships data increased the error in all of the benchmark scores. On the other hand, in the cases of kNN, Linear Regression, and ANN the results are clearly better with the use of pre-processed selected data (hybrid model). Therefore, it is not possible to determine clearly which methodology is preferable. The second comparison criteria used was based in real life applications where a deviation of 3% on predicted M/E power output is considered acceptable. Therefore, a threshold of 3% error was set to the predicted values so the results were evaluated accordingly. The percentage of the predicted values in which the difference from the actual ones is less than 3% was recorded. The prediction results with the use of the complete data set along with the proposed hybrid model were compared in terms of computational work and error size values. The use of the proposed method resulted in most cases in a minor deterioration of the percentage of values with an error below 3%, as shown in Figure 9. However, in the case of the kNN algorithm, using the hybrid model (modified data set) leads to slight improvement in the predictive performance, while in the case of the ANN algorithm the improvement is considerably higher, which is a clear evidence of the better performance achieved using the hybrid model.
The maximum difference between the predicted and the actual value of M/E power, is very important because it represents the worst performance (spike) that we can expect from the use of each method. Figure 10 presents the maximum deviation between the actual and predicted value of M/E power, using either the complete data set or the hybrid model (modified data set). As observed, using the hybrid model, for all algorithms, a considerable reduction on the maximum deviation between actual and predicted value is obtained, with the best improvement obtained in case of Linear regression and the best performance observed for the case of the Decision Tree algorithm, with the maximum deviation being equal to 10.16%. This is a significant performance improvement, however, for real life applications an even better predictive accuracy is required. To this end, further improvements on the hybrid model are required.
complete data set along with the proposed hybrid model were compared in terms of computational work and error size values. The use of the proposed method resulted in most cases in a minor deterioration of the percentage of values with an error below 3%, as shown in Figure 9. However, in the case of the kNN algorithm, using the hybrid model (modified data set) leads to slight improvement in the predictive performance, while in the case of the ANN algorithm the improvement is considerably higher, which is a clear evidence of the better performance achieved using the hybrid model. The maximum difference between the predicted and the actual value of M/E power, is very important because it represents the worst performance (spike) that we can expect from the use of each method. Figure 10 presents the maximum deviation between the actual and predicted value of M/E power, using either the complete data set or the hybrid model (modified data set). As observed, using the hybrid model, for all algorithms, a considerable reduction on the maximum deviation between actual and predicted value is obtained, with the best improvement obtained in case of Linear regression and the best performance observed for the case of the Decision Tree algorithm, with the maximum deviation being equal to 10.16%. This is a significant performance improvement, however, for real life applications an even better predictive accuracy is required. To this end, further improvements on the hybrid model are required. Finally, a comparison of the time needed for training and testing the algorithms was recorded. Figure 11 presents the comparison of the reduced time needed (on a percentage basis) for the algorithms to complete the calculations for training, using the hybrid model. All calculations in this study were conducted in the same personal computer, under the same conditions to make the comparison reliable. Finally, a comparison of the time needed for training and testing the algorithms was recorded. Figure 11 presents the comparison of the reduced time needed (on a percentage basis) for the algorithms to complete the calculations for training, using the hybrid model. All calculations in this study were conducted in the same personal computer, under the same conditions to make the comparison reliable.
The time needed for training was dramatically reduced in all cases using the hybrid model (in the range of 93.37% to 99.77%), with the exception of the ANN algorithm where an increase of 14.35% is observed. Finally, all algorithms performed better in terms of computational power needed for testing, as shown in Figure 12. The time needed for training was dramatically reduced in all cases using the hybrid model (in the range of 93.37% to 99.77%), with the exception of the ANN algorithm where an increase of 14.35% is observed. Finally, all algorithms performed better in terms of computational power needed for testing, as shown in Figure 12. As observed, there is a significant using the hybrid model for the prediction of M/E power. The reduction in CPU time demand ranges from 50% to 97.89%. It has been noticed that although, as shown in Figure 11, the ANN algorithm need more time for the training using the hybrid model, the situation is altered when using the trained algorithm for M/E power prediction. Therefore, it is obvious that the hybrid model offers great potential in the reduction of the required overall computational time for algorithm training and  The time needed for training was dramatically reduced in all cases using the hybrid model (in the range of 93.37% to 99.77%), with the exception of the ANN algorithm where an increase of 14.35% is observed. Finally, all algorithms performed better in terms of computational power needed for testing, as shown in Figure 12. As observed, there is a significant using the hybrid model for the prediction of M/E power. The reduction in CPU time demand ranges from 50% to 97.89%. It has been noticed that although, as shown in Figure 11, the ANN algorithm need more time for the training using the hybrid model, the situation is altered when using the trained algorithm for M/E power prediction. Therefore, it is obvious that the hybrid model offers great potential in the reduction of the required overall computational time for algorithm training and As observed, there is a significant using the hybrid model for the prediction of M/E power. The reduction in CPU time demand ranges from 50% to 97.89%. It has been noticed that although, as shown in Figure 11, the ANN algorithm need more time for the training using the hybrid model, the situation is altered when using the trained algorithm for M/E power prediction. Therefore, it is obvious that the hybrid model offers great potential in the reduction of the required overall computational time for algorithm training and application for the M/E power prediction. This is very important if we also take into account that, in real life applications, the computational power demand of the prediction algorithms is often an equally significant parameter as the accuracy of the predictions. Figure 13 shows, in the same diagram, the actual (true) values of M/E power output (black color) in conjunction with the predicted values (green color) using the hybrid model (modified data set) for all algorithms. account that, in real life applications, the computational power demand of the prediction algorithms is often an equally significant parameter as the accuracy of the predictions. Figure 13 shows, in the same diagram, the actual (true) values of M/E power output (black color) in conjunction with the predicted values (green color) using the hybrid model (modified data set) for all algorithms. As observed, in all cases (prediction algorithms), there is a good coincidence between actual and predicted values of M/E power, during the whole range of the data set used. However, there are many instances where high discrepancies are observed (spikes) for all prediction algorithms examined, while it seems that the linear regression algorithm is the least accurate, with the discrepancies between actual and predicted M/E power being relatively high at all the data points examined. As observed, in all cases (prediction algorithms), there is a good coincidence between actual and predicted values of M/E power, during the whole range of the data set used. However, there are many instances where high discrepancies are observed (spikes) for all prediction algorithms examined, while it seems that the linear regression algorithm is the least accurate, with the discrepancies between actual and predicted M/E power being relatively high at all the data points examined.

Conclusions
The results which are presented in this paper indicate that ML can become a valuable tool for the marine industry. With the use of regression algorithms and the data from ship's ADLM systems we can predict with good accuracy a very crucial operational parameter, such as the engine power, almost in real time. This knowledge can be used to lower the operational costs in shipping companies. The use of a straightforward pre-processing procedure proposed by the authors, for selecting and/or transforming the vast data that ADLM systems collects, significantly improves the performance of the prediction AI algorithms in terms of computational power demand. Actually, there is a reduction in the range of 50% to 99.97% of the required computational time, while keeping the accuracy at almost the same level, i.e., there is a negligible reduction of the predicted values with an error less than 3%, as shown in Figure 9. At the same time, the hybrid model reduces considerably the maximum deviation of the predicted values of M/E power (spikes) compared to the performance of the same algorithms using the complete original data set. This deviations (spikes) reach relatively high values (in the range of 10%), which are not acceptable in real life applications, although this is observed for a very low percentage of the data points examined, i.e., below 4.46% of the data in the worst case, as shown in Figure  9, for the linear regression algorithm. Therefore, further research is needed to overcome these deficiencies of the proposed model and determine the most suitable prediction algorithm for the estimation of M/E power demand using real-time data obtained by the ADLM system of the ship. In this way, it will become possible to introduce ML in real life applications (like the prediction of M/E power demand of the vessel according to the operating conditions), where predictive accuracy and computational power demand are crucial parameters.