Towards the Evaluation of Date Time Features in a Ship Route Prediction Model

: Ship Route Prediction (SRP) is an algorithm that allows assessing the future position of a ship using historical data, extracted from AIS messages. In an SRP task, it is very important to select the set of input features, used to train the model. In this paper, we try to evaluate if time-dependent features are relevant in an SRP model, based on a K-Nearest Neighbor classiﬁer, through a practical experiment. In practice, we build two models, with and without the Date Time features, and for both models, we calculate some performance metrics and the SHAP value. Tests show that although the model with the Date Time features outperforms the other model in terms of evaluation metrics, it does not in the practical experiments.


Introduction
A Ship Route Prediction (SRP) model is a vastly complex system that incorporates the natural variability of a system coupled with the determination of output by utilizing the best input features. The model offers the possibility to predict the movement of ships within a specific area determined by the boundaries of a sea chart. SRP is utilized in commercial shipping, fishing, and naval activities, as well as in military planning. As pointed out by Montiel et al. in [1], SRP can be used in different scenarios, such as enhancing the safety of a voyage [2], simplifying route planning, avoiding collisions [3,4], and reducing costs of the voyage [5]. SRP can be used to give information on accepted lanes and sortie areas and generally provides information crucial to ocean traffic management. Such a module gives details when a vessel is scheduled for arrival at a certain port and its next departure location after completing loading/unloading operations at said port.
To perform SRP, first of all, some statistical data on the movement of the ship should be collected. Such data can be extracted from AIS messages [6] and includes at least the current status of the ship, which contains the ship's coordinates, speed, heading, and size. Other data could be added to the current status of the ship, such as the weather conditions and the current date.
A vast literature exists about SRP [1,7,8]. In this paper, only the most representative papers are described. Algorithms for SRP can be classified into three categories: (1) pointsbased, which predicts only the immediately next position of the ship; (2) trajectory-based, which predicts the whole trajectory; and (3) hybrid-based, which combines the previous two categories.
Points-based algorithms split the area to be monitored in different non-overlapping cells, having all the same size. Given the current status of a ship, they calculate the probability that each cell of the area will be occupied after a given period of time. This analysis is carried out on the basis of historical data, generally extracted from AIS messages. This class of algorithms is implemented through different approaches such as neural networks [9][10][11], associative rules [12], Kernel density estimation [13], and K-NN [14].
Trajectory-based algorithms exploit historical data to build clusters, to extract and classify routes. Routes are then represented through a model, which can be improved through the inclusion of waypoints (harbors, offshore platforms, and entry and exit points in the area). Examples of trajectory-based algorithms exploit extended Kalman filter [15], similarity-based approach and kernel-based machine learning methods [16], synthetic route knowledge [17], a data-driven nonparametric Bayesian model based on a Gaussian Process [18], neural networks [19], deep learning [20], and generative models [21].
While building an SRP algorithm, it is very important to choose the input features to provide as input to the model. All the previous literature does not consider the Date Time as a candidate input feature. This paper evaluates whether the use of one or more Date Time input features could produce a robust model in practical experiments or not. Two models are built, both trained and evaluated on the same training and test sets: the first considers the Date Time as an input feature, and the second does not. Some evaluation metrics are calculated for both algorithms, including precision, recall, and accuracy and then are compared. In addition, a practical experiment is run, with real data, and both algorithms for predictions are used. The best algorithm is finally shown. The SHAP value [22] is also calculated for each model, to understand the contribution of each feature to the final output.
This work is an improvement of the previous work [23], where a comparison among different multiclass classification algorithms was performed to solve the SRP problem. The SRP algorithm implemented in the previous work was carried out within the Optical/SAR data and System Integration for Rush Identification of Ship models (OSIRIS) project (The OSIRIS project: http://si.isti.cnr.it/index.php/hid-project-category-list/44-project-osirispage (accessd on 10 August 2022)), which aimed at building a surveillance maritime system. With respect to the previous work, there is a focus on the K-Nearest Neighbors classifier, which is used to evaluate the importance of the Date Time feature. Retraining of the algorithm with a bigger dataset is also performed as well as a description of a practical experiment.
The paper is organized as follows: Section 2 describes briefly the OSIRIS-FO project, which is the follow on of the OSIRIS project, while Section 3 formulates the problem, by also describing how the training and test sets are built. Section 4 discusses the results. Finally, Section 5 gives conclusions and possible improvements for the future.

The OSIRIS-FO Project
The work presented in this paper was implemented within the OSIRIS Follow-On (OSIRIS-FO) project (The OSIRIS-FO project: http://si.isti.cnr.it/index.php/hid-projectcategory-list/206-project-osirisfo (accessd on 10 August 2022)), funded by the European Space Agency. OSIRIS-FO is the enhancement of the previous project, called OSIRIS (The OSIRIS project: http://si.isti.cnr.it/index.php/hid-project-category-list/44-project-osirispage (accessd on 10 August 2022). The main objective of OSIRIS-FO involves the implementation of a system to support decision-making processes for maritime domain awareness. Figure 1 shows the OSIRIS architecture. OSIRIS focuses on a given area of interest (AoI) and elaborates on satellite images within that AoI, to extract ships from such images (Ship Detection). Once detected, ships are also classified (Ship Classification), according to some criteria (e.g., ship length, width, class of ship, etc.) Then, for each ship, the current course over the ground and speed are calculated (Ship Kinematic Estimation). The extracted information and the AIS messages are given as input Ship Route Prediction, which calculates the next position of the ship after one hour. Finally, Data Fusion recognizes exactly every ship name within the image, by combining data contained in the AIS database with those extracted by Ship Detection [24]. recognizes exactly every ship name within the image, by combining data contained in the AIS database with those extracted by Ship Detection [24]. The SRP problem can be solved through the division into cells of the AoI and the subsequent identification of which cell will be occupied by the ship after a certain time interval, given the current status of a ship. This is a multiclass classification problem, where every target class corresponds to a cell of the grid, identified by a row and a column. As input features, the current status of the ship, which includes the current latitude and longitude, the current speed over the grounds, the current course over the ground h, the class of a ship, and, optionally, the date are considered. In this paper, only the differences with respect to our previous work [23] are described, and thus, for further details, you can refer to it.

Methodology
The objective of this study is to test whether adding a Date Time feature as an additional input feature improves the performance of the SRP algorithm developed in [23 ]. In particular, the previous model was based on a K-Nearest Neighbors algorithm, which received as input the following features relating to the current state of the ship: position, speed, direction, and class of the ship (small or big). In order to verify if the addition of a Date Time feature improved the performance of the algorithm, two versions of the KNN were implemented. The first version is trained by considering the date as an input feature (KNN With Date or shortly KNN-WD), and the second version does not (KNN). The two versions were trained on the same training set, with the only difference that in the case of the KNN-WD the information relating to the date was also considered. The performance of the two algorithms is calculated in two ways. The first way uses classic performance measurement metrics (precision, recall, and accuracy). The second method is based on a more heuristic approach, which implements a practical experiment.
The described approach includes the following steps: (a) data preparation, (b) model training and evaluation, (c) a practical experiment. This methodology follows the standard approach based on splitting the dataset into two parts, training and test sets, and then using a set of unseen samples for validation. In this paper, data extracted from the practical experiment are used as validation set. An alternative to this approach could be splitting the original dataset in three parts, training, test, and validation sets. However, this approach could propagate a possible bias present in the original dataset.
Before describing the followed methodology in detail, here a short glossary of used terms used in this paper is given: The SRP problem can be solved through the division into cells of the AoI and the subsequent identification of which cell will be occupied by the ship after a certain time interval, given the current status of a ship. This is a multiclass classification problem, where every target class corresponds to a cell of the grid, identified by a row and a column. As input features, the current status of the ship, which includes the current latitude and longitude, the current speed over the grounds, the current course over the ground h, the class of a ship, and, optionally, the date are considered. In this paper, only the differences with respect to our previous work [23] are described, and thus, for further details, you can refer to it.

Methodology
The objective of this study is to test whether adding a Date Time feature as an additional input feature improves the performance of the SRP algorithm developed in [23]. In particular, the previous model was based on a K-Nearest Neighbors algorithm, which received as input the following features relating to the current state of the ship: position, speed, direction, and class of the ship (small or big). In order to verify if the addition of a Date Time feature improved the performance of the algorithm, two versions of the KNN were implemented. The first version is trained by considering the date as an input feature (KNN With Date or shortly KNN-WD), and the second version does not (KNN). The two versions were trained on the same training set, with the only difference that in the case of the KNN-WD the information relating to the date was also considered. The performance of the two algorithms is calculated in two ways. The first way uses classic performance measurement metrics (precision, recall, and accuracy). The second method is based on a more heuristic approach, which implements a practical experiment.
The described approach includes the following steps: (a) data preparation, (b) model training and evaluation, (c) a practical experiment. This methodology follows the standard approach based on splitting the dataset into two parts, training and test sets, and then using a set of unseen samples for validation. In this paper, data extracted from the practical experiment are used as validation set. An alternative to this approach could be splitting the original dataset in three parts, training, test, and validation sets. However, this approach could propagate a possible bias present in the original dataset.
Before describing the followed methodology in detail, here a short glossary of used terms used in this paper is given: • AIS messages-the messages periodically by ships.

•
AoI-the Area of Interest. In this paper, the area located under the Malta isle is considered. • KNN-the K-Nearest Neighbors algorithm. In this paper this acronym is used to indicate the algorithm which receives as input the following features: geographical coordinates, speed, course, and ship class.
• KNN-WD-the K-Nearest Neighbors algorithm with Date Time features. In this paper, this acronym is used to indicate the algorithm which receives as input the following features: geographical coordinates, speed, course, ship class, and date time.

Data Preparation
The dataset used to train the two algorithms is built from 188,130,529 AIS messages extracted from Astra Paging, covering a time interval from 26 December 2015 to 24 December 2017 and referring to the AoI near the isle of Malta. An AoI with a cell size of 0.1 × 0.1 degrees was considered.
To prepare the dataset for model training and evaluation, a new dataset is created, which is a subset of the original dataset, which is cleaned and balanced. The following preprocessing techniques have been applied to the raw dataset: (a) cleaning, (b) discretization, (c) normalization, and (d) balancing.

Data Cleaning
Data cleaning involves the deletion of all records which do not satisfy some criteria. In this case, the following rows were deleted: • Speed less than 0.1 knots. • Speed greater than 60 knots. • Course greater than 360 degrees.

•
Records where the MMSI is present only once.
Analysis could be carried out to understand why in some cases speed is greater than 60 knots, and the course is greater than 360 degrees. At the moment, these records have been removed. However, they could be replaced by adopting some of the most common techniques used to deal with missing values, such as the replacement with the average value. This task is demanded as future work on this analysis if it is needed. At the end of the cleaning process, 76,039,396 records remained, which corresponded to 2397 classes, each representing a different cell of the AoI The number of classes could be reduced to simplify the problem, for example, grouping adjacent cells. However, this grouping strategy could produce a coarser grain resolution of the AoI, thus making quite difficult to detect the exact position of the ship in the ocean. The more cells, the more accurate the predicted position of the ships.

Discretization
Discretization converts values belonging to a continuous domain to quantized values. Through discretization, the number of possible values assumed by a feature is reduced, thus the estimation algorithm can perform better. With respect to the previous work [23], discretization and transformation of the input features was carried out as follows.
Discretization was carried out by dividing the course into 8 clock faces, as illustrated by All the possible speeds were grouped into 4 slots (Table 2): slow, medium, high, and very high [15]. Finally, latitude and longitude were also discretized, by setting the latitude to the value of the row in the matrix, representing the AoI, and the longitude to the column.

Normalization
Data Normalization involves adjusting values measured on different scales to a common scale. Different techniques exist to perform normalization. In this work, single feature scaling was applied. Single Feature Scaling converts every value of a column into a number between 0 and 1. The new value is calculated as the current value divided by the max value of the column: x new = x current x max

Data Balancing
Since the dataset was imbalanced, the number of samples for each target class was set to 1000. The following balancing strategy was adopted: (1) undersample the classes with more than 1 M records through Random Undersampling; (2) undersample the classes with a number of records between 1000 and 1 M through Cluster Centroids; (3) oversample the classes with less than 1000 records through Random Oversampling. The final balanced dataset contained 2,397,000 records. Table 3 resumes the balancing strategy.

Model Training and Evaluation
The two Machine Learning models were tuned through Grid Search with Cross-Validation with Kfold = 5. Table 4 shows the tuned parameters and the best value for each parameter. The total number of records is 2,397,000. The training set contains 2,157,300 records and the test set the remaining 239,700 records. Table 5 shows the performance of both the algorithms. The table shows that in terms of evaluation metrics, KNN-WD outperforms KNN with a percentage increase of more than 23% in all the cases. This means, that theoretically, the use of the Date Time feature should improve the performance of an SRP algorithm.

A Practical Experiment
To test the performance of the algorithms in a real case, a practical experiment was set up. The goal of the experiment was to demonstrate how both algorithms work in a practical scenario. The practical experiment used a satellite image, taken in a period covered by the AIS data with which the algorithm was trained. The image was downloaded from the Copernicus Open Access Hub (https://scihub.copernicus.eu/, accessed on 16 August 2022) (The extracted image is the S1A_IW_GRDH_1SDV_20170525T165531_20170525T165556_016741_ 01BCE3_614D.SAFE, relating to an area south of Sicily and referring to the date 25 May 2017 at 16:55:56). Through the various modules of the OSIRIS project, the current status of the ship was extracted from the image, including position, speed, direction, and class of the ship. The Ship Detection module detected about 100 ships in the image. Then both the KNN and KNN-WD algorithms were run to calculate the prediction after one hour and a visual procedure was performed to establish whether the predictions were corrected.
Through the Data Fusion module, also developed within the OSIRIS project, a mapping between the AIS data and the ships extracted from the image was carried out. This permitted to reconstruct the correct path of the ship and verify the correctness of the predicted values by SRP. Among the 100 ships identified by the Ship Detection module, the Data Fusion module recognized the MMSI of only 71 ships. Among them, only 32 ships had a speed greater than 0.5 knots. Among them, only 26 ships had a sufficient number of AIS messages in the AIS dataset, which permitted us to build the ship trajectory and thus predict the ship position after one hour.
For each ship, the actual position of a ship after one hour was extracted from the AIS dataset and then compared with the predicted position by the two algorithms. Results were classified as follows: • Correct prediction-the predicted cell contained exactly the actual position of the ship after one hour. • Acceptable prediction-the predicted cell was adjacent to the actual position of the ship after one hour.

•
Wrong prediction-all the other cases. Table 6 shows the results of the tests for both the algorithms. Results shown in Table 6 are in contrast with those shown in Table 5. A possible explanation of this divergence will be discussed in the next section.
The remainder of this section shows the results for some ships. For each ship, the predicted values are represented through squares, and the ship trajectory is shown as dots with the timestamp as labels. Figures 2 and 3 show the predicted cells for ship 26 for KNN and KNN-WD, respectively. Table 6 are in contrast with those shown in Table 5. A possible explanation of this divergence will be discussed in the next section.

Results shown in
The remainder of this section shows the results for some ships. For each ship, the predicted values are represented through squares, and the ship trajectory is shown as dots with the timestamp as labels. Figures 2 and 3 show the predicted cells for ship 26 for KNN and KNN-WD, respectively.

Discussion
Model evaluation showed that KNN-WD outperformed KNN, while the practical experiment demonstrated the contrary.
The previous experiments show that there is a discrepancy between the calculated metrics and the performance of the algorithms in real use-cases. In fact, in the real usecases, KNN outperforms KNN-WD, although in terms of performance metrics, KNN-WD outperforms KNN. Probably this is due to a different distribution between the data of the training/test set and those of the real experiments.
To better understand the phenomenon, two types of analysis were performed: (a) distribution of the date features, (b) SHapley Additive exPlanations (SHAP) value. The two types of analysis are discussed separately.

Distribution of the Date Features
An analysis of the distribution of the features related to the date in the training set was performed, as well as where the date of the experiments is placed. Figure 6 shows the day sin, day cosine, hour sin, and hour cosine distribution, and, in red, where the date of the real experiments falls.

Discussion
Model evaluation showed that KNN-WD outperformed KNN, while the practical experiment demonstrated the contrary.
The previous experiments show that there is a discrepancy between the calculated metrics and the performance of the algorithms in real use-cases. In fact, in the real usecases, KNN outperforms KNN-WD, although in terms of performance metrics, KNN-WD outperforms KNN. Probably this is due to a different distribution between the data of the training/test set and those of the real experiments.
To better understand the phenomenon, two types of analysis were performed: (a) distribution of the date features, (b) SHapley Additive exPlanations (SHAP) value. The two types of analysis are discussed separately.

Distribution of the Date Features
An analysis of the distribution of the features related to the date in the training set was performed, as well as where the date of the experiments is placed. Figure 6 shows the day sin, day cosine, hour sin, and hour cosine distribution, and, in red, where the date of the real experiments falls.

Discussion
Model evaluation showed that KNN-WD outperformed KNN, while the practical experiment demonstrated the contrary.
The previous experiments show that there is a discrepancy between the calculated metrics and the performance of the algorithms in real use-cases. In fact, in the real usecases, KNN outperforms KNN-WD, although in terms of performance metrics, KNN-WD outperforms KNN. Probably this is due to a different distribution between the data of the training/test set and those of the real experiments.
To better understand the phenomenon, two types of analysis were performed: (a) distribution of the date features, (b) SHapley Additive exPlanations (SHAP) value. The two types of analysis are discussed separately.

Distribution of the Date Features
An analysis of the distribution of the features related to the date in the training set was performed, as well as where the date of the experiments is placed. Figure 6 shows the day sin, day cosine, hour sin, and hour cosine distribution, and, in red, where the date of the real experiments falls.  The hour cosine associated with the real experiment falls into an interval where there are few samples. This probably means that there are not enough samples in the training test to correctly use the date features in the real experiments. This would justify why KNN outperforms KNN-WD. In other words, the training/test set and the real experiments seem to not follow the same distribution, thus producing overfitting of the training/test set. However, this is not completely true, because the satellite image was extracted from dates available in the AIS dataset.
To solve the previous problem, an alternative could be to search for a dataset where there are enough samples for all the Date Time features. However, it could not be easy to search for a similar dataset because a balance in the Date Time features may not correspond to an equal balance in the number of samples per target class. Too many factors should be considered, which require a separate study. This could be investigated as a future work.

SHAP Value
SHAP is a method extracted from game theory to calculate the contribution of each input feature to predict an output [23]. Figures 7 and 8 show the contribution of each input feature to produce the value predicted by KNN and KNN-WD for ship 26 of the experiment described in the previous section.  The hour cosine associated with the real experiment falls into an interval where there are few samples. This probably means that there are not enough samples in the training test to correctly use the date features in the real experiments. This would justify why KNN outperforms KNN-WD. In other words, the training/test set and the real experiments seem to not follow the same distribution, thus producing overfitting of the training/test set. However, this is not completely true, because the satellite image was extracted from dates available in the AIS dataset.
To solve the previous problem, an alternative could be to search for a dataset where there are enough samples for all the Date Time features. However, it could not be easy to search for a similar dataset because a balance in the Date Time features may not correspond to an equal balance in the number of samples per target class. Too many factors should be considered, which require a separate study. This could be investigated as a future work.

SHAP Value
SHAP is a method extracted from game theory to calculate the contribution of each input feature to predict an output [23]. Figures 7 and 8 show the contribution of each input feature to produce the value predicted by KNN and KNN-WD for ship 26 of the experiment described in the previous section. The hour cosine associated with the real experiment falls into an interval where there are few samples. This probably means that there are not enough samples in the training test to correctly use the date features in the real experiments. This would justify why KNN outperforms KNN-WD. In other words, the training/test set and the real experiments seem to not follow the same distribution, thus producing overfitting of the training/test set. However, this is not completely true, because the satellite image was extracted from dates available in the AIS dataset.
To solve the previous problem, an alternative could be to search for a dataset where there are enough samples for all the Date Time features. However, it could not be easy to search for a similar dataset because a balance in the Date Time features may not correspond to an equal balance in the number of samples per target class. Too many factors should be considered, which require a separate study. This could be investigated as a future work.

SHAP Value
SHAP is a method extracted from game theory to calculate the contribution of each input feature to predict an output [23]. Figures 7 and 8 show the contribution of each input feature to produce the value predicted by KNN and KNN-WD for ship 26 of the experiment described in the previous section.  In the previous example, KNN predicts 1805 (correct) as the target class, while KNN-WD 1124 (acceptable). Each model starts from a standard value, identified as E[f(x)], and then it adds or subtracts the contribution of each feature. In KNN, the row is the class giving the greatest contribution, while in the KNN-WD the Date Time features contribute to lowering the value of the predicted class. The presence of many negative contributions in KNN-WD justifies why KNN-WD tends to predict a very low value, which is not the case in KNN, which is very close to the expected output.      In Figure 9 the predicted value is 1438 (acceptable), which is built by the sum of positive features (speed, row, and class) and negative ones (column and course). In Figure  10, the predicted value is 1232 (wrong). Again, the Date Time features contribute to lower In Figure 9 the predicted value is 1438 (acceptable), which is built by the sum of positive features (speed, row, and class) and negative ones (column and course). In Figure 10, the predicted value is 1232 (wrong). Again, the Date Time features contribute to lower the predicted value, with the exception of hour_sin, which gives a positive contribution.
Both previous examples show that Date Time features give a negative contribution to the final predicted output. Figures 11 and 12 show the average SHAP for all the 26 ships considered in the experiment, for KNN and KNN-WD respectively. In both cases, the row is the most important input feature. In KNN-WD, the day_sin feature is the second most important feature, thus giving a strong contribution to the final prediction. Then, in both cases, course, column, and speed make a contribution in descending order of importance. In both cases, in the last place, there is the class. In Figure 9 the predicted value is 1438 (acceptable), which is built by the sum of positive features (speed, row, and class) and negative ones (column and course). In Figure  10, the predicted value is 1232 (wrong). Again, the Date Time features contribute to lower the predicted value, with the exception of hour_sin, which gives a positive contribution.
Both previous examples show that Date Time features give a negative contribution to the final predicted output. Figures 11 and 12 show the average SHAP for all the 26 ships considered in the experiment, for KNN and KNN-WD respectively. In both cases, the row is the most important input feature. In KNN-WD, the day_sin feature is the second most important feature, thus giving a strong contribution to the final prediction. Then, in both cases, course, column, and speed make a contribution in descending order of importance. In both cases, in the last place, there is the class.  The previous analysis showed that using Date Time as input features strongly affects the behavior of an algorithm, and thus, great attention should be paid while using a feature of this type.

Conclusions and Future Work
This paper discussed the importance of time-dependent input features for a classification problem. A practical experiment was implemented, showing that although KNN-WD outperforms KNN in terms of performance metrics, KNN behaves better than KNN-WD in practical experiments This is probably due to the fine-grained date feature: in the training/test set, there were many samples for the considered period of time, while in the experiments, there were few samples for the considered period of time.
The most important findings of this paper include: • Special attention should be paid when dealing with Date Time features. In fact, the temporal conditions present in the training set may no longer be true at run-time.

•
The calculation of the SHAP value in the experiments showed that the Date Time The previous analysis showed that using Date Time as input features strongly affects the behavior of an algorithm, and thus, great attention should be paid while using a feature of this type.

Conclusions and Future Work
This paper discussed the importance of time-dependent input features for a classification problem. A practical experiment was implemented, showing that although KNN-WD outperforms KNN in terms of performance metrics, KNN behaves better than KNN-WD in practical experiments This is probably due to the fine-grained date feature: in the training/test set, there were many samples for the considered period of time, while in the experiments, there were few samples for the considered period of time.
The most important findings of this paper include: • Special attention should be paid when dealing with Date Time features. In fact, the temporal conditions present in the training set may no longer be true at run-time.

•
The calculation of the SHAP value in the experiments showed that the Date Time features greatly influence the determination of the results, giving almost always a negative contribution, which tends to lower the final predicted value.
The described results rely onto an empirical analysis, which focuses on a practical experiment. As a future work, a more theoretical approach could be defined, to demonstrate the influence of Date Time features on the behavior of a Machine Learning algorithm.
To improve the KNN-WD algorithm, a coarse grain discretization of dates could be performed, e.g., use season instead of day of year, split 24 h in 4 slots and so on. This could lead to more samples in each time slot and thus reduce the impact of the date in the SHAP value calculation.
Finally, as pointed out in Section 4.1, a deeper analysis could be carried out to build a training set with a more balanced distribution of the Date Time input features. This should also take into account a balance in the target classes.