How Good Is the STW Sensor? An Account from a Larger Shipping Company

: This paper examines the statistical properties and the quality of the speed through water (STW) measurement based on data extracted from almost 200 container ships of Maersk Line’s ﬂeet for 3 years of operation. The analysis uses high-frequency sensor data along with additional data sources derived from external providers. The interest of the study has its background in the accuracy of STW measurement as the most important parameter in the assessment of a ship’s performance analysis. The paper contains a thorough analysis of the measurements assumed to be related with the STW error, along with a descriptive decomposition of the main variables by sea region including sea state, vessel class, vessel IMO number and manufacturer of the speed-log installed in each ship. The paper suggests a semi-empirical method using a threshold to identify potential error in a ship’s STW measurement. The study revealed that the sea region is the most inﬂuential factor for the STW accuracy and that 26% of the ships of the dataset’s ﬂeet warrant further investigation.


Background
The shipping industry has a strong wish to reduce the environmental effects of their operations and become sustainable. In addition to this, strict regulations on notably carbon emissions make it necessary for shipping companies to closely monitor and evaluate the performance of both the single ship and the entire fleet daily. Increasing use of on-board sensors and improved speed-power performance models have led to a significant increase in the fuel efficiency. The central parameter in this context is the measured speed with which a ship advances. Thus, the necessary power to achieve a given speed is approximately proportional to the speed cubed [1][2][3]. Consequently, it is clear that any uncertainty or error in the measured speed can have a detrimental effect on the outcome of speed-power performance models.
At sea, the speed of an advancing ship is measured either relative to the seabed (speed over ground-SOG) or relative to the surrounding water flowing past the hull (speed through water-STW). Both speed types apply in modern navigation systems [4]. However, for all models addressing problems related to speed-power performance, STW is the measurement that enters the models; obviously noting that the two measurements (STW and SOG) are related through the speed of sea current. Typically, maritime speed logging devices use one of the following measurement principles to obtain the STW: water pressure, electromagnetic induction, or the transmission of low frequency radio waves [5].
The latter refers to as the Doppler velocity log (DVL), which is the speed-log used by the ships dealt with in this paper, and used by the majority of today's merchant ships.
DVLs are installed along the centerline of the hull at a dry accessible location in the bow of the ship, where the flow is as undisturbed as possible and contains few air bubbles. It is evident that DVL sensors measure in a highly fluctuating environment [6]. The flow around a ship is affected by eddies, currents, waves, stratified current layers etc. Furthermore, the sensor itself moves relative to the water due to ship motions [7]. Accurate measurement of ship speed is therefore difficult, notwithstanding STW is the most important parameter for performance analysis. The calculation of the STW accuracy has been a challenging task for the experts of the shipping industry ever since the launch of the first DVL sensor used for underwater navigation [7][8][9][10][11][12][13][14][15][16][17].
Seafarers, marine engineers, and land-based performance analysts recognize that there is error involved in the STW measurement greater than the values indicated by the respective DVL manufacturers. Likewise, there are claims of observed constant offset in the STW measurements [18]. Reported examples from daily operations include also sudden jumps and drifts in the STW signal with measurements constantly decreasing/increasing for relatively long periods (hours), despite a maintained constant speed as secured by crew. Such incidents, apart from bringing mistrust to the crew navigating the ship, they also shift the outcome of any dynamic voyage planning and they finally cause headaches to performance analysts. However, the above-mentioned testimonies will remain unsupported if they are not properly investigated and documented. Individual cases are not enough to hold assumptions and generalize for a larger extent.

Objective and Scientific Contribution of Study
Although all the above-mentioned studies enhanced the literature of Doppler logging devices, there has never been an attempt to quantify the deriving error using real sensor data from a large fleet consisting of ships with different characteristics. This paper constitutes an initial attempt to quantify the error affecting the STW measurement using high-frequency data from approximately 200 container ships sailing for 3 years. As such, it is an account from a large container shipping company with the aim to make a systematic analysis of different sets of operational and environmental data collected and stored in-company and obtained from third parties. The overall objective of the paper is to contribute towards a better understanding of the responsible causes suspect to degrade the STW measurement.

Problem Formulation
Understanding a ship's SOG and STW accurately joins with a range of daily operational challenges on-board the ships and on-shore for any shipping company. One of the main objectives of most shipping companies is to model the current performance of their ships and forecast what will happen in the very near or distant future. For instance, looking at the short-term voyage planning model which provides advice for the crew, an internal company study indicates that the average speed prediction profile to arrive on time in all operational conditions should be less than 1.0 kn RMSE (Root Mean Square Error) for the crew to trust the advice. This leads to several interesting research questions, like:

1.
What parameters and external conditions seem to be the most influential on the STW measurements accuracy? 2.
How often and where are these parameters and conditions observed? 3.
Assuming a given advice quality requirement (threshold), how sensitive are the answers to the above questions?
In this study, three speed signals have been used for the purpose of answering these questions. The first is the STW, referred to by U w , the second is the SOG, given as U g , and the third is the speed of the surface currents projected on the ship's true heading, defined by U c . Those three signals have a theoretical relationship which constitutes the foundations of this study.
Equation (1) assumes that the STW equals the SOG subtracted by the speed of the currents. Two reference frames are used to represent the motion of the ship: the North-East-Down (NED) coordinate frame {n} is the inertial frame used to describe the pose of the ship; the body-fixed coordinate frame {b} is the non-inertial frame fixed to the ship used to describe linear and angular velocities. Theoretically, knowing the true SOG and the true speed of the currents, the true STW could easily be calculated with precision. However, those signals are only provided through sensor measurements or predictions (see Figure 1) which consequently means that there is error involved. Assuming an additive measurement error we obtain the following observed variables: It is assumed that each measured signal is mutually independent and affected by zero mean white Gaussian noise with variance σ 2 g i.e., g ∼ N (0, σ 2 g ), σ 2 w i.e., w ∼ N (0, σ 2 w ), σ 2 c i.e., c ∼ N (0, σ 2 c ) and σ 2 ψ i.e., ψ ∼ N (0, σ 2 ψ ). ψ refers to ship's compass true heading and is furthered analyzed in Section 2.1.

Theoretical Background of Position and Speed Signals
At this point, it is imperative to analyze each error term of Equations (2)-(4). Starting from Equation (2) the GPS gets the data from a GNSS (Global Navigation Satellite System) receiver providing synchronous measurements of ship position (N lat , E lon ) and SOG (U GPS ). Most of the manufacturers of the GPS installed on company-owned ships, claim that the position accuracy accounts for σ p < 2.5 m . Only a few of them indicate the SOG accuracy. In one datasheet the SOG accuracy was registered for σ g < 0.13 kn.
Progressing to Equation (3) regarding DVL accuracy, numerous DVL manufacturers provide diversified information. Once calibrated the DVL measurements may approach an accuracy of 0.1% [19], but a series of factors influence the result. Many DVL manufacturers claim that the accuracy is related to speed intensity followed by the below boundaries.      U DVL < 10 kn σ w = 0.1 kn 10 kn ≤ U DVL < 25 kn σ w = 0.16 kn 25 kn ≤ U DVL < 50 kn σ w = 0.32 kn Below are the factors influencing the DVL measurements, according to [6]: • Water Clarity. The STW measurements depend on acoustic reflection from solid particles in the water such as microorganisms or suspended dirt. In extremely clear water the quantity of scatters may be insufficient for adequate signal return. • Aeration. Aerated water under the transducer may reflect sound energy which could erroneously be interpreted as sea bottom returns. Sailing in heavy weather may be the source of this effect and so could non-laminar flow around the transducer. By placing the transducer near the bow the effect of non-laminar flow is reduced considerably. • Ship's trim and list. Changes in the trim (affects fore/aft speed) and list (affects transverse speed) of the ship will affect the measured speed. (e.g., 5 • trim change foresees 0.4% speed change [6] Lastly, Equation (4) assumes that U COP is subject to two error terms, c and ψ . The first is a combination of prediction error from the grid provided by the external weather provider along with the error from the bi-linear interpolation to match the ship's location to that grid. According to CMEMS (Copernicus Monitoring Environment Marine Service) [20] σ c = 0.37 kn. The latter is the error deriving from the projection of the sea currents speed vector U n c = [U x , U y , 0] T ∈ R 3 to the ship's true heading measurements ψ m . ψ m is a measurement provided by the gyrocompass installed on the ship. Observation at sea indicates that 0.25 • is the error in many typical gyrocompass installations [21]. Consequently, invoking the gyrocompass error into the projection formula, it derives that σ ψ = 0.001 kn. To sum up, the measurements have been identified with error characteristics as stated in Table 1. The present study is trying to showcase how far or close to reality the error values of Table 1 are.

Error Identification in STW Measurement
It is imperative to develop a detailed explanation of how to check potential error in the STW measurement. Based on Equations (1)-(4) we obtain: Equation (5) indicates that there is a speed deviation which we can easily calculate by subtracting the measured STW (U DVL ) from SOG (U GPS ) and it must be equal to the projected speed of the currents (U COP ). According to [22], the range of magnitude of global currents, span from 0.01 kn to 7.78 kn. However, currents generally diminish in intensity with increasing depth, so the higher end of the magnitude range refers to shallow water currents close to shore. According to [23] a characteristic sea surface speed is 0.1 kn to 0.97 kn. More specifically, in 98% of the bins, the mean surface current did not exceed 0.97 kn. After illustrating the GPS coordinates registered from 190 container ships during 3 years of operation in Figure 2, it is evident that liner shipping is following specific seaways unlike bulk and crude oil carriers. In Figure 3 there is an illustration from [22] depicting the global surface currents magnitude during northern hemisphere winter. When comparing the last two figures, it is indisputable that most of the ships' seaways are along one of the streams. Considering the above, in this research study we have assumed that a container ship sailing in steady state is unlikely to confront (from any direction) a surface current speed magnitude U n c = U 2 x + U 2 y greater than 2.0 kn, which is the value of the defined threshold referred to in Section 1.3. Focusing down to a single ship level, the same 2.0 kn threshold can be used to identify if the STW and SOG measurements are experiencing significant errors. The procedure is the following:

1.
Acquire STW measurements, SOG measurements and high resolution (both spatial and temporal) sea currents reanalysis outputs obtained by a weather provider.

2.
Create a scatterplot of SOG vs. STW and inspect the correlation of the two vectors. They are expected to be highly correlated.

3.
Create the speed deviation feature U dev which is a product of subtracting STW from SOG like Equation (5), plot a histogram and a boxplot to check the distribution of U dev and calculate the percentage of |U dev | > 2 kn. 4.
Plot the histogram and boxplot of U COP to check its distribution and identify the statistical characteristics of the sea currents magnitude of the regions where the ship has sailed. Plot the ship seaways on a map to verify if the U COP values reflect the characteristics of the sea region.

5.
Compare U dev and U COP distributions. When the U dev boxplot of one ship's median is close to 0, the inter-quantile range (IQR) is narrow enough to span from −1 to 1 knot, and the whiskers span between −2 and 2 knots, and finally the U COP distribution conveys the same story then it is indicative that the DVL measurements can be trusted. Anything outside these boundaries is questionable and requires further investigation by location.
At this point, it should be noted that the particular threshold value of 2 kn is a decision made by the authors. The investigations made in relation to the strength of average sea current, indicate that the 2 kn threshold is a reasonable choice/decision. It is expected that the outcome of the analysis to some degree depends on the precise value of the threshold and, as such, there is incentive for making a sensitivity study to comprehensively investigate the influence of the threshold value on the outcome and findings. However, this exercise is beyond the scope of the present paper and is left as a work for the future.

Sources
As already introduced, data has been collected from many different sources; both internally (within the company) and externally (open-source as well as paid services). Below are the descriptions of each available data source that has been meticulously merged to end up in the final dataset.

CAMS Data
This study is conducted on sensor data from Maersk's fleet deriving from a system called CAMS (Control Alarm Monitoring System). Thus, CAMS is the source of the main dataset and it includes sensor readings from each ship. CAMS contains 10-min sampled data from all distributed sensors along the ships, and is sent to shore continuously as often as there is satellite connection available on-board each ship. There are around 300 variables available in CAMS but only a subset of these are a priori assumed to affect the STW sensor readings and thus included in this study. In CAMS, each ship is identified by its unique IMO number, and Table 2 illustrates the chosen features for each of the downloaded IMO numbers.

AIS Data
The AIS is an automatic tracking system that uses transceivers on ships and is used by vessel traffic services (VTS). AIS information supplements marine radar, which continues to be the primary method of collision avoidance for water transport. It is confirmed that AIS is using a different set of sensors than the CAMS dataset. This means that all available measurements are comparable with the adequate CAMS measurements. The company does not collect this data, but an external provider does, and it is where the data was sourced from. The AIS dataset is used for comparison and correction purposes mainly due to faulty GPS locations registered on the CAMS dataset. Table 3 show the selected features from the AIS dataset.

IHO Seas Data
There is a report from IHO (International Hydrographic Organization) called "Limits of oceans and seas in digitized, machine readable form" [24] which includes the boundaries of 148 oceans and seas. The given positions were typed into a spreadsheet and were completed to a right ordered polygon by hand using Google Earth. This dataset is the digitized version of the printed "Limits of Oceans and Seas" [25]. The dataset is structured by the columns as described in Table 4 and will help classify the STW accuracy issue into regional categories. In Figure 4 one can see the borders of each sea as reference.

General-Info Data
Following the same classification principle, particular information for each ship of the dataset is provided through an internal company source. For instance, the vessel class and the DVL manufacturers info are some of the features out of a range of valuable categorical variables that has helped classify the data and identify if they relate to the STW accuracy. The classification dataset named from now on the 'general-info', is shown in Table 5.

Hull and DVL Cleaning Events
According to [26] sensor fouling has proved to be one the more influencing factors on the quality of the DVL measurements. Based on this assumption the hull and DVL cleaning events of each ship over time has been included in the study. Especially the timestamps where the cleaning and re-calibration of the DVL take place are expected to be significant and might reveal high correlation with U dev . Table 6 describes the dataset for each ship. The length of the table depends on the amount of cleaning events registered for each ship. MET ocean data is sourced from external providers. In this study, MET ocean data includes information about sea current and waves, and in either case data is collected with credit to the E.U. Copernicus Marine Service Information. Specifically, sea current was obtained from CMEMS [27] while wave data relies on ERA5 [28]. ERA5 is a reanalysis database that provides hourly updates of wave spectra in grid points spaced by 0.5 • latitude and longitude degrees. Table 7 presents the features downloaded and later bi-linearly interpolated using the method described in [29]. The sea current velocity vector measurements, combined with the true heading of the ship as given by compass ψ m , one can find the projected speed of the currents on the ship's body-fixed reference frame U COP .

Data Processing
This section describes the preprocessing of the data. Although this task is an asynchronous process, below is an ordered description of the steps that ended up to the main (final) dataset.

Filtering and Merging
Focus is exclusively on the actual sailing operations. Thus, many of the other parts of shipping operations are filtered out; this includes quay stays, anchoring, maneuvering, maintenance (dock stays), etc. In practice, this was simply done by including data only if the forward speed is above 5 knots, although important information such as cleaning events and dock stays were kept track of. In total, the dataset was reduced to 40% of the original dataset.
Since CAMS is the main dataset, the rest of the data sources are merged to this. Figure 5 illustrates the merging of the rest of the data sources to the CAMS dataset. In cases where there were no exact timestamp match, a tolerance level of a few seconds is given, depending on the importance of the related merged features. For cleaning events, the merging led to a new feature counting the number of days since the last occurrence of each of those events. In cases where the frequency of the merged dataset was lower than that of the main dataset, the merging resulted in missing data which needed to be imputed. Depending on each feature's physical properties, the replacement of the missing data has been handled adequately as described in the next subsection.

Outlier and Missing Data Replacement
Most of the missing values of the main dataset derived either from CAMS or from the merging process. In Figure 6 one can see the missing data matrix for the most significant features of the main dataset. It starts on 01-01-2017 and ends on 22-02-2020 and it sums up to 190 ships and ≈7 million rows. It is illustrated in Figure 6 that from the 7,333,490 rows of the dataset, the GPS coordinates N lat and E lon have several missing values (15%) after validating and correcting the measurements with AIS data. For other parameters (return to the nomenclature for notation), the following can be noted: ψ m has 14% missing values due to malfunction of the compass on some ships, θ m has 97%, φ m has 81%, χ m has 71%, U dev has 0.5% missing values deriving from random iterations of U DVL missing values, and finally U COP has (25%) missing values deriving from N lat , E lon and ψ m . Here, it should be noted that the features, with missing values-percentages higher than >70%, have been dropped entirely from the main dataset.
Each feature of the main dataset has been meticulously checked for outliers. This includes obvious outliers based on physical constraints, drop-outs, spikes and finally repeated values indicating frozen signals. The methodology followed on identifying outliers was based on the work by Dalheim and Steen [30].
After identifying missing values and outliers followed a careful replacement process. In fact, the replacement has been one of the most tedious tasks of this study, because the replaced values had to be validated for all ships. Even a small mistake might lead to wrong conclusions on the analysis and the modeling section. For instance, when latitude and longitude outliers were identified in CAMS, they were replaced with their AIS substitute, since it has been internally investigated that the AIS system uses a dedicated GNSS antenna separate from the one that the ship uses for the GPS coordinates. However, there were cases that even the substituted values did not comply with the validation constraints. In these cases, it was decided to linear interpolate instead of creating an additional filter where the location would be estimated based on a physical model. Table 8 describes all the main features of interest that constitute the main dataset. This dataset has been used for the analysis and modeling part. Some features had to be dropped from the main dataset after the merging, either because of the high number of missing values, or because they were assumed to be less significant for the purpose of this study. Please note that the sampling frequency of ships on the main dataset is uneven, since the maneuvering state has been completely dropped.

Analysis and Modeling
In this section, the main dataset has been analyzed, attempting to answer the research questions. The section starts with the U dev and U COP decompositions over the whole fleet (Sections 4.1 and 4.2) and proceeds with a regression analysis of the main features of interest versus U dev .

U dev Decomposition
In Section 2.2 we assumed that a container ship sailing in open waters is unlikely to confront a surface current speed U n c greater than 2 kn. Hence the threshold of 2 kn is used to identify potential errors of the STW and SOG measurements. It is otherwise expected that U DVL and U GPS are linearly correlated. The scatterplot in Figure 7 depicts the correlation of the two features deriving from the whole fleet. The red-colored dots indicate those indices that are outside the speed deviation boundary of 2 knots (|U dev | > 2 kn). At first glance there seems to be many faulty indices (red dots) in the CAMS dataset. However, when looking at the bivariate kernel density estimate (KDE) plotted on top of the scatterlot, it is evident that the issue is not that severe, since most of the points are concentrated within five contour levels below 2.0 kn within the gray area. Checking on the percentage of points distributed among the 2 kn boundary, on Table 9 one can see that only 2.47% out of ≈7 million points are considered to be either DVL or GPS failure. Even small fractions of DVL failure can contribute to huge costs, especially for companies operating hundreds of ships. However, given that 2.47% out of ≈7 million measurements is experiencing faulty DVL and GPS measurements, one can decompose Table 9 by vessel class and by DVL manufacturer and inspect how much above or below this general percentage boundary each category is sitting. Tables 10 and 11 reports the outcome of this analysis for DVL manufacturer and vessel class. For anonymity, the brand names of the DVL manufacturers have been changed into index numbers. The only DVL manufacturer that is far above the 2.47% boundary layer is M.5. Later, it has been confirmed that this is not a proof of a manufacturer of low accuracy, but this indication is caused by a few ships with miscalibrated or malfunctioning sensors at some period of time. As seen in Table 11, there are a few vessel classes in which the percentage is greater than 2.47% and these classes have been investigated thoroughly. Boxplots are used to illustrate the ship level, as illustrated in Figure 8. It is evident that most of the ships' IQRs span among ±1 kn. Some IQRs are narrower than others, which indicate that there is a stronger correlation between SOG and STW. Strong correlation implies either minimal signals error or that the ships sail in areas with very small or no ocean current. The latter is not likely, since ships from almost all vessel classes sail in at least one sea region where the sea currents happen to be stronger than average. Focusing on the vessel classes categorization, it seems that ships under the same class record similar performance. For instance, the boxplots from class C.16 (gray boxplots) look wider than those from class C.7 (yellow boxplots) meaning that the IQRs of C.7 imply stronger correlation between SOG and STW than those of C.16. Some boxplots from C.7 are shifted to the left, which means that the mean value is negative which implies that the ship is mostly confronted with head currents, where when it is shifted to the right, the mean value is positive, and it implies that the ship is mostly enjoying following currents. The whiskers of C.7 do not exceed ±2 kn where those of C.16 do that in some cases. Additionally, the outliers (which constitute only the 0.7% of the data) of C.7 are more condensed than those of C.16 where one can see a few scattered cases to the right. Adding class C.26 into the comparison (orange boxplots) the IQRs there indicate that the correlation is similar to that of C.16 but the whiskers imply that the variance of the distribution is similar to that of C.7. In a few words, one can say that C.7's measurements seem more trustworthy than those of C.26 which in turn seem more trustworthy than those of C.16 and so on.
Diving a bit deeper and splitting the boxplots using an additional dimension as on the tables earlier, say the DVL manufacturers, one can see the resemblance or differences. Figure 9 depicts the additional split. For instance, when focusing on class C.15, one can clearly see that each ship wears one of the two different DVL manufacturers (NAN refers to missing values). There is a slight difference in the IQR between the two sets of boxplots. Those of M.1 are a bit narrower than the others. Also, all whiskers of M.1 are within the ±2 kn boundary where some of the others span beyond it.
To combine the table information of the boundary percentage of 2.47% with the boxplots, it can be appreciated from Figure 10 in which ships the boundary percentage is exceeded (after the dashed-black-line). Getting back to the C.7-C.16 class example from above, when one checks the percentage of U dev which goes beyond 2 kn in Figure 10, the advantage of C.7 over C.16 class is evident. C.7 ships never exceed the general percentage boundary of 2.47%. On the other hand, there are several C.16 class ships that exceed the percentage boundary layer. This is because some whiskers of C.16 class boxplots of Figure 8 span beyond the ±2 kn boundary limit. Same goes with class C.26 where only one ship exceeds the boundary percentage.
Another categorical feature of the main dataset that is interesting to investigate is the sea region. Due to space limitations, Table 12 shows only the seas where the percentage of |U dev | > 2 kn column was above the 2.47% boundary. It is evident from the percentage column that most of the seas with the highest scores refer to either narrow passages or shallow water regions.   By depicting the seas on the map, the above statement gets even clearer. Figure 11 highlights the sea regions with high percentage boundary in gray scale, same as in Table 12. The greatest amounts are occupied by North Atlantic Ocean, Indian Ocean and Arabian sea but as seen by the color scale, these are the least alarming regions. The percentage is increasing when ships have sailed around Oceania, Philippines, Singapore and in short passages where sea currents are known to be stronger than the rest of the world, such as the English Channel and the Strait of Gibraltar. Figure 11. The IHO seas regions of the world. The regions are gray when the percentage column of |U dev | > 2 kn is greater than 2.47% and it gets darker when it increases.

U COP Decomposition
According to (5), U dev and U COP should be equal. In case there is any deviation between them, it should be due to a mix of measurement and prediction error from GPS, DVL, compass and sea currents. Checking on the percentage of points distributed among the 2 kn boundary for U COP , on Table 13 it is realized that only 0.63% out of ≈7 million points from ships sailing across the global seas has experienced a stronger current than 2 kn on its bow. The percentage compared to the one from U dev is significantly smaller. This means that the predicted sea currents deriving from the external provider, deviate from the ones calculated by the measured GPS and DVL resulting in U dev . The above statement implies that it is either (i) the external provider that is underestimating the truth about the surface sea currents magnitude, (ii) the signal errors provided by the manufacturers of GPS, DVL and gyrocompass sensors are a lot higher than what they disclose, or (iii) there is a deeper layer of sea currents predictions U COP d which resembles more with U dev than with U COP . In other words, case (iii) assumes that the DVL sensor measures the STW by viewing the speed of the currents on a deeper layer rather than the surface. If at some regions the deeper layer differs in magnitude and direction from the surface layer, then case (iii) is the most reasonable explanation. The inference of this is that it could be relevant to make an investigation where depth is considered to be a third dimension on the global sea currents. Although this exercise is left for the future, some initial investigations have been made [31].
Placing the distributions of U dev and U COP side-by-side in Figure 12, one can see the difference between them. Both are symmetrical distributions with similar mean and median, and different variance. Enhanced with the boxplots of Figure 13 it is easy to distinguish a few points. First, the U COP distribution perfectly coincide with the claims from [23,32] which is a confirmation that the sea currents affecting the ships of that study match with the generic global statistics of surface currents. Secondly, there are a lot of outliers on the U dev dis-tribution that are not present in U COP . From equation (5), and based on Table 1 and the above distributions, it is assumed that the error of U dev is greater than that of U COP which means that w and g is greater than ψ and c despite what the manufacturers and weather providers claim. Given that g is low because of the high accuracy of the GPS receiver [33], the highest difference is assumed to be derived from the DVL error w . By using the same approach as with U dev , U COP can be similarly decomposed by category. The distribution by sea name is the most relevant, because sea currents ought to be related to sea regions, as presumed by Figure 3. In Table 14 one can see what the percentage is of U COP that is above or below the initially defined 2 kn boundary. It is a positive indication to see that most of the sea regions from Table 12 that were above the 2.47% boundary percentage are again apparent in Table 14 with the new U COP boundary percentage of 0.63%. However, there are a few exceptions such as the Strait of Gibraltar, which in Table 12 conveys the impression that there are strong sea currents in the region (26.20% of |U dev | > 2 kn), but when compared with U COP of Table 14 the percentage (0.69% of |U COP | > 2 kn) does not validate the assumption. Figure 14 depicts the U dev -U COP differences of all sea names in a boxplot format. Here, it is important to note that in regions with sea currents of high volatility such as the English Channel, it is peculiar to see a few outliers indicating a negative correlation between U COP and U dev . This could be rationalized. The spatial resolution of the sea currents data is a 0.083 • × 0.083 • grid which means that when pointing at the Channel's latitude (≈50 • ) the grid is ≈1.5 km wide. Given the fact that U COP is calculated by spatially interpolating into the two-dimensions grid (1.5 km ×1.5 km) and later temporally interpolating (because of the frequency mismatch of the sensors with the external weather data) within whole hours, when the region is highly volatile in sea currents magnitude and direction (similar to the English Channel [34]), the calculation might generate some additional error and more outliers than usual. In Figure 15 there are two snapshot of the English Channel's surface sea currents speed, one 2 hours after the other. Comparing the images, it is obvious that the magnitude and direction of the surface currents in the region are exposed to large variations. Pinpointing the sea regions by color of intensity in Figure 16 based on Table 14, one can see the differences and similarities of that with Figure 11.  The boxplots of U COP by vessel class and IMO number have been plotted in Figure 17 to inspect and compare with U dev from Figure 8. By a first look at the U COP boxplots, it is clear that the IQRs are more condensed than those of U dev . Additionally, almost all boxplots' mean values are centered around 0 which is more likely than the shifted mean values from the U dev boxplots because almost all ships of the dataset have sailed for a long time window. There are a few exceptions though, such as the second ship from class C. 21 with only a few iteration in the dataset. This ship's boxplot is shifted to the left which implies strong head sea currents for as long as the data was recorded. Comparing the same ship's boxplot from Figure 8, the ship was indeed under strong head currents. The question that rises again is which of the two is closer to what really happened. Lastly, in Figure 17 one can distinguish the vessel classes that sail under calm versus strong sea currents. Class C.27 for instance, has mostly narrow boxplots compared to others. Indeed, C.27 ships sail mostly in Baltic sea where the sea currents are milder. There is one ship in C.27 in which the U dev boxplot's mean does not match with the one from U COP . In particular, the U dev mean is shifted to the left implying head sea currents for its entire journey. That ship's data recordings span from 04/12/2018 to 22/02/2020 and in Figure 18 there is the U dev and the U COP for the whole journey. By comparing the two maps, it is more likely that the bottom is closer to the truth, because the Baltic sea currents look mild in the bottom map. On the contrary, in the upper map it seems like there is a constant head current during the whole journey which is highly unlikely. The apparent ship constitutes a proof that although |U dev | > 2 kn is below 2.47%, the error in STW is still high. This means that the 2 kn boundary initially set as a threshold for this study is quite sensitive in calm seas.

Regression Analysis
Two different correlation analyses have been used: (i) the Pearson correlation and (ii) the Spearman correlation. The first, evaluates the linear relationship, where the latter evaluates the monotonic relationship between two continuous features. A relationship is linear when a change in one feature is associated with a proportional change in the other feature. In a monotonic relationship, the features tend to change together, but not necessarily at a constant rate [36]. Figure 19 illustrates the Pearson and Spearman correlation matrices for the continuous features of the main dataset. As expected, U GPS and U DVL are highly correlated. Also, engine power E pow and engine shaft rotations E rpm are highly correlated with both U GPS and U DVL . Next in correlation intensity is the sea temperature S temp with the latitude N lat which is also reasonable because sea temperature is warmer around the equator. Then, there is the feature that counts the number of days since the birth of the vessel C bov correlating with most of the cleaning events of the hull C ddp , C dd f b , C pp and C hc . This is also straightforward since C bov is a constantly-increasing-over-time feature along with the rest of the cleaning features. Finally is the most unexpected low correlation pair, U dev with U COP . This last one in a perfect world without measurement error, should be 1 since both U dev and U COP represent the same value. Figure 20 depicts the linear relationships of all continuous features versus U dev . On top of each scatterplot of the multi-graph, both the Spearman and Pearson correlation coefficients are indicated as titles. Focusing on the blue lines that represent the slope of each feature with U dev , only on the third scatterplot with U COP is the line tilted, entailing linear relationship. In this scatterplot, it is also peculiar to see the great number of points concentrated around U COP = 0. This is an indication that even when the weather provider reports neutral sea surface currents, the speed sensors convey a different story.
The association between U dev and other explanatory variables such as U GPS is expected to be nonlinear. Due to complexity of the relationship between U dev and the rest of explanatory variables and due to the non-homogeneity of the variance of U dev it has been decided to run a GAMLSS (Generalized Additive Model for Location, Scale and Shape) for each of the continuous features of the main dataset. The GAMLSS is a modern distribution-based approach to (semiparametric) regression. A parametric distribution is assumed for the response (target) variable (in our case U dev ) but the parameters of this distribution can vary according to explanatory variables using linear, nonlinear or smooth functions [37]. Considering the GAMLSS model as: where Y = (y 1 , . . . , y n ) T is the n length vector of the response variable, D(µ, σ, ν, τ) is a four-parameter distribution, µ, σ, ν, τ are the distribution parameters which are all vectors of length n, η k is the predictor vector for k = 1, 2, 3, 4, g k (.) are known monotonic link functions relating the distribution parameters to explanatory variables, X k is a fixed known design matrix of order n × j, β k is the coefficient vector, s kj is a non-parametric smoothing function applied to covariate x kj for j = 1, . . . , J [38]. Parameter µ defines the location (mean), σ the scale (standard deviation) and finally ν, τ define the shape (skewness and kurtosis) of the distribution. In Figure 21 there is the multi-GAMLSS plot. In each sub-graph there is a continuous feature acting as the explanatory variable enhanced with the additive term of cubic smoothing splines with 5 effective degrees of freedom in all four parameters. Since only one explanatory variable is used in the fit, centile estimates for the fitted distribution are also plotted in each sub-graph, showing 0.4%, 2%, 10%, 25%, 50%, 75%, 90%, 98% and 99.6% centiles. The gray shade in between each consecutive boundary represents the segments of the distribution. For instance, 50% of the distribution falls within the darkest gray area.   Figure 21 it is possible to conclude that the only interesting relationships among the available features are U dev , U COP and U GPS . Consequently, Figure 22 applies U dev and U COP as response variables and U GPS as explanatory variable in the GAMLSS model (6) for top and bottom graphs, respectively. The expected behavior of the scatter points of U dev and U COP is a funnel-shaped plot, starting narrow and getting wider with increasing U GPS . However, only by looking at the scatter points it is not clear how to distinguish the shape of the funnel, if any, apart from the nonlinear relationship of U dev and U COP with U GPS . With the centiles included in the graphs, one can compare the distributions of U dev and U COP by location. It is clear that both graphs' medians (thick black middle line) follow the same pattern over whole range of U GPS which indicate a matching pattern of sea current predictions and speed measurements. However, in U dev there is a bigger fluctuation of the median compared to the one from U COP . Also, the centile areas are wider in U dev compared to U COP . This means that the σ of the U dev distribution is always higher than that of U COP within the whole range of U GPS . Consequently, one can assume that the speed sensors can capture the stream of the currents with a higher intensity than expected. Or contrariwise, one can assume that the external weather provider cannot capture the true intensity of the sea currents and how it affects a ship's hull.

Summary and Conclusions
Acquiring high-accuracy speed through water (STW) measurements has been a hot topic in maritime society for many years. STW is one of the most significant signals for driving fuel consumption, costs and greenhouse gas emissions down. The transformation of the physical quantity of water mass passing down the hull of a ship to a digital signal of high accuracy is a complex procedure that ends up in measurements with error. However, fusing a large volume of high-frequency data from engine and motion sensors to evaluate and compare the STW measurements is a process towards the elimination of that error. This study attempted to simplify the identification of a highly erroneous DVL sensor based on data of Maersk Line's fleet. The data originated from 190 container vessels, each one equipped with a continuous monitoring system. Based on measurements from GPS coordinates and compass true heading, wave data from ERA5 [28], sea regions data from IHO seas [24] and sea currents data from E.U. Copernicus Marine Service Information [27] they were merged into the main dataset, encompassing ≈ 7 million 10-min spaced rows to be analyzed.
Essentially, the study focused on an analysis relying on two measures of the same quantity being the sea current. The one measure U dev (U dev = U GPS − U DVL ) was based on ship data, while the other measure U COP came as a numerical estimate (CMEMS, [27]). According to the results of the analysis, 2.47% of the U dev distribution spans above 2 kn with a standard deviation of 0.91 kn. On the contrary, only 0.63% of the U COP distribution was above 2 kn with a standard deviation of 0.56 kn. This indicates that U dev and U COP did not convey the same picture as expected. Given the defined threshold of 2 kn, one can identify a ship with a highly erroneous DVL sensor by checking its U dev distribution and measuring if the number of points above the threshold surpasses the fleet's average percentage boundary of 2.47%. According to the data, 26% of the ships of the fleet exceeds the percentage boundary.
The main dataset upon which the whole study has been-based, encompasses categorical features such as the DVL manufacturer, the sea region and the vessel class. The study showed that the DVL manufacturer was not a feature as significant as initially expected. On the other hand, vessel class proved to be an influential factor of DVL accuracy, mainly because sister ships share the same characteristics in terms of hull and voyage seaways. However, it was pinpointed that each ship should be examined separately and not as part of a class nor as part of a potential defective DVL. Concerning sea regions, it has turned out that some of them were fickler than others. Usually, the narrower and shallower seas are the ones with the highest volatility regarding sea water currents. For those regions, the STW measurements were prone to confront significant errors far from reality. Regions with high volatility such as the English Channel also affected the correlation between U dev and U COP due to the low temporal and spatial resolution of the weather data. In fact, sea region proved to be the most influential categorical feature on the STW accuracy.
The mean values of the boxplots from U dev distribution on some ships were either positive or negative which implies head or following currents adequately. When this happens, one can assume that a ship is a subject of bad or good voyage planning. On the other hand, checking at the U COP distribution of the same ships revealed that the mean values were close to zero as expected, due to the long time window of each ship. Consequently, the study affirmed that any assumption should not be-based exclusively on SOG and STW but it is recommended to incorporate the speed of the currents for confirmation.
The regression analysis showed that U COP was the only continuous feature correlating with U dev . Additionally, it was spotted that the local variance of U dev was always higher than that of U COP within the whole range of GPS speed. The above implies that DVLs capture the stream of the currents with higher intensity than expected.
The availability of a monitoring tool able to determine the trustworthiness of the measured STW could be very useful to enhance both on-board and on-shore voyage data analytics. The correct knowledge of the STW is paramount for the assessment of the ship's resistance, which in turn is a determinant factor in assessing the expected fuel and overall energy consumption during a voyage. Therefore, by improving the reliability of the STW measurement it will be possible to improve the assessment of the fuel/energy consumption. Similarly, by analyzing the STW with the propulsion power it will be possible to assess the increase in resistance, and this could lead to a more precise scheduling of hull/propeller cleaning events.

Future Work
The built dataset will be a great asset for future research and development. The first rational continuation of this study will be to create a model detecting wrong coordinate pairs from the GPS signal and replacing them with the coordinate estimates of the nearest true location. Consequently, this model will improve the weather data quality and sea regions accuracy of the dataset. Additionally, this GPS filtering model can be placed onboard before each ship's CAMS data storage and correct the data in real time right before the coordinates are registered into the system. The above validation model will save a great number of resources from any fleet management team.
The present study has focused on exploring the marginal association between each of the features of the main dataset and U dev . Although this exploratory approach did not reveal any significant dependencies the interpretation can be challenging if confounding factors are present. We therefore aim to supplement this analysis in future research with more sophisticated analyses based on multiple regression techniques while adjusting for error-in-variables and missing data.
An additional project that would enhance the outcome of this study would be an experiment test application where a sonar STW sensor [39] would be installed for a long period in one of the ships with tested good performing STW sensor, crossing multiple volatile sea regions (such as English Channel, Malacca Strait, Bay of Biscay, Strait of Gibraltar, Philippine sea, etc.). Then comparing U dev , U COP with the new sonar-based measurement (U dev−son ), created by subtracting SOG (U GPS ) from sonar STW (U SON ) such that U dev−son = U SON − U GPS , will affirm the two matching signals, revealing where the truth lies upon.
Finally, an extension of the STW estimator built on [4] would be a great step forward. Throughout that paper, it was shown that despite the few simplifying assumptions the evaluation of the designed STW estimator on the full-scale data was positive and it showed the feasibility of using a pure kinematic model to estimate a STW signal. Extending this model to a nonlinear kinetic model by discarding the simplified assumptions will be an interesting approach to drive the error further down.