A Direction-Preserved Vessel Trajectory Compression Algorithm Based on Open Window

: Ship trajectory data can be used in most marine-related research, and most ship trajectory data come from AIS. The large number of ships and the short reporting period of AIS have resulted in a huge amount of ship trajectory data, which has caused a certain amount of pressure on the relevant research. This paper proposes a direction-preserved vessel trajectory compression method based on Open Window, which can effectively retain the direction change feature points while ensuring the position error. In addition, the method can work in both offline and online modes. Also, the velocity threshold and low-speed redundancy points problems are considered. In order to verify the effect of the proposed method and the adaptability of the method to different feature data, a parallel experiment was performed on port water and coastal water datasets. The results show that our method can compress vessel trajectories while retaining the feature points. Meanwhile, by comparing it with the compression method that is based on distance thresholds, the vessel trajectory compress time is drastically reduced by up to 87.3% in the port water data. The research in this paper provides a new method through which to compress vessel trajectories for research in marine-related fields on vessel trajectory.


Introduction
Automatic identification systems (AIS) transponders are designed to be capable of providing the position, speed over ground (SOG), course over ground (COG), identification, and other such information about a ship to other ships and to coastal authorities automatically.The International Convention for the Safety of Life at Sea (SOLAS) regulation requires AIS to be fitted aboard all ships of 300 gross tonnage and upwards that are engaged on international voyages, cargo ships of 500 gross tonnage and upwards that are not engaged on international voyages, and all passenger ships irrespective of size.The requirement became effective for all ships by 31 December 2004 [1].For ships that are underway, the AIS dynamic information report interval is between 2 and 10 s depending on the speed and course alteration.For a ship at anchor or moored and not moving faster than 3 knots, the report interval is 3 min.The report interval for static information is every 6 min or when data have been amended [2].The short report interval and the considerable amount of ships result in a great number of AIS data.AIS has made effective support in improving the safety of navigation [3], and it has important significance in the research of maritime traffic pattern recognition [4][5][6][7], vessel trajectory prediction, and anomaly detection [8][9][10].In these studies, there are certain difficulties in the storage, request, analysis, and display of AIS tracks due to the huge number of AIS track points.According to the statistical result of China's coastal AIS data from 00:00 to 01:00 on 1 January 2018, there were 4,178,422 National Marine Electronics Association(NMEA) sentences in the original data.After decoding, 2,316,205 Class A ship-borne mobile AIS stations and 1,346,644 Class B stations, their dynamic information were obtained, which represented about 87.66% of the total messages.According to this, the daily dynamic information will be close to 87,908,376.Moreover, due to such a large amount of data, it is almost impossible to complete clustering and other tasks; as such, AIS trajectories need to be compressed at the beginning of these studies.Therefore, retaining the key feature points and deleting the redundant points in the AIS trajectory by a certain method is necessary.Previous studies have mainly used distance thresholds for AIS trajectory compression and for the identification of key feature points, while compression algorithms based on distance thresholds do not perform particularly well in terms of running time and trajectory direction feature retention.The purpose of this paper is to propose a ship AIS trajectory compression algorithm that retains trajectory orientation features and reduces compression time while compressing the trajectory.
We implemented a ship trajectory direction-preserving compression method based on Open Window.Open Window is an online algorithm that can handle both online AIS data and offline AIS trajectories.Compression that is based on direction thresholds sees a significant improvement in the algorithm running time compared with compression based on distance thresholds, and it can better retain the points with large changes in the trajectory direction.Comparison experiments show that the compression time of this paper's method is significantly better than that of the DP algorithm based on distance thresholds when the compression rates are close to each other; moreover, it solves the problem of redundancy in low-speed points, which occurs when the compression is based on direction thresholds alone.The method proposed in the paper can compress the AIS trajectory data better.The research results have been applied in the macro display of traffic flow, and the research results of the paper will be used in the later stage for the research of ship traffic pattern recognition, trajectory prediction, and anomaly detection, etc.At the same time, Open Window is an online algorithm, and the research results of the paper can be used in the application program connected with the AIS equipment to carry out real-time compression of the ship's AIS trajectory.
The main contributions of this paper can be schematized as follows: • A direction-preserved vessel trajectory compression algorithm based on Open Window is proposed for the first time.Open Window algorithms can handle both offline data and compress vessel trajectories online.The method directly calculates the direction difference between the original trajectory segments and the potentially compressed segments instead of judging the direction change in adjacent trajectory segments through the COG while the compression is based on distance thresholds, which avoids the problem of the inaccurate retention of direction changes due to the delayed and erroneous COGs.The vessel trajectory compressed by the method in this paper can effectively retain the direction change feature points while ensuring the position error.
The results can be applied to ship traffic pattern mining algorithms that rely heavily on the direction information of vessel trajectories, such as clustering, anomaly detection, classification, etc. • Certain deficiencies are improved when using the direction-preserved compression method.There are a large number of low-speed redundant points in vessel trajectories, such as at anchor, moorings, and sailing with low speed.The direction-preserved compression method is sensitive to direction change, while the low-speed points may undergo huge direction changes at a particularly close distance due to the drift of the position.When applying the direction-preserved compression method, the data in this part cannot be compressed.In this paper, the radial distance method is applied to process the ship trajectory before the direction-preserved compression method, which can sufficiently eliminate the low-speed redundant points of the vessel's trajectory.

•
Compared with the position-preserving compression algorithm, the method proposed in the article has been greatly improved in terms of compression time.The positionpreserving compression method needs to recalculate the distance between the original vessel trajectory points and the potential trajectory segments when the potential trajectory segments are changed and the amount of calculation is larger.The direction-preserved compression method, on the other hand, only requires a one-time calculation of the original segment direction.Moreover, when the potential trajectory segment changes, it only needs to compare the direction between the original segments and the potential trajectory, and it does not need to recompute.The method proposed in the article greatly decreases the running time of vessel trajectory compression.This is especially evident in the online compression process, which is when the real-time requirements of the algorithm are high and the advantages of the article's method are more obvious.The remainder of this paper is organized as follows.Section 2 is the related work to AIS trajectory compression, and we define the direction-preserved vessel trajectory compression problem and present the new method in Section 3. Section 4 includes the results of the experiments, and the paper is concluded in Section 5.

Related Work
Trajectory simplification (TS) algorithms can be divided into batch mode and online algorithms according to whether a complete trajectory is required.The batch mode algorithms need the entire trajectory to be collected before performing any data reduction operations.It is widely used in trajectory approximations and macroscopic traffic flow displays, and the batch mode always has a better compression effect than the online algorithm because it has complete trajectory information [11].The Douglas-Peucker [12] algorithm (also known as the Ramer-Douglas-Peucker algorithm) might be the most well-known polygonal approximation algorithm.The main idea of the algorithm is to move the point with the largest error to the simplified set.This operation is repeated until no point has an error that exceeds the given threshold.Zhang et al. [13] compressed AIS track data based on the Douglas-Peucker algorithm and proposed recommended thresholds under different chart scales.Zhao et al. [14] used the improved Douglas-Peucker algorithm to simplify the straight and curved sections of an AIS trajectory.Zhang et al. [15] obtained the minimum ship domain size from the AIS trajectory as the recommended threshold for trajectory compression, and they used the Douglas-Peucker algorithm for AIS trajectory compression.Wei et al. [16] obtained speed and heading thresholds through statistics and carried out a two-pass approach to simplify the AIS trajectory.The Douglas-Peucker algorithm was also adopted in the algorithm, and the feature points of speed and heading changes were retained while track point compression was carried out.Liu et al. [5] used the improved Douglas-Peucker algorithm to carry out AIS trajectory compression that considered speeds and cog thresholds.Cui et al. [17] used the Douglas-Peucker algorithm to simplify the AIS trajectory in different sailing directions, as well as applied the DWT algorithm to evaluate the compression effect of different thresholds, thus obtaining the optimal compression threshold.Heres et al. [18] approximated ship trajectories as sailing plans, and compressed AIS trajectories by identifying three ship behaviors: stopping, sailing, and turning.The online algorithms do not require a complete trajectory and are able to simplify streaming trajectories.One trivial implementation is just to select location data with a predefined or random interval [19], i.e., the kth simplification.The online algorithms that are based on Open Window [20,21] are also widely used, thanks to their easy implementation.Open Window approximates as many points as possible with a 2-point segment until the error exceeds the predefined threshold.The last point that holds a legal error would be the output, and it is selected as the start point of the new open window.Gao et al. [22] adopted the improved sliding window algorithm to realize the online compression of AIS trajectories, and the angle of leeway and drift was considered in the simplified process.Sun et al. [23] combined the sliding window algorithm and SPM algorithm to carry out an online simplification of AIS trajectories.Zhu et al. [24] employed the sliding window algorithm based on speeds and heading thresholds to simplify the AIS trajectory, whereby that algorithm considered the ship's maneuvering mode.The above trajectory simplify algorithms were designed with heuristics.They all somewhat belong to greedy methods, which cannot guarantee optimal results.To obtain optimal compression rates under the same threshold, the directed acyclic graph algorithm [25,26] has been used for trajectory simplification in certain studies.The directed acyclic graph algorithm has a certain time advantage in finding the optimal path, but it could take a considerable amount of time to construct the directed acyclic graph according to the threshold.
There are two main categories of thresholds used in trajectory simplification.One is based on the distance threshold.The above pieces of literature are all based on distance thresholds for trajectory simplification.Distance thresholds are commonly used, but it is not the best choice in certain scenarios [27].Cheng et al. [27] proposed direction-preserving trajectory simplification (DPTS) problems, and they used a graph algorithm to solve the DPTS problem.Deng [28] realized an online DPTS trajectory simplify algorithm based on SP-Theo.Compared with the distance threshold, the directional threshold has certain advantages in the algorithm's time consumption.The main reason for this is that the direction of each track segment of the original trajectory can be saved after a one-time calculation before compression.In the compression process, only the direction of the simplified segment needs to be calculated and compared with the direction of the original trajectory segments, and then it is possible to determine whether the threshold is exceeded.Unlike the directional threshold if the simplified trajectory segment changes, the distance between the original trajectory point and the simplified trajectory segment will change.In addition, the distance needs to be recalculated; as such, the computation complexity is high.The AIS trajectory often includes low-speed points.When a ship is sailing at low speed, the slight change in position points will cause direction instability.At the same time, there are a large number of stop trajectory points in the AIS trajectory, such as mooring ships, and the DPTS algorithms cannot handle low-speed points particularly well [26].

Problem Definition
An AIS trajectory is represented by a sequence of n points in the form of ((t 0 , λ 0 , φ 0 , v 0 ), (t 1 , λ 1 , φ 1 , v 1 ), . . ., (t n−1 , λ n−1 , φ n−1 , v n−1 )), where t i is the timestamp, λ i is the longitude at time t i , φ i is the latitude at time t i , and v i is the speed of the ship at time t i .We define the ship's dynamic information p i = (t i , λ i , φ i , v i ) for each i ∈ [0, n − 1].Then, the trajectory T can be expressed as p 0 , p 1 , . . ., p n−1 .The size of T, denoted by |T| is defined to be the point number of trajectory T. Consider a running example, such as the one shown in Figure 1.In the figure, the original trajectory T is represented in the form of p 0 , p 1 , . . ., p 9 , and the size of the trajectory |T| is 10.T ′ is said to be a simplification of T, T ′ is in the form of (p ′ s0 , p ′ s1 , . . ., p ′ s(m−1) ), where m ≤ n and 1 = s1 < s2 < ... < s(m − 1) = n − 1.Note that p 0 and p n−1 in T must be kept in any simplification of T. T ′ is using m − 2 segments to represent T containing n − 2 segments.Consider our running example.Let T ′ = (p 0 , p 1 , p 2 , p 4 , p 7 , p 8 , p 9 ).T ′ is a simplification of T in Figure 1.The size of T ′ is 7. T ′ is using 6 segments to approximate 9 segments in T. Consider segment p 2 p 4 in T ′ , which is used to approximate the sequence of segments between p 2 and p 4 in T. In other words, the p 2 p 3 and p 3 p 4 segments are approximated by a single segment p 2 p 4 .
The direction of any two points in an AIS trajectory can be expressed as θ p i p j , 0 ≤ i < j ≤ n − 1, where each direction falls in (−π, π], the anticlockwise rotation from the positive x-axis is positive, and the clockwise rotation is negative.The angular difference between the two directions θ 1 and θ 2 , denoted by ∆(θ 1 , θ 2 ), is equal to the angle between the two trajectory segments that are less than π.The direction difference can be calculated by the following formula (Equation ( 1)).
The distance between any two points in T can be expressed as ρ p i p j , where 0 < j ≤ n − 1.There are many low-speed points in an AIS trajectory, and the distance between adjacent points could be particularly close.Because the ship position is unstable, the angular difference between the adjacent segments may be large.For an illustration, Figure 2 shows an AIS trajectory with 23 points and a speed of less than 1.2 knots.There are many locations in which the angular difference is large.Like when the angular difference between θ(p 13 p 15 ) and θ(p 13 p 14 ) is 0.93 radian but the distance between the two points ρ(p 13 p 14 ) is only 2.3 m.If only the direction change is considered, p 14 is a key point and should not be removed.According to the actual situation, points that are closer are not particularly meaningful for characterizing ship behavior.Then AIS trace points not only contain temporal and spatial information, but also speed, heading, and other information.These trajectory characteristics need to be preserved as much as possible during the trajectory simplification.Let us assume a ship sailing along a straight line, that the vessel's speed varies greatly (as shown in Figure 3), the vessel's speed v a at time t a is 14.4 knots, and that the speed v f at time t f is 0.5 knots.If we use segment p a p f to approximate the trajectory and then obtain the vessel speed v ′ i of 10.7 knots by interpolation at time t i , but actually the real speed is only 1.5 knots, then the speed error is about 9.2 knots.The black dots represent the ship's speeds at the corresponding moments, and the red line represents the linear change of the speed from the starting point to the end point, after compress the trajectory, we can find the ship's speeds at t i through the linear interpolation(the white circle on the red line).The speed error at Given a segment P S k P S k+1 in T ′ , the direction simplification error of P S k P S k+1 , denoted by ϵ θ (P S k P S k+1 ), was defined as being the greatest angular difference between the direction of P S k P S k+1 in T ′ and the direction of a segment in T that is approximated by P S k P S k+1 .
That is, ϵ θ p S k p S k+1 = max S k ≤h≤S k+1 ∆ θ p S k p S k+1 , θ(p h p h+1 ) .Then, the direction simplification error of T ′ is defined to be the greatest direction simplification error of a segment in T ′ , that is, The speed error of a segment P S k P S k+1 in T ′ can be expressed as ϵ v (P S k P S k+1 ), which is defined to be the greatest speed error between the original speed in T and the speed at the corresponding time obtained by interpolation through P S k P S k+1 .That is, ϵ v p S k p S k+1 = max S k <h<S k+1 ∆(v h , v ′ h ) then, the speed simplification error of T ′ is defined to be the greatest speed simplification error of a segment in T ′ , that is, The smallest distance between two adjacent points in T ′ can be defined as The problem of simplifying AIS trajectories can be described as given an AIS trajectory T and the error tolerance ϵ θ , ϵ v , ϵ ρ finding the trajectory that satisfies The main step of our approach can be schematized as follows, and are further explained in the next sections:

Geographical Coordinate Conversion and Direction and Distance Calculation
The position in an AIS trajectory is given in the form of geographical coordinates, and it is complicated to calculate the direction and distance of two points when using geographical coordinates.In our algorithm, it is required to calculate the direction of the segment and the distance of the adjacent track points frequently; thus, directly using the geographical coordinates will increase the calculation time.The Mercator projection is a cylindrical map projection.After converting geographical coordinates into Mercator projection coordinates, the plane geometry can be used to calculate the direction of the segment and distance of adjacent track points directly.When the earth is regarded as an ellipsoid, an accurate conversion result can be obtained, and the following formula [29] can be used for coordinate conversion: where e is the first eccentricity; the value λ 0 is the longitude of an arbitrary central meridian that is usually, but not always, that of Greenwich (i.e., zero); R is the radius of the spheroid; λ is the longitude; and φ is the latitude.x and y are the Mercator projection coordinates, and k is the scale factor.The simplification of AIS trajectories is generally within a specific range; as such, the ellipsoid can be approximated by a sphere of radius, where R is approximately 6371 km, and the error of approximating the earth as a sphere is small.If we approximate the earth as a sphere, the first eccentricity is 0. Equations ( 6) and ( 7) can be simplified to Equations ( 8) and (9).In this paper, we use the approximated Equations ( 8) and ( 9) to convert the geographical coordinates to the Mercator projection coordinates.
The Mercator projection is an isogonal projection that preserves angles, that is, if two curves intersect at a given angle, the images of the two curves on the map also intersect at the same angle.As such, the segment's directions calculated by the plane coordinates are the actual directions.The inverse tangent direction falls in [−π/2, π/2], and the range of the trajectory segment's direction is [0, π/2].Therefore, the inverse tangent direction cannot meet the requirements; thus, we need to use the four-quadrant inverse tangent to obtain a segment's direction.The directions of two points in an AIS trajectory can be calculated by the following formula.
where (x 1 , y 1 ), (x 2 , y 2 ) is the Mercator projection coordinates of two points in a AIS trajectory.
The Mercator projection has the issue of distance distortion; as such, the Euclidean distance between two points that are calculated by the projection coordinates is different from the actual distance.The scale factor of Equation ( 9) should be considered when calculating the distance between two points in the AIS trajectory.The distance between any two points in the trajectory can be calculated by the following formula.
where dis_mer(p 1 , p 2 ) is the Euclidean distance between two points calculated by the projection coordinates, and dis(p 1 , p 2 ) is the actual distance.We can use the average latitude of the studied water area to obtain the value.For example, the latitude of the study area is 40 • N, the Euclidean distance between two points calculated by the projection coordinates is 1000 m, and the actual distance between the two points is 766 m.The higher the latitude, the greater the resulting difference; as such, the difference cannot be ignored.
If the distance threshold is ϵ d , then, before compression, a transformation can be carried out through Equation ( 15), which can avoid calculating the actual distance and reduce the square root arithmetic.This will reduce the data processing time.

Ship Trajectory Pre-Processing
The AIS trajectory that we study is always in a certain area and during a particular period of time.Certain vessels may re-enter the area after leaving, and certain AIS trajectories may lose some points; as such, the trajectory that is obtained by only being based on the MMSI number could not be the correct ship trajectory.Before compression, different ship trajectories should be obtained according to the MMSI number, and then the trajectories should be sliced according to the receiving intervals of adjacent points.Technical characteristics for a universal ship-borne automatic identification system using time division multiple access in the VHF maritime mobile band [2] has relevant requirements for AIS report intervals.The broadcasting rate of a ship's dynamic information is related to the status of the ship, the speed of the ship, and whether the course is changed.The specific values are shown in Table 1.In theory, the receiving time interval of adjacent track points of the same ship will not exceed 3 min, that is, a ship that is at anchor or one that is moored and not moving faster than 3 knots.In order to verify the actual situation, we randomly selected 100 ship trajectories in the study waters for receiving time interval statistics.The count of the points was 6,371,875.The smallest receiving interval was 0 s, that is, the ship with the same MMSI number had at least 2 points at the same time.The maximum receiving interval was 2,126,683 s, the main reason for this was that certain vessels re-entered the area after leaving, thus resulting in a long receiving interval between two adjacent track points.In the statistical samples, 95% of the receiving intervals were less than 21 s, 99% of the receiving intervals were less than 181 s, and 99.9% of the receiving intervals were less than 360 s.As shown in Figure 4, the receiving time interval of track points is mainly concentrated around 10 s, but there are also some data around 180 s, and the statistical results are basically consistent with the theoretical report interval.When comprehensively considering the theoretical report interval and statistical results, 360 s was determined as the time threshold to split the trajectory, that is, sort the AIS trajectory by time.If the receiving interval of two adjacent points was greater than 360 s, then the trajectory was split.The pseudo-code of the trajectory slice is shown in Algorithm 1.  Positioning drift or the multiple ships that share the same MMSI will lead to positional noise points in the AIS trajectory, and the noise points need to be removed before the trajectory simplification.The distance between adjacent points is calculated to determine whether it is a noise point.After time slicing, the receiving interval between two trajectory points will not exceed 360 s; if the ship speed is 30 knots, the maximum distance between the adjacent trajectory points is 3 nautical miles.Before simplification, the entire trajectory is traversed and the distance between adjacent points is calculated.If the distance exceeds 3 nautical miles, it is considered an anomaly and eliminated.The speed over ground in an AIS trajectory is in a 0.1-knot resolution, from 0 to 102 knots.Value 1023 indicates that the speed is not available, and value 1022 indicates that the speed is 102.2 knots or higher.We selected the AIS trajectories of port and coastal waters to study the speed distribution by statistical means.The number of points in port waters was 11,772,444, of which 69,173 were with a field value of 1023 (accounting for 0.59%).As shown in Figure 5, the ship speed in port waters was mainly concentrated at 0 2 knots, mainly because there were more mooring and anchoring ships in port waters, as well as due to the speed also being low in the process of berthing and leaving.When the outlier with the field value of 1023 was not removed from the port water data, the 99.9% quantile of the velocity was 102.3 knots.There were many outliers with the velocity value of 1023 in the port water areas; thus, the statistical result will have been interfered with.After the 1023 outlier was removed, the 99.9% quantile of the velocity was 16.9 knots.The number of points in coastal waters was 5,873,830, of which 4710 were with a field value of 1023 (accounting for 0.08%).As shown in Figure 6, the speed was mainly between 8 and 12 knots, while there were also large instances of ship speeds between 0 and 2 knots.The 99.9% quartile of speed in the coastal waters data was 23.1 knots.According to the statistical results, the ship's sailing speed did not exceed 30 knots; as such, 30 knots was taken as the threshold to judge whether the ship's speed was abnormal.In this paper, we directly removed the abnormal speed point in the trajectory.

Direction-Preserved Vessel Trajectory Compression Algorithm
The DPTS algorithm is sensitive to direction change.Low ship speed or small changes in position when mooring or anchoring can lead to dramatic changes in direction.Trajectory simplification that only uses directional thresholds will retain too many low-speed redundancy points, thereby resulting in a low compression rate.The radial distance method is carried out to deal with the low-speed points in the AIS trajectory before trajectory simplification.Radial distance is a brute force O(n) algorithm for polyline simplification.It reduces successive vertices that are clustered too closely to a single vertex (called a key).The resulting keys form the simplified polyline.The first and last vertices are always part of the simplification and are thus marked as keys.Starting at the first key (the first vertex), the algorithm walks along the polyline.All consecutive vertices that fell within a specified distance tolerance from that key were removed.The first encountered vertex that lay further away than the tolerance was marked as a key.Starting from this new key, the algorithm will start walking again and repeat the process until it reaches the final key (the last vertex).As shown in Figure 7, the AIS trajectory is composed of p 0 , p 1 . . ., p 6 and 7 points; in addition, the distance threshold is ϵ ρ , the first point p 0 , and the last point p 6 are keys.For Step (a), the distance from p 1 , p 2 to p 0 is less than ϵ ρ , and the distance from p 3 to p 0 is greater than ϵ ρ ; as such, p 3 is the new key.For Step (b), p 3 is the key point, the distance from p 4 to p 3 is greater than ϵ ρ , and p 4 is the new key.For Step (c), p 4 is the key point, the distance from p 5 to p 4 is less than ϵ ρ , and p 6 is the last point of the trajectory.After the radial distance method processes the trajectory, the remaining points were p 0 , p 3 , p 4 , p 6 in Step (d).The pseudo-code of the radial distance method is shown in Algorithm 2. Theoretical schematic of the radial distance method to deal with the low-speed AIS trajectory points.In (a), the distance from p 1 , p 2 to p 0 are less than ϵ ρ , so these two points are deleted.In (b), p 3 is the key point, and no points were deleted.In (c), p 4 is the key point, and the distance from p 5 is less than ϵ ρ , so p 5 is deleted.In (d), after the radial distance method, the remaining points were p 0 , p 3 , p 4 , p 6 .Open window (OW) algorithms anchor the start point of a potential segment, and they then attempt to approximate the subsequent data series with increasingly longer segments.It starts by defining a segment between the first data point (the anchor) and the third data point (the float) in the series.As long as all directions and the speed error of the intermediate data points are below the threshold, an attempt is made to move the float one point up in the data series.When the threshold is about to be exceeded, the data just before it become the endpoint of the current segment, and they also become the anchor of the next segment.If no threshold excess takes place, the float is moved one up the data series, i.e., the window opens further-and the method continues until the entire series has been transformed into a piecewise linear approximation.Figure 8   Our algorithm also considered the speed threshold so as to avoid the large difference between the original speed and the speed of the corresponding time obtained by linear interpolation.Assuming that the anchor point p a = {t a , x a , y a , v a } and the floating point p a = {t f , x f , y f , v f }, then the ship speed at any time between the anchor point and floating point can be obtained through Equation ( 16), and the speed difference can be obtained by Equation ( 17).The speed difference should be less than the given threshold; otherwise, the AIS trajectory cannot be approximated by the segment.The pseudo-code of the speed error is shown in Algorithm 3.
Algorithm 3: Speed error judgment algorithm input : A trajectory T = {p a , p a+1 , ...p f } and the error tolerance ϵ v output : if violates the error tolerance The pseudo-code of Direction-Preserved Vessel Trajectory Compression algorithm is shown in Algorithm 4.

Algorithm 4: Direction-Preserved Vessel Trajectory Compression algorithm
input : A trajectory T = {p 0 , p 1 , ...p n−1 } with the same shipid and the tolerance ϵ t output : A simplifyed trajectory by Direction

Results Analysis and Discussion
In this section, we report the results of our experiments and give comparisons for the three algorithms for solving the trajectory simplification problem.We used two real AIS trajectory datasets in our experiments, namely for port areas and coastal areas.The geographical range of the two areas is shown in Figure 9.The purpose of selecting the two areas was mainly because there were more low-speed points in the port areas, such as ships in berths or anchorages; however, at the same time, the ship speed changes frequently in port areas.Meanwhile, the ship's speed and heading in coastal waters were found to be relatively stable.The AIS data from different areas can help to verify the trajectory simplification algorithm's effect on different AIS datasets.All of the experiments were run on a Windows 11 platform with Intel(R)Core(TM)i7-10710U CPU @ 1.10 GHz 1.61 GHz and 16.0 GB RAM, and the hard disk model was a SAMUNG MZVLB1T0HBL-000L7.The statistics of these datasets are summarized in Table 2.

Compression Rate
In this part, we study the effect of ϵ θ on the compression rate, which is defined to be equal to (|T| − |T ′ |)/|T| × 100%, where t is the raw AIS trajectories and t ′ is the set of the corresponding simplified trajectories.In order to verify the compression rate of the proposed algorithm and the DPTS algorithm, which only considers the directional threshold, 14 parallel experiments were conducted to compress the data of the port waters and coastal waters with different thresholds.The results are shown in Table 3 and Figure 10.We obtained the following observations: First, the compression rate increased significantly when we increased the tolerance from 0 slightly, and it increased slowly when the tolerance went above a certain value.This is good since it implies that under direction tolerance, the AIS trajectory data could be simplified significantly with a small error.Second, when the tolerance was the same, the compression rate of the proposed algorithm was significantly improved compared with the DPTS algorithm.The compression rate of the port water datasets increased by 21% to 31%, and the coastal water datasets increased by 10% to 15%.As mentioned above, the main reason for this was that there are many low-speed points in port waters, and the ship's trajectory point drift during low-speed sailing results in great changes in direction.Only through using the DPTS algorithm can these redundant points be removed.

Running Time
In this part, we study the running time of the proposed algorithm and the DPTS algorithm with the same threshold, and we also learn the running time of the proposed algorithm, the DPTS algorithm, and the DP algorithm with the same compression rate.The results are shown in Figure 11.We obtained the following observations: First, the running time of the proposed algorithm was shorter than the DPTS algorithm when dealing with the port water dataset, but when the coastal water dataset was processed, the running time was longer.The main reason for this was that there is a large amount of low-speed points in port waters, and the proposed algorithm uses the radial distance method to remove this part of the data; as such, the amount of data processed with Open Window was found to be significantly reduced.However, in coastal waters, there are fewer low-speed points; as such, the reduction in low-speed points does not optimize the running time.Second, the running time becomes longer with increases in the threshold.This is because, with a smaller ϵ θ , it is less likely that a long sequence of segments could be approximated with one segment.Thus, the cost of checking the error of the segment linking the start position and the end position is small.Third, the running time on the coastal water datasets was longer than the port water datasets.The main reason for this, as above, was that the coastal water ships were always keeping their course; as such, the cost of checking the error of the segment linking the start position and the end position was big.The directional threshold is different from the distance threshold, and running times under the same threshold cannot be compared.Figure 12 shows the result of the running time based on the proposed algorithm, DPTS algorithm, and the DP algorithm with the same compression rate.We found that the running time of the proposed algorithm was significantly improved compared with the DP algorithm.This was especially the case in the port water datasets when the threshold was small and the compression rate was low.Unlike the directional threshold, if the simplified trajectory segment changed, then the distance between the original trajectory point and the simplified trajectory segment would change; as such, the distance needed to be recalculated.The computation complexity was found to be high.As such, the running time of the DP was longer than the proposed algorithm.We compared the running time of the different data sizes when the compression rate was close (95%).The results are shown in Figure 13.We found that the running time of the proposed algorithm and the DP algorithm increased linearly with a data size increase.The running time of the DP method was 6 to 8 times of the proposed algorithm on the port water datasets; for coastal water datasets, the running time was 3 to 4 times.With an increase in the amount of data, the advantage of the proposed algorithm with respect to running time will be more obvious.

Position Error and Speed Error
In this part, we study the position error (as well as the average position error), which is defined to be equal to , and also the max position error, which is defined to be equal to max dis(p i , p i ′ ), 0 < i < |T| − 1, where T is the raw AIS trajectories, T ′ is the set of the corresponding simplified trajectories, p i is the point in the raw AIS trajectories, p ′ i is the synchronized point in simplified trajectories, and dis(p i , p ′ i ) is the synchronized Euclidean distances.Figures 14 and 15 show a comparison of the average position error and the distribution of the maximum position error of the three algorithms when under close compression rates.It can be seen from the figure that the algorithm based on distance threshold (DP) performs better in position error.When the compression rate was close to 95%, the average position error changed quickly.Therefore, it is recommended that the compression ratio should not exceed 95% when the directional threshold is used for trajectory simplification.In the experiment, the threshold corresponding to the 95% compression rate of port waters was 0.3 radian, and the coastal waters were 0.1 radian.In this part, we study the speed error (as well as the average speed error), which is defined to be equal to ∑ |T|−1 i=0 ∆v i /(|T| − |T ′ |), and the max speed error, which is defined to be equal to max ∆v i , 0 < i < |T| − 1, where ∆v i is the error between the speed in raw AIS trajectories, as well as the speed of the corresponding time in simplified trajectories.Figures 16 and 17 show a comparison of the average speed error and the distribution of the maximum speed error of the three algorithms when under close compression rate conditions.We can find that the proposed algorithm performs better in speed error in the port water datasets.From the maximum speed error distribution of the trajectory, the distribution of the proposed algorithm and DP algorithm was relatively concentrated, and, in the port water datasets, the proposed algorithm performed better.The reason for this is that the ship speed changes frequently in port waters.The proposed algorithm considers the speed threshold; as such, it performs better in retaining speed information.

Visualization Analysis of the AIS Trajectory Simplification Performance
With the proposed algorithm, we have performed a visualized analysis of the port water datasets and the coastal water datasets.Figures 18 and 19 show the AIS trajectory before and after compression.The compression rate was close to 95%.We can find from the chart that the number of track points significantly decreased, and the time spent on the plots of a simplified trajectory was 0.003 s.The time spent on the raw AIS trajectory was 0.048 s, which is about 16 times of the simplified trajectory plot.Also, it can be seen that the traffic flow situation reflected by the simplified trajectory is basically the same as that of the raw AIS trajectory, and the confusion of trajectory lines caused by the inaccurate position of the ship can be dealt with.

Conclusions
The aim of this study was to find a better compression algorithm for AIS trajectories.In this study, a direction-preserved vessel trajectory compression algorithm based on Open Window was proposed.The following conclusions may be drawn from the experimental results:

•
The compression rate increases significantly when we increase the tolerance from 0 slightly, and it increases slowly when the tolerance is above a certain value.Compared with the DPTS algorithm (with the same tolerance), the compression rate of the port water datasets increased by 21% to 31%, and coastal water datasets increased by 10% to 15%.

•
The compression time becomes longer as the direction threshold increases.The running time of the proposed algorithm was significantly improved compared with the DP algorithm, especially in the port water datasets.Furthermore, the running time of the DP method was 6 to 8 times of the proposed algorithm with port water datasets.
For the coastal water datasets, the running time was 3 to 4 times.

•
The algorithm based on distance thresholds (DPs) performed better with respect to position error.When the compression rate was close to 95%, the proposed method's average position error changed quickly; as such, we recommend that the compression ratio not exceed 95%.The recommended threshold for port waters is 0.3 radian, and the threshold for coastal waters is 0.1 radian.Because the proposed algorithm incorporates a velocity threshold, it outperforms the other two algorithms in the retention of velocity information.
The results demonstrated that the proposed method could address the AIS trajectory compression problem, and there was also a great improvement found in the processing time while, at the same, preserving feature trajectory points.
However, in order to improve the general applicability of the algorithm to both online and offline data, we chose the Open Window algorithm for solving the compression problem.Moreover, the algorithm can be further optimized when dealing with the offline AIS trajectory data alone, which can achieve a better compression effect with the same compression threshold.In addition, Open Window is a kind of heuristic algorithm, which cannot guarantee the global optimum; as such, how to find the optimal compression rate while guaranteeing the compression speed is also a direction for future research.The AIS trajectory compression based on direction thresholds does not perform well in terms of positional error; as such, how to reduce the positional error caused by compression while retaining the advantages of direction thresholds on the speed of compression algorithm running times and the recognition of direction changes is also a potential research direction.

Figure 1 .
Figure 1.A running example of ship trajectory.

Figure 2 .
Figure 2. Examples of trajectory with a low-speed point.The gray box in the figure includes three trajectory points that are at small distances from each other, but the difference in direction of the trajectory segments is large.

Figure 3 .
Figure 3. Illustration of the speed error between the original speed and speed from interpolation.The black dots represent the ship's speeds at the corresponding moments, and the red line represents the linear change of the speed from the starting point to the end point, after compress the trajectory, we can find the ship's speeds at t i through the linear interpolation(the white circle on the red line).The speed error at t i is v ′ i − v i .

1 .
Conversion of geographical coordinates to the Mercator projection.2. Direction and Distance calculation.3. Trajectory slices according to the report interval.4. Trajectory anomaly processing. 5. Low-speed trajectory with Radial Distance.6. Vessel trajectory compression by direction and speed thresholds based on Open Window

Figure 4 .Algorithm 1 :
Figure 4. Reception interval statistics for neighboring AIS track points.Algorithm 1: Trajectory slice algorithm input : A trajectory T = {p 0 , p 1 , ...p n−1 } with the same shipid and the tolerance ϵ timedi f f output : A trajectory array split by time interval L considers the position anomaly and velocity anomaly.

Figure 5 .
Figure 5. Statistical results of the speed distribution in port waters.(a) Frequency distribution chart of the different speeds.(b) Cumulative frequency distribution chart of the different speeds.

Figure 6 .
Figure 6.Statistical results of the speed distribution in coastal waters.(a) Frequency distribution chart of the different speeds.(b) Cumulative frequency distribution chart of the different speeds.

Figure 7 .
Figure 7. Theoretical schematic of the radial distance method to deal with the low-speed AIS trajectory points.In (a), the distance from p 1 , p 2 to p 0 are less than ϵ ρ , so these two points are deleted.In (b), p 3 is the key point, and no points were deleted.In (c), p 4 is the key point, and the distance from p 5 is less than ϵ ρ , so p 5 is deleted.In (d), after the radial distance method, the remaining points were p 0 , p 3 , p 4 , p 6 .

Algorithm 2 : 4 if dis < ϵ RD then 5 remove
Radial Distance algorithm input : A trajectory T = {p 0 , p 1 , ...p n−1 } with the same shipid and the tolerance ϵ RD output : A simplifyed trajectory by Radial Distance 1 key ← 0; 2 for i ← 1 to n − 1 do 3 dis ← distance(T[key], T[i]); illustrates the Open Window algorithm.The first window opened up to point p 5 , thus making p 4 the cut point.The second window opened up to point p 6 , thus making p 5 the cut point.The third window opened up to point p 8 , thus making p 7 the cut point.The fourth window opened up to point p 9 , thus making p 8 the cut point.Now, the window could open up to the last point.The original trajectory p 1 , p 2 , . . ., p 10 can be approximated by p 1 , p 4 , p 5 , p 7 , p 8 , p 10 .

Figure 8 .
Figure 8. Theoretical schematic of the Open Window method to compress the AIS trajectory based on directional threshold.

Figure 9 .
Figure 9.The area where the experimental data are located.(a) The coastal waters in eastern Zhejiang.(b) The port area at Yingkou Ports

Figure 10 .
Figure 10.Comparison of the compression rate between the proposed method (DPTSM) and the DPTS method with the same tolerance.(a) Comparison of the rate in port waters.(b) Comparison of the rate in coast waters.

Figure 11 .
Figure 11.Comparison of the running time between the proposed method (DPTSM) and the DPTS method with the same tolerance.(a) Comparison of the running time in port waters.(b) Comparison of the running time in coast waters.

Figure 12 .
Figure 12.Comparison of the running time between the proposed method (DPTSM), the DPTS method, and the DP method with the same compression rate.(a) Comparison of the running time in port waters.(b) Comparison of the running time in coast waters.

Figure 13 .
Figure 13.Comparison of the running time between the proposed method (DPTSM) and the DP method with the same compression rate (95%) for different data sizes.(a) Comparison of the running time in port waters.(b) Comparison of the running time in coast waters.

Figure 14 .
Figure 14.Comparison of the average position error between the proposed method (DPTSM), the DPTS method, and the DP method with the same compression rate.(a) Comparison of the average position error in port waters.(b) Comparison of the average position error in coast waters.

Figure 15 .
Figure 15.Comparison of the max position error between the proposed method (DPTSM), the DPTS method, and the DP method with the same compression rate.(a) Comparison of the max position error in port waters.(b) Comparison of the max position error in coast waters.

Figure 16 .
Figure 16.Comparison of the average speed error between the proposed method (DPTSM), the DPTS method, and the DP method with the same compression rate.(a) Comparison of the average speed error in port waters.(b) Comparison of the average speed error in coast waters.

Figure 17 .
Figure 17.Comparison of the max speed error between the proposed method (DPTSM), the DPTS method, and the DP method with the same compression rate.(a) Comparison of the max speed error in port waters.(b) Comparison of the max speed error in coast waters.

Figure 18 .
Figure 18.Visual analysis of the port water datasets.(a) is AIS trajectory after compression (b) is AIS trajectory before compression.

Figure 19 .
Figure 19.Visual analysis of the coastal water datasets.(a) is AIS trajectory after compression (b) is AIS trajectory before compression.

Table 1 .
Class A ship-borne mobile equipment reporting intervals.

Table 2 .
The datasets used in our experiments.

Table 3 .
The results of the compression rate by the proposed algorithm and DPTS.