A Trajectory Ensemble-Compression Algorithm Based on Finite Element Method

: Trajectory compression is an efﬁcient way of removing noise and preserving key features in location-based applications. This paper focuses on the dynamic compression of trajectory in memory, where the compression accuracy of trajectory changes dynamically with the different application scenarios. Existing methods can achieve this by adjusting the compression parameters. However, the relationship between the parameters and compression accuracy of most of these algorithms is considerably complex and varies with different trajectories, which makes it difﬁcult to provide reasonable accuracy. We propose a novel trajectory compression algorithm that is based on the ﬁnite element method, in which the trajectory is taken as an elastomer to compress as a whole by elasticity theory, and trajectory compression can be thought of as deformation under stress. The compression accuracy can be determined by the stress size that is applied to the elastomer. When compared with the existing methods, the experimental results show that our method can provide more stable, data-independent compression accuracy under the given stress parameters, and with reasonable performance.


Introduction
In location-based applications [1], there is a considerable amount of positioning data from sensors of vehicles, ships, and mobile phones. Taking Zhejiang Province of China as an example, 68T vehicle trajectory data and 19T ship trajectory data are generated every year, which brings many difficulties to the storage, processing, and analysis. Raw trajectories contain a lot of noise, some of which are caused by the signal drift of the sensors and others are caused by the random movement of the moving object. The compression of the raw trajectory can eliminate noise, reduce storage occupation, and improve the efficiency of data queries and processing. In the task of spatiotemporal data mining, noise reduction is helpful in a spatiotemporal pattern search from the trajectory [2]. Different application scenarios have different requirements on the compression ratio and compression accuracy. A very high compression ratio can be achieved if the vehicle trajectory is compressed based on the road network and only the turning point is retained [3][4][5]. However, when we want to analyze the lane change of vehicles, we need to reduce the compression ratio and improve the accuracy. For the vessel trajectory, since there is no fixed road constraint, the compression accuracy changes dynamically according to the requirement [6][7][8][9]. Lower compression accuracies are needed for ocean channel analysis, medium accuracy for periodic analysis of vessel, and higher for short-term route prediction of single vessel. In the trajectory monitoring application, there are similar dynamic requirements for the display of trajectory. When the display scale is small, the trajectory with a low accuracy and high compression ratio should be displayed. If the map window is zoomed in, then the higher accuracy trajectory should be displayed with the window scale increasing. Therefore, trajectories can no longer be compressed in a fixed way and stored in advance. We call this kind of compression method as dynamic compression, whose accuracy changes dynamically with the requirement.
Trajectory compression tasks are traditionally divided into offline compression and online compression [10]. The former compresses all of the trajectories obtained, while the latter incrementally compresses the trajectories. In these methods, dynamic compression can be achieved by adjusting compression parameters. However, it is less certain what compression accuracy will result from a given compression parameter, and, even for different trajectories, the results will be different. Our study focuses on trajectory compression from another perspective-ensemble compression, is inspired by the human cognitive activity, which allows one to simplify trajectories without knowing the precise position of each point. People can draw simplified trajectories (the blue line is the raw trajectory, and the red one is the simplified trajectory) by intuitive feeling without accurate calculation of the exact position and distance, as shown in Figure 1. This is actually a compression method according to the whole features of the trajectory, and we call this kind of compression method ensemble compression. Inspired by this ensemble compression ability, we regard the trajectory as a physical entity, and trajectory compression can be thought of as deformation under uniform forces that are applied around the elastic object. The larger the deformation, the more points overlap due to mutual extrusion, so as to achieve the compression effect. In this research, we implement the trajectory ensemble compression algorithm that is based on finite element analysis, and in which we integrate the main direction of trajectory to achieve compression while preserving key features. The compression accuracy can be determined by the elastic parameters that are applied to the elastomer. When compared with the existing methods, the experimental results show that our method can provide more stable, data-independent compression accuracy under the given parameters, and with reasonable performance.
This paper is organized, as follows: the related research is summarized in Section 2. Section 3 provides a detailed statement of our algorithm. Section 4 shows the experiment analysis based on real data sets, Section 5 provides a discussion, and a conclusion is given in the final section.

Related Work
Various trajectory compression algorithms are summarized by [10][11][12]. According to our research, the existing methods are divided into four categories: distance-based compression, gesture-based compression, map-constrained compression, and ensemble feature-based compression.

Distance-Based Compression
The distance-based compression algorithm analyzes each point in the trajectory in turn and decides whether to keep it or not according to its location, distance, direction, and other features of its adjacent points.
The Douglas-Peucker algorithm [13] is the first widely used trajectory compression algorithm. The algorithm connects the begin point and the end point of the original trajectory, and then calculates the distance from each point to the line. If the distance exceeds the threshold, then the trajectory is divided into two subsequences with the point as the splitting point, and then recursively performs the above process until the trajectory does not need to be divided. Another similar distance-based method is the Piecewise Linear Segmentation algorithm [14]. In this method, the point of most deviation is selected, and the threshold parameters are set to determine whether to retain the point, and the process executes recursively. In order to improve the efficiency of the distance-based algorithm, various algorithms have been made [15][16][17][18].
Another class of distance-based algorithms supports online compression. Two online algorithms are proposed by [15,19]: sliding window algorithm and open window algorithm, which build a sliding window on the point sequence, compress the trajectory in it, and then repeat the process for the subsequent trajectory data. Ref. [20] proposes the Dead Reckoning algorithm, which predicted the next point according to the points in the window and reserved points that greatly deviated from the forecast. Thye Dead Reckoning algorithm is improved by [21], who also proposed an algorithm, called Squish trajectory compression, which completes the compression by deleting the points with the least information loss in the buffered window. On the basis of these methods, many algorithms have been developed to further improve performance and reduce complexity [22][23][24][25][26].

Gesture-Based Compression
The distance-based compression method is greatly affected by different threshold parameters, and the improved method introduces gesture information to make up for the deficiency.
Ref. [27] proposes a trajectory compression method, which predicts the next point based on the speed and direction of historical data, and removes the predicted accurately points. The method is suitable for the high sampling density trajectories.
The stop points, similarity points, and turn points are also important semantic information that can be used in trajectory compression. For RFID location data [28], realize data compression by merging and closing the same location points of different trajectories [29], design a lossy compression strategy to collapse RFID tuples, which contain the information of items that are delivered in different locations.
Ref. [30] introduces the sampling information on the time series. Ref. [31] introduces the main direction of the trajectory to remove the noise. In [32], the speed information and stop point are introduced to improve the compression efficiency. Ref.
The key points are preserved by [33] according to the velocity and direction data in the trajectory, and data compression is realized with these key points as constraints.

Map-Constrained Compression
Road information can improve the compression ratio. A large number of trajectory compression methods that are based on road constraints are proposed. A map-matching trajectory compression problem was first proposed by [4], where it is the combined problem of compressing trajectory at the same time matched on the underlying road network. The study gives a formal definition of the map matching compression problem, proposes two naive methods, and then designs improved online and offline algorithms. A path pruning simplification method is proposed by [34], which divides the trajectory simplification process into edge candidate set stage, path-finding stage, and path-refining stage. In the first stage, multiple candidate matching edges are obtained, In the second stage, road matching is performed with the assistance of driving direction for each trajectory position, In the third stage, the algorithm implements path tree pruning and it preserves the position in the trajectory where the direction changes. The algorithm runs on mobile device, which makes network transmission and central processing more efficient. The compressed points are selected by [5] according to the road network. Ref. [3] proposed a similar map matching system and implemented a track compression algorithm, called Heading Change Compression.

Ensemble Compression
Ensemble compression means that, when a trajectory is compressed, other geometric information that is related to the trajectory is combined, such as other similar trajectories, the boundary region of the trajectory, or the space-transformed trajectory.
When compared with the distance-based compression algorithm, there are few methods that are based on ensemble feature compression. Ref. [35] compress trajectories using convex hulls. The authors establish a virtual coordinate system with the starting point as the origin and the rectangular boundary around the trajectory, and make two boundary lines in each quadrant according to the direction of the trajectory. The rectangle and boundary lines form a convex hull, as well as the coordinate points in the constraint convex hull, are compressed.
Ref. [36] designs a trajectory similarity measurement method that is based on interpolation, in which the adopted method is similar to cluster. For each trajectory, the similar reference trajectory is found, and only the difference points with the reference one is retained, the similar points are removed.
In [37], a contour preserving algorithm for trajectory compression is proposed, which can compress the trajectory and keep the contour of the trajectory as much as possible. The algorithm divides the trajectory into multiple open windows, determines the main direction of each open window, and then compresses the trajectory points that deviate from the main direction.
Ref. [38] clusters all of the locations, match the clustering center on the road networks, and search the semantic events on the trajectory, such as parking, road switching, destination arrival, etc., to remove the random noise by only preserving semantic information points.
Ref. [39] regards the trajectories as time series, established linear equations of time and positions, and mapped the positions into the parameter space of the equations by hough transformation. Compression can be achieved by reducing three-dimensional data to hough space, in which the number of dual points is less than the number of points in the origin trajectory.

Preliminary
We give basic concepts related to the algorithm, in which Definitions 1 and 2 are the input of our algorithm, Definition 3 is the output of the algorithm, and Definitions 4-9 are the evaluation index.
Definition 1 (raw trajectory). The raw trajectory can be regarded as a sequence of locations (x i , y i ) and attributes , as shown in (1). The attributes are only speed (s i ) and direction (d i ) of trajectory.
Definition 2 (main direction). A trajectory can be divided into several segments according to its driving direction. The main direction of vehicle is the direction of its road, while the main direction of vessel trajectory is fuzzy, different references have different definitions [6][7][8]. Our compression method makes no distinction between the vessels and vehicles, and the road network is not being used, so the main direction is obtained from only the raw trajectory. Its general definition is shown in (2).
p i is the ith point in the trajectory, length is the distance function, and direction is the azimuth function. According to Equation (2), the main direction of a segment that is composed of n points is the average of the directions of each part weighted by the length of each part.

Definition 3 (simplification and approximation).
There are two types of trajectory compression tasks, simplification and approximation. Simplification means that given a trajectory, a subsequence of the trajectory is generated, as shown in (Figure 2A). Approximation means generating a new sequence, as shown in (Figure 2B), where the two endpoints are the same.
(A) simplification (B) approximation Compression efficiency refers to the number of bytes that can be compressed per unit time.
Definition 6 (compression accuracy). The DTW algorithm [40] is used to calculate the trajectory distance before and after compression, which reflects the degree of their dissimilarity. The compression accuracy is defined as 1 minus this distance divided by the maximum DTW distance, which is the distance when the maximum compression occurs (only preserving the start and end points). The compression accuracy is between 0 and 1.
Definition 8 (length ratio). The ratio of the sum of the lengths between two adjacent points the trajectory after simplification to the sum of the lengths between two adjacent points of the original trajectory.
Definition 9 (curvature ratio). curvature is the sum of the angles between segments, the curvature ratio is the angle sum after simplification to the angle sum of the original trajectory.

Algorithm Description
The input of the Ensemble-Compression algorithm includes the original trajectory and a set of elastic parameters, and the outputs either of the two compression results, although, in real applications, the focus is on simplification. The Algorithm 1 is shown in following.

Discretization
The first step after initialization is to discretize the trajectory together with its bounding rectangle. We meshed the minimum boundary rectangle (line 3), merged mesh nodes and trajectory points (line 4), and then divided the elastomer into small units (line 5), as shown in Figure 3. We use delaunay triangulation [41] and the two-dimensional advancing front technique (AFT) [42] to complete the discretization.

Element Analysis
The main task of element analysis is to generate the element stiffness matrix and solve element matrix equation.
The element stiffness equation represents the relationship between the stress and triangular element, so that we can calculate the displacement of any point in the triangular element with a given magnitude and direction of the stress. For a single triangular element, Equation (3) shows the stiffness equation [41].
where t is the thickness of the element, θ is the elements area, and δ (e) is the element displacement array, as shown in Equation (4). The three nodes of each triangular element ( Figure 2) after triangulation are coded as i, j, and m. We take counterclockwise as the forward direction and establish the element displacement array.
F (e) is the stress column matrix of each node is shown in Equation (5).
D is the elastic matrix, as shown in Equation (6), in which E is the modulus of elasticity and u is Poisson's ratio.
B i , B j , B m is the strain matrix of the element nodes, as defined in Equation (7) [41], in which c x , b x are the stress coefficient constants.
Equation (4) gives the displacements of three nodes i,j, and m of the element under the stress, while the displacements of any point x and y in the triangular element can be obtained by solving Equations (8).
The six coefficients in the formula can be obtained by the positions and displacements of nodes i, j, and m.
The parentheses of Equation (3) are referred to stiffness matrix. The meaning of an element of the stiffness matrix is the stress to be applied to a node of the element, while the node has unit displacement and the others are zero.
Suppose that the whole is divided into m elements and n nodes, then the overall node displacement δ and the overall stress matrix F are all 2N × 1 matrix. Equation (9) shows the overall equilibrium equation of triangular element analysis: The preconditioned conjugate gradient [43] is used to solve the stiffness matrix equation and SSOR [43] is chosen as the preconditioned matrix.

Semantic Polymerization
The displacement of each point can be obtained using the above method. The stress causes spatial competition among the trajectory points. The simplified trajectories can be obtained by screening the subset of trajectories through the threshold, and the approximate trajectories can be obtained if the displacement of the subset is taken directly.
Based on the method shown in Section 3.4, the trajectory point displacement can be obtained (line 8). We calculate the distance at the adjacency point after displacement (line 9), normalize all the distances (line 9), and then sort all of the distances in ascending order. The points whose distance is less than the percentile threshold after sorting become candidate filter points (lines 10-11). Some points with key semantic information, such as direction or speed, will be deleted if only distance threshold is used, so we implemented a trajectory segmentation method based on the main direction. When two points compete due to the close distance, we keep those which are different from the main direction and stop points. The algorithm is shown in the following.
The input of Algorithm 2 is the trajectory sequence that is defined by Equation (1), in which each point is a five-dimensional array consisting of coordinates, time, speed, and direction. Algorithm 2 segments the trajectory according to the main direction of Equation (2). The algorithm adds four dimensions to the initial five dimensions. The sixth dimension records the length from the end of the last segment to the current point, which is the denominator of Equation (2). The seventh dimension records the product of the direction from the previous point to the current point and the sixth dimension, which is the numerator of Equation (2). The eighth dimension records the ratio of the seventh dimension to the sixth dimension, which is the main direction that is obtained by assuming the current point as the splitting point. The ninth dimension records the difference of the eighth dimension between the current point and previous point, which is, the deflection of the adjacent main direction. We take the position with the largest difference of main direction as the candidate splitting point. If the average of the main directions on both sides of the candidate point are very different, then the point is regarded as the end point of a new segment.

Experimental Setup
We select GPS of taxi in Shanghai [44] and AIS of vessels crossing the East China Sea [7] as the experimental data. The taxi data set includes the 24-h trajectories of 4310 taxis, and the average sample frequency is 15 s. The vessel data set consisted of 120-h trajectories of 10,927 vessels with an average sample frequency of 10 s.
We compare our algorithm with two baselines [36,45], the former is offline compression algorithm (named OVTC), the latter is online compression algorithm(named SPM). In [36], many gesture information is considered in trajectory compression, such as static point, turn point, speed change point, break point, etc., so that the method has many compression parameters, and the optimal interval of these parameters is given. In [45], the sliding window, which is popular in the online compression, is improved, and the dynamic changing reference point is introduced to improve the compression efficiency. The sliding window size and threshold distance are the key parameters of the algorithm.
In the experiments, we fixed some parameters, including an elastic modulus of 2, Poisson's ratio of 0.2, mass density of 1.15, maximum iteration number of 100, error threshold of 10 −6 and relaxation factor of 1.
The percentiles of the experiment are 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, and 0.95, respectively. The external force factors (between 0-1) are 0.1, 0.2, 0.3, 0.4 0.5, 0.6, 0.7, 0.8, 0.9, and 1, and the direction of the force points to the center of gravity of the trajectory. The effects of different parameters on the compression indicators were observed. The experimental environment is Intel(R) Core(TM) i5 processor, 4 GB memory, Mac Darwin Kernel Version 17.7.0, and the development language is MatlabR2019.

Comparative Study
The first concern is whether stable and data-independent compression accuracy can be obtained for a specific combination of parameters.
In In Figure 4, the horizontal axis is the number of 250 groups of parameters, and the vertical axis is the standard deviation of accuracy that is obtained by compressing all of the trajectories with corresponding parameters. The algorithm OVTC has the greatest uncertainty between compression parameters and accuracy, which may be due to the simultaneous use of direction, velocity, and distance thresholds. The algorithm SPM has both low standard deviation (left half) and high standard deviation (right half). The finite element method has a small standard deviation of accuracy under all parameters, which means that, when compared with the other two methods, it can get a certain accuracy under certain compression parameters. Although the standard deviation is different, the average accuracy of the three algorithms is very close, in which OVTC is 0.95159, SPM is 0.96056, and FEM is 0.943888.
Compression efficiency is another concern. We calculate the minimum, average, and maximum compression rates of the three algorithms under all parameters, as shown in Table 1. Our method is smaller than the other two in the compression rate. The reason is that the finite element based method needs to solve the equations, while the other two methods only filter point-by-point. The average compression rate of FMT shown in Table 1 is 243.056 kbps, which can meet the needs of the real application scenarios described later (Section 4.2.5). The technology of concurrent services and cache in modern applications also make up for the deficiency of compression rate.

Influence of Percentile and Stress Factor on Compression Ratio and Compression Rate
The two data sets are compressed with different parameters, and Figure 5 shows the results.
(A) Taix data set (B) Vessel data set  Figure 5 shows that the compression ratio increases with the increase of the external force factor and with the decrease of the percentile. The influence of percentile on the compression rate is more obvious. When the stress factor remains unchanged, increasing the percentile can increase the compression ratio by 0.68. Figure 6 shows the effect of different parameters on the compression rate. Different from the compression ratio, the compression ratio is affected by both stress factor and percentile, and it is close to the maximum value when the stress factor value is 0.6.
(A) Taix data set (B) Vessel data set

Compression Ratio and Compression Error
We use DTW algorithm [40] to calculate the distance of the trajectory before and after compression, which reflects the degree of compression error. The normalized distance can be considered as the relative error ratio that is caused by compression. The relative error ratio is the ratio of each DTW distance to the maximum DTW distance. The experiment shows the influence of various parameters on compression error, as shown in Figure 7.   Figure 7 shows that the error ratio increases with the increase of percentiles. The relationship between the compression rate and error ratio was observed. The stress factor was fixed at 0.6 to observe the change of the error rate with the compression rate, as shown in Figure 8. For every 1% increase in the compression ratio, the error rate increases by 0.47%.

The Influence of Different Parameters on Other Indicators
We study the effect of the algorithm on the length ratio and the curvature ratio. In the experiment, we fixed the stress factor as 0.6, and the results are shown in Figure 9. length & curvature ratio length ratio tortuosity ratio Figure 9. The correlation between length ratio, curvature ratio, and compression ratio.
As the compression ratio decreases, the length ratio and curvature ratio increase. Within the compression ratio of [20%, 40%], the length ratio and the curvature ratio are also maintained at a high level, which conforms to the geometric significance of the simplification [46]. It can also be seen that, even if the compression ratio is insignificant, the length ratio and the curvature ratio are maintained at a high level, and the distortion of the reaction curve is within a reasonable range.

Application Scenarios
We develop an standalone daemon service, which is responsible for the management of multi-source spatial data, including vessel trajectory, vehicle trajectory, RFID, and it provides data query interfaces to multiple third-party applications. These third-party applications include online management systems, safety early warning systems, waterway management systems, etc. The data query service is required to be generic and application independent. Figure 10 shows the system overall. The trajectory compression service is one of core in the system, which realizes concurrent processing and puts the compressed trajectory into the cache based on LRU for performance.
These applications submit the moving object id, time period, and trajectory precision (between 0-1), and the service returns the required trajectory. The precision here is interpreted as the accuracy of our method, which is, the compressed trajectory that is only composed of the begin and end point of the raw trajectory has the minimum accuracy, and the raw trajectory has the highest accuracy. Through the previous experiments, we can obtain the corresponding compression parameters for each accuracy interval, so that we can select the appropriate parameters to implement the compression process. Figure 11 shows example results of the same trajectory with different precision in different application. (A) is the trajectory with a compression ratio of 0.53 at a map scale of 1:200, (B) is the trajectory with a compression ratio of 0.26 at scale of 1:500 (zoomed to 1:200 to make it the same size as (A)). The larger the display scale, the smaller the compression ratio, and the more detail can be shown.

Discussion
Although the algorithm can meet the needs of above application, the algorithm also has some uncertain problems, the most important one is that it is actually a fuzzy compression strategy, and it does not accurately determine whether each point is noise. Therefore, it cannot be applied to the safety critical area without strict theoretical proof. Another problem is that it only considers the main direction, without speed and road network data. In the details of the implementation, the stress of each triangular element is the same, and the direction always points to the geometric center of the trajectory. It is not clear whether the result is different if the stress size and direction change with the point distribution density. All of these need to be further studied in the future.

Conclusions
In this work, a novel trajectory compression algorithm that is based on the finite element method is proposed, in which the trajectory is regarded as an elastomer that is deformed under external forces, and the trajectory is compressed with elasticity theory. The main direction segmentation algorithm is combined to achieve compression while preserving the key position information. The experiments show that our method can provide a more stable, data-independent compression ratio under the given stress parameters.
Accuracy is the only parameter selection basis of current compression algorithm, which is far from enough for rich practical applications. The ensemble compression algorithm that is based on finite element is only a preliminary attempt to realize dynamic compression service, and providing a customized trajectory, rather than fixed compression method should become an important research direction in related fields.
Author Contributions: Writing-draft, review and editing, Haibo Chen; software, Xin Chen. All authors have read and agreed to the published version of the manuscript.