Algorithm for Trajectory Simplification Based on Multi-Point Construction in Preselected Area and Noise Smoothing Processing

Huang, Simin; Yang, Zhiying

doi:10.3390/data9120140

Open AccessArticle

Algorithm for Trajectory Simplification Based on Multi-Point Construction in Preselected Area and Noise Smoothing Processing

by

Simin Huang

and

Zhiying Yang

^*

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Data 2024, 9(12), 140; https://doi.org/10.3390/data9120140

Submission received: 31 October 2024 / Revised: 26 November 2024 / Accepted: 28 November 2024 / Published: 29 November 2024

(This article belongs to the Special Issue IoT and Big Data Applications in Smart Cities: Recent Advances, Challenges, and Critical Issues)

Download

Browse Figures

Versions Notes

Abstract

Simplifying trajectory data can improve the efficiency of trajectory data analysis and query and reduce the communication cost and computational overhead of trajectory data. In this paper, a real-time trajectory simplification algorithm (SSFI) based on the spatio-temporal feature information of implicit trajectory points is proposed. The algorithm constructs the preselected area through the error measurement method based on the feature information of implicit trajectory points (IEDs) proposed in this paper, predicts the falling point of trajectory points, and realizes the one-way error-bounded simplified trajectory algorithm. Experiments show that the simplified algorithm has obvious progress in three aspects: running speed, compression accuracy, and simplification rate. When the trajectory data scale is large, the performance of the algorithm is much better than that of other line segment simplification algorithms. The GPS error cannot be avoided. The Kalman filter smoothing trajectory can effectively eliminate the influence of noise and significantly improve the performance of the simplified algorithm. According to the characteristics of the trajectory data, this paper accurately constructs a mathematical model to describe the motion state of objects, so that the performance of the Kalman filter is better than other filters when smoothing trajectory data. In this paper, the trajectory data smoothing experiment is carried out by adding random Gaussian noise to the trajectory data. The experiment shows that the Kalman filter’s performance under the mathematical model is better than other filters.

Keywords:

trajectory simplification; data compression; spatio-temporal features; real-time algorithm; bounded error

1. Introduction

According to the most recent report, global sales of GPS sensors reached CNY 809.32 billion in 2021 and are projected to increase to CNY 1961.87 billion by 2028. As the number of sensors supporting location-aware capabilities and data storage increases, the volume of trajectory data generated, stored, and transmitted by these devices is growing substantially. Consequently, researchers are confronted with challenges including low transmission efficiency of trajectory data, high costs, limited storage capacity, and extended query and analysis times. These challenges are particularly acute for lightweight sensors. For instance, based on the methodology outlined in [1], it is assumed that the GPS system requires ten seconds to calculate each trajectory point, and the memory capacity of each GPS unit is sufficient to record the trajectories of only 1000 moving objects over a period of 5 days. Moreover, trajectory data often contain a significant number of redundant points, which exacerbate the burden on data network transmission, particularly for lightweight locators. The presence of redundant trajectory points results in longer upload times and increased power consumption. Furthermore, improving efficiency in querying or analyzing trajectory data is challenging for researchers, and the difficulty of extracting valuable information from large volumes of redundant data is significantly heightened. These issues also constrain the advancement of location-based services [2].

To address the aforementioned challenges, researchers aim to streamline the original trajectory by eliminating redundant points that carry minimal feature information thereby reducing data size while enhancing the clarity of trajectory features. Currently, based on the existing research literature, trajectory simplification methods are technically classified into three main categories. The first category is the line simplification method, which seeks to minimize the number of trajectory points while maintaining an acceptable error margin. This method is further subdivided into outline simplification and real-time simplification. The second category is map-matching based simplification, which involves mapping trajectory points onto road segments and representing the original trajectory by integrating the road network structure thus reducing data volume. The third category is semantic simplification, which transforms the original trajectory into a sequence of Points of Interest (POI) to convey its meaning more comprehensibly [3].

Prominent real-time trajectory simplification algorithms include Douglas–Peucker [4], TD-TR [5], FBQS [6], BQS [7], and OPERB [8]. These approaches are orthogonal to line simplification-based methods and may be combined with each other to further improve the effectiveness of trajectory compression.

Due to the constraints of current sensor sampling frequencies, the recorded trajectory comprises discrete points; however, the actual movement trajectory of the object is continuous. Existing trajectory simplification algorithms are based on the premise that trajectories consist of discrete points. Nonetheless, unrecorded trajectory points also contain significant feature information. While these unrecorded points cannot be directly analyzed, leveraging their spatial and temporal characteristics to aid in the simplification of recorded points is a promising approach. Given the potential to reconstruct these implicit trajectory points and utilize their feature information, this paper proposes a novel trajectory simplification algorithm. The characteristics of the proposed algorithm are as follows:

(1): Unlike other simplification algorithms that utilize only the feature information of recorded trajectory points, this algorithm also considers the influence of inflection points and approximately restores unrecorded implicit trajectory points, using their feature information to aid in simplification.
(2): By defining a preselected area and predicting the landing point of the trajectory, a one-way error-bounded simplified trajectory is achieved.
(3): A novel error measurement method is defined to quantify the deviation between the original and simplified trajectories.
(4): The Kalman filter is used to smooth the trajectory, which effectively reduces the influence of noise on the performance of the simplified algorithm and obtains the most realistic performance index of the algorithm.
(5): This algorithm offers high compression rates and efficient real-time simplification while effectively preserving the original trajectory’s contour.

2. Related Work

Initially, the Douglas–Peucker algorithm [4], proposed by Douglas–Peucker et al., utilized Euclidean distance as the key feature. It recursively identified and included points with the maximum Euclidean distance in the simplified trajectory sequence, continuing this process until no trajectory point exceeded a predefined distance threshold. Although straightforward to implement, the algorithm is computationally intensive and fails to preserve the trajectory features adequately.

Subsequently, Xue et al. incorporated temporal feature information to enhance simplification and introduced the TD-TR [5] algorithm, effectively addressing the temporal limitations of the DP algorithm. This algorithm replaces Euclidean distance with Synchronous Euclidean Distance (SED) as the discriminant criterion and ingeniously incorporates timestamp feature information into the simplification process thereby preserving the temporal characteristics of the trajectory. The formula is as follows:

S E D_{k} = \sqrt{(x_{k} - x_{k}^{'})^{2} + (y_{k} - y_{k}^{'})^{2}}

where i and j are the indexes of the trajectory points in the simplified trajectory, and i < j, k is the index of any trajectory point between i and j in the original trajectory. The

x_{k}^{'} = x_{i} + \frac{t_{k} - t_{i}}{t_{j} - t_{i}} (x_{j} - x_{i})

,

y_{k}^{'} = y_{i} + \frac{t_{k} - t_{i}}{t_{j} - t_{i}} (y_{j} - y_{i})

.

Based on various trajectory features, such as speed [9], angle, and inflection points [10], corresponding trajectory simplification algorithms have been developed. While these algorithms excel in analyzing their specific feature information, they perform optimally only for single-feature data. This paper proposes a novel trajectory simplification approach that leverages feature information by approximately restoring unrecorded trajectory points thereby using their feature information to assist in simplifying the recorded points.

Using the above feature information, new trajectory simplification algorithms are constantly generated. Ji Gu [11] proposed the concept of safe region by using the two pieces of characteristic information of angle and Euclidean distance, and then screened redundant trajectory points through the safe region to achieve the purpose of simplifying the trajectory, as illustrated in Figure 1. However, the definition of its security zone depends entirely on a trajectory point of the current input. When subsequent trajectory points fall within this region, it only indicates that the simplification condition is satisfied for the trajectory point used to construct the safety region, but the result may not hold for the entire trajectory. Therefore, we argue that this approach is prone to greedy behavior. In order to solve this problem, this paper proposes the concept of a preselected region in Section 3.3 and constructs the preselected region through multiple trajectory points.

This paper aims to develop a real-time trajectory simplification algorithm optimized for lightweight positioning sensors. The algorithm performs real-time simplification of the original trajectory, minimizing storage space requirements and power consumption, while maximizing the use of feature information. When applied to an analysis, the simplified trajectory ensures that each point retains maximal feature information thereby reducing data size and enhancing data analysis efficiency.

The rest of this paper is organized as follows: We present a new accuracy loss metric IED and the compression algorithm SSFI in Section 3.2. We then construct the Kalman filter to smooth trajectory data in Section 4. Section 5 shows the detailed experimental results and the corresponding analysis. Section 6 reviews the related works and concludes our work.

3. Trajectory Simplification Algorithm SSFI

3.1. Preliminaries

This paper addresses the challenge of real-time trajectory simplification for lightweight locators by constructing a preselected area through the abstraction of trajectory point feature information thereby achieving a one-way error-bounded simplified trajectory. Considering the time and space complexity of the algorithm, the primary objectives are to achieve a lower simplification rate, enhanced simplification accuracy, and reduced execution time. The following section provides a description of the proposed real-time simplification framework.

Definition 1.

Original trajectory: In the Euclidean space, the trajectory sequence is composed of the position information of all trajectory points. It is denoted by

P [1 : n] = \{p_{1}, p_{2}, p_{3} \dots p_{n}\}

, where

{P [i] = p}_{i} (x_{i}, y_{i}, t_{i})

means that the trajectory point is recorded in the data set at the longitude

x_{i}

and latitude

y_{i}

at the time

t_{i}

. The original trajectory

P [1 : n]

can be logically understood on the graph as a piece-wise linear line LP that connects all trajectory points in turn.

Definition 2.

Simplify trajectory: In the Euclidean space, the trajectory sequence of the original trajectory simplified by the trajectory simplification algorithm can be expressed by

S [1 : m] = \{s_{1}, s_{2}, s_{3} \dots s_{m}\}

, where m ≤ n.

Definition 3.

Directions and Direction difference: In the trajectory sequence of the original trajectory, the direction of a certain line segment

p_{i - 1} p_{i}

is defined as the rotation angle of the x−axis’s counterclockwise rotation to the parallel line segment, which is recorded as

θ_{i} (p_{i - 1} p_{i})

, and the value range is

[0,2 π)

. Secondly, the direction difference between the two trajectory segments is defined as

Δ θ (θ_{1}, θ_{2}) = \min \{|θ_{1} - θ_{2}|, 2 π - |θ_{1} - θ_{2}|\}

. As shown in Figure 2b,c,

Δ θ (θ_{1}, θ_{2})

belongs to two cases of

|θ_{1} - θ_{2}|

and

2 π - |θ_{1} - θ_{2}|

.

Definition 4.

Inflection points: In the original trajectory sequence, when the difference between the directions of the adjacent two segments is large and the inflection point threshold is set by the user, the common endpoint of the two adjacent line segments is recorded as the inflection point.

Definition 5.

PED error: Given the original trajectory P[s: e](s < e) and the corresponding line segment

p_{s} p_{e}

, that is,

S \{s_{s}, s_{m}\}

, for any discard point

p_{m}

(s < m < e) in T[s: e], the PED of

p_{m}

is calculated as follows:

P E D (p_{m}) = \frac{‖\vec{p_{s} p_{e}} \times \vec{p_{s} p_{m}}‖}{‖\vec{p_{s} p_{e}}‖}

(1)

where the cross product and the double vertical line calculate the vector product and the length of the vector, respectively. The PED error schematic diagram is shown in Figure 3; the yellow line segment represents the PED error of the point.

Definition 6.

IED error-bounded: Given the simplified trajectory

S [1 : m] = \{s_{1} \dots s_{i} \dots s_{j} \dots s_{m}\}

, the corresponding original trajectory

P [1 : m] = \{p_{1}, p_{2}, p_{3} \dots p_{m}\}

, and the determined error threshold

ε

. If

\forall p_{k} \in P, f_{I E D} (p_{k}) \leq \underset{Δ x_{n} \to 0}{l i m} \sum_{n = x_{k - 1}}^{x_{k}} (Δ x_{n} \cdot d (Δ x_{n}))

, then it is considered that the simplified trajectory S must be IED error-bounded. This will be described in Section 3.2.

3.2. Error Measurement Method Based on Implicit Trajectory Points

Following trajectory compression, the original trajectory is typically approximated by a series of continuous line segments. Given a fixed simplification rate, a better simplification algorithm will exhibit less accuracy loss in the simplified trajectory. Measuring accuracy loss necessitates an index that adequately reflects the difference between the original and simplified trajectories. Typically, accuracy loss in the simplified trajectory is represented by the deviation between each discarded point and the corresponding simplified line segment.

Historically, trajectory simplification algorithms have commonly employed three error metrics: Perpendicular Euclidean Distance (PED), Synchronous Euclidean Distance (SED), and Fréchet Distance [12].

To address the issue of missing feature information following trajectory simplification and to balance the trade-off between compression rate and accuracy, we propose a more refined accuracy loss metric, termed Implicit Euclidean Distance (IED).

Definition 7.

IED error: By reconstructing the implicit trajectory points that lie between the recorded trajectory points and their preceding adjacent points, the Perpendicular Euclidean Distance (PED) errors of these implicit points are aggregated and recorded as the Implicit Euclidean Distance (IED) error for the trajectory.

Given a simplified trajectory

S = \{s_{1}, \dots s_{i}, s_{j} . . s_{n}\}

, and its corresponding original trajectory

P = \{p_{1}, p_{2}, \dots, p_{i}, p_{i + 1}, \dots, p_{j} \dots, p_{n}\}, (i < j)

, the IED error formula used to calculate the simplified point

p_{k}

(where

(i < k \leq j)

) in a trajectory segment

s_{i} s_{j}

of the simplified trajectory S is as follows:

\begin{matrix} f_{I E D} (p_{k}) = \underset{Δ x_{n} \to 0}{l i m} \sum_{n = x_{k - 1}}^{x_{k}} (Δ x_{n} \cdot d (Δ x_{n})) \\ = \int_{x_{k - 1}}^{x_{k}} |f (k - 1, k)| d x \end{matrix}

(2)

where

Δ x_{n}

is a length interval in the line segment

p_{k - 1} p_{k}

. When

Δ x_{n}

approaches 0,

Δ x_{n}

, it can be approximately understood as a simplified implicit trajectory point.

d (Δ x_{n})

is the Euclidean distance from the implicit trajectory point in the original trajectory to the simplified trajectory.

f (k - 1, k)

is the function of the straight line determined by

p_{k - 1}

and

p_{k}

trajectory points when the simplified trajectory line segment

s_{i} s_{j}

is the x-axis. The IED error schematic diagram is shown in Figure 4.

Based on the definition of the definite integral, it follows that the Implicit Euclidean Distance (IED) represents the area of the shaded region. The IED results can be obtained by geometric operation. Consequently, the specific formula is shown in Appendix A.

The Perpendicular Euclidean Distance (PED) error quantifies the deviation between each discarded point and its corresponding line segment by calculating the perpendicular distance from the discarded point to the line segment. However, when the movement direction of the positioning sensor undergoes abrupt changes, PED may struggle to accurately represent the deviation. Particularly in scenarios involving animals tracked by lightweight sensors or tourists wandering in unstructured environments, where there is no road network constraint, the direction of movement can change dramatically. As illustrated in Figure 5, when the object makes a U-turn from point

p_{4}

to point

p_{5}

, if PED is used as the accuracy measure, the accuracy loss at

p_{4}

is zero, even though

p_{4}

is actually quite distant from the simplified line segment. Therefore, a simplified algorithm relying on PED as the accuracy loss index cannot accurately capture the true motion trajectory of the object.

Therefore, this paper introduces a more refined error accuracy index, the Implicit Euclidean Distance (IED) to address the issue of distortion in simplified trajectory accuracy. By incorporating the concept of inflection points, users can adjust the sensitivity of the simplification algorithm to the turning magnitude of the tracked object through an angle threshold thereby significantly reducing the number of aberrant edges in the simplified trajectory.

3.3. Algorithm Implementation and Its Pseudo-Code

To preserve the crucial spatio-temporal feature information of the original trajectory as much as possible during real-time simplification and to address the issue of limited data universality in data analysis of simplified trajectories, this paper utilizes the feature information of inflection points and implicit trajectory points to construct a preselected area, enabling rapid bounded simplification with one-way error control. Consequently, a trajectory simplification algorithm leveraging the spatio-temporal features of implicit trajectory points is proposed.

To facilitate the efficient and rapid determination of IED error-bounds for the simplified trajectory, this paper introduces a novel concept:

ε

_Area.

Definition 8.

ε

_Area

R_{i j}

: Given the simplified line segment

s_{i} s_{j}

, the original trajectory P[i:j](i < j) corresponding to the line segment, and the distance threshold

ε

. Taking

s_{i}

and

s_{j}

as the center of the circle, respectively, the distance threshold

ε

as the radius, the circle is drawn. The common tangent lines

{T a}_{i j}

and

{T a}_{i j}^{'}

of the two circles are obtained, and the enclosed area is defined as

ε

_Area

R_{i j}

.

Then, we can easily obtain the property of

ε

_Area

R_{i j}

.

Lemma 1.

Given the simplified line segment

s_{i} s_{j}

, the original trajectory P[i:j](i < j) corresponding to the line segment, and the distance threshold

ε

. If any discarded trajectory point

p_{k}

(i < k < j) lies within the

ε

_Area

R_{i j}

region, then the following condition must be satisfied:

I E D (P_{k}) \leq \underset{Δ x_{n} \to 0}{l i m} \sum_{n = x_{k - 1}}^{x_{k}} (Δ x_{n} \cdot ε)

. Consequently, the simplified trajectory S[i:j] will be IED error-bounded.

As illustrated in Figure 6, the original trajectory P [1:8] is simplified to a simplified trajectory S composed of two trajectory segments,

p_{1} p_{5}

and

p_{5} p_{8}

. The simplified trajectory

{S {p}_{1} {, p}_{5}}

is IED error-bounded. The simplified trajectory

{S {p}_{5} {, p}_{8}}

is not strictly bounded by error.

In this paper, the

\min #

problem (given an original trajectory P with N trajectory points and an error tolerance range

ε

approximate a polygonal curve with the least number of trajectory points within the error tolerance range) is studied. The original trajectory

P [1 : n]

can be divided into at most

2^{n - 1}

groups of different continuous trajectory segments, which means that there are at most

2^{n - 1}

different simplification strategies. By employing various greedy algorithms and techniques, the time complexity of the simplification algorithm is reduced to an acceptable range, which represents a significant challenge addressed in this paper. For the specific solution, refer to Definition 9 below. A brief discussion of the comparison algorithm addressed in this paper can be found in Section 5.2.

In this paper, by anchoring the starting point of the original trajectory to be simplified as the anchor point

p_{i}, p_{j}

is set to a floating point, where j is a variable and j is initialized to (i + 2), then the original trajectory to be simplified can be represented by P[i:j]. When the floating point

p_{j}

is collected, the direction difference

Δ θ (θ_{j - 1}, θ_{j})

between the floating point and the previous adjacent trajectory point is calculated first. If the direction difference is less than the user-defined angle threshold

σ

, then determine whether the

\forall P_{k} (i < k < j) \in P [i : j], I E D (P_{k}) \leq \underset{Δ x_{n} \to 0}{l i m} \sum_{n = x_{k - 1}}^{x_{k}} (Δ x_{n} \cdot ε)

. If the above equation is satisfied, the

p_{j + 1}

is set as a new floating point. If the above equation is not satisfied, the trajectory P[i:j] is simplified to the

p_{i} p_{j}

line segment and the two trajectory points are output, then

p_{j - 1}

is set as a new anchor point, and the floating point

p_{j}

is reinitialized. If the direction difference between the floating point and the previous adjacent trajectory point is greater than the user-defined angle distance threshold

σ

, the trajectory P[i:j] is directly simplified to the

p_{i} p_{j}

line segment and the two trajectory points are output. Then,

p_{j - 1}

is set as the new anchor point

p_{i}

, and the floating point

p_{j}

is initialized. This process continues until the floating point is determined to be the end of the simplified trajectory segment.

Whenever a new floating point is loaded to determine whether it is the endpoint of the original trajectory, it is necessary to update the

ε

_Area

R_{i j}

egion between the floating point and the anchor point, and to verify whether

\forall p_{k} (i < k < j) \in ε

_Area

R_{i j}

, so as to ensure that each simplification is error-bounded. However, it also leads to the fact that when there is a new floating point load, its

ε

_Area

R_{i j}

is constantly changing, and the middle trajectory point needs to be scanned multiple times and judged repeatedly. In the worst case, the time complexity even reaches

O (n^{2})

. This significantly increases the running time of the algorithm. Therefore, in order to solve this problem, this paper defines the concept of

P [i : j]_P r e s e l e c t o n A r e a

, which makes the algorithm realize the boundedness of one-way error. The specific concepts are as follows:

Definition 9.

P [i : j]_P r e s e l e c t o n A r e a

: Given the trajectory segment P[i:j](i < j) and the distance threshold

ε

.

P [i : j]_P r e s e l e c t o n A r e a

=

ε

_Area

R_{i i + 1} \cap ε

_Area

R_{i + 1 i + 2} \cap ε

_Area

R_{i + 2 i + 3} \dots \cap ε

_Area

R_{j - 1 j}

. That is,

P [i : j]_P r e s e l e c t o n A r e a

=

P [i : j - 1]_P r e s e l e c t o n A r e a \cap ε

_Area

R_{i j}

.

After determining the anchor point and initializing the floating point, the direction difference

Δ θ (θ_{j - 1}, θ_{j})

between the floating point and the previous adjacent trajectory point is first determined. If the direction difference is less than the user-defined distance threshold

σ

, and the floating point is located in the

P [i : j]_P r e s e l e c t o n A r e a

region, then

\forall p_{k} (i < k < j) \in P [i : j], I E D (p_{k}) \leq \underset{Δ x_{n} \to 0}{l i m} \sum_{n = x_{k - 1}}^{x_{k}} (Δ x_{n} \cdot ε)

. Consequently, it is not necessary to repeatedly determine whether

p_{k} (i < k < j)

belongs to the

ε

_Area

R_{i j}

region. Therefore, the algorithm only needs to determine whether the new floating point is within the

P [i : j]_P r e s e l e c t o n A r e a

area when the new floating point is loaded. If it is in the area, update the

P [i : j]_P r e s e l e c t o n A r e a

area and continue to load the new floating point. This method addresses the issue of repeatedly assessing intermediate trajectory points during the loading of the floating point, significantly reducing the running time of the simplification algorithm and ensuring one-way error-boundedness.

As shown in Figure 7, it shows us how to update the

P [i : j]_P r e s e l e c t o n A r e a

area correctly.

In the SSFI algorithm, it is necessary to input the following: the original trajectory P[1:n], the angle threshold

σ

, and the distance threshold

ε

set by the user. The output is the simplified trajectory S. When the lightweight sensor samples the trajectory point, Algorithm 1 calculates and filters the direction difference

Δ θ (θ_{j - 1}, θ_{j})

between the current input trajectory point and the previous trajectory point. When the direction difference is less than the angle threshold set by the user, the preselected area of the trajectory segment to be simplified is updated until the next trajectory point is loaded. When the direction difference is greater than the angle threshold σ or the trajectory point is not in the preselected area

P [i : j]_P r e s e l e c t o n A r e a

of the trajectory segment to be simplified, the trajectory segment to be simplified is simplified to the line segment

p_{i} p_{j}

, and two points are output to the simplified trajectory sequence S.

Algorithm 1 SSFI pseudo-code

Input : The original trajectory P [1 : n], the angle threshold σ

, and the distance threshold ε

Output: Simplified trajectory S with IED bounded error

1: i = 1;S = P[1];

2 : while i \leq

N do{

3: AnchorPoint = P[i];

4: i = i + 1;

5: PreselectonArea.initialive(StartPoint,i);

6 : if (Δ θ (θ_{i - 1}, θ_{i}) > σ

)

7: S.append(P[i]);

8: else{

9: while(P[i] in PreselectonArea and (i < N) do{

10 : PreselectonArea . updata (P [i], ε

);

11: i = i + 1;}

12: i = i − 1;}

13: S.append(P[i]);}

14: return S:

In this paper, the one-way error-bounded operation of the algorithm is achieved by defining the concept of a preselected area. The algorithm calculates the angle difference, evaluates the trajectory point’s area, and updates the preselected area as the sensor collects the trajectory point. Each trajectory point is scanned and calculated only once. Thus, the time complexity of the SSFI trajectory simplification algorithm is

O (n)

. Additionally, regardless of the data size of the trajectory segment to be simplified, the algorithm requires only a constant amount of space to store the trajectory points between indices i and j for constructing the preselected area. The space complexity of the algorithm is solely dependent on the size of the index range (j − i), which remains constant in practical applications. Therefore, the space complexity of the SSFI trajectory simplification algorithm is

O (1)

.

4. Trajectory Data Noise Reduction

In the process of collecting trajectory data, GPS error is difficult to avoid due to the limitation of current positioning technology. Although the deviation distance is very small, it will cause frequent angle changes in a small range, which has a great impact on the performance of the trajectory simplification algorithm. Therefore, if you want to further improve the performance of the trajectory simplification algorithm, smoothing the trajectory is a good method.

4.1. GPS Error Analysis

In the process of GPS positioning, the positioning accuracy is mainly affected by three kinds of errors [13]: (1) the error related to the location service satellite; (2) observation-related errors; and (3) errors related to the observation station. Type (1) errors are common to all user receivers using location-based services, namely satellite clock errors, ephemeris errors, and delays caused by the ionosphere and troposphere. This kind of error can be eliminated by differential techniques [14]. The second type of error is mainly the propagation delay error, which depends on the distance between the main reference receiver and the user receiver and the performance of the receiver itself. The third type of error is the inherent error of the user receiver, that is, internal noise, channel delay, and multipath effect, which cannot be eliminated. GPS positioning error [15] can be approximated by Formula (3) as follows:

σ_{p} = G D O P σ_{U E R E}

(3)

where

σ_{p}

is the standard deviation of GPS positioning error, GDOP is the Geometric Dilution of Precision, and

σ_{U E R E}

is the User Equivalent Range Error (UERE). It is a concept to measure GPS positioning error, which considers the combined effect of all error sources. It includes satellite clock error, receiver clock error, signal propagation error (such as ionospheric delay, tropospheric delay, multipath effect), and receiver noise.

4.2. Trajectory Data Noise Reduction Model

When processing trajectory data, the trajectory point is actually an “observation” information of the “state” of the sampled object. Due to the existence of GPS error, the observation data may deviate greatly from the actual motion state of the sampled object.

In order to obtain the motion state of the object more accurately and remove the noise in the trajectory data, the mainstream method is to compare a trajectory point with the previous trajectory to see whether there is a significant unreasonable transient shift. The essence of this idea is to predict the possible position of the next trajectory point based on the previous trajectory. If the deviation between the sampling trajectory points and the prediction result is far more than expected, the noise point is considered when the current sampling trajectory point is considered.

This method has great similarity with the principle of the Kalman filter. The Kalman filter is a linear quadratic estimation algorithm for state estimation of linear dynamic systems [16]. It mainly combines the previous state estimation (i.e., the predicted position of the current trajectory point) and the current observation data (the sampled current position trajectory point) to perform the optimal estimation of the current state. Therefore, even if there is noise information interference, the Kalman filter can usually effectively reveal the actual situation and find out the imperceptible correlation between phenomena.

Therefore, this paper uses the Kalman filter to smooth the trajectory data to reduce the noise caused by GPS error and affect the subsequent simplified results.

When using the Kalman filter to smooth the trajectory data, each trajectory point corresponds to each frame of the Kalman filter algorithm. The algorithm processes the latitude and longitude coordinates of the trajectory point during operation, and the whole can be divided into the following two stages:

Prediction stage: According to the past state estimation and the dynamic model of the system, the current state and error covariance are predicted.

Update phase: According to the predicted state and the current observation data, the state estimation and error covariance are updated by Kalman gain.

4.3. Kalman Filter Prediction Phase

The sampled object will produce state information such as speed, position, and acceleration during motion, and these state information will exist objectively regardless of whether it is measured. Using these state information, the state equation of the object can be established, and the state of the next frame can be predicted by the state of the previous frame. Then, the state equation of the kth frame of the object can be expressed as follows:

\vec{x_{k}} = [\begin{matrix} a_{k} \\ b_{k} \\ \begin{matrix} β_{k} \\ γ_{k} \end{matrix} \end{matrix}]

(4)

where

a_{k}

and

b_{k}

denote the x and y coordinates of the frame state, and

β_{k}

and

γ_{k}

are the speeds in the x and y directions.

It is assumed that the sampling interval of the positioning sensor is

∆ t

, and the time difference in one frame of the Kalman filter algorithm is

∆ t

. If the object maintains a uniform linear motion state without other external forces (variable speed, steering, etc.), then the mathematical relationship between the state of the object in the kth frame and the state of the object in the k − 1th frame can be obtained.

\{\begin{matrix} a_{k} = a_{k - 1} + β_{k} \times ∆ t \\ b_{k} = b_{k - 1} + γ_{k} \times ∆ t \\ β_{k} = β_{k - 1} \\ γ_{k} = γ_{k - 1} \end{matrix}

(5)

According to the above relation, the state transition matrix F can be obtained.

{\vec{x}}_{k} = [\begin{matrix} a_{k} \\ b_{k} \\ \begin{matrix} β_{k} \\ γ_{k} \end{matrix} \end{matrix}] = [\begin{matrix} 1 & 0 & Δ t & 0 \\ 0 & 1 & 0 & Δ t \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} a_{k - 1} \\ b_{k - 1} \\ \begin{matrix} β_{k - 1} \\ γ_{k - 1} \end{matrix} \end{matrix}] = F {\vec{x}}_{k - 1}

(6)

The above situation only describes the ideal motion state of the object. If there is more additional information input, such as speed change and steering, the actual motion state of the object can be described more accurately. This additional information can be represented by the control input

{\vec{u}}_{k}

, and these information inputs can be mapped to the state space through the control matrix B, which is represented by B

{\vec{u}}_{k}

. Even if there is additional information input, the theoretical model cannot accurately represent the real trajectory of the object, because there are always some errors, such as the change in wind direction caused by the change in wind resistance when the object moves, or the change in speed caused by the complex and changeable road conditions. These unpredictable factors may affect the accuracy of the predicted state to a certain extent, so a new random term

{\vec{w}}_{k}

needs to be introduced to simulate all uncertainties. In this paper,

{\vec{w}}_{k}

has four error terms, which correspond to four states of the state equation. When the Kalman filter is used in practical application scenarios, its process noise

{\vec{w}}_{k}

is usually assumed to have a Gaussian distribution with a mean value of zero, that is

{\vec{w}}_{k} ~ N (0, Q_{k})

, where

Q_{k}

is the estimation of the covariance matrix of process noise.

After introducing the terms of B

{\vec{u}}_{k - 1}

and

{\vec{w}}_{k}

, the system state equation of the kth frame of the object can be updated as follows:

\hat{{\vec{x}}_{k}} = F {\vec{x}}_{k - 1} + B {\vec{u}}_{k - 1} + {\vec{w}}_{k - 1}

(7)

where

\hat{{\vec{x}}_{k}}

is used to represent the uncorrected predicted value of the object in the k th frame.

Since the current trajectory data set is mostly composed of latitude and longitude and timestamp, there is no other additional information to provide the control input

{\vec{u}}_{k}

, so

B {\vec{u}}_{k - 1}

is omitted, and the final object state equation is simplified as:

\hat{{\vec{x}}_{k}} = F {\vec{x}}_{k - 1} + {\vec{w}}_{k - 1}

(8)

In addition, it is also necessary to calculate the covariance between the four states, which is used to represent the correlation between the four states, so as to be used for subsequent updates to correct the object state. Finally, the object state covariance matrix P is obtained.

P_{k} = [\begin{matrix} \sum a a & \sum a b & \sum a β & \sum a γ \\ \sum b a & \sum b b & \sum b β & \sum b γ \\ \sum β a & \sum β b & \sum β β & \sum β γ \\ \sum γ a & \sum γ b & \sum γ β & \sum γ γ \end{matrix}]

(9)

where

\sum a a

is the variance of the state variable a, and

\sum b a

is the covariance of the state variables a and b.

Finally, the relationship between the state covariance matrix of the kth frame and the (k − 1)th frame of the object can be derived.

\begin{matrix} {\hat{P}}_{k} = F C o v ({\vec{x}}_{k - 1} {\vec{x}}_{k - 1}) F^{T} + C o v ({\vec{w}}_{k}, {\vec{w}}_{k}) \\ = F {\hat{P}}_{k - 1} F^{T} + Q_{k} \end{matrix}

(10)

where

{\hat{P}}_{k}

is the estimated value of the state covariance matrix that is not further updated in the kth frame, and the covariance matrix of the

Q_{k}

process noise.

4.4. Update Phase of Kalman Filter

The trajectory data records the observation information returned by the positioning sensor, which is mainly composed of latitude and longitude and timestamp. It reflects the real trajectory of the object to a certain extent. In this paper,

Z_{k}

is used to represent the kth set of observations returned by the positioning sensor.

Z_{k} = [\begin{matrix} {l o n}_{k} \\ {l a t}_{k} \end{matrix}]

(11)

Since the GPS error is inevitable, it is also necessary to add the observation noise

v_{k}

, to the observation equation, which also belongs to the Gaussian white noise and obeys the normal distribution,

v_{k} ~ N (0, R_{k})

, where

R_{k}

represents the covariance matrix of the observation noise, then the observation equation

Z_{k}

is updated to the following:

{\hat{Z}}_{k} = [\begin{matrix} {l o n}_{k} \\ {l a t}_{k} \end{matrix}] + v_{k}

(12)

Since the observed value is the first two states of the object state, a matrix H can be used to represent the relationship between the two, and the observation equation is updated as follows:

\begin{matrix} {\hat{Z}}_{k} = [\begin{matrix} {l o n}_{k} \\ {l a t}_{k} \end{matrix}] + v_{k} = [\begin{matrix} 1 & \begin{matrix} 0 & \begin{matrix} 0 & 0 \end{matrix} \end{matrix} \\ 0 & \begin{matrix} 1 & \begin{matrix} 0 & 0 \end{matrix} \end{matrix} \end{matrix}] [\begin{matrix} a_{k} \\ b_{k} \\ \begin{matrix} β_{k} \\ γ_{k} \end{matrix} \end{matrix}] + v_{k} \\ = H \hat{{\vec{x}}_{k}} + v_{k} \end{matrix}

(13)

After the observation equation is obtained, the Kalman gain is also required to adjust whether the update result is the predicted value obtained by the biased trust mathematical model or the observed value returned by the positioning sensor.

K_{k} = {\hat{P}}_{k} H^{T} {(H {\hat{P}}_{k} H^{T} + R_{k})}^{-}

(14)

Finally, according to the Kalman gain, the object correction state formula is obtained as follows:

x_{k} = \hat{{\vec{x}}_{k}} + K_{k} ({\hat{Z}}_{k} - H \hat{{\vec{x}}_{k}})

(15)

The covariance matrix

P_{k}

is updated for the next frame of the algorithm to predict the trajectory as follows:

P_{k} = (I - K_{k} H) {\hat{P}}_{k}

(16)

Similarly to other application scenarios of the Kalman filter, this paper controls the value of Kalman gain

K_{k}

by adjusting the size of hyperparameters Q and R [17,18]. According to Formula (14), it is not difficult to see that the larger the hyperparameter R is, the smaller the Kalman gain

K_{k}

is, that is, the predicted value of the object correction state trust mathematical model. On the contrary, if the hyperparameter R is smaller, the larger the Kalman gain

K_{k}

is, the more the object correction state trusts the observation value of the positioning sensor. The size of the hyperparameters Q and R is not groundlessly adjusted. If the state dimension of the prediction model is very high, there is a lot of additional information, or the environment of the object moving scene does not change much (such as the car driving on the highway), the object correction state trusts the prediction model. The hyperparameter Q also tends to have a smaller value, and vice versa. If the positioning sensor is used with high accuracy and its GPS error is very small, the more the object correction state trusts the observation value, the more the hyperparameter R tends to a smaller value [19]. The noise reduction results are shown in Figure 8.

5. Experiments and Analysis

In order to analyze and evaluate the performance of the algorithm, this paper selects a laptop as the experimental test hardware platform. The specific configuration is Intel (R) Core (TM) i7-8750H CPU @ 2.20 GHz 2.21 GHz, and the memory is 8 GB. The software environment is the Windows 11 operating system and the Visual Studio 2022 development system. The experimental data come from the T-Drive data set [20], which contains 10,357 taxis driving in Beijing from 2 February to 8 February 2008. The data set contains about 1.5 million points with a total distance of 900 km.

5.1. Noise Reduction Model Analysis

Because the noise location of the trajectory data set cannot be accurately determined, the performance of each filter cannot be intuitively compared. In this experiment, Gaussian noise is artificially added to the T-Drive data set to test the smoothing effect of the trajectory data of each filter. The data set is represented by the following formula after adding Gaussian noise:

X_{i} = \{\begin{array}{l} X_{i}^{t} + e_{i}, l a n d o m l y n_{e} p o s i t i o n s f r o m N \\ X_{i}^{t}, o t h e r w i s e \end{array}

(17)

where

X_{i}^{t}

denotes the input of the original trajectory without adding noise at the i th position,

e_{i}

represents random two−dimensional Gaussian noise, and

n_{e}

represents the position of adding noise.

In this experiment, we selected a section of the original trajectory with 1000 trajectory points in the data set and added Gaussian noise to 100 of them. The neighborhood size of the mean filter and the median filter is set to 10 [21]. The particle filter uses the same mathematical model as the Kalman filter given in this paper. In order to compare the smoothing effect of each filter more intuitively, the experiment uses the NoiseError mathematical model to evaluate the smoothing effect.

\begin{matrix} N o i s e E r r o r (x, y, F_{x}, F_{y}) = \frac{∥ (F_{x}, F_{y}) - (x, y) ∥_{2}}{∥ (x, y) ∥_{2}} \end{matrix}

(18)

where x and y are the original coordinates of the data set, and

F_{x}

and

F_{y}

are the results obtained by adding Gaussian noise to the Formula (18).

It can be seen from Figure 9 that the performance of the mean filter lags behind the other three filters. Because the mean filter simply uses the mean of the first few trajectory points instead of the trajectory points in the data set, this leads to its extreme sensitivity to noise interference. From the data point of view of the performance of the median filter, the performance is already superior, but considering that it is essentially replacing the trajectory point with the median of each trajectory point as a neighborhood, it is not sensitive to noise. However, it has a huge flaw when used for trajectory data smoothing. The median filter only considers the location of the estimated trajectory points, while ignoring other order variables that are equally important for the trajectory data, which leads to a large amount of feature information lost in the smoothed data. This is unacceptable for trajectory data. Similarly to the Kalman filter, particle filter needs to describe the actual motion state of the trajectory by constructing a mathematical model. The more accurate the mathematical model is the better the performance of the two filters is. Although with the increase in noise intensity, the performance of the particle filter exceeds the Kalman filter at the noise intensity of about 32 db, but with the development of the current positioning sensor, its positioning error and noise have been greatly improved. Therefore, by weighing the performance advantages of particle filter under high-intensity noise and its computational resources being far greater than the consumption of the Kalman filter, this paper finally chooses the Kalman filter to smooth the trajectory because the trajectory data are very large. When smoothing the trajectory data, the advantage of the Kalman filter in speed is far more important than the weak performance advantage of the particle filter under high-intensity noise.

5.2. Performance Analysis of Simplified Algorithm

To better demonstrate the performance advantages of the SSFI algorithm, this paper introduces several line segment simplification algorithms that also address the

\min #

problem [22] for comparative analysis.

The Douglas–Peucker (DP) algorithm employs Euclidean distance as a feature to recursively identify and retain trajectory points with the largest Euclidean distance, continuing this process until all points have an Euclidean distance greater than the specified threshold. The Fast version Bounded Quadrant System (FBQS) algorithm selects up to eight points within a defined quadrant to form a convex polygon and calculates the upper and lower bounds based on the connection between these points and the origin. If the distance between the trajectory and the region falls below the specified error threshold, the region is considered tight and is simplified. The One-Pass Error Bounded (OPERB) algorithm employs local distance detection to evaluate whether the current trajectory point satisfies retention conditions by establishing active and anchor points. OPERB-A introduces a trajectory point interpolation method based on OPERB. By setting the intersection of the extension lines of two separated trajectory segments in the simplified trajectory as the addition point and discarding the intermediate trajectory segment, this method not only effectively improves the simplification rate but also reduces the number of abnormal edges.

Initially, this paper presents a thorough analysis of the complexity associated with the SSFI algorithm and the referenced algorithms, with the results detailed in Table 1. Here, n denotes the number of trajectories in the original data set, and m represents the storage space utilized.

Subsequently, this paper conducts a comparative analysis of the performance advantages and application scenarios of each algorithm based on several performance metrics, including simplification rate, running time, and error accuracy. The trajectory simplification rate, denoted as rr, represents the ratio of the number of points in the simplified trajectory to those in the original trajectory. This metric effectively captures the extent to which the simplification algorithm reduces the original trajectory.

5.2.1. Simplification Rate Analysis

As illustrated in Figure 10a, we assess the simplification rate of each algorithm across varying distance thresholds. We observed that the SSFI algorithm exhibits exceptionally high simplification rates across various distance thresholds. The DP and FBQS algorithms significantly lag behind other algorithms in terms of simplification rate. As described in the literature, the OPERB algorithm demonstrates an exceptionally high simplification rate, and even the OPERB-A algorithm surpasses the SSFI algorithm in terms of simplification rate. Regarding the OPERB-A algorithm, despite its excellent simplification rate, the added interpolation process reduces data size but introduces a risk of trajectory distortion. The additional inserted trajectory points, derived from the intersection of trajectory extension lines, are neither actual sensor-recorded points nor rigorously computed by mathematical models. Furthermore, the inclusion of the interpolation method inevitably increases running time, which is not worth the loss for the real-time simplification algorithm.

As shown in Figure 10b, the figure shows the difference in the simplification rate performance of the simplified results before and after the trajectory data set is smoothed. When the distance threshold is small, the performance improvement brought by trajectory smoothing is particularly obvious, and the improvement of SSFI algorithm is much greater than that of the DP algorithm. This is entirely in line with our expectations. Owing to the superior performance of the current positioning sensor, the noise it generates does not deviate significantly from the true trajectory. However, the deviation in direction is uncontrollable. This results in frequent occurrences where the angular difference between trajectory segments exceeds the angular threshold during the SSFI process, leading to a substantial increase in the number of inflection points that need to be retained thereby preventing a natural decrease in the simplification rate. Therefore, this paper uses the Kalman filter to smooth the trajectory, so that the trajectory direction will not change suddenly, which significantly reduces the simplification rate of SSFI algorithm.

Considering the specific characteristics of the SSFI algorithm, a comprehensive performance analysis must account not only for the impact of the distance threshold but also for the effect of the angle threshold on the algorithm’s simplification process. Thus, this paper also examines how varying the angle threshold affects the algorithm’s simplification rate, while keeping the error threshold fixed. As depicted in Figure 11a, across different distance thresholds, the trend in the simplification rate of the SSFI algorithm influenced by the angle threshold remains consistent. As the angle threshold increases, the simplification rate decreases. This is because a smaller angle threshold necessitates retaining more inflection points thereby diminishing the effectiveness of the preselected area constructed by the algorithm. A smaller angle threshold makes the SSFI algorithm more sensitive to changes in trajectory direction, which helps to simplify the trajectory while preserving the overall outline of the original trajectory. Consequently, users can adjust the angle threshold to control the simplification accuracy of the SSFI algorithm thereby accommodating various application requirements. As depicted in Figure 11b, we compared the simplification rates before and after smoothing at distance thresholds of 10 and 100. The experimental findings reveal that the post-smoothing process did not modify the influence of the angular threshold on the simplification rate trend.

5.2.2. Error Comparison

To facilitate the evaluation of the algorithm’s error accuracy, we define the average error as the mean Euclidean distance between the discarded trajectory points in each segment of the simplified trajectory and the corresponding segment. A smaller value indicates a lower error between the simplified trajectory and the original trajectory thus reflecting a more accurate approximation. Consequently, the higher the accuracy of the simplified trajectory, the more effectively it can serve as a substitute for the original trajectory.

The results are illustrated in Figure 12a. At lower distance thresholds, the performance of the three algorithms shows no significant gap, except that the average distance for the DP and FBQS algorithms is markedly greater compared to the other algorithms. At a distance threshold of 100, the SSFI algorithm demonstrates clear advantages. In summary, the SSFI algorithm outperforms other algorithms in terms of simplification accuracy.

5.2.3. Time Cost

As illustrated in Figure 13a, when the number of original trajectory segments increases from 1 to 2000 (where the average number of points per trajectory in the data set is approximately 2000), all algorithms, except for the DP algorithm, exhibit similar time complexity trends. The trend of running time increasing with the size of the trajectory remains consistent across these algorithms. The SSFI algorithm demonstrates more pronounced advantages over other algorithms when handling more than 1000 trajectories. This indicates that the SSFI algorithm offers superior running efficiency, which is crucial for the simplification of large-scale trajectory data.

Figure 13b shows the change in the running speed of the DP and SSFI algorithms before and after the trajectory smoothing. As shown in the figure, the smoothed trajectory greatly improves the running speed of the SSFI algorithm. Athe trajectory is smoothed, the influence of noise points is removed, so that more trajectory points fall in the preselected area during the SSFI operation, which greatly reduces the running process of the algorithm, thus improving the running speed of the algorithm. As for the DP algorithm, the trajectory smoothing is not obvious for its speed improvement.

In order to more clearly show the advantages of the SSFI algorithm in the case of large amount of trajectory data, this paper also compares the time spent by each algorithm to simplify the complete data set T-Drive. As shown in Figure 14a,b, as the amount of data increases, the advantages of the SSFI algorithm in running speed become extremely obvious. Even after the trajectory is smoothed, the running speed of the SSFI algorithm has a more objective improvement.

The objective of trajectory simplification extends beyond merely reducing storage space, it also involves optimizing the efficiency of subsequent analyses performed on the simplified trajectories. Consequently, evaluating the quality of a trajectory simplification algorithm cannot be accomplished by relying on a single performance metric in isolation. This experiment provides a comprehensive analysis of the distance threshold, simplification rate, and average error. As illustrated in Figure 15, increasing the distance threshold results in a decreasing simplification rate; however, the average error increases significantly, leading to a corresponding reduction in trajectory feature information. Thus, indiscriminately increasing the distance threshold can substantially lower the simplification rate of the algorithm, but it may also cause significant distortion in the simplified trajectory thereby adversely affecting the overall quality of the simplification. Therefore, an appropriate distance threshold should be chosen based on the specific application scenario to ensure that the simplified trajectory can effectively substitute for the original trajectory.

The data presented above demonstrates that the SSFI algorithm proposed in this paper performs favorably across the performance indicators of the three compared simplification algorithms, with varying degrees of enhancement observed in trajectory data smoothing for each indicator.

6. Conclusions

Given the constraints of lightweight equipment, applying overly complex simplification algorithms is impractical. Consequently, this paper focuses on line segment simplification algorithms with low time complexity as the primary research objective. However, existing line segment simplification algorithms predominantly focus on the geometric features of trajectory points linked to line segments, often neglecting the comprehensive use of trajectory point feature information and only considering the external contour of these points. In reality, the feature information of most trajectory points is not utilized in the trajectory simplification process. Hence, leveraging the feature information of the trajectory, including that of points not recorded by the positioning sensor, for trajectory simplification has become a primary research focus of the algorithm discussed in this paper. This paper demonstrates that the SSFI algorithm effectively addresses this issue by employing an angle threshold and utilizing its distinctive IED error to construct the preselected area.

Experiments show that the SSFI algorithm is efficient and stable in simplifying the trajectory, and because it uses more spatio-temporal feature information of the trajectory when the simplified trajectory is used for data analysis because it carries more spatio-temporal feature information, its analysis results will be more accurate. When this trajectory is used for data analysis with different analysis purposes, the universality will naturally be stronger (such as for analyzing the turning frequency of vehicles in all directions at intersections). Based on the idea of retaining more spatio-temporal feature information of the original trajectory, how to retain more spatio-temporal feature information on the basis of ensuring the simplification rate and running time of the algorithm will become the focus of trajectory simplification algorithm research [23].

Author Contributions

All authors contributed to the paper. The experimental part was mainly conducted by the first author, S.H. Conceptualization was carried out by S.H. and Z.Y. Verification and final editing of the manuscript were conducted by Z.Y. who played a role of a research director. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data and codes can be requested from the first author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

In order to facilitate the calculation, we give the geometric calculation formula for IED error as follows:

\begin{matrix} \int_{x_{k - 1}}^{x_{k}} |f (k - 1, k)| d x \\ = \{\begin{matrix} ((d (p_{k - 1}) + d (p_{k})) \times l_{p_{k - 1} p_{k}} \times | \cos (θ_{p_{i} p_{j}} - θ_{p_{k - 1} p_{k}}) | \times \frac{1}{2} & {(θ}_{p_{i} p_{k - 1}} - θ_{p_{i} p_{j}}) * (θ_{p_{i} p_{k}} - θ_{p_{i} p_{j}}) \geq 0 & (A 1) \\ \frac{[l_{p_{k - 1} p_{k}} \times | \cos (θ_{p_{k - 1} p_{k}} - θ_{p_{i} p_{j}}) |] \times (d (p_{k - 1})^{2} + d (p_{k})^{2})}{2 \times (d (p_{k - 1}) + d (p_{k}))} & (θ_{p_{i} p_{k - 1}} - θ_{p_{i} p_{j}}) * {(θ}_{p_{i} p_{k}} - θ_{p_{i} p_{j}}) < 0 & (A 2) \end{matrix} \end{matrix}

where

l_{p_{k - 1} p_{k}}

is denoted by the Euclidean distance between the trajectory point

p_{k - 1}

and

p_{k}

.

References

He, P.; Klarevas-Irby, J.A.; Papageorgiou, D.; Christensen, C.; Strauss, E.D.; Farine, D.R. A guide to sampling design for GPS-based studies of animal societies. Methods Ecol. Evol. 2023, 14, 1887–1905. [Google Scholar] [CrossRef]
Qian, K.; Li, X. A user LBS dual privacy protection scheme based on trajectory similarity. Comput. Simul. 2023, 40, 459–465. [Google Scholar]
Wang, S.; Bao, Z.; Culpepper, J.S.; Cong, G. A survey on trajectory data management, analytics, and learning. ACM Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
Wang, P.; Wang, J.; Zhang, L.; Jing, D. An improved Douglas-Peucker NC machining trajectory compression method. Microcomput. Syst. 2024, 1, 1–9. [Google Scholar]
Zhang, Y.; Pan, J.; Zhao, M. Threshold guided sampling method for ship trajectory simplification algorithm. J. Jimei Univ. 2021, 26, 425–432. [Google Scholar]
Liu, J.; Zhao, K.; Sommer, P.; Shang, S.; Kusy, B.; Lee, J.-G.; Jurdak, R. A Novel Framework for Online Amnesic Trajectory Compression in Resource-Constrained Environments. IEEE Trans. Knowl. Data Eng. 2016, 28, 2827–2841. [Google Scholar] [CrossRef]
Liu, J.; Zhao, K.; Sommer, P.; Shang, S.; Kusy, B.; Jurdak, R. Bounded Quadrant System: Error-bounded trajectory compression on the go. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering, Seoul, Republic of Korea, 13–17 April 2015; pp. 987–998. [Google Scholar]
Lin, X.; Ma, S.; Zhang, H.; Wo, T.; Huai, J. One-pass error bounded trajectory simplification. Proc. VLDB Endow. 2017, 10, 841–852. [Google Scholar] [CrossRef]
Jiang, H.; Han, D.; Liu, H.; Nie, W. Time Synchronized Velocity Error for Trajectory Compression. Comput. Model. Eng. Sci. 2022, 130, 1193–1219. [Google Scholar] [CrossRef]
Zhong, Y.; Kong, J.; Zhang, J.; Jiang, Y.; Fan, X.; Wang, Z. A trajectory data compression algorithm based on spatio-temporal characteristics. PeerJ Comput. Sci. 2022, 8, e1112. [Google Scholar] [CrossRef] [PubMed]
Gu, J.; Song, X.; Shi, X.; Chang, D. Online compression algorithm of fishing vessel trajectory based on improved sliding window. J. Shanghai Marit. Univ. 2023, 44, 17–24. [Google Scholar]
Kim, M.W.; Park, S.K.; Ju, J.G.; Noh, H.C.; Choi, D.G. Clean Collector Algorithm for Satellite Image Pre-Processing of SAR-to-EO Translation. Electronics 2024, 13, 4529. [Google Scholar] [CrossRef]
Sui, L.; Ma, F.; Chen, N. Mining Subsidence Prediction by Combining Support Vector Machine Regression and Interferometric Synthetic Aperture Radar Data. ISPRS Int. J. Geo-Inf. 2020, 9, 390. [Google Scholar] [CrossRef]
Wang, H.; Jin, X.; Hou, C.; Zhou, L.; Xu, Z.; Jin, Z. A micro-nano satellite relative positioning algorithm based on robust adaptive estimation. J. Zhejiang Univ. 2023, 57, 2325–2336. [Google Scholar]
Xu, W.; Yan, C.; Du, W.; Zhang, G.; Wang, T.; Xu, M. Comparative analysis of navigation and positioning performance between GPS system and BDS system. Glob. Position. Syst. 2017, 42, 77–82. [Google Scholar]
Xu, C.; Chen, G.; Hu, N. Beidou/GPS dual-mode data fusion trajectory positioning based on Kalman filter. Metering Test. Technol. 2024, 51, 10–13. [Google Scholar]
Zhao, Y.X.; Hsieh, Y.Z.; Lin, S.S.; Pan, C.J.; Nan, C.W. Design of an IoT-Based Mountaineering Team Management Device Using Kalman Filter Algorithm. J. Internet Technol. 2020, 21, 2085–2093. [Google Scholar]
Xu, D.; Wang, B.; Zhang, L. A New Adaptive High-Degree Unscented Kalman Filter with Unknown Process Noise. Electronics 2022, 11, 1863. [Google Scholar] [CrossRef]
Zhou, L.; Zhang, L.; Jin, Y.; Hu, Z.; Li, J. Distributed Cubature Kalman Filter Cooperative Localization Based on Parameterized-belief Propagation. J. Internet Technol. 2022, 23, 497–507. [Google Scholar] [CrossRef]
Yuan, J.; Zheng, Y.; Xie, X.; Sun, G. Driving with knowledge from the physical world. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, San Diego, CA, USA, 21–24 August 2011. [Google Scholar]
Yao, R.; Wang, F.; Chen, S.; Zhao, S. GroupSeeker: An Applicable Framework for Travel Companion Discovery from Vast Trajectory Data. ISPRS Int. J. Geo-Inf. 2020, 9, 404. [Google Scholar] [CrossRef]
Ding, M. Research on Trajectory Big Data Compression Technology. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2019. [Google Scholar]
Ru, J.; Wang, S.; Jia, Z.; Wang, Y.; He, T.; Wu, C. Sunshine-Based Trajectory Simplification. IEEE Access 2019, 7, 47781–47793. [Google Scholar] [CrossRef]

Figure 1. Orange area is safe area.

Figure 2. (a) Definition of trajectory segment direction; (b) direction difference is less than π; (c) direction difference is greater than π.

Figure 3. The PED error.

Figure 4. (a)

p_{k}

and

p_{k - 1}

are located on the opposite side of the simplified trajectory segment. (b)

p_{k}

and

p_{k - 1}

are located on the same side of the simplified trajectory segment.

Figure 4. (a)

p_{k}

and

p_{k - 1}

are located on the opposite side of the simplified trajectory segment. (b)

p_{k}

and

p_{k - 1}

are located on the same side of the simplified trajectory segment.

Figure 5. Defects of PED.

Figure 6.

ε

_Area

R_{i j}

region determines IED error-bounded.

Figure 6.

ε

_Area

R_{i j}

region determines IED error-bounded.

Figure 7. Updating the

P [i : j]_P r e s e l e c t o n A r e a

area.

Figure 7. Updating the

P [i : j]_P r e s e l e c t o n A r e a

area.

Figure 8. Effect display before and after smoothing.

Figure 9. NoiseError performance comparison.

Figure 10. (a) Simplification rate of each algorithm under different distance thresholds; (b) change in algorithm simplification rate after smoothing.

Figure 11. (a) Influence of angle threshold on SSFI simplification rate under different distance thresholds. (b) Influence of angle threshold on the simplification rate of algorithm after smoothing.

Figure 12. (a) The average error of each algorithm under different error thresholds. (b) The average error change in the algorithm after smoothing.

Figure 13. (a) The running speed of each algorithm. (b) The change in the algorithm running speed after smoothing.

Figure 14. (a) The time required for each algorithm to simplify the entire data set. (b) The time required to simplify the entire data set after smoothing.

Figure 15. Relationship between simplification rate, distance threshold, and average distance.

Table 1. Simplified algorithm complexity analysis.

Algorithm	Time Complexity	Space Complexity
SSFI	$O (n)$	$O (1)$
DP	$n \log (n)$	$O (m)$
FBQS	$O (n)$	$O (1)$
OPERB	$O (n)$	$O (1)$
OPERB-A	$O (n)$	$O (1)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, S.; Yang, Z. Algorithm for Trajectory Simplification Based on Multi-Point Construction in Preselected Area and Noise Smoothing Processing. Data 2024, 9, 140. https://doi.org/10.3390/data9120140

AMA Style

Huang S, Yang Z. Algorithm for Trajectory Simplification Based on Multi-Point Construction in Preselected Area and Noise Smoothing Processing. Data. 2024; 9(12):140. https://doi.org/10.3390/data9120140

Chicago/Turabian Style

Huang, Simin, and Zhiying Yang. 2024. "Algorithm for Trajectory Simplification Based on Multi-Point Construction in Preselected Area and Noise Smoothing Processing" Data 9, no. 12: 140. https://doi.org/10.3390/data9120140

APA Style

Huang, S., & Yang, Z. (2024). Algorithm for Trajectory Simplification Based on Multi-Point Construction in Preselected Area and Noise Smoothing Processing. Data, 9(12), 140. https://doi.org/10.3390/data9120140

Article Menu

Algorithm for Trajectory Simplification Based on Multi-Point Construction in Preselected Area and Noise Smoothing Processing

Abstract

1. Introduction

2. Related Work

3. Trajectory Simplification Algorithm SSFI

3.1. Preliminaries

3.2. Error Measurement Method Based on Implicit Trajectory Points

3.3. Algorithm Implementation and Its Pseudo-Code

4. Trajectory Data Noise Reduction

4.1. GPS Error Analysis

4.2. Trajectory Data Noise Reduction Model

4.3. Kalman Filter Prediction Phase

4.4. Update Phase of Kalman Filter

5. Experiments and Analysis

5.1. Noise Reduction Model Analysis

5.2. Performance Analysis of Simplified Algorithm

5.2.1. Simplification Rate Analysis

5.2.2. Error Comparison

5.2.3. Time Cost

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI