Simpliﬁcation and Detection of Outlying Trajectories from Batch and Streaming Data Recorded in Harsh Environments †

Smoothing of Trajectory Data Recorded in Harsh Environments and Detectioning of Outlying Abstract: Analysis of trajectory such as detection of an outlying trajectory can produce inaccurate results due to the existence of noise, an outlying point-locations that can change statistical properties of the trajectory. Some trajectories with noise are repairable by noise ﬁltering or by trajectory-simpliﬁcation. We herein propose the application of a trajectory-simpliﬁcation approach in both batch and streaming environments, followed by benchmarking of various outlier-detection algorithms for detection of outlying trajectories from among simpliﬁed trajectories. Experimental evaluation in a case study using real-world trajectories from a shipyard in South Korea shows the beneﬁt of the new approach.


Background and Motivation
The increased use of the Global Navigation Satellite System (GNSS), such as Global Positioning System (GPS) [1] enhances the ability to generate trajectory data [2]. Under batch [3] or streaming environments [4], it broadened awareness of, and stimulated interest in, location-based services such as trajectory data mining [5,6]. A trajectory can be defined as a sequence of point-locations; however, some point-locations can be considered as noise, which is a random error due to several circumstances such as sensor errors [7] or environment interference [8]. In some uncontrollable situation, for example, an underground structure, an environment having many high-rise buildings and/or many steel structures, current hardware cannot perform accurately. Noise is an outlying point-location that can change the statistical properties of a trajectory significantly, i.e., the corresponding feature vector. A single noise, such as the one shown in Figure 1a, can change the statistical properties significantly, for example, if it is located very far from the rest of the point-locations of the trajectory. We can call a trajectory a noisy trajectory if it contains noise. A trajectory that contains a noise that renders it useless for movement analysis is called an outlying trajectory. A trajectory with a high amount of noise can usually be found such that a software-based noise-filtering or trajectory-simplification approach is necessary to enhance hardware capability. Detection of an outlying trajectory is one example of important trajectory-data-mining analysis [9,10] that can be affected by such a noisy trajectory problem.
To handle such a noisy trajectory problem, a noise-removal modality that reduces the amount of noise in a trajectory using a filtering or heuristic approach has been proposed by Zheng et al. [5]. Meanwhile, in response to the problem of the large number of point-locations and thereby noise that can be generated, trajectory-simplification by the reduction of trajectory length also has been proposed [5]. Figure 1b shows that trajectory-simplification reduces the length of a trajectory by including only essential point-locations that, in sum, can be an approximation or representation of the actual trajectory. By simplifying the trajectory, the amount of noise is reduced, thus improving the precision and recall of the outlying-trajectory-detection algorithms. In this paper accordingly, we propose an improved trajectory-simplification algorithm for both the batch and streaming environments, as well as two scenarios for the determination of the trajectory-simplification parameter.

Running Example
To illustrate the problem, we use a real-world application of a location-based service for monitoring of block transporter movement in the South Korean shipbuilding industry. A large ship is usually made from properly sized parts called blocks. Each block requires a sequence of work, including cutting and forming, block assembly, pre-outfitting and painting, pre-erection, erection, outfitting and painting in a specific work area called a factory [11].
Since each block undergoes several different operations, it should be moved around the shipyard to complete all of the work before the final step. Because a factory is designed for a fixed-position layout, whenever the type of work is changed, a block must be moved from one location to another using very large carrier vehicles called block transporters. According to safety regulation, every block transporter moves at a maximum speed of 30 km per hour. Nonetheless, since many shipbuilding projects run simultaneously, a block transporter must operate on a tight schedule. Thus, monitoring and analysis of block transporter movement patterns is a very important task. Figure 2 shows how one company has chosen to adopt GNSS technology for tracking of such patterns.
Every block transporter is equipped with a device incorporating GPS receiver and Bluetooth low energy (BLE) modules. The GPS receiver module will continuously receive GPS signals and update point-location data consisting of latitude, longitude, and timestamp, and then the BLE module will broadcast the point-location update using BLE broadcasting advertising packets. A signaler, the person who will move along with the block transporter to guide its route, uses a mobile application to initialize tracking of the block transporter movement and to start receiving the point-location data from the block transporter device. The application will send position data periodically (every five seconds) to the application server in the company headquarters only if the block transporter point-location was changed (e.g., latitude and longitude differ from the previous point-location data). The sequence of the position data collected during the movement of a block from the start to the end location forms a trajectory. Due to many steel structures as well as high buildings in the environment, the GPS signal often is deflected to another location, thereby leaving a considerable amount of noise in the trajectory [12]. In this study, a domain expert classified a trajectory as an outlying if it has an either significant number(s), in our case at least two, of random jump(s) noise or small loop noise as shown in Figure 3a,b, respectively. A trajectory is repairable to the extent that the noise within it can be removed. For this purpose, a trajectory-simplification algorithm must be applied before initiation of the data mining phases of trajectory data mining such as outlier-detection. Issues to be resolved prior to simplifying, however, are: (1) how exactly trajectory-simplification is to be accomplished to reduce noise in a trajectory, and (2) how trajectory-simplification affect the precision and recall during detection of outlying trajectories.

Contributions
In our previous work [13], we proposed: (1) an outlying trajectory-detection framework that entails preparatory simplifying of trajectories; (2) the t-fixed partition (TFP) and k-ahead artificial arc (KAA) algorithms for simplifying of trajectory data, and (3) benchmark statistics-based, distance-based and density-based outlier-detection algorithms for detection of outlying trajectories on the basis of a real-world case study of a shipyard in South Korea. However, it is difficult to determine the parameters for a set of trajectories that may have different length, and streaming environments that can arise in the real-world are not yet supported.
In this extended work, we make the following new contributions: 1. we introduce two scenarios for the determination of the parameters of our trajectory-simplification algorithms; 2.
we introduce a streaming version of our KAA trajectory-simplification algorithms; and 3. evaluation by means of a case study comparing our approach with the state-of-the-art and improvement in the detection of outlying trajectories caused by simplified trajectories.
The remainder of this paper is organized as follows. Section 2 discusses several related studies. Section 3 provides a problem statement. Section 4 presents the proposed trajectory-simplification algorithm. Section 5 reports an experimental evaluation based on a real-world case study. Finally, Section 6 concludes the paper.

Trajectory-Simplification Problem
Over the past decades, several studies related to trajectory data mining have been completed [2,[4][5][6]. The mining framework usually comprises several stages, gone of which is trajectory data preprocessing [4]. The primary purpose of the preprocessing stage is to generate high-quality trajectory data by selecting data that represents a trajectory, followed by filtering of the remaining noise. Therefore, if we apply a simplifying method to trajectory data, both data selection and noise filtering stages become unnecessary. Trajectory-simplification aims to produce a simplified trajectory by including only major, important points from the 'raw' trajectory [7]. Afterwards, outlying trajectory detection usually is included as an essential component of the trajectory data mining framework [4,5].
As for trajectory-simplification, the state-of-the-art had been surveyed by [14] for both batch and streaming (or online) mode. For the batch environment, it starts with the famous Douglas-Peucker (DP) algorithm [15], which has been proven to simplify a line to a particular error threshold. Then, Keogh et al. [16] and Meratnia et al. [17] introduced a sliding-window-based approach to add a trajectory segment to the resulting simplified trajectory. In [18], the use of the shortest path (SP) algorithm that adds artificial arcs as constraints in a graph to simplify the line in the cartography is introduced. Chen et al. [19] proposed a distance function (inspired by edit distance, which is widely used in bio-informatics and speech recognition) to check the similarity between two moving trajectories. Long et al. [20] proposed error measurement for direction preservation that calculates the angular delta before adding artificial arcs to a graph and using the (SP) approach to simplify the trajectory. A streaming version of [20], additionally, has been proposed in [21]. The latest one, the Sunshine algorithm [22], which is variant of the shortest path (SP) algorithm with additional requirements of sunshine duration error. However, due to the use of an error threshold, both the DP and SP variants sometimes fail to avoid noise that is far from their 'true' locations. Herein, we introduce an alternative sliding-window approach in the TFP algorithm as well as a relaxed, unconstrained version of the SP approach in the KAA algorithm.

Outlying-Trajectory Detection
For detection of outlying trajectories, several studies in the field of data mining have been conducted. An outlier-detection algorithm is used to find a subset of data that is far from the majority of data or cannot meet some statistics requirements. Figure 4 schematizes three popular approaches to outlier-detection [23]:

1.
Statistics-based outlier-detection [23], utilizes the statistical properties of data for outlier-detection. For example, if measured data are far outside interquartile range (IQR) Q1 and Q3, they can be considered as outliers. This approach can work on a single object by setting the following threshold parameters: (1) Outlier factor o f ; and (2) the extreme value factor e f . However, the effectiveness usually fades with growth of data since the mean and variance usually grow larger; 2.
Distance-based outlier-detection (DB(r, π)) [24], detects an outlier by calculating its distance relative to other objects. This approach can detect a significant global outlier among all data based on the parameter distance threshold r and the outlier fraction threshold π, but it cannot detect a local outlier from a cluster, (e.g., a set of objects that is closer to the others);

3.
Density-based outlier-detection (LOF(n)) [25], detects a local outlier by detecting a significant object that is far from the others among a set of closely related objects based on the parameter number of the n closest-neighbor.
From the three types of outlier-detection methods above, we want to benchmark the best method for detecting outlying trajectories [9,10] after applying a trajectory-simplification algorithm.

Preliminaries
First, we introduce several terms used in this paper, and afterwards, we define our problem.

Trajectory Simplification
Definition 1 (Trajectory, Point-Location). A 'trajectory' tr i ∈ TR is a sequence of multidimensional point-location p ij denoted as tr i = p i1 , p i2 , . . . p in , . |tr i | = n denotes the length of trajectory tr i and I(p ij ) = j denotes the index location j of the point-location p ij in trajectory tr i . A 'point-location' p ij is a tuple p ij = (lat ij , lng ij , ts ij ) is a location(s) data belonging to the trajectory tr i presented as a latitude (lat ij ) and a longitude (lng ij ) pair with its corresponding timestamp ts ij .
Definition 2 (Simplified Trajectory, Trajectory-Simplification Algorithm). A 'simplified trajectory' st i ∈ ST denoted as st i = s i1 , s i2 , . . . , s im is a subset of the trajectory tr i , iff: s im = p in A 'trajectory-simplification algorithm' is an approach to removes some point-locations from trajectory tr i into a corresponding simplified trajectory st i such that, the index of point-location p ik is mapped into point-location s ij with j ≤ k. |st i | = m denotes the length of simplified trajectory st i , and inversely, s −1 ij = p ik denotes an original point-location p ik ∈ tr i corresponding to the point-location s ij ∈ st i .

Definition 3 (Trajectory Stream, Simplified Trajectory Stream).
A 'trajectory stream' trs i ∈ TRS is an unbounded sequence of multidimensional point-location denoted as trs i = p i1 , p i2 , . . . , p ij , . . . , and |trs i |(ts) = ∑ ts j=1 |p ij | denotes the length of trajectory stream trs i at time ts. A 'simplified trajectory stream' strs i ⊆ trs i denoted as strs i = s i1 , s i2 , . . . is a subset of the trajectory stream trs i , iff: A trajectory can be acquired from the traces of completed moves (e.g., a sequence of GNSS points of vehicle movement from the start to the end location), and an example of a trajectory stream is a trajectory of a vehicle that still moving from the start to the end location.
Several measurements have been defined to benchmark the use of trajectory-simplification. First, a compression ratio is used to compare the length between a trajectory and its corresponding simplified trajectory.

Definition 4 (Compression Ratio (CR)).
A 'compression ratio' CR(st i , tr i ) is the length ratio of the simplified trajectory st i versus its original trajectory tr i and can be calculated as follows: As for the second measurements, total travel distance reduction ratio, compares the total travel distance between the trajectory and its corresponding simplified trajectory.
Definition 5 (Spatial Distance (DIST), Trajectory Total Travel Distance (TTD)). A 'spatial distance' DIST(p ia , p ib ) is the distance between two point-locations p ia and p ib , that can be calculated as follows: with R = 6.371.000 meters being the approximate radius of the Earth (this is the so called Equirectangular approximation for measuring distance in the latitude and longitude coordinate system [26]). Thus, the trajectory total travel distance can be calculated as follows: Definition 6 (Total Travel Distance Reduction Ratio (TTDRR)). A 'total travel distance reduction ratio' TTDRR(st i , tr i ) is the total travel distance ratio of simplified trajectory st i versus its original trajectory tr i and can be calculated as follows: For the last measurements, we generalize the Synchronized Euclidean Distance (SED) [27] error measurement, and, introduce a Time-Synchronized Spatial Distance (TSSD) to measure the spatial distance between two points at identical timestamps.
Definition 7 (Time-Synchronized Spatial Distance (TSSD)). A 'time-synchronized spatial distance' TSSD(s ia , p ib , s ic ) is the spatial distance between two point-locations p ib and p ib that can be calculated as follows: where p ib is a time synchronized point-location of p ib to the trajectory created by two point-locations s ia and s ic (see Figure 5). When the movement is contained within a relatively small area (less than or equal to one UTM grid (100,000 m 2 ) [28]), p ib can be calculated using linear interpolation as follows: p ib = (lat ib , lng ib , ts ib ). The Time Synchronized Spatial Distance (TSSD) is calculated to measure the deviation between trajectory and its corresponding simplified trajectory. Since the simplified trajectory loses some points from the original trajectory, the TSSD is used to measure the spatial distance by calculating a time-based linear interpolation on the simplified trajectory for each removed point-location in the corresponding trajectory. In Figure 5, the TSSD (marked by the dotted red line) is the distance projection of p ib to p ib between point-locations s ia and s ic . Here, the simplified trajectory becomes a direct line of s ia , p ib and s ic . Finally, we introduce Average Time Synchronized Spatial Distance (ATSSD) to measure the average time synchronized spatial distance between a trajectory tr i and its corresponding simplified trajectory st i as follows: ). An 'average time-synchronized spatial distance' ATSSD(tr i , st i ) is the spatial distance between a trajectory tr i and its corresponding simplified trajectory st i , and is calculated as follows: The efficiency of a trajectory-simplification algorithm is defined as the balance among the compression ratio, the total travel distance reduction ratio and the average of time-synchronized spatial distance. The resulting simplified trajectory should have maximum compression ratio, and at the same time it holds the minimum total travel distance reduction ratio and the average of time synchronized spatial distance.

Trajectory Outlier Detection
Definition 9 (Feature Vector). A 'feature vector' is a set of values that can represent the characteristics of a trajectory denoted as f v name (tr) = {v l , v 2 . . . , v x } with name and | f v name | being the name and length of the feature vector, respectively.
Instead of using raw data, an outlier-detection algorithm usually use a feature vector derived from the data. Here, we use three kinds of feature vectors:

1.
A trajectory that contains outlying point-locations, which are point-locations that significantly affect statistical properties, i.e., feature vectors of the trajectory; or 2. a trajectory that does not have enough neighbors with similar feature vectors, either globally or within its local clusters if any.
The outlying trajectory basically is a trajectory with an outlying point-locations or trajectories that significantly different among the others in terms of feature vectors [29,30].

Problem Statement
Finally, we define our problem as a trajectory simplification problem followed by detection of outlying trajectories as follows: Given a set of trajectories TR = {tr 1 , tr 2 , . . . , tr I }, our algorithm discovers a corresponding set of simplified trajectories ST = {st 1 , st 2 , . . . , st I }, then, an outlier-detection-algorithm discovers the set of outlying trajectories OT = {ot 1 , ot 2 , . . . , ot L } from among the simplified trajectories ST such that OT ⊆ ST. The question is: Which approach is best for trajectory simplifying and subsequent outlier-detection?

Proposed Approach
In this section, we introduce our framework for trajectory-simplification followed by an trajectory outlier-detection. Figure 7 illustrates that our framework entails two steps: (1) Trajectory simplification using a trajectory-simplification algorithm, namely the TFP or KAA algorithm. Additionally, we provide the streaming version of KAA algorithms to handle trajectory-streams; and (2) application of the outlier-detection algorithm for detection of outlying trajectories. For trajectory-simplification algorithm, we introduce two kinds of environments called batch and streaming environments, that are common for acquiring trajectory data.

Batch Processing Environment
In the batch processing environment, trajectories are all collected from the traces of completed moves. Herein, we propose two approaches to simplify trajectory called t-fixed Partition (TFP) and k-ahead Artificial Arcs (KAA) algorithm.
We herein propose the t-fixed Partition (TFP) simplifying algorithm for simplifying of a trajectory by partitioning every trajectory into a fixed t number of partitions. Algorithm 1 below shows the followings procedure: Finally, return the simplified trajectory st i (line 6).
For example, given a trajectory, as shown in Figure 6, that contains nine point-locations tr i = p i1 , p i2 , . . . , p i9 and parameter t = 3; Calculate partition size w = 9/3 , then w = 3; Iteratively, add every w-times point-location from the trajectory tr i to the corresponding simplified trajectory st i such that st i = p i1 , p i4 , p i7 ; Additionally, add the end point p i9 to the simplified trajectory st i ; and finally, return the simplified trajectory st i = p i1 , p i4 , p i7 , p i9 . Figure 8 illustrates how the TFP trajectory-simplification algorithm works to simplify the example trajectory.

k-Ahead Artificial Arcs (KAA) Algorithm
Inspired by the work in the field of cartography on line simplification based on the constrained shortest path [18], we herein additionally propose a trajectory-simplification approach called the k-ahead artificial arcs (KAA) algorithm. Algorithm 2, below, shows how the proposed KAA approach proceeds in three steps: 1.
(Graph Construction): Start with converting a trajectory tr i into temporary graph G i = (V i , E i ), by assuming each point-location p ia in trajectory tr i as a vertex v ia and an edge e iab between point-location p ia and point-location p ib , a < b < MI N(a + b, |tr i |) (lines 1-6). The value of e iab is simply the spatial distance between two point-locations p ia and p ib (e.g., the Equirectangular distance between two GNSS points based on latitude and longitude); 2.
(Shortest Path Finding): Calculate the shortest path sp i between the first vertex v i1 and the last vertex v i|tr i | in graph G i (line 7). We select the Dijkstra algorithm [31] as our shortest path finding algorithm; and 3.
(Solution Generation): Finally, the simplified trajectory st i is a sequence of point-locations p ia that is included in the shortest path sp i (lines 8-9).
Since the noise usually exists as a point-location that is located far away from the other points, heuristically, we want to avoid it by means of the resulting simplified trajectory. By adding artificial arcs and running the shortest-path-finding algorithm, in most cases, such noise can in fact be avoided.
For example, given a trajectory, as shown in Figure 6, containing nine point-locations tr i = p i1 , p i2 , . . . , p i9 and parameter k = 3. Assign all points p i1 , p i2 , . . . , p i9 in tr i as a vertex v i1 , v i2 . . . v i9 to the vertex set V i of graph G i . For each vertex v ia , create a maximum k-edges between the vertex v ia and the vertex v ib where a < b < MI N(a + k, |tr i |). Given k = 3, we then assign a set of edges ) to the edge set E i of graph G i , and simultaneously, we calculate the spatial distance for each edge e iab as a temporary distances variable dist i . Afterwards, we run a shortest-path-finding algorithm (we use the Dijkstra algorithm) with input graph G i , the temporary distances variable dist i with parameter start node v i1 and end node v i(|tr i |) to find a shortest-path sp i . Using our example trajectory, the corresponding shortest-path The solution is generated by assigning all original points p ia of a vertex v ia in shortest path sp i to the simplified trajectory st i . Finally, return the simplified trajectory st i = p i1 , p i2 , p i5 , p i8 , p i9 . Figure 9 illustrates how the KAA trajectory-simplification algorithm works to simplify the example trajectory.  The main advantage of the TFP algorithm over the KAA algorithm is the complexity of the TFP algorithm, which is O(n) with n is equal to the length of trajectory |tr i |, whereas the KAA algorithm depends on the shortest-path-finding algorithm complexity. In our case, we used the optimized Djikstra algorithm, such that our KAA algorithm complexity was about O(E i log V i ) at best, with E i and V i referring to the number of edges and vertices on the graph G i . However, the quality of the KAA algorithm might be better than the TFP algorithm, due to the implementation of the shortest-path-finding algorithm.

The Determination of Trajectory-Simplification Parameters for Batch Processing
The remaining problem with our approach is to determine the appropriate numbers for parameters t and k of the TFP and KAA algorithms, respectively. It is easy to determine a number of parameter for a single trajectory; however, the effectiveness might be reduced if we apply the same number to a set of trajectories with different lengths. If we lower the number of t on the TFP algorithm the resulting simplified trajectory will be shorter, and vice versa. In the opposite way for the KAA algorithm, if we lower the number of k, the resulting simplified trajectory will be longer due to the reduced number of additional arcs that is added to the corresponding graph and vice versa. Therefore, we propose two scenarios for determining our trajectory-simplification parameters: 1.
Absolute number scenario. We define the same, absolute value of t and k for all trajectories in TR (e.g., t = 5 or k = 5); 2.
Relative number scenario. We define a different t and k for each trajectory with respect to the trajectory length (e.g., t = 5% or k = 5% of trajectory length |tr i |).
In Section 5.2, we will experiment on a real-world case study using these two scenarios.

Trajectory Simplification in Streaming Environment
The batch environment assumes that the data is a complete data; conversely, in the streaming environment, the new incoming data is a continuous set from the current data. The concept of the trajectory-stream-simplifying algorithm is introduced as a delta function to ensure the simplification of trajectory-stream by deciding whether to add the new, incoming point-locations to the current trajectory stream without re-computing the whole trajectory-stream. Unfortunately, creating a streaming version our TFP algorithm is not possible, due to its always requiring prior knowledge of the length of trajectory itself. For example (see Figure 8), if t = 3 then for a trajectory of length 9, each partition will have at most 3 points inside one partition; hence, a streaming environment is not possible to build. Therefore, our KAA algorithm does not need prior knowledge of the length of trajectory, in that we introduce the streaming version of the KAA algorithm called StreamKAA.

Streaming k-Ahead Artificial Arcs (StreamKAA) Algorithm
To make the KAA algorithm work in the streaming environment, we need to introduce a variable mt i (ts), which is a temporary trajectory used as a buffer to keep several point-locations at time ts. The mt i (ts) variable is used to hold point-locations that are not yet added as permanent points to the corresponding simplified-trajectory-stream strs i . In our case, we assume the length of |mt i (ts)| as an unlimited; however, if we limit the length of |mt i (ts)| to be always less than k parameter at any time ts, we could expect quality loss in the resulting simplified-trajectory-stream strs i due to the possibility of missing some important point-locations. Algorithm 3, the streaming version of the KAA trajectory-simplification algorithm, proceeds as follows:

1.
Add newly added point-location p i(|trs|(ts)+1) to current buffer mt i (ts − 1) (line 1) and calculate temporary simplified trajectory tst i using batch version of KAA algorithm with mt i (ts) as input (line 2); 2.
Add all point-locations s ia ∈ tst i for which original point-location s −1 ia does not exist in the new buffer variable mt i (ts) to the new simplified trajectory nst i (lines 7-9); then 5.
Return the new simplified trajectory nst i and its corresponding buffer variable mt i (ts).

Input :
a new, incoming point-location p i(|trs i |(ts)) ∈ trs i at time ts, a parameter k, a previous memory variable mt i (ts − 1) Output : a new simplified trajectory nst i , a new memory variable mt i (ts) Initialize: a new simplified trajectory nst i ← ∅ a new buffer mt i (ts) ← ∅ a temporary simplified trajectory tst i ← ∅ a temporary index variable a, b ← 0

Data
We verified the proposed approach by performing experiments using a real-world case study of a shipyard in South Korea. The dataset employed contains 284 trajectories with a total of 15,012 point-locations of block transporter movement. The data was taken from an actual block transporter movement based on a weekly scheduled movement. Since the planning recurs in a weekly time horizon, we considered that a weekly observation (in this case, 5 working days) was sufficient for our case study. In our dataset, 21 trajectories (7%) are the outlying-trajectories according to the domain expert. The domain expert in our case should have the following qualifications: 1.
Doctor of Electrical Engineering or Control System Engineering; 2.
an expert on developing logistics analysis and optimization system of shipyard; and 3.
at least 2-years of experience in automation research in heavy industry.
The actual domain expert that work on this manuscript has all-of-the-above qualifications with the addition of an 18-years of experience in the automation research particularly in heavy industry. Figure 11 and, in the appendix section, Figure A1 show an example of an outlying-trajectory, and the overall trajectories and outlying trajectories from the data identified by the domain expert, respectively.

Sensitivity Analysis
The first experiments tested the sensitivity of the t parameter of the TFP algorithm and the k parameter of the KAA algorithm for simplifying raw trajectories, under two scenarios: 1.
absolute number scenario, with the t parameter of the TFP algorithm (TFP-ABS) and the k parameter of the KAA algorithm (KAA-ABS) is tuned within the 5-15 range notwithstanding the length of the trajectory; and 2.
relative number scenario, with the t parameter of the TFP algorithm (TFP-REL) and the k parameter of the KAA algorithm (KAA-REL) tuned within the 5-15% range relative to the length of the trajectory.
Then, a sensitivity analysis measured the compression ratio and the distance-reduced ratio between the raw and simplified trajectory. The Douglas-Peucker (DP) and the Direction-Preserving Trajectory-Simplification (DPTS) trajectory-simplification algorithm also were employed in our experiments. We varied the epsilon parameter of the DP trajectory-simplification algorithm within the 0-1.0 range, and the angular direction threshold within 5-15 degrees for the DPTS trajectory-simplification algorithm. Additionally, the StreamKAA algorithm is included by simulating each trajectory in a trajectory-stream scenario. Figure 12 shows the overall experiment scenario for varying each of the input parameters. Here we try to balance the three-score metrics of CR, TTDRR, and ATSSD. The value of CR and TTDRR should lie closest to 100%, while ATSSD value should lie closest to 0 m. In Figure 12, the value of ATSSDs are normalized into range between 0 and 100 percentage such that the values closest to 0% are considered best unless stated otherwise. The numerical version of Figure 12 is presented in Tables A1-A3 (Appendix A) for DP and DPTS algorithms, TFP algorithms, and KAA algorithms, respectively. We tried to determine, for our dataset, the k and t parameter that can balance the maximum compression ratio with the maximum total travel distance reduction ratio and minimum average time synchronized spatial distance.
In Figure 12, we can categorize into three types of trends for each algorithm. For DP, DPTS and TFP-ABS algorithms, by increasing the parameter value n, all of the metrics for the simplification algorithms are having an increasing trend. While the KAA-ABS and the StreamKAA algorithms have the value of CR and TTDRR that are relatively stagnant compare to the increasing value of ATSDD; and, the TFP-ABS and the TFP-REL algorithms shows radical fluctuations for the ATSSD and a decreasing trend for CR and TTDRR metrics. The first category shows that finding the optimal parameter can be exhaustive search, the optimal parameter may lie at the end of the positive integer. The second category, we can see the convergence of CR and TTDRR regardless of the increasing trends of ATSSD values. The last category, however, the fluctuations of ATSSD can cause an unreliable selection of the parameter unless we check every possible value of ATSSD. We consider the second category as the most stable algorithm among others.
We summarized the best parameters of each algorithm in Figure 12 in Table 1 to determine, for our dataset, the k and t parameter that can balance the maximum compression ratio with the maximum total travel distance reduction ratio and minimum average time synchronized spatial distance. Based on our experiments, the balanced conditions for the DP algorithm occurred at eps = 0.6 while the DPTS algorithm is balanced at ae = 9. For the TFP algorithm, they occurred at t = 14 and t = 15% for the absolute number and relative scenario, respectively. For the KAA algorithm, the balanced conditions occurred at k = 10 and k = 10% for the absolute number and relative scenario, respectively. In this case, the StreamKAA algorithm performs really close to KAA with absolute parameter k = 10 (KAA-ABS).  Figure 11, the table format of these charts are shown in Appendix A in Tables A1-A3.   Table 1. Sensitivity analysis result from the best parameter of each algorithm shown in Figure 12.
The best algorithm of each measurements is indicated in bold style.  If we take a look at the individual score, the best algorithm for the CR and TTDRR metrics, having a value closest to 100%, is KAA for absolute value k = 10 and the streaming KAA with value k = 10. Otherwise, TFP with an absolute value t = 14 has the best score for ATTSD, having the value closest to 0 meters. This significantly improved the previous algorithms by 186% ((83.084 − 29.047)/29.047 × 100) and 23% ((83.084 − 67.191)/67.191 × 100) for the CR metrics for the DP and DPTS algorithms, respectively. For TTDRR, the improvement is 156% ((43.304 − 16.850)/16.850 × 100) and 289% ((43.304 − 11.132)/11.132 × 100) for the DP and DPTS algorithms, respectively, and there was a 64% ((43.545 − 121.422)/121.422 × 100) and 84% ((43.545 − 281.381)/281.381 × 100) decrease in terms of ATSSD for the DP and DPTS algorithms, respectively. Therefore, the percentage decreases of our algorithms compared with the previous work are considered to be improvements. Moreover, the overall best score, however, shows that KAA for absolute value k = 10 is the best, whereby both CR and TTDRR are maximized and also its ATSSD is minimized among the other algorithms.

Measurements
As for the visual comparison, Figures 13 and A2 illustrate the corresponding simplified trajectory from trajectory-simplification algorithm at its best parameter values. The DP and DPTS trajectory-simplification algorithms had good compression scores and total travel distance reduction ratios. However, it has a relatively high score for the average time synchronized spatial distance error; and in fact, the visual comparison result clarified the fact that it is not robust against noise. Meanwhile, according to the visual comparison results, our algorithm is more robust against noise and at the same time gives a moderate compression ratio. The visual comparison results also shows that our algorithms outperformed previous algorithms, since it considered several points at a time to anticipate the occurrence of random jumps and small loops in a trajectory. Additionally, the previous works used error rates as input parameters, such that they sometimes failed to avoid noise in the form of point-locations far from their 'true' locations.  Figure 11 using various algorithms and parameters shown in Table 1.

Outlier-Detection
After applying the trajectory-simplification algorithm, we tested the following three outlier-detection algorithms against the configurations respectively specified:

1.
Statistics-based outlier-detection: Based on the recommendation of the original work called Tukey's Boxplot [28], we use the following parameters settings: Outlier factor set to o f = 1.5 and extreme value factor set to e f = 3. Using this parameters settings, if the value of the feature vector is either three times or more above the third quartile or three times or more below the first quartile it will qualify as an outlier, while the value of feature vector is either 1.5 or more above the third quartile or 1.5 or more below the first quartile is called suspected outlier; 2.
distance-based outlier-detection (DB(r, π)): Given the idea that an outlier should be far from most of the population, and the domain expert is able to identify 7% of trajectories are outlying-trajectories, we assumed that normal trajectories are trajectory that have a maximum normalized distance of 0.5 (r = 0.5) to 90% of the total number of trajectories (π = 0.9); and 3.
density-based outlier-detection (LOF(θ)): Following the distance-based outlier-detection, we assumed that a maximum local density of a trajectory cluster are 10% of the total number of trajectories, then we set θ = 0.1 × |TR|, where |TR| denotes the total number of trajectories in set TR (our dataset contained 284 trajectories). So, we could use the 28th (0.1 × 284) nearest neighbors from a trajectory to determine the local density of a trajectory cluster.
The statistics-based outlier-detection is used to detect an outlying point-location from single trajectory. The distance-based and density-based outlier-detection detects a trajectory that significantly different from others trajectory, if the length of feature-vector of a trajectory is different, a dynamic time warp approach [32] is used. We use the input parameter of o f = 1.5 and e f = 3 for statistic-based outlier detection, r = 0.5 and π = 0.9 for distance-based outlier detection, and θ = 28 for density based outlier detection. We use the domain expert's manual trajectory classification (whether a trajectory is outlier or not) as the target class, and, comparing our outlier detection result with the target class using precision and recall metrics. The precision is calculated by dividing the number of the true positive observation with the sum of the true and the false positive observations, while the recall is calculated by dividing the number of the true positive observation with the sum of the true positive and the false negative observations. Figure 14 plots the performance results from the various outlier-detection algorithms. As for the visual comparison results, Figures A3 illustrate the corresponding TFP and KAA-simplified trajectories, respectively, for the best parameter values. The outlier detection results after applying the trajectory-simplification algorithms are quite interesting. The statistics-based outlier detection precision improved when we increased the parameter of the trajectory-simplification algorithm; however, the recall value varies. The distance-based outlier detection precision and recall are somewhat similar for all trajectory-simplification algorithms regardless of parameter values. The density-based outlier detection precision and recall are quite similar for all trajectory-simplification algorithms.
We summarized our outlier-detection algorithm results of each outlier-detection algorithm shown in Figure 14 in Table 2 from the best parameter of each trajectory-smoothing algorithm shown in Figure 12, for our dataset, which outlier-detection algorithm is performed best corresponds to the result of trajectory-simplification algorithms. The statistics-based outlier-detection precision is almost similar throughout all trajectory-simplification algorithms with KAA algorithm with k = 10 followed TFP algorithm with t = 15% is performed best with 25.71% and 8.98% precision, respectively; however, the DPTS algorithm with ae = 9 and TFP with t = 14 is having better recall value. For the distance-based and the density-based outlier detections, the DPTS algorithm with ae = 9 is performed best followed by the KAA algorithm with k = 10 in both precision and recall. However, due to the small number of outlying trajectories in our dataset, i.e., around 7%, the value of precision and recall seem very low since it does not even exceed 30% mark except for recall value on distance-based outlier detection algorithm. The overall best score, however, shows that DPTS algorithm with ae = 9 is still performing at best followed by our KAA algorithm with k = 10.   Table 2. Precision and recall metrics for each outlier-detection algorithm shown in Figure 14 from the best parameter of each trajectory-smoothing algorithm shown in Figure 12. The best trajectory-smoothing algorithm of each outlier-detection algorithm measurements is indicated in bold style.

Conclusions
In this paper, we presented a system for trajectory simplification and detection of outlying trajectories. We proposed two trajectory-simplification algorithms: The t-fixed Partition (TFP) and k-ahead Artificial Arcs (KAA) algorithms. The streaming version of KAA, StreamKAA, was also introduced to handle streaming environments. A real-world case study from the ship-building industry in South Korea was used as the source of the main dataset. We introduced two scenarios for determining the parameters for our trajectory-simplification algorithm: The absolute number scenario, which uses a fixed number for all trajectories, and the relative number scenario, which uses a number relative to the length of a trajectory.
To evaluate our trajectory-simplification approach, we compare our approach with the current state-of-the-art, the Douglas-Peucker (DP) algorithm and Direction Preserving Trajectory Simplification (DPTS) algorithm. The experiments were conducted with several parameter values for each algorithm, and the results are presented in Tables A1-A3 (Appendix A). In Table 1, we present the most appropriate parameter value for each algorithm based on our experimental settings. According to our experimental results, in terms of compression ratio (CR) and total travel distance reduction ratio (TTDRR), all of the algorithms performed similarly depending on the parameter value, with our KAA-ABS and KAA-STR approaches performing the best at 83% and 43% for CR and TTDRR, respectively. In terms of average time synchronized spatial distance (ATSSD), our approaches outperformed the state-of-the-art, since they scored 117 m at max, while the DP and DPTS algorithms scored 121 m at the minimum ATSSD value. Taking the balance between maximum CR, maximum TTDRR and minimum ATSSD, in our dataset, KAA algorithm with value k = 10 followed by TFP algorithm with t = 14 is performed at best for trajectory-simplification.
After a trajectory is simplified, outlying trajectories are detected using three popular approaches in data mining: Statistical, distance-based and density-based. The precision and recall are then checked and compared with domain expert identification. We found that in terms of precision, the density-based outlier detection algorithm performed the best compared with the statistics-based or distance-based. If we look at recall value, however, the distance-based outlier detection algorithm outperforms other outlier detection algorithms. Moreover, we also found that the DPTS trajectory-simplification algorithm with ae = 9 and the KAA algorithm with value k = 10, performed best if they are followed by the density-based outlier detection algorithm.
As for the study's limitation, currently, our works do not incorporate routing data layer (e.g., map matching) due to a significant amount of noise on outlying trajectories that make it harder to match point-locations into a map routing data. For future studies, we would incorporate the routing data layer as a second pass for further noise filtering, thus improving the quality of a trajectories.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  Table A1. Sensitivity analysis result of DP and DPTS trajectory-simplification algorithm of outlying trajectory in Figure 11. The best result of each algorithms is indicated in bold style.  Table A2. Sensitivity analysis result of TFP trajectory-simplification algorithm of outlying trajectory in Figure 11. The best result of each algorithms is indicated in bold style.  Table A3. Sensitivity analysis result of KAA trajectory-simplification algorithmof outlying trajectory in Figure 11. The best result of each algorithms is indicated in bold style.