Mining Abnormal Pa tt erns in Moving Target Trajectories Based on Multi-A tt ribute Classi ﬁ cation

: As a type of time series data, trajectory data objectively record the location information and corresponding time information of an object’s activities. It not only describes the spatial activity trajectory of a moving object but also contains the unique a tt ributes, states, and behavioral characteristics of the moving object itself. It can also re ﬂ ect the interaction relationship between the object’s activities and various elements in the environment to a certain extent. Therefore, mining from moving target trajectory data to discover implicit, e ﬀ ective, and potentially useful spatiotemporal behavior pa tt erns of moving targets, such as anomaly detection, will have signi ﬁ cant research signi ﬁ - cance. This paper proposes a method for mining abnormal pa tt erns in the trajectory of moving targets based on multi-a tt ribute classi ﬁ cation. Firstly, to explore the activity location pa tt erns of single moving targets, a frequent sequence discovery method for moving targets based on sequence patterns is proposed. Furthermore, for moving target trajectory data sets containing multiple a tt ributes, numerical a tt ributes are extracted, and the data are clustered according to a tt ribute classi ﬁ cation to extract a set of normal behavior pa tt erns of moving targets. Then, combining the activity location pa tt erns and normal behavior pa tt erns of the moving target, the original trajectory data are compared with them to achieve the goal of detecting abnormal behavior of the moving target. Finally, an incremental anomaly detection scheme is proposed to address the characteristics of fast updates and large numbers of data in trajectory data sets. This involves synchronously updating the frequency of moving target activity pa tt erns and the range of values for normal behavior pa tt erns while updating the trajectory data set, in order to meet the needs of database updates and improve the accuracy and credibility of results.


Introduction
Following the rapid development of computer networks and wireless communication technologies, mobile communication and computing have found broader applications in various fields.As a result, the volume of mobile target trajectory data has experienced an approximate geometric exponential increase.To support various applications more effectively, extracting abnormal patterns (which significantly impact global decision-making) from moving target trajectory data has emerged as a key area of interest for scholars and experts.
The abnormal pattern mining of moving target trajectories, as an important branch of trajectory data mining, is used to detect data deviating from normal behavior patterns through algorithms.The purpose of mining behavior patterns such as anomaly detection for moving target trajectories is to discover valuable, potentially hidden, and unknown patterns from the original trajectory data set of moving targets, and to mine and detect the behavior patterns of moving target trajectories.The original trajectory data set of a moving target consists of several trajectories, each of which is composed of several trajectory points, that is, each trajectory can be represented as a set of trajectory points.For a certain trajectory of a moving target, its attributes include the unique identifier of the current trajectory, the name of the moving target, the appearance time of the trajectory, the disappearance time of the trajectory, the duration of the trajectory's appearance, and the sequence of regions the current trajectory passes through.For the set of trajectory points that make up the current trajectory, a certain trajectory point contains attributes including the unique identifier of the current trajectory point, the name of the moving target, the unique identifier of the trajectory it belongs to, the longitude and latitude of the trajectory point, the current time, the velocity, and the area where the trajectory point is located.After further calculation, the feature attribute set of a trajectory of a moving target can be obtained, including the area the moving target passes through, average velocity, trajectory appearance time, disappearance time, appearance duration, closest distance to other moving targets, and closest distance to the hotspot area.The seven attributes contained in the feature attribute set basically belong to the category of moving target features that need to be detected for anomalies in this article.
Usually, according to the different types of data processed, anomaly detection of moving target trajectory data can be divided into two methods: static data set oriented and data stream oriented [1].In terms of anomaly detection algorithms for static data sets, Lee et al. [2] proposed a trajectory outlier detection algorithm (TRAOD) based on a partition and detection framework in 2008.The algorithm is divided into two stages: segmentation and detection.In the segmentation stage, the minimum description length (MDL) is used to divide the trajectory into a set of multiple continuous trajectory segments.Then, in the detection stage, a method based on Hausdorff distance and density is used to identify outlier trajectory segments, and this is used as the data basis to determine whether the trajectory is abnormal.Although the TRAOD algorithm solves the problem of discovering abnormal trajectories, its disadvantage is that when the trajectory data set is too large, calculating the distance between trajectory segments is very time-consuming, and the algorithm's efficiency will be poor.In 2009, Liu Liangxu et al. [3] proposed an anomaly trajectory detection algorithm based on R-Tree, which uses continuous trajectory points as local features to represent the original trajectory.The distance function based on the matching degree of comparison units was defined, and the distance feature matrix between-and trajectories was used to search for all possible matching pairs of comparison units for local and global matching degree calculations.Thus eliminating a large number of unnecessary distance calculation tasks, while improving the efficiency of algorithm execution, achieved the goal of identifying abnormal trajectories.In 2010, Ge et al. [4] proposed the Top Evolving Trajectory Outlier Detection Method (TOPEVE), which analyzes the behavior of moving targets to discover top abnormal trajectories.Unlike previous distance-based trajectory calculations, this algorithm considers the outlier factors of abnormal trajectories in both spatial distance and motion direction, making the analysis dimension of trajectory anomaly detection more comprehensive.
In the research on anomaly detection for trajectory data streams, Bu et al. [5] used real-time processing to define three types of sliding windows for trajectory data streams, basic windows and left and right sliding windows, and set distance thresholds.By calculating the sum of the number of neighbors in the basic window in the left and right sliding windows, it was determined whether the current trajectory segment is an anomaly in the trajectory flow.In order to accelerate the efficiency of anomaly detection in trajectory flow, Cao et al. [6] proposed an anomaly moving object detection algorithm in massive trajectory flow in 2014.In a given set of moving objects and their trajectory flow, anomaly detection is divided into point neighbor-based anomaly detection and trajectory neighborbased anomaly detection according to the different granularity of anomalies in the trajectory flow.Then, based on the number of point neighbors and trajectory neighbors, respectively, it is determined if there are any anomalies in the current trajectory.In 2018, Katsilieris et al. [7] aimed to detect abnormal behavior of ground moving targets, using prior knowledge such as road network information to automatically detect abnormal behavior, and inferring target behavior based on the provided trajectory.In 2020, Zhao et al. [8] proposed a sparse subgraph-based anomaly trajectory detection method TADSS, which measures the time, velocity, and position feature values of trajectory data using three kernel functions.The weighted kernel functions are fused using a linear combination method, and the trajectory feature map is constructed using the fusion kernel functions mentioned above.Finally, the trajectory feature map is divided into multiple subgraphs using traditional graph clustering techniques.This method solves the problem of traditional anomaly trajectory detection algorithms mainly involving single feature measurement, while ignoring the influence of other features on anomaly trajectories.It can discover hidden anomaly trajectories through comprehensive measurement.In 2020, Liu et al. [9] proposed an online anomaly trajectory detection method based on deep generative sequence modeling.The Gaussian Mixture Variational Sequence Auto Encoder (GM-VSAE) captures complex sequence information in trajectories and discovers different types of normal paths, achieving online anomaly trajectory detection.In 2022, Ahmed et al. [10] proposed a graph-based method for detecting outliers in the trajectory.In 2023, Jiang et al. [11], in order to mine frequent behaviors of targets from complex historical trajectory data, proposed a behavior pattern mining algorithm based on spatiotemporal trajectory multidimensional information fusion.Lan et al. [12] proposed a two-stage framework for indoor human trajectory anomaly detection based on density noisy application spatial clustering (DBSCAN), which is used to detect human trajectory anomalies in indoor spaces.Zhou et al. [13] proposed a feature driven spatiotemporal companion pattern (STCP) mining method to detect the spatiotemporal travel patterns of ships from massive spatiotemporal trajectory data and understand the motion patterns of grouped ships.In 2024, Ouyang et al. [14] proposed a shape-matching-based algorithm for extracting similar line segments, focusing on shape matching of target trajectories.Wu et al. [15] proposed a spatial and feature mixed anomaly detection method for large trajectory data, which solves the challenge of computational power by designing data structures.
From the aforementioned research background, it is evident that existing anomaly pattern mining methods for moving target trajectories are based on different principles and requirements, each with its own advantages and disadvantages.The main common problems include the following: (1) detecting anomalies in trajectories containing multiple attributes as a whole, ignoring possible anomalies in the single attribute dimension of the trajectory; (2) lack of a quantitative description of the degree of trajectory anomalies, making it difficult to distinguish the severity level of abnormal trajectories; and (3) without considering the dynamic growth of trajectories, the anomaly detection model cannot be incrementally updated, and the evolution behavior of abnormal behavior cannot be detected, resulting in high spatiotemporal overhead.
Existing research on anomaly pattern mining fails to focus on potential anomalies in each attribute of multi-attribute trajectories and lacks discussion on trajectory attributes individually; trajectories exhibit dynamic growth with rapid expansion rates, where current research fails to detect the evolving behaviors of anomalies in such scenarios.Based on this, this article proposes a trajectory anomaly mining method based on multi-attribute classification, which divides the detected anomaly attributes into two categories: numerical and sequential anomaly attributes.Sequential patterns and cluster classification are used for anomaly detection, respectively.The main innovations include the following: (1) To explore the activity location patterns of single moving targets, a sequence-patternbased method for frequent sequence discovery of moving targets is proposed, utilizing the PrefixSpan algorithm.This involves using a frequent sequence mining algorithm on a set of activity areas of the target in units of days.Under the condition of meeting the support threshold, the activity areas of the target that are frequently active and in chronological order are identified, providing a data foundation for establishing monitoring and response mechanisms for each moving target.(2) For a moving target trajectory data set containing multiple attributes, numerical attributes are extracted, and the data are clustered according to attribute classification using the K-medoids algorithm.The Canopy clustering algorithm is then employed to predetermine the value of K for the K-medoids clustering algorithm.This process aims to extract a set of normal behavior patterns of moving targets.Then, combining the activity location patterns and normal behavior patterns of the moving target, the original trajectory data are compared with them to achieve the goal of detecting abnormal behavior of the moving target.(3) An incremental anomaly detection scheme is proposed to address the characteristics of fast updates and large numbers of data in trajectory data sets.This scheme synchronously updates the frequency of moving target activity location patterns and the range of normal behavior patterns while updating the trajectory data set, in order to meet the needs of database updates and improve the accuracy and credibility of results.

Method for Mining Abnormal Patterns of Sequential Attribute
The sequential attribute of a mobile target's trajectory is the regions it passes through.The anomaly detection for this attribute involves using sequence patterns to mine and extract a collection of frequent activity region sequences corresponding to the mobile target as the normal pattern for the regions it passes through.Based on this, by comparing the real-time trajectory's regions of the same mobile target with this normal pattern, it is possible to detect and judge the normality or anomaly of the current trajectory's regions.

Method-Related Definitions
Definition 1.A defined itemset I is a non-empty set composed of single items, and sequence  = <  ,  , ⋯ ,  > is composed of an ordered arrangement of itemsets.In sequence Q each element  , (1 ≤  ≤ ) represents an itemset; additionally, l represents the length of the sequence, which is the number of items included in the sequence.Definition 2. Let sequence  =<   ⋯  > ,  =<   ⋯  > .If there exists an integer 1 ≤  <  < ⋯ <  ≤ , such that  ⊆  ,  ⊆  , ⋯ ,  ⊆  , then sequence  is a subsequence of , in other words sequence  contains sequence , denoted as  ⊆ .Definition 3. Define () as the number of supports of sequence , which is also the number of sequence  in the database. is the support level, which is a pre-set threshold.If the support number () of sequence  is not less than the support level , then sequence  is a sequence pattern in the sequence database, and the sequence pattern with length is called  -pattern.If < ()()() > is a 1 sequence, it contains five itemsets, namely , , , ,  and contains nine items, namely , , , , , , , , , and its sequence length is 9. Sequence <  > support number is 1.Sequence < () > is the sub-sequence of sequence < ()()() > .Assuming support level is set to 22, sequence < ()()() > contains two sub-sequence < () > ; additionally, sequence < () > has a support number of 2, satisfying the support level, therefore < () > is a sequence pattern.Definition 4. For sequence  =<   ⋯  >,  =<   ⋯  > , where  ≤  , if  =  ( ≤  − 1),  ⊆  , and  is the continuous term in  , then sequence  is the prefix of sequence .
For sequence < ()()() > , the suffix of prefix <  > is < ()()() >.If the last single item of the prefix is part of the itemset, then use "_" to represent it, such as suffix of the prefix <  > being < (_)()() >.Definition 6. Define  as a sequence database, | is the projection database of sequence , it is the set of suffixes in the sequence of prefix  in ; |() is the support number of sequence  in the projection database | , where  is a sequence with prefix  ;  is a set of frequent sequences based on ; and  is the number of occurrences of the sequence.Definition 7. The activity trajectory data set of moving target  within the selected time period  = [ ,  ] is  = { ,  , ⋯ ,  }, and each trajectory  corresponds to a transition sequence of the activity area of moving target .
In the search for the longest sub-sequence  = { ,  , ⋯ ,  } in  where the frequency  ( ) is greater than the frequency threshold  , where  is the total number of active trajectories of target  in the selected time period,  is the total number of frequent sub-sequences found, and  ≤ ,  ⊆  ,  ∈ {1,2, ⋯ , },  ∈ {1,2, ⋯ , },  is the set of sequences of frequent activity areas of moving target  within the selected time period .

Specific Steps of the Method
The sequence of frequent activity areas for moving targets refers to the frequent sequence of mining activities for a single target.That is, the frequent sequence mining algorithm is used for the set of activity areas of the target on a daily basis.Under the condition of meeting the support threshold, the activity areas of the target that are frequently active and have a chronological order are identified [16].The PrefixSpan algorithm [17], which is based on prefix projection and commonly used in sequence pattern mining, is adopted here.The main steps are as follows: 1. Identify frequent items: scan the database and query for items that appear more than the set number of times (each item only counts once even if it appears multiple times in a sequence) to obtain a set of frequent items with a length of 1. 2. Generate projection database: generate a projection database for all projects in the frequent project set obtained in the previous step.3. Searching for frequent sequence subsets: utilizing the recursive mining projection database to obtain frequent sequence subsets.The mining steps are as follows: find the frequent sequences prefixed with the elements in the frequent itemset obtained in the first step, construct a projection database for them, and mine them.4. Repeat steps 1 to 3 until frequent items are not found.
Based on the above understanding of the PrefixSpan algorithm, using the PrefixSpan algorithm to mine the sequence set of frequent active regions of moving targets mainly involves the following steps: a. Scan historical trajectory data and filter out trajectory information that meets the criteria based on the user's selected moving target, time range, and assigned task.Among them, if the appearance time of a trajectory is within the time range selected by the user, then the time of this trajectory matches the time range conditions selected by the user.b.For each trajectory information, the region transfer information contained in the region attribute is already arranged in ascending chronological order.At this point, using the unique identifier  of the target's trajectory information as the unit, record the target that meets the conditions, and record all the area information passed by the user during the filtering time period as an activity sequence  for that target.c.Build a database  for the activity sequence of the same moving target, analyze the database  using the PrefixSpan algorithm, and obtain the frequent sequence set  of the target, where  represents the  -th moving target that meets the user filtering criteria.d.Calculate the set of frequent sequences  obtained from the -th moving target, remove the subset sequence, and obtain the longest set of frequent sequences L ;  is the set of frequent activity sequences for the target.
e. Jump back to step 3 and calculate the frequent sequence for the next target that meets the user's filtering criteria until all the frequent sequences of the moving targets that meet the user's selected criteria have been mined, and then end the algorithm.
Taking moving target  as an example, assuming that within the specified time range, the filtered activity sequence database  is shown in Table 1 (where I1 represents region 1).

Time Stamp
Activity-Sequence Set the support level  to 20%, corresponding to a support count of 2. The steps to obtain the sequence set of frequent activity areas of moving target  are as follows: 1.The number of visits to each region obtained by scanning  is shown in Table 2. Obtain prefixes with a length of 1: <I1>, <I2>, <I3>, <I4>, <I5>, and their support number is the number of times their respective regions have been accessed by the specified type of moving target, as shown in the table above.Keep the sequences that match the sup support number, and delete those that do not (and remove them from the active sequence).Therefore, frequent sequences with a length of 1 are: <I1:6>, <I2:7>, <I3:6>, <I4:2>, <I5:2>.
2. Mine frequent sequences starting from prefixes of length 1.The corresponding relationship between each prefix and its suffix is shown in Figure 1.For prefixes <I4> and <I5>, there is no suffix, so there are no frequent sequences with a length greater than 1.
For prefix <I3>, it only has suffix <I5>, with a support number of 1, which does not satisfy sup.Therefore, <I3 I5> is not a two-item frequent sequence.

Mine frequent sequences for prefixes with a length of 2. The corresponding relation-
ship between each prefix and its suffix is shown in Figure 2. The flowchart of the mining method for frequent activity area sequences of moving targets is shown in Figure 3. Based on the sequence set of frequently active regions of the mined moving target, the main steps of the region anomaly detection method for the moving target can be summarized as follows: 1. Pre-set the frequency threshold  ∆ for the trajectory, and then use the PrefixSpan algorithm to obtain the set of frequent activity area sequences  for the current moving target.2. For the newly added moving target trajectory data set  , extract the sequence attribute information  of the current trajectory passing through the region based on the real-time generated trajectory of the current moving target.If  traversal is completed, jump to step (4); otherwise, proceed to step (3). 3. Using the dynamic programming method, determine whether the current sequence  passing through the region is a substring of any sequence  in the set of frequently active region sequences of the moving target, .If so, stop judging  =  + 1 and return to step (2).If the set  traversal is completed and  still cannot match as a substring of any element in , then the current trajectory of the moving target passing through the area has anomalies and is stored in the area anomaly result table _,  =  + 1.Return to step (2). 4. The current trajectory of the newly added moving target trajectory data set  has undergone region anomaly detection.Based on the data in the _ table, it is visually displayed in the foreground.
Over time, more and more moving target trajectories will be added to the moving target trajectory data set.In this case, the set of frequent activity region sequences  that have been obtained for the current moving target may no longer meet the set frequency threshold, or new frequent activity sequences that are not included in  may appear and no longer have timeliness [18].Therefore, this algorithm also proposes an update method for the sequence collection of frequent activity areas to adapt to the needs of anomaly detection under incremental data.Assume that the existing moving target trajectory data set is , the new trajectory data set is  , the updated trajectory data set is , the existing frequently active area sequence set of the current moving target is , the new frequently active area sequence set is  , the updated frequently active area sequence set is , the frequency of occurrence of trajectory  in  is  ( ), the frequency of occurrence in  is  ( ), and the frequency of occurrence in  is  ( ).The situation and corresponding methods for updating the sequence set of frequent activity areas include the following four types: 1.For frequent trajectories  that belong to both the existing moving target trajectory data set  and the new moving target trajectory data set  , the updated frequency of occurrence is the following: at the same time, frequent trajectories are t added to the updated frequently active area sequence set .
2. For frequent trajectories  that belong to the existing moving target trajectory data set  but do not belong to the new moving target trajectory data set , the updated frequency of occurrence is the following: where in the formula,  () represents the frequency of occurrence of the trajectory  in the newly added trajectory data set  .If the above formula is calculated as   ∆ , the frequent trajectories  will be added to the updated frequent activity area sequence set .
3. For frequent trajectories  that belong to the new moving target trajectory data set  but do not belong to the existing moving target trajectory data set , the updated frequency of occurrence is the following: where in the formula,  () represents the frequency of occurrence of the trajectory  in the existing trajectory data set .If the above formula is calculated as   ∆ , the frequent trajectories  will be added to the updated frequent activity area sequence set .
4. For frequent trajectories  that belong to neither the existing moving target trajectory data set  nor the new moving target trajectory data set  , its updated frequency of occurrence  ( ) must be smaller than the pre-set frequency threshold of the trajectory  ∆ .That is, trajectories that are infrequent in the existing and new trajectory data sets, respectively, are also infrequent in the merged new data set, and therefore are not considered.

Method for Mining Abnormal Patterns of Numerical Attributes
The numerical attributes of the moving target trajectory include the average velocity, trajectory appearance time, disappearance time, appearance duration, closest distance to other moving targets, and closest distance to the hotspot area in the corresponding activity trajectory of the moving target.The abnormal pattern mining is achieved by using clustering algorithms to mine the set of normal activity patterns corresponding to the characteristic attributes of the moving target.The normal activity mode of each attribute should be a numerical value range.On this basis, by comparing the feature attribute values of the real-time trajectory of the same moving target with the normal mode, the abnormal modes of the six numerical attributes of the current trajectory are detected and judged.

Method-Related Definitions
Definition 8.For a trajectory  of moving target ,  = { ,  , ⋯ ,  }, where  represents the trajectory points on trajectory ,  ∈ {1,2, ⋯ , }, and  are the number of trajectory points contained in the trajectory.For trajectory , it includes attributes such as the unique identifier  of the current trajectory, the name of the moving target , the time of trajectory appearance , the time of trajectory disappearance , the duration of trajectory appearance , and the sequence of regions where the current trajectory passes through .For the trajectory point  , it contains attributes such as the unique identifier  of the current trajectory point, the name of the moving target , the unique identifier  of the trajectory it belongs to, the longitude and latitude  and  of the trajectory point, the current time  , the velocity  , and the region  where the trajectory point is located.Definition 9.For a trajectory  of moving target , its average velocity on the current trajectory is as follows: where the average velocity of all trajectory points is contained in the current trajectory .
Definition 10.For a trajectory  of moving target , its closest distance to any moving target other than itself _  _ = {  }, where  is the moving target trajectory that has a common occurrence time with trajectory .Within the common occurrence time interval, the distance between the sampling points is also recorded, i.e., In the formula,  represents the radius of the Earth, while  and  , respectively, represent the longitude and latitude of a sampling point on the trajectory , and  and  represent the longitude and latitude of any moving target at the same time, except for moving target .
Definition 11.For a trajectory  of moving target , its closest distance to the hotspot area _  _ = {  }, where  is the distance between it and the hotspot area at the sampling point time during the time period when trajectory  appears, i.e.,  = 2 × (  ( ) + (  ) × (  ) ×  ( )). ( In the formula,  represents the radius of the Earth;  and  , respectively, represent the longitude and latitude of a sampling point on the trajectory ; and  and  represent the longitude and latitude of the center of the hotspot area.

Specific Steps of the Method
Within a given time interval, this section first uses the Canopy clustering algorithm [19] to calculate the number of clustering categories (clusters) and then uses the -medoids algorithm [20] to classify the different numerical characteristic attributes of the moving targets, respectively.Clustering is performed to construct a collection of normal behavior patterns of the current moving target, and finally the corresponding attributes in the trajectory data generated in real time by the current moving target are matched with the normal behavior patterns, so as to achieve the purpose of classifying and identifying abnormal trajectories.Because the algorithm steps of numerical attribute anomaly detection are roughly the same, and only the attributes are different, this section will take the average speed attribute anomaly pattern mining of moving targets as an example for a detailed description.
The -medoids clustering algorithm used in this algorithm, like the classic -means algorithm [21], requires careful consideration of the choice of K, initialization strategies and distance metrics when applying k-means [22], where the K value needs to be specified manually.There are many traditional methods for determining the value of , such as conducting multiple trials, calculating errors, and ultimately obtaining the optimal value of .Obviously, this method requires manual intervention and is time-consuming, and there is a high time consumption in the clustering process.Therefore, this method uses the Canopy clustering algorithm to roughly determine the  value in advance, that is, using the number of Canopy sets as the  value of the -medoids clustering algorithm.This method can to some extent reduce the blindness of selecting .

Canopy Clustering Algorithm
The steps of the Canopy algorithm are as follows: 1. Assume that the sample set is , determine two thresholds  and  , and  <  ; 2. Pick any sample point  as the center point of a Canopy, mark it as  , and remove it from it  ; 3. Calculate the distance  from all points in  to  ; 4. If  <  , then classify the corresponding points into  as weak correlation; 5.If  ≤  , remove the corresponding point  as a strong correlation; 6. Repeat steps (2) to (5) until  is empty.
The principle of the Canopy algorithm is relatively simple.In short, it involves continuously traversing the data set.Sample points with a distance of  <  <  can be used as new center points for the Canopy set, while points with a distance of  ≤  are considered too close to the Canopy and will not be used as center points.It is worth noting that in the results of the Canopy algorithm, a point may belong to multiple Canopy sets.The process of Canopy clustering algorithm is shown in Figure 4 and Algorithm 1.The rendering of the Canopy algorithm is shown in Figure 5, where points with the same grayscale value represent belonging to the same cluster.Cluster center  is randomly selected and then used to create a Canopy set, which includes all data points in its outer circle (solid circle), while the data in the inner circle (dashed circle) are no longer a candidate point for the center point.After rough clustering using the Canopy algorithm,  preliminary clusters can be obtained.Here, the average distance between all data points is the radius  value of the Canopy set, and half of the average distance between all data points is the radius  value of the non-Canopy candidate center point set, that is,  = 2 ×  .
Taking the mining of abnormal patterns in the average velocity attribute of moving targets as an example, if the existing data set composed of the average velocity of  trajectories is _ = {_ | = 1,2, ⋯ , } , then the distance generated between  data points has

×( )
. Therefore, the expressions for the performance of the following indicators are obtained, namely For numerical data, the distance between two data points is the absolute value of the difference between their corresponding numerical values.

K-Medoids Clustering Algorithm
The existing clustering algorithms can be divided into five categories: partitionbased, hierarchical, density-based, grid-based, and model-based methods [23].The -medoids algorithm, also known as the -center point algorithm, is a clustering algorithm based on partition methods and can be seen as an improvement of the classical -means algorithm.
Considering that there are a large number of outliers in the process of anomaly pattern mining, which are far away from most of the data, this method needs to mine the anomalies.The -means algorithm is sensitive to outliers, and when abnormal data are assigned to a cluster, they may seriously distort the mean of the cluster, which can affect the allocation of other objects to the cluster.If considering a set of seven points in a onedimensional space {1,2,3,8,9,10,25} , and classifying it intuitively, the most reasonable method is to divide it into two clusters, {1,2,3} and {8,9,10}, and data point 25 should be excluded as an outlier.However, in the -means partitioning based on the squared error function, the partitioning results are {1,2,3,8} and {8,9,10}.Therefore, due to the outlier point 25, the -means method assigns 8 to clusters different from 9 and 10, with a cluster {9,10,25} center of 14.67, which is significantly different from all elements in the cluster.
Based on this, the -medoids algorithm does not use the mean of objects in the cluster as a reference point but selects actual data points to represent the cluster.By calculating the similarity between each other data point and the representative data points in the cluster, it allocates them to the cluster corresponding to the most similar representative data point.It can be seen that the partitioning method of the -medoids algorithm is actually based on minimizing the sum of the differences between all data points and the data points representing the cluster to partition the data points.At the mathematical level, this method uses an absolute error criterion [24], which is defined as follows for the data set  = { | = 1,2, ⋯ , }: Among them,  is the sum of absolute errors between all data point objects  in the data set and the representative object  in the canopy set  .This is the basis of the -medoids algorithm, which divides  objects into  clusters by minimizing this value.
Partitioning around medoids, also known as PAM algorithm [25], is a classic representative of the-medoids algorithm.It mainly uses iterative and greedy methods to complete the problem of data point clustering.
The main process of the PAM algorithm is as follows: 1. Randomly select  data points as representative data points; 2. Assign each data point in the data set to the nearest representative data point; 3. Randomly select a non-representative data point and replace it with a representative data point; 4. Reassign each data point in the data set and calculate the absolute error  after reallocation; 5. Repeat steps (2) to (4) until there is no further improvement in the absolute error.
By analyzing the time complexity of the algorithm, it can be concluded that the complexity of the PAM algorithm after iteration reaches (( − ) ), where  is the number of iterations,  is the number of data points in the data set, and  is the number of clusters.It is obvious that when the values of  and  are large, this computational cost becomes quite high.Compared with the complexity () of traditional -means algorithms, the efficiency of the PAM algorithm is far inferior to the -means algorithm when applied to large data sets.
Furthermore, in order to make the algorithm suitable for handling large data sets, the second step of clustering in this section intends to use another representative of the medoids algorithm based on random search, namely the CLARANS (Clustering Large Application based upon Randomized Search) algorithm [26], to strike a balance between clustering efficiency and accuracy.
Large-scale application clustering, also known as the CLARA (Clustering LARge Application) algorithm [27], is an improvement of the PAM algorithm based on big data processing.Its difference from the PAM algorithm is that the CLARA algorithm does not consider the entire data set but randomly selects a sample set from the data set, which is similar to the PAM algorithm in selecting representative data points.This can reduce the time complexity of the algorithm to (( +  − )) , where s represents the size of the sample.At the same time, the problem with the CLARA algorithm is that, unlike the PAM algorithm, it searches for  representative data points globally.If a data point is one of the  best representative data points and is not selected when sampling to generate a sample set, the CLARA algorithm will never be able to obtain the global optimal solution.Therefore, considering both the PAM algorithm and the CLARA algorithm, the improved CLARANS algorithm still relies on random sampling from the global data set, rather than being limited to a fixed sample.In addition, the CLARANS algorithm further improves the efficiency of the PAM algorithm by limiting the number of iterations.The main process of the CLARANS algorithm is as follows: 1. Randomly select  data points as representative data points; 2. Randomly select a representative data point  and a non-representative data point ; 3.If  replaces  as a representative data point and the absolute error  is better, then replace it; 4. Repeat steps (2) to (3)  times to obtain the locally optimal representative data points; 5. Repeat steps (1) to ( 4)  times and return the final clustering result.
The CLARANS algorithm is described in Algorithm 2. Through the analysis of the time complexity of the CLARANS algorithm, it can be concluded that the complexity of the CLARANS algorithm is ( ).In the context of processing large data sets,  is much greater than  .Therefore, the complexity of the CLARANS algorithm is better than that of the PAM algorithm.At the same time, the CLARANS algorithm is based on global random search, effectively avoiding the situation where the CLARANS algorithm may be limited to local optimal solutions and lose the global optimal solution.According to reference [26], in this algorithm,  = ( − ) × 1.25% is taken, and the specific value rules are described in the reference.In addition, because the coarse clustering of the Canopy algorithm has already obtained approximately  clusters and their corresponding center points, the first step in this algorithm can completely use the  center points obtained by the previous Canopy algorithm as the initial representative data points.Therefore, consider taking  = 1 here.
To summarize, taking the mining of abnormal patterns in the average velocity attribute of moving targets as an example, if the data set composed of the average velocity of  existing trajectories is _ = {_ | = 1,2, ⋯ , }, the main steps of the method are as follows: 1.For sets _, set a threshold 2. Define a set of data points , so that  = _; 3. Take a data point  from any set  as the center point of a Canopy set  , denote it as  , and remove the  from ; 4. Calculate the distance  from all points in  to  ; 5. If  <  , then assign the corresponding point to ; 6.If  ≤  , then remove the corresponding point from , and add it to the Canopy set  corresponding to  ; 7. Repeat steps (3) to (6) until the set  is empty, thus obtaining  Canopy sets; 8. Define the data point set , so that  = _; 9. Select the center points of the Canopy sets obtained in step (7)  The specific flowchart of the method is shown in Figure 6.For the numerical anomalies in the trajectory of a moving target, including the appearance time, disappearance time, appearance duration, closest distance to other moving targets, and closest distance to the hotspot area, the algorithm steps for the remaining five undetected attributes are the same as the anomaly detection algorithm for the average velocity attribute of the moving target.Only the input data set and the database table that outputs the anomaly results need to be replaced under the corresponding attributes.Therefore, there will be no further repetition here.
Similarly, as time passes and more and more moving target trajectories are added, the normal pattern set of numerical attributes of moving targets should also be updated accordingly.In this case, in order to meet the timeliness of the method, unlike the mining method for abnormal patterns of sequential attributes of moving target trajectories, this method considers using the sliding window method to update the normal pattern set of numerical attributes of moving targets, that is, setting a new cycle threshold in advance.At the beginning of a new cycle, the normal pattern set corresponding to numerical attributes is recalculated for all data within the window size range and the set of normal patterns stored in the database in the previous cycle are globally replaced to meet the needs of abnormal pattern mining in incremental data.

Theoretical Analysis and Comparison
The existing anomaly pattern mining methods for moving target trajectories are based on different principles and requirements, each with its own advantages and disadvantages.The main common problems include detecting anomalies in trajectories containing multiple attributes as a whole and ignoring the possible anomalies in the single attribute dimension of the trajectory.Without considering the dynamic growth of trajectories, the anomaly detection model cannot be incrementally updated, and the evolution behavior of abnormal behavior cannot be detected, resulting in high spatiotemporal overhead.
This article divides the anomaly attributes to be detected in trajectory data into numerical and sequential anomaly attributes.A frequent sequence discovery method for moving targets based on sequence patterns is proposed for sequential attributes.The Pre-fixSpan algorithm is used to recursively divide the sequence into shadow sequences and perform pattern mining in these sub-sequences, greatly reducing the search space.Due to the fact that trajectory data often contains intermittent and discontinuous events, Pre-fixSpan can flexibly handle these complex patterns through its prefix projection mechanism and can mine frequent patterns at different support thresholds to meet different application needs.For numerical attributes, clustering algorithms are used for anomaly detection of moving targets.During clustering, the Canopy clustering algorithm is used to roughly determine the K value of the K-medoids algorithm in advance.That is, the number of Canopy sets is used as the K value of the K-medoids clustering algorithm.While reducing computational costs, it can also reduce the possibility of the K-medoids algorithm falling into local optima, which can bring significant advantages in improving computational efficiency, enhancing the stability of clustering results, and simplifying parameter selection.Finally, an incremental sliding window method is adopted to address the dynamic updates and large data volume of trajectory data sets.This method has better timeliness, and the specific steps are as follows: 1. Pre-set update cycle threshold: set an update cycle threshold, define how long or how many batches of data need to be updated, and recalculate the normal mode set.2. Start a new cycle: whenever a new update cycle begins, process all data within the sliding window range.The sliding window includes data from the current cycle and the previous few cycles, ensuring that the number of data are large enough to capture changes in normal behavior patterns.3. Recalculate the normal mode set: collect all relevant numerical attribute data within the window range, and recalculate the normal mode set of these attributes.This includes calculating the frequency and value range of each attribute to reflect the latest behavior patterns.4. Global replacement of data from the previous cycle: replace the recalculated set of normal patterns with the old set of normal patterns stored in the database from the previous cycle.This ensures that the normal pattern set in the database always reflects the latest behavioral patterns and adapts to incremental changes in the data.5. Adapt to mining abnormal patterns in incremental data: continue mining abnormal patterns on the basis of a new set of normal patterns.When new trajectory data are continuously added, the updating mechanism of the sliding window method is used to maintain the adaptability and timeliness of the detection method to the latest data.

Experimental Testing Simulation Implementation and Analysis
By mining co-occurrence patterns on the trajectory data of moving targets, the cooccurrence parameter values of object associations between any two moving targets can be obtained on a daily basis.Furthermore, based on this data, the strength of association between any two moving targets during any period of time can be determined, thereby establishing monitoring and response mechanisms for each moving target.
The trajectory data set described in this chapter contains 12 attributes, including trajectory unique identifier, moving target name, task, appearance time, appearance longitude, appearance latitude, disappearance time, disappearance longitude, disappearance latitude, passing area, passing area time, and appearance duration.The first 11 attributes are required for the calculation in this chapter, and the specific data format is shown in Table 3.Here, set the frequency threshold  △ for object association co-occurrence to 1, the duration threshold  △ to 0.2 h, the interval distance threshold  △ to 200 km, and the frequency threshold for object location association to 20% of the total number of original trajectories.The data set used in the experiment was provided by the project user, which includes a total of 365 days of moving target trajectory data from 1 April 2016 to 31 March 2017, with a total of 364,286,923 raw data, totaling approximately 1TB.Through data analysis, the results of object association co-occurrence include a unique identifier for object association co-occurrence, date, the names, tasks, and regions of the two moving targets, as well as 14 attributes including the start time, end time, duration, and interval distance of the two moving targets appearing together.The specific data format is shown in Table 4, and the calculation results are shown in Figure 7.  Taking moving target A and moving target W as examples, the visualization of the original trajectories of moving target A and moving target W is shown in Figure 8.It can be observed that on the same timeline, the activities of moving target A and moving target W tend to appear in synergy, with a higher frequency of co-occurrences, and are relatively close in distance, sometimes even exhibiting overlapping trajectories, indicating a certain level of association.In the experiment, through the co-occurrence pattern mining method proposed in this chapter, it can be concluded that there is object association co-occurrence between moving target A and moving target W. Therefore, the experimental results of the method are consistent with the actual situation.From the experimental results, it can be seen that the stronger the correlation between any two moving targets, the more similar the activity data of these two moving targets and the more regular their collaborative behavior.Based on the strength of the correlation, dividing the correlation between two moving targets into multiple levels can provide decision-making guidance for further formulating response plans.
In terms of method efficiency, the method proposed in this chapter adopts the method of exchanging space for time efficiency.First, the calculation results are stored in the database every day.When the user executes the query operation, the corresponding data are called out from the database and simple low dimensional operations are performed.At this time, the complexity is only O(n), where n represents the number of data items in the database.This measure can greatly reduce the time required for users to perform operations and optimize query efficiency.

Experimental Results and Analysis
The data set used in the experiment is more than 360 million moving target trajectory data provided by the project user over 365 days.Through the analysis of these data, the result data of the frequent activity area sequence of the moving target includes six attributes: unique identifier associated with its own location, name of the moving target, task, frequent passing area, date, and number of passes.The specific data format is shown in Table 5 and Figure 9.  Taking the moving target as an example, through analysis of the original trajectory of the moving target , the results are as shown in Table 6.From the data in Table 6, it can be observed that on this timeline, moving target  always tends to appear in regions , , and , and often enters region  from region  while performing regular tasks.If the threshold is set to 20%, the frequent activity sequence of the moving target  is region  , region  , region −>  , region −> −> −> .Moreover, in the experiment, through the mining method of frequent activity region sequence set for moving target  proposed in this chapter, it can be obtained that the frequent activity sequence of the moving target is −> −> −> , with a frequency of two occurrences.The experimental results of the method are consistent with the actual situation.
As can be seen from the above, the results obtained from the mining method of frequent activity area sequence sets for moving targets indicate the probability of activity areas and the order of activity for a single moving target.This can assist users in in-depth research on the behavioral characteristics of moving targets, and thus assist in determining whether their sequential behavior is abnormal, providing data support for designated response strategies or early warning.The results of the regional anomaly pattern mining method combined with visual display of the moving target are shown in Figure 10.In Figure 10, the movement trajectories of two targets are displayed, which can assist in determining whether behavioral anomalies occur by combining their activity trajectories with original behavioral characteristics.
In terms of method efficiency, the PrefixSpan algorithm adopted here, compared to traditional sequence mining methods such as Apriori algorithm [28], GSP algorithm [29], and SPADE algorithm [30], can significantly reduce memory consumption [31] because it does not need to save candidate sets of frequent sequences but only needs to save projection data.Under the same minimum support threshold conditions, the method proposed in this chapter has lower time consumption and runs better.
For the mining method of numerical attribute anomaly patterns in the trajectory of moving targets, taking the average velocity attribute anomaly of moving targets as an example, the detection results are combined with visualization as shown in Figure 11.In Figure 11, a time period is specified by setting a start and end time.Under the Abnormal Type' category, Trajectory Speed Anomaly' is selected, displaying all data related to Trajectory Speed Anomaly' within this time period, including Speed', Nearest Target Type', and so on.In terms of methodology, the mining method for numerical attribute anomaly patterns of moving target trajectories adopts Canopy rough clustering combined with -medoids secondary clustering, effectively avoiding the shortcomings of traditional means algorithms that require human interference, reducing the possible impact of subjectivity on method accuracy, and reducing the deviation of outliers on the cluster centers obtained from clustering.In terms of efficiency, although the time complexity of the -medoids clustering algorithm is higher than that of the -means algorithm, the CLARANS used in this method has greatly improved the efficiency of the traditional -medoids algorithm and can effectively avoid clustering results falling into local optima.In practical application, due to the use of a normal behavior pattern set pre-stored in the database, the response speed of the module's query and calculation is about 10 s, fully meeting the realtime computing and display needs of the system.
In order to discover valuable, potentially hidden, and unknown patterns from the original trajectory data set of moving objects, and to mine, detect, and predict the behavior patterns of moving objects to meet the application requirements of real-time systems, this paper adopts the PrefixSpan algorithm based on prefix projection in sequence pattern algorithms to mine frequent sequential patterns, identifying the frequently active regions of the targets with temporal ordering and thereby providing a data foundation for anomaly detection of sequential attributes of moving objects.The PrefixSpan algorithm has high timeliness and accuracy.For the numerical data in trajectory data sets of moving objects containing multiple attributes, data is classified by attributes and clustered using the Kmedoids algorithm, with the K value predetermined by the Canopy clustering algorithm, to extract the set of normal behavior patterns of moving objects.This method can reduce the arbitrariness in choosing k to some extent.Trajectory data sets have the characteristics of rapid updates and large data volumes.With the passage of time and the increase in trajectory of moving objects, the normal mode set of numerical attributes of these moving objects should also be correspondingly updated.Conventional algorithms struggle to uncover real-time changing data patterns.Through incremental anomaly detection methods, while updating the trajectory data set, the activity location frequency of moving objects and the range of normal behavior patterns are synchronized.In this scenario, the timeliness of the method and the accuracy of the results can be ensured, as long as the incremental updating speed is fast enough.This leads to a higher level of result accuracy and reliability, even without the need for quantitative metrics evaluation in this context.

Discussion
In the research process of mining behavior patterns of moving target trajectories, there are still some shortcomings in this article: (1) in the process of data preprocessing, interpolation completion is adopted for trajectory completion with residual defects, that is, trajectory points at breakpoints are directly connected to complete the trajectory.However, considering that moving targets may have methods such as stopping and circling, in subsequent research, polynomial interpolation can be considered to fit the trajectory for completion.(2) When searching for frequent itemsets in mining trajectory data, the Pre-fixSpan algorithm can be improved.For example, in each projection process, only the first k most frequent items are considered to reduce unnecessary computation and memory overhead.Additionally, constraints such as time and space constraints can be combined to mine frequent itemsets, thereby improving the applicability and efficiency of the algorithm.(3) When designing a numerical attribute anomaly detection method for moving target trajectories in this article, the CLARANS algorithm was adopted to strike a balance between clustering efficiency and accuracy.In fact, the time complexity of the CLARANS algorithm is relatively high, and although it is barely suitable for the real-time detection needs of this system, it can still be considered to complete this part of the calculation in a distributed system in the future, or to design more efficient clustering algorithms while ensuring accuracy.(4) At present, there has been no comparison with other methods.It is necessary to compare it with existing research employing similar but different methods to validate the advantages of our approach.(5) It is necessary to establish relevant quantitative evaluation metrics to objectively assess how the proposed method enhances the accuracy and credibility of detection results.

Conclusions
When detecting anomalies in trajectory behaviors containing multiple attributes, by categorizing the detected anomaly attributes of trajectories into numerical and sequential anomaly attributes, and employing corresponding methods for anomaly detection, it can effectively avoid the neglect of existing methods towards anomalies existing in the single attribute dimension of trajectories.In order to accurately identify the abnormal behavior patterns of moving targets deviating from normal, this paper proposes a method for mining abnormal trajectory patterns of moving targets based on multi-attribute classification.Corresponding anomaly detection methods are provided based on sequential and numerical attributes, respectively.A frequent sequence discovery method for moving targets based on sequence patterns is proposed for sequential attributes, which involves using a frequent sequence mining algorithm on a set of active regions of the target on a daily basis.Under the condition of meeting the support threshold, the active regions of the target that are frequently active and have a temporal sequence are identified.For numerical attributes, clustering algorithms are used for anomaly detection of moving targets, mining the set of normal activity patterns corresponding to the feature attributes of moving targets.The normal activity pattern of each attribute should be a numerical value interval.On this basis, by comparing the feature attribute values of the real-time trajectory of the same moving target with the normal mode, six numerical attribute positive anomalies of the current trajectory are detected and judged.Based on the user's annual moving target trajectory data, algorithm experiments were conducted and analyzed.The results showed that the two proposed methods can effectively identify anomalies in sequential and numerical attributes, respectively.In terms of query efficiency, the complexity is minimal, where N represents the number of data entries in the database.This measure can significantly reduce the time consumed by user operations.Moreover, the response time for queries and calculations is approximately 10 s, fully meeting the requirements for realtime computation and display of the system.

Figure 3 .
Figure 3. Flowchart of mining method for frequent activity region sequence of moving targets.

From Definition 8 toDefinition 12 .
Definition 11 above, and Definition 7 in Section 2.1.1,the feature attribute set {, _, , , , _  _, _  _} of the trajectory  of moving target  can be obtained.Given the data set  = { | = 1,2, ⋯ , }, for ∀ ∈ , if  = { |∀|| −  || <  ,  ∈ ,  ≠ } is satisfied, the set  composed of  that satisfies the condition is called a Canopy set.The set  composed of  contains all Canopy sets,  is the center point of the current Canopy set  ,  is the center point set, and  is the half path of the Canopy set.Definition 13.Given a data set  = { | = 1,2, ⋯ , }, for ∀ ∈  that satisfies { |∀|| −  || ≤  ,  <  ,  ∈ ,  ≠ } ,  is referred to as a non-Canopy candidate center point, and  is the radius of the non-Canopy candidate center point set.

Figure 6 .
Figure 6.Flow chart for mining numerical anomaly patterns in moving target trajectory.

Figure 7 .
Figure 7. Sample data set data for co-occurrence pattern mining results.

Figure 8 .
Figure 8. Visualization of co-occurrence pattern mining results.

Figure 9 .
Figure 9. Example of frequently active area sequence result data set.

Figure 10 .
Figure 10.Results of mining method for abnormal patterns of moving target passing through regions.

Figure 11 .
Figure 11.The results of the mining method for abnormal patterns of average speed of moving targets.

Table 1 .
Activity sequence database S of moving target A within the specified time range.

Table 2 .
Access times of moving target A in different regions within the designated time range.

Algorithm 1 :
Canopy Clustering Algorithm Select element  from  to initialize canopy  and center  4 Add canopy  , to the list of canopies  14 Add canopy center  , to the list of centers of canopies  15

return 23 end
Select the  as the final representative element list which has the minimum Absolute-error criterion 20 Calculate the distance between each element in  −  and each representative element in  21 Divide all elements to  clusters according the distance 22 (12)epresentative data points, construct a representative data point set , and remove these  representative data points from the set ; 10.Set the number of neighboring nodes  = ( − ) × 1.25% and initialize  = 1; 11.Randomly select a data point  from the representative data point set , replace  with a data point  from the set , and calculate the absolute error standard  when  is a representative data point and the absolute error standard  when  is a representative data point, respectively; 12.If  >  , reset  = 1 for data point  =  in the representative data point set , otherwise  =  + 1;13.Repeat steps(11)to(12)until  >  , where the representative data point set  is the minimum cost representative data point set;14.Using the data points contained in the  set as the cluster center, divide each data point in the _ set into the cluster represented by the nearest cluster center, and obtain  clusters after clustering.Construct the corresponding normal behavior pattern __, which should be  value intervals; 15.Traverse the newly obtained set of data points and match them with __.If the matching cannot be successful, it is determined that the current data point value is abnormal and stored in the corresponding database table.

Table 3 .
Trajectory data set data format.

Table 4 .
Data format of co-occurrence pattern mining results data set.

Table 5 .
Frequently active area sequence result data set data format.

Table 6 .
Simplified trajectory information of moving target A.