Contrastive Learning for Graph-Based Vessel Trajectory Similarity Computation

: With the increasing popularity of automatic identiﬁcation system AIS devices, mining latent vessel motion patterns from AIS data has become a hot topic in water transportation research. Trajectory similarity computation is a fundamental issue to many maritime applications such as trajectory clustering, prediction, and anomaly detection. However, current non-learning-based methods face performance and efﬁciency issues, while learning-based methods are limited by the lack of labeled sample and explicit spatial modeling, making it difﬁcult to achieve optimal performance. To address the above issues, we propose CLAIS, a contrastive learning framework for graph-based vessel trajectory similarity computation. A combined parameterized trajectory augmentation scheme is proposed to generate similar trajectory sample pairs and a constructed spatial graph of the study region is pretrained to help model the input trajectory graph. A graph neural network encoder is used to extract spatial dependency from the trajectory graph to learn better trajectory representations. Finally, a contrastive loss function is used to train the model in an unsupervised manner. We also propose an improved experiment and three related metrics and conduct extensive experiments to evaluate the performance of the proposed framework. The results validate the efﬁcacy of the proposed framework in trajectory similarity calculation.


Introduction
AIS devices play a crucial role as onboard navigation aids for communication and data exchange between vessels and shores.These devices utilize very-high-frequency (VHF) radio transceivers to broadcast vessel information to nearby vessels or shore-based stations while also receiving AIS data transmitted by other vessels.AIS data provide essential information such as vessel identification, characteristics, real-time position, velocity, heading, and other relevant details, which contribute to maritime traffic management and collision avoidance.In recent years, extensive research has been conducted to extract the spatiotemporal distribution patterns of vessels from historical AIS data, enhancing our understanding of waterway traffic patterns [1].Among these studies, the analysis of AIS trajectories, derived from real-time position information in AIS messages, has garnered significant attention across various research domains.The computation of AIS trajectory similarity is a crucial task for several maritime applications, including trajectory clustering, prediction, and anomaly detection.This necessitates the development of effective methods and measures for assessing the similarity between AIS trajectories, facilitating improved insights and decision-making in maritime operations [2][3][4].
The current research focus on trajectory similarity computation revolves around utilizing learning-based methods to indirectly calculate trajectory similarity through trajectory representation learning.Feature-based approaches are employed to learn the feature representation of trajectory data, enabling the mapping of trajectories with varying positions, shapes, and lengths to a shared low-dimensional feature space.Subsequently, traditional distance metrics such as Euclidean distance or cosine similarity are utilized to compute the similarity between trajectories.However, the application of learning-based trajectory similarity computation for vessel AIS data is still relatively limited, and there is a dearth of modeling approaches that explicitly capture trajectory spatial dependencies through graph learning and other techniques.
Vessel trajectories depict the continuous movement of vessels in three-dimensional physical space over time.Ideally, a complete vessel trajectory in the real world is a mathematically continuous and smooth curve, referred to as a latent path in this paper.However, in practical situations, shipborne devices are unable to capture a continuous curve that precisely represents the underlying latent path associated with a given trajectory.Instead, they can only gather discrete positions of the vessel at irregular time intervals.As a result, the challenge in trajectory representation learning lies in acquiring a representation vector that effectively captures the genuine underlying latent path, considering the sparsely and irregularly sampled trajectory data points.
In the research on learning-based trajectory similarity, there are mainly two methods: supervised learning and unsupervised learning.Supervised learning aims to utilize neural network methods to fit existing similarity measures in order to improve computational efficiency; on the other hand, unsupervised learning methods do not have existing measures as supervised signals, so each new unsupervised method develops new similarity measures.Unsupervised learning is an important approach in the field of machine learning as it can extract useful information and knowledge from unlabeled data.The automatic learning of general vessel trajectory representations from massive unlabeled AIS trajectory data is of great significance for ship trajectory similarity calculation.First, ship trajectory data often exhibit characteristics such as long time spans, high dimensionality, complexity, and noise [5][6][7][8].Therefore, effective representation methods are needed in order to reduce the dimensionality and complexity of the data, enabling better exploration of the intrinsic structure and features of vessel trajectory data.Second, in order to enhance the model's ability to measure trajectory distances, new trajectory measurement methods need to be proposed.The development of new trajectory representation learning models can only be achieved through unsupervised learning.Finally, vessel trajectory data have a wide range of applications in practical scenarios, such as route planning, trajectory prediction, trajectory clustering, and anomaly detection.Effective general ship trajectory representations can provide strong support and assurance for these application scenarios.Therefore, learning general trajectory representations from massive unlabeled trajectory data holds significant research value and practical importance.
To address the aforementioned issues, this paper proposes a graph neural networkbased framework called CLAIS for the unsupervised learning of optimal ship trajectory representations.By calculating the distances between the learned trajectory representation vectors, the framework determines ship trajectory similarity, which facilitates subsequent research tasks such as ship clustering and anomaly trajectory detection.The contributions of this paper are summarized as follows: 1.
A graph-based trajectory contrastive learning framework, CLAIS, is proposed.It constructs similar trajectory samples to learn robust trajectory representation vectors and computes trajectory similarity based on the Euclidean distance between representation vectors, leading to favorable similarity results.

2.
A parameterized trajectory augmentation method is introduced to enhance the robustness of the model's trajectory representation learning.

3.
Improved evaluation experiments and three evaluation metrics are proposed to verify the performance of the proposed framework in learning trajectory representations and computing ship trajectory similarities.
The remaining structure of this paper is as follows: Section 2 introduces the relevant research; Section 3 provides a detailed description of the proposed model framework; Section 4 presents the proposed improvement experiments and details of the model evaluation experiments; and Section 5 concludes the work of this paper.

Related Work
To perform trajectory similarity computation using trajectory representations, it is crucial to obtain effective trajectory representations.Currently, many popular methods utilize deep neural networks to capture the feature representations of trajectories and map them into a feature space.Yang et al. [9] proposed a deep learning-based trajectory similarity computation model called T3S, which fits different trajectory similarity measures.By employing long short term memory (LSTM) and self-attention-based networks, T3S can retain the spatial and structural information of trajectories for similarity computation.It can automatically adjust the weights of spatial and structural information based on different similarity measures.Yang et al. [10] introduced TMN, which matches points from one trajectory to points from another trajectory using an attention mechanism that enables cross-trajectory point matching.The trajectory's spatial information is then combined, and a recurrent neural network (RNN) is used to learn the trajectory representation for fitting different similarity measures.Zhang et al. [11] proposed Traj2SimVec for scalable and robust trajectory similarity computation.Traj2SimVec acquires triplet training samples through fast trajectory compression and indexing.It further utilizes sub-trajectory similarity information as auxiliary supervision.Additionally, the framework supports point matching queries by modeling the optimal matching relationships of trajectory points under different distance metrics.Yao et al. [12] introduced TrajGAT, which constructs trajectories as quadtree structures and employs attention heads from the Transformer [13] instead of graph attention networks (GAT) to learn trajectory representations for different similarity measures.However, the aforementioned methods are supervised models based on existing similarity computation methods.While they improve computational efficiency, they cannot achieve better performance when facing the similarity computation performance bottlenecks in existing models.
In unsupervised representation learning, Yao et al. [14] use sliding windows to extract a set of spatiotemporal invariant features that capture the motion characteristics of trajectories.They convert each trajectory into a feature sequence to describe the object's movement using a feature extraction module.They further employ a seq2seq autoencoder to learn the trajectory representation.Li et al. [15] apply computer vision techniques to measure similarity in vessel trajectories.They propose a similarity measurement method based on a convolutional autoencoder, where vessel trajectories are transformed into trajectory image data, treating each grid as a pixel.They then introduce a grid-based convolutional autoencoder to extract feature vectors from trajectory data to learn the representation of the original vessel trajectories.Fu et al. [16] propose a representation learning framework called Trembr, which models trajectories and road segments separately.They designed an encoderdecoder model called Traj2Vec based on a recursive neural network.By leveraging the underlying road network and matching segments obtained using road network matching techniques, they constrain the learning process.They also introduce Road2Vec, a neural network-based approach to learn segment embeddings in the road network, capturing various relationships between road segments.However, the aforementioned models do not explicitly model the spatial structural features of trajectories using a graph, which may compromise the model's performance.
In the field of unsupervised learning, self-supervised learning is currently an important trend and considered the future of unsupervised learning [17,18].Currently, self-supervised learning models can be mainly classified into two types: generative and contrastive learning models.Traditional self-supervised learning methods based on generative models, such as autoencoders, attempt to generate or model specific parts of input samples by using a limited number of discrete trajectory positions in the input space, thereby inferring an approximate trajectory movement curve of the input sample [19,20].However, this approach aims to reconstruct the entire input sample by comparing the input original sample with the reconstructed sample to compute the error and train the model.For sequential data such as trajectories or time series, an autoregressive paradigm is often employed, where the prediction result of the current position point f (x t−1 ) is obtained by using the fitted result of the previous position point f (x t ) as the input, i.e., f (x t ) = f ( f (x t−1 )).Generating reconstructed data samples is computationally expensive, and this fine-grained approach is often unnecessary for learning models that can distinguish between different samples.In fact, it can even lead to performance degradation due to excessive focus on sample details.
Fortunately, contrastive learning provides a new approach to address this problem.The core idea of contrastive learning is to compare different trajectory samples and establish relationships among similar samples (i.e., positive samples) and differences among different samples (i.e., negative samples).Through this comparison, the model can learn the intrinsic structure and features of trajectory data, thus generating effective trajectory representations that serve as a basis for subsequent trajectory analysis and processing.As an unsupervised learning method, contrastive learning also does not require labeled data, which reduces the barriers for practical applications and avoids the cost and complexity associated with annotation.In contrast to generative methods that attempt to fully reconstruct the input original trajectories, discriminative methods learn representations through objective functions similar to supervised learning, but the input and supervisory signals during network training come from unlabeled datasets.Discriminative contrastive learning methods based on latent space have shown great potential and achieved state-of-the-art results [21,22].Contrastive learning directly finds discriminative features that best differentiate different trajectory samples in the feature space by comparing similar and dissimilar samples within the sample set.Compared to generative methods, contrastive learning is more direct, simple, and effective for discriminative tasks.
Currently, existing literature has attempted to apply contrastive learning to the processing and analysis of trajectory data [23,24].However, there is currently no existing method for calculating similarity of ship AIS trajectories based on contrastive learning.Therefore, this paper combines contrastive learning and graph learning to facilitate ship trajectory representation learning, thereby calculating the distance between ship trajectory representations to obtain vessel trajectory similarity.The CLAIS framework proposed in this article benefits from the following three aspects with respect to performance: first, by incorporating water area pretraining into trajectory graph construction, CLAIS learns the spatial structural representation of water area in advance before modeling trajectories; second, through the proposed parameterized augmentation scheme, CLAIS enhances its robustness against erroneous AIS signals by learning potential signal errors through contrastive learning with real trajectories; and third, by introducing graph neural networks to learn the representation of trajectory graphs, CLAIS strengthens its ability to learn the spatial dependency relationships of trajectories in water areas.Compared to previous research, CLAIS achieves better model performance through improvements in these three aspects.

Methodology
As shown in Figure 1, the CLAIS framework consists of three modules: the regional graph pretraining module in the red box; the vessel trajectory contrastive learning module in the yellow box; and the trajectory graph representation learning module in the green box.After training, the specific similarity between trajectories can be obtained by calculating the Euclidean distance of trajectory representation vectors output by the trajectory graph representation model.
The regional graph pretraining module, located in the red box, first gridizes the study area and establishes a spatial structural relationship graph based on the selected grid cells.Then, the spatial dependency of the regional structural graph is pretrained, resulting in pretraining representation vectors for each effective spatial grid.
The vessel trajectory contrastive learning module, highlighted in the yellow box, is the main part of training in the CLAIS framework.It learns trajectory representation vectors that minimize the loss through unsupervised contrastive learning.Initially, the input ship trajectories undergo various types and methods of trajectory augmentation using the proposed parameterized combination enhancement scheme, creating diverse similar samples.The trajectory samples are then processed by the trajectory graph representation learning module to extract trajectory features and encode them into trajectory representation vectors.During the training phase, the trajectory representation vectors are further passed through a non-linear mapper.Finally, the contrastive loss component calculates the error to optimize the model.The trajectory graph representation learning module, surrounded by the vessel trajectory contrastive learning module, is an independent module that includes a trajectory graph construction component and a proposed graph neural network encoder.The input trajectories are first used by the trajectory graph construction component, combined with the grid representation vectors obtained from the regional graph pretraining module, to construct a trajectory map.Then, the graph neural network encoder performs feature extraction on the trajectory graph, generating representation vectors for the trajectories.
Next, we will separately introduce the structural and methodological details of each module.

Regional Graph Pretraining Module
The regional pretraining module consists of two components: the region graph construction component and the node2vec pretraining component.
Figure 2 demonstrates the steps of the regional graph pretraining module.The region graph construction component divides the study area into adjacent and non-overlapping grids of equal size.It maps all the position points in the historical AIS database that fall within a grid to that grid.After the grid mapping, grid filtering is performed.Specifically, the ship trajectory database T is divided into square grids of equal size based on the given area range, with the grid size determined by the experimental parameter "grid_size".The grids containing the number of position points from all the points in T are then counted.Subsequently, a natural number parameter δ is manually selected, which represents the threshold for the number of historical position points contained in a grid.By removing grids in the area that do not contain position points or have a small number of position points, grids that are not suitable for navigation from a data-driven perspective (such as regions with obstacles such as islands or artificial structures) and grids that may contain noise points are filtered out.The selected valid grids make the model training more robust and construct a waterway chart that is more in line with reality, reducing unnecessary computational complexity.Figure 3 shows the heatmap of historical AIS signal position points received by onshore base stations in the gridified Shanghai Port waterway, with a fixed grid size of 0.01 • .It can be observed that as the threshold δ gradually increases, the grids containing noise signals (such as grids clearly corresponding to land areas) become fewer.However, a negative consequence is that some normal and sparsely populated water area grids are also filtered out.Additionally, to handle position points in trajectories that fall in invalid grids, an abstract grid is introduced to represent all the invalid grids when dividing and constructing the water area graph.All trajectory position points falling in invalid grids are correspondingly assigned to that grid.This grid does not contain the real-world features present in normal valid grids, such as the central position of the grid.Instead, it uses a zero vector representation to replace the normal grids for learning embedding representation vectors.After completing the spatial grid structure partitioning, CLAIS utilizes node2vec [25] to perform pretraining on the spatial graph structure.This process generates embedded representation vectors for all valid grids, which are then used to construct trajectory graph features in subsequent steps.

Vessel Trajectory Contrastive Learning Module
After completing the region pretraining, the vessel trajectory contrastive learning module takes the training set of trajectory data as input to train the model.Inspired by previous research on contrastive learning [26], this module applies the self-supervised learning paradigm to learn representations of ship trajectories by leveraging the dissimilarity between data samples as the model loss, aiming to learn the most discriminative trajectory representations.
Specifically, the module learns the representation model through contrastive learning by automatically constructing similar instances (positive samples) and dissimilar instances (negative samples).This is referred to as the trajectory graph representation learning module of CLAIS.It ensures that positive samples projected in the embedding space by the representation model are close in distance, while negative samples are far apart.CLAIS follows the principles of SimCLR and employs a parameterized trajectory augmentation scheme to construct positive and negative sample pairs.The trajectory graph representation learning module encodes trajectories into feature vectors, enabling different positive samples generated from the same trajectory to have closer distances in the feature space.The positive and negative sample pairs are constructed as follows: for a batch of trajectory samples, two different augmentation techniques are randomly applied to a particular trajectory sample, resulting in two trajectory samples that are positive samples to each other, while the remaining random trajectories in the batch serve as negative samples.
The vessel trajectory contrastive learning module consists of two components: the vessel trajectory augmentation component and the contrastive learning training component.Figure 4 illustrates the schematic diagram of this module, with the vessel trajectory augmentation component highlighted in red and the contrastive learning training component highlighted in green.The following sections will provide detailed explanations of each component.(1) Vessel trajectory augmentation component As mentioned earlier, generating reliable augmented samples (referred to as views) from input trajectory samples to form similar positive samples is an important step.When it comes to enhancing ship trajectory samples, it is crucial to design reasonable, effective, and interpretable augmentation schemes to facilitate effective trajectory contrastive learning.
In the field of contrastive learning, initially proposed in computer vision, image data can be augmented through techniques such as affine transformations, cropping, color variations, and noise.However, most of these augmentation techniques may not be applicable to trajectory data.Trajectory data exhibit clear sequential and geometric properties, and the aforementioned augmentation techniques either do not apply to trajectories or may disrupt the intrinsic characteristics of the data, making the learned trajectory representations unable to accurately reflect the true attributes of the trajectories.
Due to the constrained nature of ship movement within a limited sea surface, it can be simplified as two-dimensional plane motion.This study focuses solely on the geometric and sequential features of trajectories, thus excluding other information such as timestamps, heading, and speed.Referring to previous research on trajectory augmentation schemes [27,28], this study innovatively proposes a combined vessel trajectory augmentation scheme, which includes two novel augmentation methods tailored to vessel signal trajectories.The parameterized vessel trajectory augmentation scheme proposed in this study consists of two augmentation scales: random position augmentation and random segment augmentation.It also incorporates three augmentation methods: random noise, segment distortion, and downsampling, resulting in a total of five individual augmentation methods.Furthermore, to strengthen the impact of augmentation techniques on the model, this study combines these individual methods in a meaningful way, resulting in six combined augmentation methods.In total, there are eleven individual and combined augmentation methods.The following section will introduce these augmentation methods in detail.
1) Random position noise Adding noise is the most common and effective means of augmenting data, which introduces distortions at the scale of individual position points.In the experiments conducted in this study, random positional noise is applied by randomly selecting positions to introduce noise based on the given parameter r p (position ratio), which represents the proportion of position points in the entire trajectory that will be affected by noise.Since CLAIS adopts a grid-based approach and AIS trajectories can experience significant trajectory drift and positional errors, to enhance the model's robustness against noise, the magnitude of the noise (d xi p , d yi p ) is calculated by multiplying a manually set constant "base_distortion" and the experimentally controlled d p (position distortion) as coefficients.These coefficients are then multiplied by samples drawn from a standard Gaussian distribution N (0, 1).The position of a specific point (x i , y i ) on a trajectory after introducing noise is given by the following equation: Here, d ci p = d p •bd•e ci , where bd represents the base distortion set to 0.01 • in this study (both longitude and latitude), and e ci ∼ N(0, 1), ci = xi or yi.It can be observed that not only the magnitude of noise between longitude and latitude within the same coordinate differs, but the noise magnitudes between random positional noise are also independent of each other.This design ensures the authenticity and diversity of the learned random positional noise.
In Figure 5, (a) represents an original AIS trajectory segment without obvious positional errors or anomalies before augmentation; (b) shows the trajectory after applying random positional noise.Green dots represent the original normal position points, red dots represent the erroneous position points after introducing noise, and yellow dots represent the corresponding original normal position points before the occurrence of noise.It can be observed that the points affected by noise and the degree of noise in the augmented trajectory are random, ensuring that the generated noise follows a Gaussian distribution in each training iteration.In the figure, d p is set to 2, indicating that the longitude and latitude noise follows a standard Gaussian distribution with a standard deviation of 0.02 • .pr is set to 0.2, indicating that approximately one-fifth of the trajectory position points are affected by noise.represent normal position points, while yellow dots represent positions where signal loss has occurred.It can be observed that the positions where signal loss occurs are random.In this example, the value of l p is 0.2, indicating that one-fifth of the position points experience random position loss.
3) Regular downsampling Downsampling is a common data augmentation technique where trajectory downsampling simulates the reduction of the trajectory data sampling rate by retaining a fixed number of position points at regular intervals and discarding the remaining ones.By applying regular downsampling to trajectories, the model can learn results that are more aligned with scenarios where the data is generated with a lower sampling rate along latent paths.In the augmentation scheme of CLAIS, the downsampling rate is controlled by an integer i p (position interval), where i p represents selecting every i p position point from the original trajectory to construct a new augmented trajectory.As shown in Figure 5d, green dots represent the retained position points after downsampling, while yellow dots represent the discarded position points.In this example, the sampling interval i p is set to 2, indicating a downsampling rate of 50%.4) Random segment distortion Random segment distortion is an enhancement technique in CLAIS specifically designed for the characteristics of AIS trajectories at the segment scale.It introduces noise of the same direction and magnitude to trajectory segments, visually presenting the effect of distorting a single segment within the trajectory.Due to equipment and system errors, AIS may experience consecutive segments with the same noise.This continuous occurrence of fixed noise can easily mislead the recognition model into believing that the vessel trajectory indeed traverses the positions indicated by the received signals, resulting in significant performance loss.Similar to random positional noise, the random segment distortion enhancement method is controlled by two parameters: r s (segment ratio) and d s (segment distortion).The parameter r s determines the proportion of positions in consecutive segments where the noise occurs.The key distinction is that the added noise is the same for all positions within a segment, rather than multiple independent noises.For a trajectory T = [. . . ,(x m , y m ), . . . ,(x n , y n ), . . .] let T S = [(x m , y m ), . . . ,(x n , y n )], T S ⊂ T be a segment within T. After distortion, the distorted segment is denoted as T S , where for ∀(x i , y i ) ∈ T S , Here, d c p = d p •bd•e c , where bd represents the baseline error and e c ∼ N (0, 1), c = x or y.It is important to note that in addition to the parameters mentioned above, CLAIS also includes a parameter n s to control the number of augmented segments.In Figure 5e, with d s set to 2 and a distortion occurrence rate of r s = 0.2, when n s = 2, there are two distorted segments.The meaning represented by the green, red, and yellow colors is the same as that of random positional noise.Green indicates normal positions, while yellow represents the corresponding normal positions before the occurrence of the red noise.
5) Random Segment Loss Random segment loss is another enhancement method in the CLAIS augmentation scheme specifically designed for AIS trajectories at the segment scale.Random segment loss is also a common anomaly in AIS data, such as consecutive signal loss due to station malfunctions or the removal of trajectory segments with abnormal vessel speeds by preprocessing programs.Incorporating this augmentation technique can greatly improve the model's ability to handle consecutive segment losses and reduce the occurrence of the model recognizing disconnected segments as multiple independent trajectories.
Similar to random positional loss, the parameter l s (segment loss) is used to quantify the proportion of randomly lost position points within the entire trajectory, with the lost positions being completely random.Figure 5f illustrates the occurrence of consecutive segment loss when l s = 2 and n s = 2.The two segments of yellow position points demonstrate the consecutive loss of two trajectory segments.

6) Combined augmentations
In addition to the five individual enhancement methods mentioned above, CLAIS innovatively incorporates combined enhancement schemes.These schemes combine the five enhancement methods based on their realism, feasibility, and the purpose and significance of the enhancement operations, resulting in six combined enhancement methods.Figure 6 illustrates the visual effects of these six combined enhancement methods, which are pairwise combinations of two noise (or distortion) operations and three position loss operations: 1 random positional noise + random position loss (Figure 6a); 2 random positional noise + random segment loss (Figure 6b); 3 random positional noise + regular downsampling (Figure 6c); 4 random segment distortion + random position loss (Figure 6d); 5 random segment distortion + random segment loss (Figure 6e); 6 random segment distortion + regular downsampling (Figure 6f).The parameters used in the enhancement methods in Figure 6 are consistent with their respective methods in Figure 5. (2) Contrastive learning training component CLAIS utilizes contrastive loss functions proposed in previous studies [21,29,30].For a randomly selected mini-batch of N samples, each sample undergoes augmentation operations to generate two positive samples that are similar trajectories.A contrastive prediction task is defined, and the loss function for positive sample pairs is as follows: Here, 1 [k =i] represents the indicator function when k = i.This is the same as the approach described in the literature [31].The remaining 2(N − 1) augmented trajectories in the mini-batch are implicitly considered as negative samples.

Vessel Trajectory Representation Learning Module
(1) Trajectory graph construction component The construction of the trajectory map is an essential part that explicitly transforms the input trajectories into a graph data structure with spatial dependencies.For ship trajectory coordinates, each real trajectory point is mapped to a divided grid, generating a spatial grid sequence of the same length as the trajectory's position points.Next, the grid sequence is traversed to remove consecutive repeated grids.Once all the grids representing the trajectory are obtained, the edges of the trajectory map are generated by connecting these grids.
The edges in the trajectory map connect two types of neighboring nodes: sequence neighbors and spatial neighbors.For sequence neighbors, N seq neighboring grids in the sequence that are adjacent to the current grid node's front or back are added to the sequence neighbor set based on the obtained grid sequence.As for spatial neighbors, all spatial neighbors of the current grid in the region map are set as spatial neighbors.The union of the aforementioned sequence neighbor set and spatial neighbor set yields the complete neighbor set of the grid nodes.Bidirectional edges are established between each node and its neighbors, along with self-loops, creating the adjacency relationships between all the edges of the nodes.
Once the trajectory map structure is obtained, the grid embedding representation vectors generated by the region map pretraining are retrieved based on the grid indices.These vectors serve as the node features for the corresponding grids.After obtaining the pretraining embeddings for all the nodes, the trajectory map is constructed, and the next step of trajectory representation learning begins.
(2) Trajectory graph representation learning component After the previous component enhances trajectory modeling into a trajectory graph structure, it is inputted to the trajectory graph representation learning component in this section.
The structure of the trajectory graph representation learning component in CLAIS consists of two layers of graph convolutional networks (GCN), one layer of GAT, and a two-layer bidirectional gate recurrent unit (GRU) as the readout function.
GCN is responsible for learning the relationships between grid nodes and the spatial local context information of the grids.The first graph convolution layer takes the input trajectory graph and performs information propagation and aggregation through its own node embedding and the embedding of neighboring nodes, resulting in a layer of node representations.The second graph convolution layer takes the node representations from the first layer as input and further propagates and aggregates information to obtain higher-level node representations.Multiple layers of GCN can learn more complex graph structure features.Each GCN layer is followed by a batch normalization layer [32,33] and a ReLU function that generates non-linearities [34,35].Graph convolution can be formalized as follows: where Ã = A + I N , A is the neighbor matrix of constructed trajectory graph, I N is the identity matrix representing self-loop, and D is the degree matrix of Ã. H(l) is the input feature matrix in the l-th layer, and H(0) = X, i.e., the original input is the pretrained embedding of grids.W(l) is the parameter matrix in the l-th layer, and σ represents the activation function, which is ReLU in our case.Graph convolution can be seen as an effective extension of the Laplacian operator applied to the graph domain, analogous to its application on images.Further detail is in [36].
After the two GCN layers, a GAT layer is used to measure the importance weights between nodes.GAT [37] is a graph neural network that uses attention mechanisms.It can adaptively assign different weights to each neighboring node, dynamically selecting and focusing on important neighbor nodes to better capture the relationships between nodes and enhance the representation capability of graph data.GAT first performs a linear transformation on the features of each node to generate node representation vectors.Then, it learns the correlations between nodes by calculating attention weights between them.In this case, GAT also uses multiple attention heads, each generating a set of different attention weights to capture information between nodes from different perspectives.In this way, the model can selectively aggregate node features from different attention heads to better capture the relationships between nodes.The graph attention mechanism in GAT is formalized as follows: where a ij is attention weight between node i and j, h i and h j are representations of node i and j, which is the output of GCN layers, W is a shared parameter matrix that transforms the input features, [•||•] concatenates transformed features, and → a is a linear weight, which is realized here with a single-layer feed-forward neural network that maps the concatenated feature into a real number.Thus, the relevance of node i and j can be learned by W and → a .LeakyReLU is a nonlinearity activation function derived empirically which, in fact, is ReLU with leak.It is formalized as follows: Then, so f tmax normalizes features into the output attention score a ij .After obtaining a ij , the final representation from multi-head attention is calculated as follows: where represents concatenation.GRU is a type of recurrent neural network used for processing sequential data [38].Two-layer bidirectional GRU refers to using two layers of GRU and concatenating the results of forward and backward propagation.This allows for the capturing of the contextual information of nodes at different time steps and acquisition of richer node representations.The final GRU layer serves as the readout function, aggregating and extracting graph-level information from the node-level representations to generate the final representation of the trajectory graph.

Experiment
In this section, experiments were conducted to validate the effectiveness of the proposed CLAIS framework.The experimental design included various comparative experiments from different perspectives, aiming to verify the performance of CLAIS and observe the effects of different parameter settings on CLAIS and the comparison models.

Data & Preprocess
The experiments in this paper utilized AIS data from Shanghai Port between August and October 2022, resulting in a total of 46,079 trajectories after preprocessing.For the experiments, a subset of 31,000 trajectories was selected from the dataset.The statistical information of the dataset and the parameters of the preprocessing methods are shown in Tables 1 and 2, respectively.

Experiment Metrics
To evaluate the model's ability to measure trajectory similarity, appropriate metrics are needed in order to quantify the results generated by the model.Since CLAIS is based on unsupervised learning, there are no real labels or matching samples available to guide the model's error calculation.Inspired by previous research [27,39,40], this paper introduces a new metric called trajectory augmentation invariance based on self-similarity experiments.This metric quantifies the model's ability to recognize similar trajectories.
First, the concept of self-similarity experiments to evaluate unsupervised learning are introduced, followed by the introduction of the proposed evaluation metric, i.e., trajectory augmentation invariance.
As mentioned earlier, a well-performing vessel trajectory similarity calculation model should ideally be able to accurately distinguish different trajectory sequences sampled from the same underlying path.For this purpose, the idea behind self-similarity experiments suggests simulating two different trajectories sampled from the same underlying path by splitting a sampled trajectory into two sub-trajectories.Specifically, for the self-similarity task, given a test trajectory dataset referred to as the trajectory database set D, and an empty set Q, the following steps are performed: We randomly select q trajectories T q from D and split the trajectories in T q into two sub-trajectories by alternating the order of their internal position points (e.g., extracting odd-indexed points and retaining even-indexed points, or vice versa).These two sub-trajectories are considered to be twin sub-trajectories of each other.Next, the q odd-indexed sub-trajectories T q•odd are added to the empty set Q, while the other half of q even-indexed sub-trajectories T q•even are returned to the database set D. This process creates a database set that includes the other half of the trajectories corresponding to the set Q, which is referred to as the query set.Now the model can compute the similarity between each trajectory T i in Q and the trajectories in D. To avoid interference from the distribution of trajectory lengths, the trajectories in D that do not belong to T q undergo an operation of selecting every other half of the trajectory points.Since the q trajectories T q•even in Q correspond to trajectories generated from the same underlying path in D, a prior assumption is made that these trajectories should be the closest to each other, indicating the highest similarity.Therefore, in the self-similarity experiment, the similarity rankings should place these trajectories at the top.Based on this, the model's ability to capture trajectory similarity can be evaluated using the following three metrics: 1. Precision P represents the proportion of queries where the corresponding twin sub-trajectory is ranked first (ordinal number 0 in computer indexing).For ∀T i ∈Q, if the rank calculated by the model is denoted as r i , then the precision P can be calculated as follows: where 1 (r i =0) = 1 if r i = 0 otherwise 0, and |Q| represents element number of Q, i.e., |Q| = q.
2. The mean rank R µ is the mean rank of T i representing the twin sub-trajectory in the ranking, denoted as follows: 3. The rank standard deviation represents the sample standard deviation of the ranks r i .It is denoted by R σ , and the formula is as follows: R σ is a newly proposed evaluation metric in this paper, aiming to observe the stability of the model's ability in similarity computation.If the model demonstrates good robustness in perceiving trajectory similarity in self-similarity experiments, the results of the query trajectory ranking should be relatively stable, indicating a smaller rank standard deviation R σ .
However, the aforementioned self-similarity experiments have some limitations.Under the above-mentioned self-similarity experiment approach, whether or not augmentation operations are performed, the method only evaluates the ability to detect the twin subtrajectories of the original trajectory before augmentation or the twin sub-trajectories of the variant trajectory after augmentation.In this case, the evaluation of the model's ability in self-similarity experiments may lead to misunderstandings, as the model might mistakenly identify variant trajectories as another trajectory.The model's ability to recognize trajectory similarity is merely based on its capability to detect twin sub-trajectories between the original and variant trajectories, i.e., without evaluating whether the model retrieves the original trajectory through the variant trajectory.
Based on the above, this paper argues that a trajectory similarity computation model should possess the ability to perceive the corresponding original trajectory even in the presence of noise and positional loss in a given trajectory, thereby identifying the ownership of the original trajectory.This requirement implies that a model with good similarity computation performance should not only be able to recognize twin sub-trajectories of the original trajectory but also the twin sub-trajectories of corresponding variant trajectories of the original, alongside its the capability to identify the original trajectory through the variant trajectories.Therefore, the self-similarity experiment metric needs to evaluate the model's ability to find both the corresponding variant twin sub-trajectory and the original twin sub-trajectory before augmentation.Thus, this paper introduces the concept of trajectory augmentation invariance: a trajectory similarity computation model should be capable of simultaneously finding the original twin sub-trajectory and the variant twin sub-trajectory before and after augmentation, respectively, for a trajectory in a given trajectory set.
Therefore, the self-similarity experiments in this paper are improved as follows.Given a trajectory database collection D, an empty augmented trajectory database collection D , and a query trajectory collection Q, the following steps are performed: 1.
Randomly select q trajectories, T q , from D. Divide T q into two sub-trajectories, T q•odd and T q•even , based on the order of their internal positional points.2.
Apply selected augmentation operations to the trajectories in T q to create variant trajectories, T q .Split T q into T q•odd and T q•even .

3.
Add T q•odd to the empty set Q. Place T q•even back into the database collection D and add T q•even to D .4.
Subsequently, randomly downsample half of the trajectories in D that do not belong to T q and place them back into D. Downsample the augmented trajectories and add them to D .
By following this procedure, the rankings of the corresponding counterpart trajectories in D and D can be obtained through the query collection Q.
The aforementioned three evaluation metrics can be improved as follows: 1.
Augmentation invariance precision (P) represents the proportion of queries where the corresponding trajectory is ranked highest in both D and D .Let us denote the rankings of ∀T i ∈Q in D and D as r i and r i , respectively.Then, the enhanced invariant precision can be calculated as follows: Augmentation invariance mean rank (R µ ) represents the average rank of the corresponding twin sub-trajectory T i in both D and D .It can be calculated as the average of the ranks in D and D for each T i ∈Q.Mathematically, it can be expressed as follows: Augmentation invariance rank standard deviation (rank std), denoted as R σ , represents the sample standard deviation of the average ranks . It measures the variability in the average ranks of the corresponding twin sub-trajectories in D and D .The formula for calculating R σ is as follows: Based on the aforementioned three metrics, we conducts enhanced invariant selfsimilarity experiments to evaluate the proposed CLAIS framework and compare its performance and robustness with the control models.

Comparative Baselines and Parameter Setting
To ensure the comprehensiveness and representativeness of the control models, the experiments compare five trajectory similarity calculation methods, including CLAIS.Three of these methods are classical trajectory similarity metrics or computation methods: Fréchet distance, Hausdorff distance, and dynamic time warping (DTW) distance.Additionally, a representative sequence neural network model, LSTM [41], is included.To ensure fair comparisons in the experiments, the LSTM-based method, which is also based on learning, is combined with CLAIS's spatial pretraining method, augmentation scheme, and contrastive learning method.This means that the sequential LSTM is used to replace the encoder in CLAIS, participating in the model comparison experiments.In the following experiments, CLAIS refers to the complete CLAIS framework, with the graph neural network encoder proposed in this paper.Table 3 provides important parameter settings for the training of both learning models.

Model Comparison Experiment
To validate the effectiveness of the CLAIS framework, model comparison experiments were conducted.Using the WSK dataset, a database collection of 2 k to 10 k trajectories with a maximum length of 4000 was selected.The query trajectory collection consisted of 100 trajectories.The performance of different models was tested under different database sizes, using the metrics P, R µ , and R σ .The results of the experiments are shown in Table 4, with the best-performing metric indicated in bold.From Table 4, it can be observed that among the three traditional methods, DTW, as the most effective and commonly used distance measure, achieves the best results in terms of precision, mean rank, and rank standard deviation, outperforming the other two methods.However, both learning methods based on the CLAIS framework significantly outperform the traditional methods represented by DTW.In terms of precision, both the LSTM encoder and CLAIS achieve their best results at different database sizes, demonstrating the good performance of the CLAIS framework with both types of encoders.In terms of mean rank, CLAIS consistently outperforms the LSTM encoder, maintaining an error of less than two ranks under the enhanced invariant experimental conditions even at the 10 k data level.Examining the standard deviation further shows that the differences in ranks are minimal, indicating that the CLAIS model remains stable when dealing with a large amount of error in vessel trajectories at the 10 k data level.

Robustness Experiment
The robustness of the models in the face of parameter variations plays a crucial role in the comprehensive evaluation of model performance.In this experiment, the robustness of the models in measuring trajectory similarity when facing trajectory errors was observed by fixing other parameters and varying a single parameter in the setting where the query set size was 100 and the database size was 1 k.The fixed parameter values were the same as the enhancement parameters in Table 3. Due to space limitations, the variations of five parameters are presented.
Figure 7 illustrates the variations in different trajectory similarity calculation methods with respect to the pos_ratio parameter, which controls the proportion of random position noise in the trajectories.In terms of precision, the traditional calculation methods generally maintain a relatively stable performance as the proportion of noise positions increases.On the other hand, the learning-based methods exhibit a relatively larger performance decline, but still outperform the non-learning methods.Regarding the mean rank metric, the traditional calculation methods also maintain a relatively stable level.The method based on the LSTM encoder experiences a significant performance decline after a pos_ratio value of 0.5, while the method based on the GNN encoder shows a slight increase but still maintains the best performance among all methods.Figure 8 demonstrates the variations of the models with increasing pos_distort.The parameter pos_distort controls the magnitude of random position noise in the points.It can be observed that both methods based on CLAIS exhibit high robustness and are minimally affected by large variations in pos_distort.On the other hand, traditional methods may experience significant performance loss due to their own definitions and other factors.For example, the definition of the Hausdorff distance as the maximum of the minimum distances between sets may not remain stable when the pos_distort increases.Parameter pos_loss controls the proportion of randomly lost positions.In Figure 9, the traditional methods maintain a relatively stable performance, and there is even a phenomenon where the average rank decreases as the loss increases.The analysis suggests that this may be due to the loss of excessive noisy positions, resulting in the measurement of only the truly valid trajectory points.As a result, the performance of finding similar trajectories slightly improves.The parameter seg_ratio controls the proportion of position points involved in segment distortion.As shown in Figure 10, it can be observed that the LSTM experiences a significant performance decline similar to pos_ratio when a large amount of distortion occurs.In contrast, CLAIS still maintains the best robustness.The parameter seg_distort controls the proportion of position points involved in segment distortion.As shown in Figure 11, it can be observed that the model performance exhibits a similar trend to pos_distort, maintaining overall stability as the seg_distort parameter varies.

Grid Size Experiment
The size of the grid is crucial to CLAIS, as it is a common challenge in grid-based approaches.For a given waterway, the width of the channel affects the density of naviga-tion, which, in turn influences the extent to which the grid granularity can preserve the geometric features of the trajectories.Finer grids can represent more detailed trajectory shapes so that the proposed model can identify different underlying paths.However, due to computational costs, the grid size cannot be infinitely reduced.Therefore, different waterways theoretically have different optimal grid sizes to balance computational complexity and performance.To illustrate how the grid size affects the ability of the proposed model to identify different trajectories, we conducted experiments on augmentation invariance self-similarity under various grid sizes.The experiment took the grid size as a variable, query size and database size were fixed to 1 k and 10 k, respectively, and the remaining model parameters were kept the same as those in Table 3.The average results of metrics for five runs of experiments can be seen in Table 5.From Table 5, it can be observed that when the grid size is smaller than 0.002, although there is a slight decrease in top-1 precision, it remains relatively stable.The proposed model is able to accurately identify the underlying real trajectories corresponding to the variant trajectories; even when the grid size increases to 0.003, the top-1 precision is still above 60%.Furthermore, by examining the average ranking, it can be noted that this metric performs even better than the top-1 precision.Within the range of under 0.003, the model consistently ranks the original trajectory among the top 4 out of 10,000 trajectories when searching for 1000 variant trajectories on average.However, when the grid size increases to 0.005, the top-1 precision drops to 40.7%, and the average ranking decreases to 8.41.Additionally, the standard deviation of the average ranking shows an increasing trend.This indicates that the grid size parameter indeed has an impact on the model, but despite this influence, the model can still maintain good performance within an appropriate trajectory size range.Moreover, depending on the specific requirements for application, a trade-off can be made between model performance and computational complexity.

Visualization
In the previous experiments, the performance of different comparison models was quantitatively evaluated, but it is difficult to intuitively understand the differences between various distances.In this section, visual experiments were conducted to observe whether the results of ship trajectory similarity calculation using CLAIS align with intuition, i.e., if visually similar trajectories exhibit a higher spatial proximity.The experiment randomly selected 500 trajectories from the dataset and chose two different target trajectories as query trajectories.For all comparison methods, pairwise similarity scores were computed between the query trajectories and the remaining trajectories (excluding themselves) in the dataset.The top 1% (i.e., the top 5) most similar trajectories were visualized.The visualization results, as shown in Figure 12, depict the query trajectory in red and the top-k nearest trajectories in blue.From the figure, it can be observed that due to the dataset being located in a harbor area, the trajectories are very close to each other, posing a significant challenge for similarity measurement.All methods obtained trajectory sets with tight spatial distances.Among the traditional methods, DTW achieved the best results, identifying the characteristics of trajectories such as the entrance to the harbor area and turning points at the mouth of the Huangpu River.The learning-based method based on the LSTM encoder successfully recognized the trajectories in the harbor area but had misjudgments at the turning point of the Huangpu River.On the other hand, the proposed CLAIS demonstrated the strongest ability to identify similar trajectories and successfully recognized the entrance direction and turning points, demonstrating the effectiveness of the proposed framework.

Discussion
In this section, three important experiments were conducted.The first experiment, the model comparison experiment, tested the ability of different models to retrieve the closest trajectories by calculating AIS trajectory similarity under different database sizes.The second robustness experiment examined the performance degradation of each comparison model under different variations.The third nearest neighbor experiment visualized the distribution of the closest trajectories obtained by different models.
In the robustness experiment, the pos_ratio parameter controlled one aspect.In terms of accuracy, both learning-based methods experienced a certain degree of decline when the random position noise increased significantly.The analysis suggests that these methods adopt an average pooling approach to obtain the final representation vector, making it difficult to maintain stable feature vectors in the presence of a high proportion of position noise.In terms of average rank, the significant performance drop of the LSTM encoder may be attributed to its modeling of the trajectory within a grid while neglecting the spatial features.As the positions with noise increase, the LSTM tends to misidentify positions that fall into other grids as belonging to other trajectories, leading to a substantial decrease in identification accuracy.On the other hand, CLAIS, based on the GNN encoder, demonstrates the advantage of incorporating spatial modeling.By performing message passing on neighboring grids' features, it can identify erroneous positions within a close range and recognize them as part of its own trajectory, exhibiting relatively better performance.
It is worth noting that observing the changes in pos_ratio and seg_ratio reveals similar patterns.A performance increase exceeding the expected trend occurs when both distortion values are set to 2. This can be attributed to setting the distortion value to 2 during the training phase, which enables the model to fit better to the scenario with a distortion value of 2 compared to other parameters during inference.

Conclusions
Similarity calculation between vessel trajectories is one of the fundamental and crucial problems in vessel trajectory analysis.It directly impacts applications such as vessel trajectory prediction, route planning, and anomaly detection.Therefore, it holds great significance in domains such as maritime safety and maritime regulation, where vessel trajectory analysis plays a vital role.There is still significant room for improvement in the trajectory representation learning capability of existing ship trajectory similarity calculation methods, especially in terms of explicitly modeling the spatial structure of trajectories.Additionally, with the accumulation of AIS data, a large amount of unlabeled data remains underutilized, resulting in significant waste.There is an urgent need for a method in AIS trajectory similarity calculation that possesses good ship trajectory representation learning capability and similarity calculation under unsupervised conditions.This work makes contributions in the following three respects.First, to achieve good performance in region and trajectory learning models, this paper proposes utilizing a historical trajectory database to model water areas as regional graphs and leverages pretraining to learn the spatial dependency relationships of the regional grid.By modeling regional graphs and pretraining, the model has already learned spatial features such as the habitual routes and adjacency relationships of the spatial grid before modeling the trajectory graph.This approach reduces training costs and achieves better performance.Second, the proposed CLAIS framework in this paper enhances the model's ability to capture the potential real trajectories of trajectories under erroneous AIS signals, such as noise or position loss, through a contrastive learning approach.By employing different augmentation techniques on trajectories, the model generates similar augmented trajectory sample pairs using the proposed parameterized augmentation scheme.These augmented trajectories are then modeled as trajectory graphs, and graph neural networks are used to learn the spatial dependency relationships of the trajectory graphs and improve the model's capability to extract the spatial features of trajectories.
Lastly, this paper proposes three metrics for trajectory similarity experiments under contrastive learning and conducts extensive experiments to validate the model's performance.The experiments are conducted five times with different random seeds, and the average results demonstrate the effectiveness of CLAIS.When the database size is 10 k, CLAIS achieves a high level of accuracy in the most similar trajectory accuracy metric with a value of 62%, an average rank of 1.89, and a low standard deviation of only 8.695.In comparison, the best-performing traditional method with the DTW metric achieves values of 39%, 55.46, and 89.71, respectively.Even replacing the CLAIS encoder with an LSTM that only models sequential features yields performance of only 64%, 3.79, and 14.45 for the corresponding metrics.
Although this paper effectively contributes to the research on AIS trajectory similarity computation, there are still aspects that require further investigation.First, this paper only considers the geometric features, directional features, and sequential features of vessel trajectories, while neglecting important information such as speed, heading, timestamps, and seasons, as well as vessel type and tonnage.This may not fully reflect real-world scenarios and can have a significant impact on trajectory analysis.Second, the contrastive learning method in this paper compares trajectory graphs constructed based on the grid features of water areas, overlooking trajectory-level features and water area-level features.It is worth considering future research that incorporates joint contrastive learning for features at different levels to enhance representation learning capabilities.Furthermore, the augmentation scheme in this paper does not take into account the specific AIS signal error conditions in a particular real-world water area.It will be necessary to conduct research in the future that adapts the augmentation scheme parameters based on statistical information about signal errors, making the model more suitable for handling specific erroneous signals in the studied water area.

Figure 2 .
Figure 2. Overview of the regional graph pretraining module.

Figure 3 .
Figure 3. Gridification and historical heatmaps of the study area under different signal thresholds in the water domain.The grid size is set to 0.01 • , and δ ranges from left to right as 1, 5, and 10.

Figure 5 .
Figure 5. Trajectory enhancement methods in CLAIS: (a) original trajectory; (b) random position noise; (c) random position loss; (d) regular downsampling; (e) random segment distortion; (f) random segment loss.Green, red and yellow dots represent normal positions, error positions and missing normal positions respectively.

Figure 12 .
Figure 12.Top five similar trajectories of query trajectory.

Table 1 .
Statistical information of the dataset.

Table 4 .
Performance comparison of different database sizes, bold indicates best performance.

Table 5 .
Experiment on grid size.