Author Contributions
Conceptualization, G.A.-F., S.G., Z.H. and B.B.; Methodology, G.A.-F., S.G., Z.H. and B.B.; Software, G.A.-F., S.G., Z.H. and B.B.; Validation, G.A.-F., S.G., Z.H. and B.B.; Formal analysis, G.A.-F., S.G., Z.H. and B.B.; Investigation, G.A.-F., S.G., Z.H. and B.B.; Resources, G.A.-F., S.G., Z.H. and B.B.; Data curation, G.A.-F. and B.B.; Writing—original draft, G.A.-F., S.G., Z.H. and B.B.; Writing—review & editing, G.A.-F., S.G., Z.H., B.B., P.K., B.S. and S.T.; Visualization, G.A.-F., S.G., Z.H. and B.B.; Supervision, G.A.-F. and S.T.; Project administration, G.A.-F. and S.T.; Funding acquisition, S.T. All authors have read and agreed to the published version of the manuscript.
Appendix A. Post-Processed Features Set
Table A1.
Static vessel information (Ship DB).
Table A1.
Static vessel information (Ship DB).
| Column | Unit | Description |
|---|
| mmsi | – | Identifier of the ship |
| ship_type | – | Official ship type |
| ship_group | – | Ship category (sailing, passenger, cargo, other) |
| to_bow | m | Distance from GPS antenna to bow |
| to_stern | m | Distance from GPS antenna to stern |
| to_port | m | Distance from GPS antenna to port side |
| to_starboard | m | Distance from GPS antenna to starboard side |
| crawled | – | Flag indicating crawled data (otherwise AIS) |
Table A2.
Dynamic ship features.
Table A2.
Dynamic ship features.
| Column | Unit | Description |
|---|
| timestamp | ns | Timestamp of the observation |
| mmsi | – | Identifier of the ship |
| traj_id | – | Trajectory identifier |
| time_diff | s | Interpolation time offset |
| draught | m | Ship draught |
| geometry | deg | GeoPandas geometry (EPSG:4326) |
| lon | deg | Longitude |
| lat | deg | Latitude |
| heading | deg | Heading [0–360] |
| course | deg | Course [0–360] |
| status | – | Navigational status |
| speed | m/s | Speed |
| acc | m/s2 | Acceleration |
| angular_difference | deg | Change in course per step [0–180] |
| dist_to_land | m | Distance to shore |
| dist_to_ferry_route | m | Distance to nearest ferry route |
| dist_to_restricted_area | m | Distance to nearest restricted area |
| water_depth | m | Water depth |
| density_all | ships/km2/h | Density of all ship groups (log-scaled) |
| density_sailing | ships/km2/h | Density of sailing vessels (log-scaled) |
| density_cargo | ships/km2/h | Density of cargo vessels (log-scaled) |
| density_other | ships/km2/h | Density of other vessels (log-scaled) |
| density_passenger | ships/km2/h | Density of passenger vessels (log-scaled) |
Table A3.
Pairwise ship-to-ship interaction features.
Table A3.
Pairwise ship-to-ship interaction features.
| Column | Unit | Description |
|---|
| timestamp | ns | Timestamp of the observation |
| mmsi | – | Identifier of the ego ship |
| mmsi_other | – | Identifier of the target ship |
| dist | m | Distance between ships |
| rel_speed | m/s | Relative speed |
| course_of_rel_motion | deg | Course of relative motion |
| course_diff | deg | Difference in course |
| true_bearing | deg | Bearing from north |
| rel_bearing | deg | Bearing relative to course |
| rel_bearing_cat | – | (bow, stern, starboard, port) |
| tcpa | s | Time to closest point of approach |
| dcpa | m | Distance at closest point of approach |
| collision_risk | [0–1] | Collision risk score |
Table A4.
Complete model input feature set and debiasing for Pipeline 1. All features are present at every observation timestamp. Cyclic features use sin/cos encoding; categorical features use one-hot encoding. Features marked “Excluded“ are removed under the debiasing procedure (
Section 5.2, Equation (
26)) to prevent ship-group label leakage in Pipeline 1. The Ship DB dimension columns
to_bow,
to_stern,
to_port, and
to_starboard are collapsed into
length and
width during feature engineeringreferences and are therefore covered by their exclusion.
Table A4.
Complete model input feature set and debiasing for Pipeline 1. All features are present at every observation timestamp. Cyclic features use sin/cos encoding; categorical features use one-hot encoding. Features marked “Excluded“ are removed under the debiasing procedure (
Section 5.2, Equation (
26)) to prevent ship-group label leakage in Pipeline 1. The Ship DB dimension columns
to_bow,
to_stern,
to_port, and
to_starboard are collapsed into
length and
width during feature engineeringreferences and are therefore covered by their exclusion.
| Feature | Group | Encoding | Pipeline 1 Status |
|---|
| Trajectory features
|
| lon | Trajectory | – | Retained |
| lat | Trajectory | – | Retained |
| x | Trajectory | – | Retained (projected) |
| y | Trajectory | – | Retained (projected) |
| rel_x | Trajectory | – | Retained (segment-relative) |
| rel_y | Trajectory | – | Retained (segment-relative) |
| status | Trajectory | one-hot | Retained |
| speed | Trajectory | – | Retained |
| acc | Trajectory | – | Retained |
| course | Trajectory | sin/cos | Retained |
| angular_difference | Trajectory | sin/cos | Retained |
| Static vessel features |
| ship_type | Static | one-hot | Excluded—direct class label |
| ship_group | Static | one-hot | Excluded—direct class label |
| length | Static | – | Excluded—class-discriminative dimension |
| width | Static | – | Excluded—class-discriminative dimension |
| Map/environment features |
| water_depth | Map | – | Retained |
| dist_to_land | Map | – | Retained |
| dist_to_ferry_route | Map | – | Retained |
| dist_to_restricted_area | Map | – | Retained |
| density_all | Map | – | Retained |
| density_own_group | Map | – | Excluded—label-derived proxy |
| Ship-to-ship interaction features |
| dist | Ship2ship | – | Retained |
| rel_speed | Ship2ship | – | Retained |
| course_diff | Ship2ship | sin/cos | Retained |
| rel_bearing | Ship2ship | sin/cos | Retained |
| rel_bearing_cat | Ship2ship | one-hot | Retained |
| tcpa | Ship2ship | – | Retained |
| dcpa | Ship2ship | – | Retained |
| collision_risk | Ship2ship | – | Retained |
| ship_type_other | Ship2ship | one-hot | Excluded—encodes class of interaction partner |
| ship_group_other | Ship2ship | one-hot | Retained |
| Datetime features |
| day_of_year | Datetime | sin/cos | Retained |
| day_of_week | Datetime | sin/cos | Retained |
| hour_of_day | Datetime | sin/cos | Retained |
| month_of_year | Datetime | sin/cos | Retained |
| Total features | 35 full set; 29 retained in Pipeline 1 ( excluded) |
Appendix B. Clustering Methods: Detailed Descriptions and Formulations
This appendix provides detailed mathematical formulations for the four clustering methods used in Experiment II (
Section 6.3).
The methods were selected for their complementary strengths in capturing the complex, non-linear structures characteristic of maritime navigation behaviour, including arbitrary-shape clusters, hierarchical patterns, and heterogeneous density distributions.
Appendix B.1. kNN-Leiden: Graph-Based Community Detection
The kNN-Leiden method combines
k-nearest neighbour graph construction with the Leiden community detection algorithm [
65].
Leiden improves upon the Louvain algorithm [
69] by guaranteeing that all detected communities are internally connected, preventing the formation of fragmented or isolated clusters—a critical property for maritime trajectory analysis where spatial and temporal continuity defines coherent behavioural patterns.
For a dataset in feature space , we construct an undirected weighted graph where
Vertices V correspond to trajectory segments: ,
Edges E connect each vertex to its k nearest neighbours under distance metric ,
Edge weights are computed as:
where
is a scale parameter (typically set to the median pairwise distance).
The Leiden algorithm iteratively optimises the modularity quality function:
where
is the total edge weight,
is the weighted degree of node i,
if nodes i and j are in the same community, 0 otherwise.
The resolution parameter controls community size: higher values yield more, smaller communities. The final partition assigns each trajectory segment to exactly one community.
Table A5 summarises the key hyperparameters for kNN-Leiden clustering.
Table A5.
kNN-Leiden hyperparameters and their interpretations.
Table A5.
kNN-Leiden hyperparameters and their interpretations.
| Parameter | Symbol | Description |
|---|
| n_neighbors | k | Number of nearest neighbours for graph construction |
| resolution | | Controls granularity of detected communities; higher values yield finer partitions |
| metric | – | Distance metric: cosine for normalised embeddings, euclidean for expert features |
Appendix B.2. HDBSCAN: Hierarchical Density-Based Spatial Clustering of Applications with Noise
Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [
63] extends the classical DBSCAN algorithm by replacing the fixed density threshold with a hierarchical density model.
This enables detection of clusters with varying densities and provides explicit identification of noise points, essential for maritime anomaly detection.
For each point
, the core distance is defined as the distance to its
th nearest neighbour:
The mutual reachability distance between points
and
is:
This distance metric emphasises density by inflating distances in low-density regions.
A Minimum Spanning Tree (MST) is constructed over all points using as edge weights. A cluster hierarchy is built by removing edges from the MST in order of decreasing weight (increasing density threshold ).
For each candidate cluster
C appearing in the hierarchy, its stability is computed as:
where
is the threshold at which point
p joins
C and
is the threshold at which
C splits or disappears.
The final flat clustering is obtained by selecting clusters that maximise total stability, using a dynamic programming algorithm to avoid selecting parent and child simultaneously. Points not assigned to any cluster are labelled as noise (cluster label ).
Table A6 summarises the key hyperparameters for HDBSCAN clustering.
Table A6.
HDBSCAN hyperparameters and their interpretations.
Table A6.
HDBSCAN hyperparameters and their interpretations.
| Parameter | Symbol | Description |
|---|
| min_cluster_size | – | Minimum number of points required to form a cluster |
| min_samples | | Number of neighbours for core distance computation |
| cluster_selection_epsilon | | Optional minimum threshold for cluster separation |
| cluster_selection_method | – | eom (Excess of Mass) or leaf (leaf clusters) |
Appendix B.3. VBGMM: Variational Bayesian Gaussian Mixture Model
VBGMM employs variational Bayesian inference for Gaussian Mixture Model (GMM) with automatic component selection via Dirichlet process priors [
64]. This probabilistic approach provides soft cluster assignments and principled uncertainty quantification, aligning with the framework’s emphasis on uncertainty-aware representations.
A Gaussian Mixture Model assumes that observations
are generated from a mixture of
K Gaussian components:
where
are mixing proportions (
),
is the mean of component
k, and
is the covariance matrix of component
k.
The variational Bayesian approach places priors on all model parameters and approximates the posterior distribution
with a factorised variational distribution
. Prior distributions are specified as shown in
Table A7.
Table A7.
Prior distributions for VBGMM parameters.
Table A7.
Prior distributions for VBGMM parameters.
| Parameter | Prior Distribution | Hyperparameter |
|---|
| Mixing proportions | | : concentration parameter |
| Means | | : mean precision |
| Covariances | | : Wishart parameters |
The Evidence Lower Bound (ELBO) is maximised iteratively:
The algorithm alternates between updating the variational posterior on assignments (computing responsibilities ) and updating the variational posterior on parameters (computing updated Dirichlet and Gaussian–Wishart parameters).
The Dirichlet process prior with concentration encourages sparsity in mixing proportions. Components with negligible posterior weight () are effectively pruned, enabling automatic determination of the effective number of clusters.
The final hard clustering assigns each point to the component with highest responsibility:
Table A8 summarises the key hyperparameters for VBGMM clustering.
Table A8.
VBGMM hyperparameters and their interpretations.
Table A8.
VBGMM hyperparameters and their interpretations.
| Parameter | Symbol | Description |
|---|
| n_components | K | Maximum number of mixture components |
| weight_concentration_prior | | Dirichlet concentration; lower values favour fewer components |
| mean_precision_prior | | Prior precision on component means |
| covariance_type | – | Covariance structure: full, tied, diag, spherical |
| weight_concentration_prior_type | – | dirichlet_process (sparse) or dirichlet_distribution (uniform) |
Appendix B.4. FINCH: First Integer Neighbour Clustering Hierarchy
FINCH [
66] is a parameter-free hierarchical clustering algorithm that constructs a cluster hierarchy by iteratively merging clusters based on first-neighbour relations. The algorithm is computationally efficient and requires minimal tuning, making it suitable for exploratory multi-scale analysis of large trajectory datasets.
FINCH constructs a nested hierarchy of partitions through an iterative procedure. Each point is initially assigned to its own cluster: .
At iteration
t, for each cluster
, its first neighbour
is defined as the cluster with the smallest minimum pairwise distance:
An undirected graph is constructed where vertices are clusters and edges connect each cluster to its first neighbour. Connected components from form the next-level partition .
The process terminates when (no further merging possible), producing a sequence of nested partitions where T is the number of hierarchy levels.
Since FINCH generates multiple hierarchy levels, the partition that maximises a validation metric (silhouette score in Experiment II) is selected:
The computational complexity is per iteration, with typically logarithmic number of iterations .
Table A9 summarises the key hyperparameters for FINCH clustering.
Table A9.
FINCH hyperparameters and their interpretations.
Table A9.
FINCH hyperparameters and their interpretations.
| Parameter | Symbol | Description |
|---|
| n_neighbors | k | Number of nearest neighbours for initial graph construction (optional) |
| metric | – | Distance metric for pairwise distance computation |
| max_levels | T | Maximum number of hierarchy levels; typically set to null for automatic termination |
Appendix B.5. Method Comparison and Selection Rationale
Table A10 summarises the key characteristics, computational complexity, and suitability of each method for maritime trajectory clustering.
Table A10.
Comparative summary of clustering methods used in Experiment II.
Table A10.
Comparative summary of clustering methods used in Experiment II.
| Method | Paradigm | Cluster Shapes | Complexity | Key Advantage for AIS Trajectories |
|---|
| kNN-Leiden | Graph-based | Arbitrary (network communities) | | Captures route-network structures with heterogeneous density; connectivity guarantee prevents fragmentation |
| HDBSCAN | Density-based | Arbitrary (density-connected) | | Explicit noise detection for anomalies; multi-scale hierarchy captures local manoeuvers and global patterns |
| VBGMM | Model-based | Ellipsoidal (Gaussian) | per iteration | Probabilistic assignments support uncertainty quantification; automatic component selection via Dirichlet process |
| FINCH | Hierarchical | Arbitrary (first-neighbor) | per level | Parameter-free exploratory clustering; fast and scalable for large datasets |
The four methods provide complementary perspectives on trajectory organisation:
kNN-Leiden reveals community structures in the trajectory graph, capturing vessel movements organised into route networks and transit corridors
HDBSCAN identifies density-separated behavioural modes while explicitly flagging anomalous trajectories as noise, supporting safety-critical applications
VBGMM provides probabilistic cluster assignments with uncertainty estimates, enabling soft boundaries between overlapping behaviours (e.g., vessels transitioning between operational modes)
FINCH offers a parameter-free baseline for exploratory analysis, revealing multi-scale behavioural hierarchies without prior assumptions about cluster count or density
This multi-method evaluation ensures that conclusions about representation quality (Contribution 6) are robust across different clustering paradigms and not artifacts of a single methodological choice.
Appendix C. Clustering Validation Metrics
Table A11 summarises the mathematical properties, value ranges, and interpretation guidelines for all clustering validation metrics used in Experiment II (
Section 6.3). Metrics are categorised by their underlying assumptions and suitability for different clustering paradigms.
Table A11.
Properties and formulations of clustering validation metrics used in Experiment II. Metrics are categorised by their suitability for density-based, graph-based, or centroid-based clustering methods.
Table A11.
Properties and formulations of clustering validation metrics used in Experiment II. Metrics are categorised by their suitability for density-based, graph-based, or centroid-based clustering methods.
| Metric | Description & Formulation | Value Range | Interpretation | Best Suited for |
|---|
| Intrinsic Density/Graph-based Metrics (Primary) |
| DBCV | Density-Based Clustering Validation [67]. Measures density connectivity within clusters and density separation between clusters using mutual reachability distance. | | Higher is better. >0: dense, well-separated clusters. | HDBSCAN, density-based methods. Gold standard for arbitrary-shape clusters. |
| Conductance | Graph edge-cut quality [65]. Measures fraction of edges leaving a community relative to community volume. | | Lower is better. <0.1: excellent separation. | kNN-Leiden, graph-based community detection. |
| Modularity | Community structure strength [69]. Quantifies difference between actual and expected edge density within communities. | | Higher is better. >0.3: significant structure. | kNN-Leiden, graph partitioning. Primary optimisation target for Leiden algorithm. |
| ELBO | Evidence Lower Bound for VBGMM [64]. Variational lower bound on log-likelihood. | | Higher (less negative) is better. Balances fit and complexity. | VBGMM. Native probabilistic model selection criterion. |
| Traditional Metrics (For Completeness, Appendix D) |
| Silhouette | Centroid-based cohesion and separation [75]. where is mean intra-cluster distance, is mean nearest-cluster distance. | | Higher is better. Assumes convex clusters. | K-means, centroid-based methods. Less suitable for arbitrary shapes. |
| Calinski–Harabasz | Variance ratio criterion [76]. Ratio of between-cluster to within-cluster variance. | | Higher is better. Assumes isotropic Gaussian clusters. | K-means, GMM. Biased towards spherical clusters. |
| Davies–Bouldin | Average similarity between each cluster and its most similar cluster [77]. | | Lower is better. Assumes spherical, similar-size clusters. | K-means. Not suitable for varying density or arbitrary shapes. |
The choice of primary metrics in Experiment II is aligned with the clustering methods’ optimisation objectives and underlying assumptions. DBCV is the native validation criterion for HDBSCAN and provides a density-aware alternative for evaluating all methods. Conductance and modularity directly measure the quality functions optimised by kNN-Leiden, making them the most faithful diagnostics for graph-based partitions. ELBO is the variational lower bound maximised during VBGMM inference, providing a principled model selection criterion. Traditional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) assume convex, isotropic clusters and are less informative for maritime trajectory data, which exhibit elongated corridors, branching patterns, and heterogeneous density; these metrics are reported for completeness but not used for primary comparison.
Appendix D. Clustering Quantitative Assessment
Table A12 presents the complete set of intrinsic and traditional clustering validation metrics for all four clustering methods applied to learnt embeddings and expert features. Primary metrics (DBCV, conductance, modularity) align with the optimisation objectives of each method and are discussed in detail in
Section 7.2. Traditional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) are included for completeness but exhibit restrictive geometric assumptions (convexity, isotropy) that limit their validity for maritime trajectory data, as explained in
Appendix C. Bold values with arrows indicate superior performance (
↑ higher is better;
↓ lower is better).
Table A12.
Complete clustering quality metrics for all methods on 50,000 trajectory segments. Primary metrics (DBCV, conductance, modularity) are optimised by the respective clustering algorithms. Traditional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) assume convex clusters and are less informative for arbitrary-shape maritime patterns. Bold values indicate better performance when comparing embeddings vs. expert features. Metric properties and interpretation guidelines are detailed in
Appendix C.
Table A12.
Complete clustering quality metrics for all methods on 50,000 trajectory segments. Primary metrics (DBCV, conductance, modularity) are optimised by the respective clustering algorithms. Traditional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) assume convex clusters and are less informative for arbitrary-shape maritime patterns. Bold values indicate better performance when comparing embeddings vs. expert features. Metric properties and interpretation guidelines are detailed in
Appendix C.
| Method | Repr. | Primary Metrics | Traditional Metrics |
|---|
| DBCV ↑ | Cond. ↓ | Modul. ↑ | #Clust. | Silh. ↑ | CH ↑ | DB ↓ |
|---|
| kNN-Leiden | Embed. | ↑ | 0.186 ↓ | 0.906 ↑ | 47 | 0.040 ↑ | 745 | 2.633 |
| Expert | | 0.327 | 0.875 | 48 | | 928 ↑ | 2.301 ↓ |
| FINCH | Embed. | ↑ | 0.205 ↓ | 0.756 ↑ | 27 | 0.010 | 941 | 2.695 ↓ |
| Expert | | 0.206 | 0.671 | 8 | 0.024 ↑ | 2374 ↑ | 2.986 |
| HDBSCAN | Embed. | 0.112 ↑ | 0.479 | – | 3 | 0.008 ↑ | 710 | 3.116 |
| Expert | 0.042 | 0.311 ↓ | – | 2 | | 1165 ↑ | 2.766 ↓ |
| VBGMM | Embed. | ↑ | 0.193 ↓ | 0.756 ↑ | 28 | 0.058 ↑ | 1251 ↑ | 2.386 ↓ |
| Expert | | 0.498 | 0.419 | 28 | | 749 | 3.832 |
DBCV (Density-Based Cluster Validation): Embeddings achieve higher (less negative) values for all methods, indicating improved density connectivity within clusters and density separation between clusters. For HDBSCAN, embeddings reach positive DBCV (0.112), signaling dense, well-separated clusters.
Conductance: Lower values indicate cleaner graph cuts. Embeddings outperform expert features for kNN-Leiden (0.186 vs. 0.327), FINCH (0.205 vs. 0.206), and VBGMM (0.193 vs. 0.498). For HDBSCAN, higher conductance (0.479) is expected as the method optimises density stability, not graph boundaries.
Modularity: Higher values signal stronger community structure. Embeddings achieve superior modularity for kNN-Leiden (0.906 vs. 0.875), FINCH (0.756 vs. 0.671), and VBGMM (0.756 vs. 0.419). HDBSCAN does not compute modularity (density-based, not graph-based).
Cluster Count: Embeddings yield finer-grained partitions for FINCH (27 vs. 8 clusters) and HDBSCAN (3 vs. 2), revealing multi-scale behavioural structure. kNN-Leiden and VBGMM produce comparable granularity (47 vs. 48; 28 vs. 28).
Traditional Metrics: Silhouette, Calinski–Harabasz, and Davies–Bouldin exhibit mixed patterns. These metrics assume convex, isotropic clusters and are less informative for maritime trajectories, which exhibit elongated corridors and heterogeneous density (see
Appendix C). For instance, expert features achieve higher Calinski–Harabasz for kNN-Leiden (928 vs. 745) and FINCH (2374 vs. 941), but this does not contradict the superior DBCV/conductance/modularity of embeddings—it reflects the metrics’ differing geometric assumptions.
Appendix E. Cluster Size Distributions
Figure A1 compares cluster size distributions for learnt embeddings and expert features across all four clustering methods. Embeddings consistently yield more balanced partitions, whilst expert features produce skewed distributions dominated by a small number of large clusters. For HDBSCAN, embeddings reduce noise assignments by 48% (26,000 vs. 44,000 segments), indicating improved density structure. For FINCH, expert features collapse into 8 coarse-grained clusters with a dominant cluster containing 26% of all segments, whereas embeddings reveal 27 finer-grained behavioural modes with more uniform size distribution. The quantitative metrics corresponding to these partitions are detailed in
Appendix D.
Figure A1.
Cluster size distributions for embeddings (left column) vs. expert features (right column). Embeddings produce more balanced partitions: kNN-Leiden (a) yields 47 communities with gradual decay, whereas expert (b) exhibits power-law dominance. FINCH (c) reveals 27 multi-scale clusters, whilst expert (d) collapses into 8 coarse modes with one giant cluster (26% of data). HDBSCAN (e) rejects 52% as noise (grey) vs. 88% for expert (f), indicating embeddings form denser, more stable clusters. VBGMM (g) maintains balanced components, whereas expert (h) concentrates 18% of data in a single dominant component.
Figure A1.
Cluster size distributions for embeddings (left column) vs. expert features (right column). Embeddings produce more balanced partitions: kNN-Leiden (a) yields 47 communities with gradual decay, whereas expert (b) exhibits power-law dominance. FINCH (c) reveals 27 multi-scale clusters, whilst expert (d) collapses into 8 coarse modes with one giant cluster (26% of data). HDBSCAN (e) rejects 52% as noise (grey) vs. 88% for expert (f), indicating embeddings form denser, more stable clusters. VBGMM (g) maintains balanced components, whereas expert (h) concentrates 18% of data in a single dominant component.
The balanced cluster size distributions observed for embeddings indicate that GMAE-REx representations support fine-grained behavioural differentiation without collapsing semantically distinct navigation patterns into a few dominant modes. In contrast, expert features exhibit concentration effects where large clusters capture generic transit behaviours, whilst smaller clusters represent edge cases or outliers. This pattern is consistent with the hypothesis that expert-crafted features emphasise kinematic similarities within vessel classes (leading to large homogeneous clusters), whereas learnt embeddings encode context-dependent operational modes that transcend vessel-type boundaries (enabling more granular and balanced partitioning). The substantial reduction in HDBSCAN noise assignments (26,000 vs. 44,000 segments) further demonstrates that embeddings exhibit denser, more coherent manifold structure in high-dimensional space, facilitating density-based community discovery.
Appendix F. UMAP Cluster Projections
Figure A2,
Figure A3,
Figure A4 and
Figure A5 visualise clustering results in two-dimensional UMAP projections for all four methods applied to learnt embeddings and expert features. Each method shows two colourings: cluster assignments (left panels) and ship type (right panels). Embeddings consistently produce well-separated, compact clusters with mixed vessel types within each cluster, indicating organisation by operational behaviour rather than vessel identity. Expert features exhibit substantial overlap between clusters and stronger ship-type segregation, reflecting kinematic similarities within vessel classes. The visual structure corroborates the quantitative metrics in
Appendix D and the cluster size distributions in
Appendix E.
Figure A2.
UMAP projections for kNN-Leiden clustering. Embeddings (a) yield 47 well-separated communities (left panel) with mixed ship types (right panel), whilst expert features (b) produce 48 communities with visible cluster overlap and stronger ship-type segregation. The right panels demonstrate that embedding-based clusters transcend vessel categories, capturing operational modes shared across cargo, passenger, sailing, and other vessel types.
Figure A2.
UMAP projections for kNN-Leiden clustering. Embeddings (a) yield 47 well-separated communities (left panel) with mixed ship types (right panel), whilst expert features (b) produce 48 communities with visible cluster overlap and stronger ship-type segregation. The right panels demonstrate that embedding-based clusters transcend vessel categories, capturing operational modes shared across cargo, passenger, sailing, and other vessel types.
Figure A3.
UMAP projections for FINCH hierarchical clustering. Embeddings (a) reveal 27 distinct spatial regions (left panel) with balanced ship-type mixing (right panel), whereas expert features (b) collapse into 8 large overlapping clusters dominated by a giant central component. The finer granularity of embedding-based partitions reflects multi-scale behavioural structure not captured by expert kinematic features.
Figure A3.
UMAP projections for FINCH hierarchical clustering. Embeddings (a) reveal 27 distinct spatial regions (left panel) with balanced ship-type mixing (right panel), whereas expert features (b) collapse into 8 large overlapping clusters dominated by a giant central component. The finer granularity of embedding-based partitions reflects multi-scale behavioural structure not captured by expert kinematic features.
Figure A4.
UMAP projections for HDBSCAN density-based clustering. Embeddings (
a) identify 3 dense clusters (green, dark red, purple) with 52% noise rejection (grey), whereas expert features (
b) produce only 2 small clusters with 88% noise rejection. The substantial reduction in noise for embeddings indicates improved density structure, consistent with the higher DBCV score (0.112 vs. 0.042) in
Table A12.
Figure A4.
UMAP projections for HDBSCAN density-based clustering. Embeddings (
a) identify 3 dense clusters (green, dark red, purple) with 52% noise rejection (grey), whereas expert features (
b) produce only 2 small clusters with 88% noise rejection. The substantial reduction in noise for embeddings indicates improved density structure, consistent with the higher DBCV score (0.112 vs. 0.042) in
Table A12.
Figure A5.
UMAP projections for VBGMM probabilistic clustering. Embeddings (
a) form 28 compact, well-separated components (
left panel) with mixed ship types (
right panel), whilst expert features (
b) exhibit a dominant central cluster with substantial overlap between components and stronger ship-type segregation. This contrast is discussed in detail in
Section 7.2 and illustrated in
Figure 8 (main text).
Figure A5.
UMAP projections for VBGMM probabilistic clustering. Embeddings (
a) form 28 compact, well-separated components (
left panel) with mixed ship types (
right panel), whilst expert features (
b) exhibit a dominant central cluster with substantial overlap between components and stronger ship-type segregation. This contrast is discussed in detail in
Section 7.2 and illustrated in
Figure 8 (main text).
The UMAP projections provide visual evidence that embeddings encode context-dependent operational modes rather than vessel-identity-driven kinematic profiles. Across all methods, embedding-based clusters exhibit: (i) spatial compactness with clear inter-cluster separation, (ii) mixed ship-type composition within individual clusters, and (iii) distinct regional organisation in the two-dimensional projection, suggesting that the latent space captures behavioural nuances beyond simple speed/course patterns. In contrast, expert features consistently show (i) overlapping cluster boundaries, (ii) stronger alignment between clusters and vessel types (e.g., cargo-dominated regions vs. passenger-dominated regions), and (iii) diffuse central concentrations, indicating that hand-crafted nautical features emphasise within-class kinematic similarities rather than cross-class behavioural patterns. This visual structure aligns with the superior DBCV, conductance, and modularity scores for embeddings reported in
Table A12, and supports the hypothesis that GMAE-REx representations facilitate behaviour-centric clustering suitable for MASS applications.
Appendix G. Global SHAP Feature Importance
Figure A6,
Figure A7,
Figure A8 and
Figure A9 present global SHAP feature importance aggregated across all clusters for each of the four clustering methods applied to GMAE-REx embeddings. SHAP values quantify the marginal contribution of each input feature to cluster assignments, providing model-agnostic explainability grounded in cooperative game theory. Temporal encodings (day-of-year, day-of-week, hour-of-day) dominate the rankings for kNN-Leiden, FINCH, and VBGMM, indicating that seasonal and diurnal patterns are primary drivers of behavioural differentiation in the Kiel Fjord dataset. HDBSCAN exhibits a distinct pattern, prioritising ship-to-ship interaction features (relative bearing, distance to land) and density features over temporal encodings, reflecting the method’s focus on local density structure rather than global community organisation. Kinematic features (speed, acceleration, course) and environmental features (water depth, distance to land, traffic density) occupy mid-range positions across all methods, whilst static vessel attributes (ship type, dimensions) contribute at moderate to low levels. Detailed per-cluster SHAP analyses are provided in
Appendix H, and the VBGMM results are discussed in detail in
Section 7.4.
Figure A6.
kNN-Leiden: Temporal features dominate (top 4 positions).
Figure A6.
kNN-Leiden: Temporal features dominate (top 4 positions).
Figure A7.
FINCH: Day-of-year encodings rank highest.
Figure A7.
FINCH: Day-of-year encodings rank highest.
Figure A8.
HDBSCAN: Interaction features (rel_bearing_1_cos, dist_to_land) dominate.
Figure A8.
HDBSCAN: Interaction features (rel_bearing_1_cos, dist_to_land) dominate.
Figure A9.
VBGMM: Temporal encodings occupy top 4 positions (discussed in
Section 7.4).
Figure A9.
VBGMM: Temporal encodings occupy top 4 positions (discussed in
Section 7.4).
The divergence between HDBSCAN and the other three methods reveals fundamental differences in clustering paradigms: Graph-based and mixture-based methods (kNN-Leiden, FINCH, VBGMM) partition the embedding space based on global community structure and probabilistic density, leading to temporal-operational clusters that capture when and how vessels navigate (e.g., summer ferry traffic vs. winter commercial operations). Density-based methods (HDBSCAN) identify local density-connected regions, emphasising ship-to-ship interactions and spatial context (relative bearing, distance to land, traffic density) over seasonal patterns. This distinction aligns with the methods’ optimisation objectives: kNN-Leiden maximises modularity (global graph structure), VBGMM maximises ELBO (probabilistic fit), FINCH merges nearest neighbours (hierarchical structure), whilst HDBSCAN maximises cluster stability under varying density thresholds (local persistence).
Across all methods, the importance hierarchy follows a consistent pattern: (1) Temporal features rank highest for 3/4 methods (day_of_year_cos/sin: 0.0046–0.0028 for VBGMM, day_of_week_cos/sin: 0.0034–0.0013), indicating that behavioural modes are strongly time-dependent. (2) Kinematic features occupy mid-range positions (speed: 0.0019–0.0014, course_cos/sin: 0.0019–0.0014, acc: 0.0018–0.0014), suggesting that while speed/heading inform cluster structure, they are not primary differentiators. (3) Environmental features (water_depth: 0.0017–0.0012, dist_to_land: 0.0016–0.0010, density_all: 0.0017–0.0012) and (4) interaction features (rel_bearing_0_cos/sin, dcpa_0, tcpa_0) contribute at moderate levels, except for HDBSCAN where interaction features dominate (rel_bearing_1_cos: 0.0226, 5× higher than temporal features). (5) Static vessel attributes (ship_type, ship_group_cargo, width, length) rank in the lower quartile for all methods, confirming that clusters organise by operational behaviour rather than vessel identity.
The global SHAP rankings demonstrate that GMAE-REx embeddings preserve multi-faceted behavioural signals from the input feature space, enabling different clustering methods to discover complementary structures: temporal-operational modes (graph/mixture methods) or spatial-interactional patterns (density methods). This flexibility supports diverse maritime applications, from seasonal traffic analysis (VTS, port planning) to collision risk assessment (real-time MASS navigation).
Appendix H. Per-Cluster SHAP Feature Importance
Figure A10,
Figure A11,
Figure A12,
Figure A13,
Figure A14,
Figure A15 and
Figure A16 present per-cluster SHAP feature importance distributions for all four clustering methods applied to GMAE-REx embeddings. Each subplot displays the top 15 features driving cluster assignments for a single cluster, revealing cluster-specific behavioural signatures. Whilst global SHAP rankings (
Appendix G) aggregate importance across all trajectories, per-cluster analysis exposes intra-method heterogeneity. Radar plots comparing cluster centroids on top features are provided in
Appendix I.
Temporal features (day_of_year_cos/sin) dominate top 2 positions across 45/47 clusters, with importance values 0.026–0.030. Cluster-specific variations emerge in kinematic features: Clusters 0–5 prioritise acceleration and course encodings (ranks 3–5), whilst Clusters 20–25 elevate spatial features (dist_to_ferry_route, water_depth). Sample sizes range from (Cluster 1, micro-community) to (Cluster 0, dominant hub), reflecting hierarchical graph structure.
Day-of-year encodings consistently occupy top 2 positions across all 27 clusters, with importance 0.028–0.040 (highest among all methods). Cluster 5 () exhibits unique behaviour: rel_bearing_0_cos elevates to rank 3 (importance: 0.0236), 2× higher than other clusters, indicating specialised ship-to-ship interaction patterns. The hierarchical merging strategy produces clusters with strong temporal homogeneity but varying kinematic profiles.
Cluster 0 (, 80% of non-noise data) prioritises interaction features: rel_bearing_1_cos (0.0332), course_sin (0.0309), dist_to_land (0.0289), reflecting dense maritime traffic navigation patterns. Cluster 1 (, 0.07%) is an outlier micro-cluster dominated by collision-risk features: course_diff_1_cos/sin and dcpa_1 (0.0225), likely representing near-miss scenarios. Cluster 2 () emphasises spatial constraints (dist_to_restricted_area: 0.0220, rank 2), suggesting restricted-zone navigation awareness. Temporal features rank below position 10 for all clusters.
Figure A10.
kNN-Leiden per-cluster SHAP importance (Clusters 0–23).
Figure A10.
kNN-Leiden per-cluster SHAP importance (Clusters 0–23).
Figure A11.
kNN-Leiden per-cluster SHAP importance (Clusters 24–45).
Figure A11.
kNN-Leiden per-cluster SHAP importance (Clusters 24–45).
Figure A12.
FINCH per-cluster SHAP importance (Clusters 0–15).
Figure A12.
FINCH per-cluster SHAP importance (Clusters 0–15).
Figure A13.
FINCH per-cluster SHAP importance (Clusters 16–26).
Figure A13.
FINCH per-cluster SHAP importance (Clusters 16–26).
Figure A14.
HDBSCAN per-cluster SHAP importance (3 clusters).
Figure A14.
HDBSCAN per-cluster SHAP importance (3 clusters).
Temporal encodings (day_of_year, day_of_week, hour_of_day) occupy 4–6 of the top 10 positions across all clusters. Cluster 4 () exhibits the strongest seasonal dependence: day_of_year_cos (0.0353) and day_of_year_sin (0.0320), both 15% higher than the global mean. Vessel attributes (length, width) consistently rank in the top 5–8 for 25/28 clusters, with importance 0.015–0.021, indicating that Gaussian mixture components partially align with vessel size classes. This contrasts with kNN-Leiden and FINCH, where vessel attributes rank below position 15 in most clusters.
Figure A15.
VBGMM per-cluster SHAP importance (Clusters 0–15).
Figure A15.
VBGMM per-cluster SHAP importance (Clusters 0–15).
Figure A16.
VBGMM per-cluster SHAP importance (Clusters 16–27).
Figure A16.
VBGMM per-cluster SHAP importance (Clusters 16–27).
Three clustering paradigms emerge: (1) Temporal stratification (kNN-Leiden, FINCH, VBGMM): Day-of-year encodings dominate most clusters (importance 1.5–2.5× kinematic features), organising trajectories by seasonal context before motion characteristics. (2) Density-driven specialisation (HDBSCAN): Interaction features (relative bearing, DCPA, course differences) rank 2–5× higher than temporal features, capturing dense traffic patterns (Cluster 0) and collision-risk scenarios (Cluster 1). (3) Vessel-attribute modulation (VBGMM): Length and width appear in top 10 for 89% of clusters, versus 20% for kNN-Leiden, reflecting probabilistic soft assignment that models vessel-size-dependent behaviour.
Appendix I. Cluster Centroid Feature Profiles
Figure A17,
Figure A18,
Figure A19 and
Figure A20 present radar plots of cluster centroids in the original feature space, visualising normalised mean values for seven key maritime operational features:
speed,
dist_to_land,
water_depth,
density_all,
dist_to_restricted_area,
length, and
width. For each cluster, feature values are aggregated as means across all member trajectories, then normalised to [0,1] relative to the global minimum and maximum observed across all clusters within the method. The resulting radar polygons encode geometric behavioural signatures: large filled areas indicate high values across multiple dimensions, whilst asymmetric shapes reveal specialised operational contexts. For kNN-Leiden, FINCH, and VBGMM, only the first 12 clusters (by index) are displayed due to space constraints, whilst HDBSCAN shows all three clusters. These profiles complement the SHAP importance analysis in
Appendix H by translating cluster centroids into visual interpretability.
Cluster 7 () exhibits extreme normalised values across water depth (≈0.95), distance to land (≈0.90), and vessel dimensions (width/length ), forming a maximal radar polygon that indicates deep-sea operations by large vessels. Cluster 0 () shows minimal coverage (<0.3) across all dimensions, representing traffic density. Clusters 2 (), 3 (), and 6 () form a mid-range group with moderate normalised values (0.4–0.6), likely corresponding to regional ferry routes or coastal fishing activity. Cluster 5 () demonstrates asymmetric geometry: high dist_to_restricted_area (0.85) but low density_all (<0.2), indicating isolated navigation in unrestricted open waters away from congested zones.
Figure A17.
FINCH cluster centroid radar profiles (Clusters 0–11 of 27 total).
Figure A17.
FINCH cluster centroid radar profiles (Clusters 0–11 of 27 total).
Vessel size stratification dominates the radar geometry. Clusters 0 (), 1 (), and 2 () share similar spatial profiles (water depth and distance to land: 0.5–0.7) but exhibit progressive increases in vessel dimensions: length and width rise from 0.3 (Cluster 0) to 0.5 (Cluster 1) to 0.6 (Cluster 2), suggesting that Gaussian mixture components align with ship-size categories (small craft→medium vessels→large ships). Cluster 3 () shows low traffic density (0.15) combined with high dist_to_restricted_area (0.7), indicating sparse navigation in permissive maritime zones. Cluster 11 () demonstrates balanced normalised values (0.4–0.6) across all features, forming a near-circular radar profile that represents general-purpose maritime operations without behavioural specialisation. Unlike FINCH’s sharp extremes, VBGMM produces smoother, overlapping profiles, reflecting probabilistic soft cluster assignments that model transitional vessel behaviours.
Figure A18.
VBGMM cluster centroid radar profiles (Clusters 0–11 of 28 total).
Figure A18.
VBGMM cluster centroid radar profiles (Clusters 0–11 of 28 total).
Cluster 0 (, 80% of non-noise data) exhibits maximal normalised coverage (0.6–0.9) across all features, forming a large, filled radar polygon that aggregates mainstream maritime traffic. The broad coverage reflects high internal variance: this density-dominant hub encompasses diverse vessel types, operational contexts, and spatial distributions. Cluster 1 (, outlier micro-cluster) shows extreme water depth (1.0) and distance to land (0.95) with minimal traffic density (<0.2), consistent with isolated deep-sea trajectories potentially representing anomalies, measurement errors, or rare offshore operations. Cluster 2 () presents mid-range speed (0.5) combined with low spatial and vessel-size values (<0.3), likely representing slow-moving small craft in shallow coastal waters. The density-based paradigm prioritises cluster compactness over granularity, producing fewer, larger aggregations compared to graph-based (kNN-Leiden: 47 clusters) or mixture-based (VBGMM: 28 clusters) methods.
Figure A19.
HDBSCAN cluster centroid radar profiles (all 3 clusters).
Figure A19.
HDBSCAN cluster centroid radar profiles (all 3 clusters).
Fine-grained graph partitioning produces highly specialised radar signatures. Cluster 2 (, largest among displayed clusters) demonstrates extreme normalised values for water depth (0.95) and vessel size (width/length: 0.85), indicating deep-water corridors used by large commercial vessels (cargo ships, tankers). Clusters 0 () and 8 () show compact, low-magnitude radar polygons (<0.4), representing nearshore operations by small craft with limited spatial range. Cluster 10 () exhibits asymmetric geometry: high distance to land (0.8) combined with low water depth (0.3), suggesting navigation along shallow offshore routes (e.g., island-hopping trajectories or archipelago transits). Cluster 7 () mirrors FINCH Cluster 7, both displaying maximal depth and distance profiles, confirming consistent identification of deep-sea operational modes across hierarchical (FINCH) and graph-based (kNN-Leiden) paradigms. The modularity optimisation strategy partitions the embedding space into 47 micro-behaviours, capturing operational nuances not resolved by coarser methods.
Three interpretability paradigms emerge from radar geometry:
(1) Extreme-behaviour isolation (FINCH, kNN-Leiden): Sharp, non-overlapping radar profiles with maximal or minimal normalised values enable detection of outlier operational modes (deep-sea routes, micro-density clusters, nearshore anomalies).
(2) Size-stratified soft partitioning (VBGMM): Smooth, graduated profiles along vessel-dimension axes (length, width) indicate latent Gaussian components aligned with ship-size categories, whilst overlapping spatial features (depth, distance) reflect mixed-fleet operations in shared maritime zones.
(3) Density-aggregated summaries (HDBSCAN): Large, filled radar areas with high internal variance prioritise cluster compactness over behavioural granularity, suitable for high-level traffic pattern recognition but limited in discriminating fine operational contexts. Radar visualisation complements SHAP feature importance rankings (
Appendix G and
Appendix H) by providing geometric interpretability of cluster centroids: whilst SHAP quantifies which features drive cluster assignments, radar plots reveal what values those features take within each cluster, enabling domain experts to validate clustering outputs against known maritime operational categories.
Figure A20.
kNN-Leiden cluster centroid radar profiles (Clusters 0–11 of 47 total).
Figure A20.
kNN-Leiden cluster centroid radar profiles (Clusters 0–11 of 47 total).
Appendix J. Pairwise Cluster-Assignment Agreement
To assess whether the discovered behavioural groupings are a robust property of the representation space rather than an artefact of a particular algorithm, we compute three standard external cluster-agreement metrics between every pair of the four clustering algorithms, applied to the same fixed pool of 50,000 trajectory segments:
Adjusted Rand Index (ARI) [
78]: measures pair-wise label co-assignment, corrected for chance; range
, higher is better, 0 = random.
Normalised Mutual Information (NMI) [
79]: measures shared information between partitions normalised by partition entropy; range
, higher is better.
Fowlkes–Mallows Index (FMI) [
80]: geometric mean of pair-wise precision and recall; range
, higher is better.
All three metrics operate on pair-wise co-membership and are therefore insensitive to cluster relabelling permutations. HDBSCAN noise treatment: HDBSCAN assigns the special label to noise and border points (51.8% of all points for embeddings; 86.7% for expert features) and invalidates any pairwise comparison. For all pairs that involve HDBSCAN, we therefore exclude noise points and compute metrics only on the non-noise subset; the corresponding labels of the other algorithm on those same rows are used. Pairs between covering algorithms (kNN-Leiden, VBGMM, FINCH) are evaluated on all 50,000 points.
Table A13 and
Table A14 report all six unique off-diagonal pairs for learnt embeddings and expert features, respectively.
Table A13.
Pairwise cluster-assignment agreement (ARI/NMI/FMI) for learnt embeddings (GMAE-REx encoder, 128-d), computed on the fixed 50,000-segment evaluation pool. Algorithm outputs: kNN-Leiden 47 clusters, VBGMM 28 clusters, FINCH 27 clusters, HDBSCAN 4 clusters (51.8% noise). HDBSCAN pairs are computed on the non-noise points only; see text for noise-handling details. Covering pairs use all 50,000 points.
Table A13.
Pairwise cluster-assignment agreement (ARI/NMI/FMI) for learnt embeddings (GMAE-REx encoder, 128-d), computed on the fixed 50,000-segment evaluation pool. Algorithm outputs: kNN-Leiden 47 clusters, VBGMM 28 clusters, FINCH 27 clusters, HDBSCAN 4 clusters (51.8% noise). HDBSCAN pairs are computed on the non-noise points only; see text for noise-handling details. Covering pairs use all 50,000 points.
| Pair | ARI | NMI | FMI | n |
|---|
| Covering pairs—all 50,000 points assigned to a named cluster |
| kNN-Leiden/VBGMM | | | | |
| kNN-Leiden/FINCH | | | | |
| VBGMM/FINCH | | | | |
| Mean (covering pairs †) | 0.302 | 0.602 | 0.346 | |
| HDBSCAN pairs—restricted to non-noise HDBSCAN points |
| HDBSCAN/kNN-Leiden | | | | |
| HDBSCAN/VBGMM | | | | |
| HDBSCAN/FINCH | | | | |
Table A14.
Pairwise cluster-assignment agreement (ARI/NMI/FMI) for
expert features (74-dimensional hand-crafted feature vector), computed on the same 50,000-segment pool. Algorithm outputs: kNN-Leiden 48 clusters, VBGMM 28 clusters, FINCH 8 clusters, HDBSCAN 3 clusters (86.7% noise). HDBSCAN pairs are computed on the
non-noise points only. The covering pair mean NMI is 0.37, which is 38% lower than for learnt embeddings (0.60;
Table A13), indicating weaker latent structure in the expert feature space.
Table A14.
Pairwise cluster-assignment agreement (ARI/NMI/FMI) for
expert features (74-dimensional hand-crafted feature vector), computed on the same 50,000-segment pool. Algorithm outputs: kNN-Leiden 48 clusters, VBGMM 28 clusters, FINCH 8 clusters, HDBSCAN 3 clusters (86.7% noise). HDBSCAN pairs are computed on the
non-noise points only. The covering pair mean NMI is 0.37, which is 38% lower than for learnt embeddings (0.60;
Table A13), indicating weaker latent structure in the expert feature space.
| Pair | ARI | NMI | FMI | n |
|---|
| Covering pairs—all 50,000 points assigned to a named cluster |
| kNN-Leiden/VBGMM | | | | |
| kNN-Leiden/FINCH | | | | |
| VBGMM/FINCH | | | | |
| Mean (covering pairs †) | 0.125 | 0.365 | 0.216 | |
| HDBSCAN pairs—restricted to non-noise HDBSCAN points |
| HDBSCAN/kNN-Leiden | | | | 6651 |
| HDBSCAN/VBGMM | | | | 6651 |
| HDBSCAN/FINCH | | | | 6651 |
The three covering algorithms—kNN-Leiden (graph community detection), VBGMM (Bayesian mixture), and FINCH (hierarchical first-neighbour)—show moderate-to-strong mutual agreement on the learnt embeddings, with mean NMI = 0.60 across their three unique pairs. This convergence across algorithms with fundamentally different inductive biases (modularity maximisation, generative density modelling, and parameter-free hierarchical partitioning) provides direct evidence that the behavioural structure recovered in the GMAE-REx embedding space is a genuine, algorithm-independent property of the representation rather than an algorithmic artefact.
For expert features, the same three covering pairs yield a mean NMI of 0.37, a 38% reduction relative to learnt embeddings, indicating that the expert feature space contains less consistently recoverable structure. This contrast strengthens the claim that representation learning (Contribution 2–5) yields a more structured and generalisable latent space than hand-crafted feature engineering alone.
HDBSCAN interpretation. After excluding noise points, the HDBSCAN pairs tell qualitatively different stories for the two representation types:
Learnt embeddings: the 24,081 non-noise HDBSCAN points show near-zero agreement with all three covering algorithms (NMI –). HDBSCAN identified only four extremely dense regions in the 128-d embedding space; the structure-based methods (kNN-Leiden, VBGMM, FINCH) further subdivide those same points into 10–30 finer behavioural sub-groups. The disconnect reflects a genuine difference in granularity rather than disagreement about which points are similar.
Expert features: the 6651 non-noise HDBSCAN points show moderate-to-high agreement with FINCH (NMI = 0.55) and kNN-Leiden (NMI = 0.38), but low agreement with VBGMM (NMI = 0.04). With only three dense clusters in a 74-d feature space, HDBSCAN and FINCH (8 clusters) identify compatible coarse groupings, while the 28-component VBGMM partitions the same points into many small Gaussian components that do not align with the dense HDBSCAN regions.
In both cases the noise fraction itself is informative: 51.8% (embeddings) and 86.7% (expert) of segments are categorised as noise by HDBSCAN, confirming that trajectory behaviour is broadly continuous and does not naturally decompose into a small number of sharply separated dense clusters at the chosen minimum-cluster-size parameter.
Appendix K. Preprocessing Pipeline: Processing Latency
To assess the computational cost of the preprocessing pipeline described in
Section 3, we ran a dedicated profiling experiment on 100 one-hour AIS message files from the Kiel coastal receiver, processed sequentially on a
single core of an AMD EPYC 7713 processor. The 100 files represent a timing benchmark, not the complete two-year study archive; because throughput is reported as a per-unit rate, the estimates transfer directly to any dataset size. The experiment covered the complete pipeline from raw message decoding through Kalman smoothing, PCHIP interpolation, and feature engineering (ship-level kinematic, spatial, traffic-density, and temporal features), through to the CPA-based ship-to-ship interaction features (DCPA, TCPA, relative bearing, and collision risk index).
The 100 files yielded 835.63 h of trajectory data and 1,784,074 ship-to-ship interactions in total.
Table A15 reports the measured throughputs.
Table A15.
Measured preprocessing throughput on a single CPU core (AMD EPYC 7713).
Table A15.
Measured preprocessing throughput on a single CPU core (AMD EPYC 7713).
| Stage | Total Time (100 Files) | Per-Unit Rate |
|---|
| Ship-level features | 22.52 min | 26.95 ms per trajectory-minute |
| Ship-to-ship interaction features | 45.18 min | 1.52 ms per interaction |
For a standard 10-min trajectory segment, ship-level processing costs ms. The dataset averages interactions per trajectory-minute, yielding interactions per 10-min segment and a further ms for ship-to-ship features. End-to-end preprocessing latency per 10-min segment: ms on a single CPU core. The subsequent GMAE-REx encoder forward pass (≈1.0 M parameters, sequence length 120) adds negligible overhead. Since vessels are processed independently, the pipeline scales linearly with the number of available CPU cores.
The ship-level kinematic features (speed, course, ROT, temporal encodings) are computable from on-board sensor fusion (Global Navigation Satellite System (GNSS), inertial measurement unit, vessel log), so the framework remains applicable even when external AIS reception is limited; only the ship-to-ship interaction features additionally require a surrounding traffic picture from a VTS feed, radar tracker, or shore-based AIS aggregator.
Figure 1.
Different navigationally relevant map layers extracted from OpenStreetMap via the Overpass API [
50], colour-coded for visual distinction. Commercial ferry routes (green) are widened to improve visual clarity.
Figure 1.
Different navigationally relevant map layers extracted from OpenStreetMap via the Overpass API [
50], colour-coded for visual distinction. Commercial ferry routes (green) are widened to improve visual clarity.
Figure 2.
Qualitative assessment of the smoothing and interpolation pipeline on two representative 10 min AIS trajectory segments (120 observations at 10 s resolution) from two different sailing vessels in the Kiel Fjord. In each panel, blue dots indicate the raw AIS position reports as broadcast and the red line shows the fully preprocessed trajectory after Kalman smoothing and PCHIP interpolation. Sailing vessels are chosen because their wind-driven tacking behaviour produces frequent, pronounced direction reversals within a single 10 min window—the most demanding scenario for a constant-velocity motion model. (a) A sailing vessel exhibiting high positional scatter in the raw reports; the smoother suppresses the noise while the tacking pattern is faithfully preserved in the interpolated output. (b) A sailing vessel combining substantial raw scatter with strong directional complexity; the curvature-dependent gap tolerance (2 min for turning segments) prevents the Kalman model from being applied across extended high-ROT intervals. In both cases the pipeline preserves the qualitative shape of the manoeuvre while effectively removing positional noise.
Figure 2.
Qualitative assessment of the smoothing and interpolation pipeline on two representative 10 min AIS trajectory segments (120 observations at 10 s resolution) from two different sailing vessels in the Kiel Fjord. In each panel, blue dots indicate the raw AIS position reports as broadcast and the red line shows the fully preprocessed trajectory after Kalman smoothing and PCHIP interpolation. Sailing vessels are chosen because their wind-driven tacking behaviour produces frequent, pronounced direction reversals within a single 10 min window—the most demanding scenario for a constant-velocity motion model. (a) A sailing vessel exhibiting high positional scatter in the raw reports; the smoother suppresses the noise while the tacking pattern is faithfully preserved in the interpolated output. (b) A sailing vessel combining substantial raw scatter with strong directional complexity; the curvature-dependent gap tolerance (2 min for turning segments) prevents the Kalman model from being applied across extended high-ROT intervals. In both cases the pipeline preserves the qualitative shape of the manoeuvre while effectively removing positional noise.
![Jmse 14 00507 g002 Jmse 14 00507 g002]()
Figure 3.
Comparison of raw and processed AIS data in the Port of Kiel area. (Left) Raw AIS position reports showing noise, outliers, and irregular sampling. (Right) Processed vessel trajectories after filtering, segmentation, smoothing, and interpolation. Each colour represents a distinct trajectory segment with a unique identifier. The preprocessing pipeline successfully removes outliers, segments continuous vessel movements, and produces clean trajectories suitable for downstream analysis.
Figure 3.
Comparison of raw and processed AIS data in the Port of Kiel area. (Left) Raw AIS position reports showing noise, outliers, and irregular sampling. (Right) Processed vessel trajectories after filtering, segmentation, smoothing, and interpolation. Each colour represents a distinct trajectory segment with a unique identifier. The preprocessing pipeline successfully removes outliers, segments continuous vessel movements, and produces clean trajectories suitable for downstream analysis.
Figure 4.
Daily AIS trajectory steps count after interpolation for the years 2022–2023. Data from Kiel University of Applied Science (blue) capture a larger number of vessels when available, resulting in higher daily counts, while data from the DMA (orange) show more consistent coverage across the full two-year period.
Figure 4.
Daily AIS trajectory steps count after interpolation for the years 2022–2023. Data from Kiel University of Applied Science (blue) capture a larger number of vessels when available, resulting in higher daily counts, while data from the DMA (orange) show more consistent coverage across the full two-year period.
Figure 5.
Vessel density in the Kiel area at a spatial resolution of m. The density is computed separately for each ship group (cargo, passenger, sailing, and other) using only AIS trajectories from the training dataset.
Figure 5.
Vessel density in the Kiel area at a spatial resolution of m. The density is computed separately for each ship group (cargo, passenger, sailing, and other) using only AIS trajectories from the training dataset.
Figure 6.
Architecture of GroupMAE-REx for AIS trajectory representation learning.
Figure 6.
Architecture of GroupMAE-REx for AIS trajectory representation learning.
Figure 7.
Representation learning pipelines.
Figure 7.
Representation learning pipelines.
Figure 8.
UMAP projections of VBGMM clustering (28 components). (
a) Learnt embeddings with 28 well-separated clusters with mixed ship types per cluster, organised by operational context. (
b) Expert features with 28 overlapping clusters with ship-type segregation, organised by kinematic similarity. Each subfigure shows cluster assignments ((
left), coloured by cluster ID) and ship-type distribution ((
right), coloured by vessel category). Learnt embeddings form spatially compact, well-separated clusters containing mixed vessel types, indicating organisation by operational navigation modes (e.g., channel transit, port manoeuvring, and anchorage). Expert features produce diffuse, overlapping clusters with greater ship-type segregation, indicating organisation by kinematic profiles (speed/course characteristics). Quantitative metrics in
Table 17 confirm superior cluster quality for embeddings: DBCV (
vs.
), conductance (
vs.
), modularity (
vs.
). This demonstrates that learnt representations capture operational context essential for autonomous navigation, transcending vessel-specific characteristics. Complete projections for all methods are in
Appendix F.
Figure 8.
UMAP projections of VBGMM clustering (28 components). (
a) Learnt embeddings with 28 well-separated clusters with mixed ship types per cluster, organised by operational context. (
b) Expert features with 28 overlapping clusters with ship-type segregation, organised by kinematic similarity. Each subfigure shows cluster assignments ((
left), coloured by cluster ID) and ship-type distribution ((
right), coloured by vessel category). Learnt embeddings form spatially compact, well-separated clusters containing mixed vessel types, indicating organisation by operational navigation modes (e.g., channel transit, port manoeuvring, and anchorage). Expert features produce diffuse, overlapping clusters with greater ship-type segregation, indicating organisation by kinematic profiles (speed/course characteristics). Quantitative metrics in
Table 17 confirm superior cluster quality for embeddings: DBCV (
vs.
), conductance (
vs.
), modularity (
vs.
). This demonstrates that learnt representations capture operational context essential for autonomous navigation, transcending vessel-specific characteristics. Complete projections for all methods are in
Appendix F.
![Jmse 14 00507 g008 Jmse 14 00507 g008]()
Figure 9.
UMAP visualisation of learnt trajectory embeddings coloured by trajectory-level mean values of selected AIS and contextual features. Panel (a) highlights cargo trajectories, which concentrate in two regions on the left side of the embedding space. Panels (b,c) show that both regions correspond to vessels with larger length and width. Panels (d–f) indicate differences in operating context: the left region is associated with deeper waters and larger distances to land and restricted areas, while the upper-left region relates to shallower waters and closer proximity to coastlines and regulated regions. This spatial organisation demonstrates that GMAE-REx embeddings encode operational environmental context alongside vessel characteristics.
Figure 9.
UMAP visualisation of learnt trajectory embeddings coloured by trajectory-level mean values of selected AIS and contextual features. Panel (a) highlights cargo trajectories, which concentrate in two regions on the left side of the embedding space. Panels (b,c) show that both regions correspond to vessels with larger length and width. Panels (d–f) indicate differences in operating context: the left region is associated with deeper waters and larger distances to land and restricted areas, while the upper-left region relates to shallower waters and closer proximity to coastlines and regulated regions. This spatial organisation demonstrates that GMAE-REx embeddings encode operational environmental context alongside vessel characteristics.
Figure 10.
UMAP visualisation of the learnt trajectory embeddings coloured by ship type. Cargo trajectories in panel (
a) mainly occupy the left and upper-left regions. Passenger trajectories in panel (
b) appear more often in the lower-right region, consistent with ferry operations close to land (cf.
Figure 9e). Sailing trajectories in panel (
c) are more frequent in the central region, farther from the coastline. The vessel length and width patterns in
Figure 9a,b suggest that passenger and sailing trajectories are associated with smaller physical dimensions. The mixed-type composition of individual clusters in
Figure 8 indicates that the embedding organises trajectories by operational behaviour rather than vessel identity.
Figure 10.
UMAP visualisation of the learnt trajectory embeddings coloured by ship type. Cargo trajectories in panel (
a) mainly occupy the left and upper-left regions. Passenger trajectories in panel (
b) appear more often in the lower-right region, consistent with ferry operations close to land (cf.
Figure 9e). Sailing trajectories in panel (
c) are more frequent in the central region, farther from the coastline. The vessel length and width patterns in
Figure 9a,b suggest that passenger and sailing trajectories are associated with smaller physical dimensions. The mixed-type composition of individual clusters in
Figure 8 indicates that the embedding organises trajectories by operational behaviour rather than vessel identity.
Figure 11.
Global SHAP feature importance aggregated across all 28 VBGMM clusters. Temporal encodings (
day_of_year_cos/sin,
day_of_week_cos/sin) dominate the top 4 positions, indicating that seasonal and weekly patterns are primary drivers of cluster differentiation. Kinematic features (
length,
speed,
course,
acc) occupy mid-range ranks, whilst environmental features (
water_depth,
dist_to_land,
density_all) and interaction features (
dcpa_0,
tcpa_0) contribute at moderate levels. This hierarchy suggests that GMAE-REx embeddings encode operational context (when and how vessels navigate) rather than vessel identity (what vessel types navigate). Detailed per-cluster SHAP rankings and comparisons with kNN-Leiden, FINCH, and HDBSCAN are provided in
Appendix H.
Figure 11.
Global SHAP feature importance aggregated across all 28 VBGMM clusters. Temporal encodings (
day_of_year_cos/sin,
day_of_week_cos/sin) dominate the top 4 positions, indicating that seasonal and weekly patterns are primary drivers of cluster differentiation. Kinematic features (
length,
speed,
course,
acc) occupy mid-range ranks, whilst environmental features (
water_depth,
dist_to_land,
density_all) and interaction features (
dcpa_0,
tcpa_0) contribute at moderate levels. This hierarchy suggests that GMAE-REx embeddings encode operational context (when and how vessels navigate) rather than vessel identity (what vessel types navigate). Detailed per-cluster SHAP rankings and comparisons with kNN-Leiden, FINCH, and HDBSCAN are provided in
Appendix H.
Figure 12.
Per-cluster SHAP feature importance showing the top 15 features for each of the 28 VBGMM clusters. Temporal features are consistently important across most clusters, but specific clusters exhibit elevated importance for kinematic, spatial, interaction, or density features, indicating context-dependent feature salience. This heterogeneity demonstrates that the embedding space supports multiple behavioural facets rather than a single canonical pattern. Detailed cluster-specific analyses and feature interaction studies for all clustering methods are provided in
Appendix H.
Figure 12.
Per-cluster SHAP feature importance showing the top 15 features for each of the 28 VBGMM clusters. Temporal features are consistently important across most clusters, but specific clusters exhibit elevated importance for kinematic, spatial, interaction, or density features, indicating context-dependent feature salience. This heterogeneity demonstrates that the embedding space supports multiple behavioural facets rather than a single canonical pattern. Detailed cluster-specific analyses and feature interaction studies for all clustering methods are provided in
Appendix H.
Figure 13.
VBGMM cluster centroid radar profiles (Clusters 0–11 of 28 total). Each polygon represents the operational signature of one cluster, computed as the mean across all member trajectories and normalised to [0,1] relative to global feature extrema within the method. Large filled areas indicate high values across multiple dimensions, whilst asymmetric shapes reveal specialised operational contexts. These profiles complement SHAP analysis by visualising cluster centroids in the original feature space rather than quantifying feature importance for cluster assignments. Complete profiles for all 28 VBGMM clusters and comparative analyses for kNN-Leiden (47 clusters), FINCH (27 clusters), and HDBSCAN (3 clusters) are in
Appendix I.
Figure 13.
VBGMM cluster centroid radar profiles (Clusters 0–11 of 28 total). Each polygon represents the operational signature of one cluster, computed as the mean across all member trajectories and normalised to [0,1] relative to global feature extrema within the method. Large filled areas indicate high values across multiple dimensions, whilst asymmetric shapes reveal specialised operational contexts. These profiles complement SHAP analysis by visualising cluster centroids in the original feature space rather than quantifying feature importance for cluster assignments. Complete profiles for all 28 VBGMM clusters and comparative analyses for kNN-Leiden (47 clusters), FINCH (27 clusters), and HDBSCAN (3 clusters) are in
Appendix I.
Figure 14.
Spatial collision risk profiles for two contrasting kNN-Leiden clusters. Each cell in the
raster shows the mean collision risk (Equation (
1)) averaged over all AIS observations from trajectory segments belonging to the cluster; cells with no observations are transparent. (
a) Cluster 16: high-risk encounter pattern, with elevated collision risk concentrated in the main channel and ferry-terminal approaches of Kiel Fjord, corresponding to geometrically constrained zones where simultaneous low TCPA and low DCPA are structurally enforced by converging traffic. (
b) Cluster 29: low-risk transit pattern, showing near-zero mean collision risk across the full study area, characterising trajectories that consistently operate with either ample temporal or spatial separation from other vessels. The geographic contrast between the two profiles demonstrates that the discovered behavioural clusters encode verifiable, safety-relevant navigational scenarios.
Figure 14.
Spatial collision risk profiles for two contrasting kNN-Leiden clusters. Each cell in the
raster shows the mean collision risk (Equation (
1)) averaged over all AIS observations from trajectory segments belonging to the cluster; cells with no observations are transparent. (
a) Cluster 16: high-risk encounter pattern, with elevated collision risk concentrated in the main channel and ferry-terminal approaches of Kiel Fjord, corresponding to geometrically constrained zones where simultaneous low TCPA and low DCPA are structurally enforced by converging traffic. (
b) Cluster 29: low-risk transit pattern, showing near-zero mean collision risk across the full study area, characterising trajectories that consistently operate with either ample temporal or spatial separation from other vessels. The geographic contrast between the two profiles demonstrates that the discovered behavioural clusters encode verifiable, safety-relevant navigational scenarios.
Table 1.
Overview of AIS message classes and commonly used message types [
51].
Table 1.
Overview of AIS message classes and commonly used message types [
51].
| Message Class | Description | Message IDs |
|---|
| Class A | Position report | 1, 2, 3 |
| Class A | Static and voyage-related data | 5 |
| Class B | Position report | 18, 19 |
| Class B | Static data report | 24 |
Table 2.
Trajectory table containing dynamic AIS information.
Table 2.
Trajectory table containing dynamic AIS information.
| Column | Description | Manually Entered |
|---|
| mmsi | MMSI | No |
| timestamp | UTC timestamp of the AIS message | No |
| lon | Longitude (EPSG:4326) | No |
| lat | Latitude (EPSG:4326) | No |
| status | Navigational status | Yes |
| cog | COG | No |
| heading | True heading | No |
| draught | Reported vessel draught | Yes |
Table 3.
Static vessel information table.
Table 3.
Static vessel information table.
| Column | Description | Manually Entered |
|---|
| mmsi | MMSI | No |
| ship_type | AIS vessel type code | Yes |
| to_bow | Distance from GPS antenna to bow | Yes |
| to_stern | Distance from GPS antenna to stern | Yes |
| to_port | Distance from GPS antenna to port side | Yes |
| to_starboard | Distance from GPS antenna to starboard side | Yes |
Table 4.
Clustering models considered in Pipeline 2 for behavioural analysis.
Table 4.
Clustering models considered in Pipeline 2 for behavioural analysis.
| Model | Core Assumptions | Fitness for Maritime Behaviour |
|---|
| HDBSCAN [63] | Clusters are high-density regions separated by low density. No assumptions about cluster number or shape. | Excellent for arbitrary-shape behaviour modes and noise handling. Hierarchical structure enables multi-scale analysis of local manoeuvres and global routes. |
| VBGMM [64] | Gaussian mixture components with VB inference automatically determining effective number of clusters. | High. Mixture components approximate complex distributions. Probabilistic nature valuable for anomaly detection and modelling speed/heading variability. |
| k-NN + Leiden [65] | Dense feature-space regions correspond to densely connected graph communities. | Excellent for interconnected route networks. Connectivity guarantee prevents fragmented clusters and improves stability. |
| FINCH [66] | Shared nearest neighbour relationships form hierarchical multi-scale patterns. | High. Hierarchical approach suits multi-scale behaviours. Parameter-free (except k) and robust for exploratory clustering. |
Table 6.
Post-processed dataset summary used for experiments.
Table 6.
Post-processed dataset summary used for experiments.
| Item | Value |
|---|
| Region of interest | Port of Kiel and surrounding waters (( E, N), ( E, N)) |
| Time span | 730 days, 2022–2023 |
| Temporal resolution (post-processing) | 5 s (fixed temporal grid) |
| Segment length for learning | time steps (10 min) |
| Trajectories (after processing) | 176,787 |
| Unique vessels (MMSI) | 9948 |
| Total interpolated points | 63,448,367 |
| Total segments | 527,225 |
| Interaction range | 2 km |
| Neighbour retention | Top 2 neighbours by collision-risk score |
| Feature definitions | See Appendix A |
Table 7.
Hyper-parameter search space for encoder selection (Experiment I). Following Contribution 2, optimisation is conducted exclusively on self-supervised validation loss.
Table 7.
Hyper-parameter search space for encoder selection (Experiment I). Following Contribution 2, optimisation is conducted exclusively on self-supervised validation loss.
| Hyper-Parameter | | Optuna Search Space |
|---|
| Batch size | | |
| Learning rate | | (fixed) |
| Encoder layers | | |
| Decoder layers | | |
| Model dimension | | |
| FFN dimension | | |
| Attention heads | | |
| Dropout | | |
| Noise (DAE/EAE only) | | |
Table 8.
Sensitivity to group mask ratio (levelA, density4), .
Table 8.
Sensitivity to group mask ratio (levelA, density4), .
| Mask Ratio (Group Mask) | 0.25 | 0.35 | 0.50 | 0.75 |
|---|
| Linear Probe (Acc.) | 0.7447 | 0.7447 | 0.7535 | 0.7479 |
| Fine-tune (Acc.) | 0.8514 | 0.8514 | 0.8548 | 0.8524 |
Table 9.
Grouping scheme ablation (group mask rate = 0.5, env scheme = density4, ).
Table 9.
Grouping scheme ablation (group mask rate = 0.5, env scheme = density4, ).
| Group Scheme | levelA | levelB |
|---|
| Linear Probe (Acc.) | 0.7447 | 0.7474 |
| Fine-tune (Acc.) | 0.8517 | 0.8556 |
Table 10.
Environment scheme ablation (group scheme = levelB, group mask rate = 0.5, ).
Table 10.
Environment scheme ablation (group scheme = levelB, group mask rate = 0.5, ).
| Env Scheme | density4 | geo4 | densitygeo16 | densityhour16 |
|---|
| Linear Probe (Acc.) | 0.7447 | 0.7495 | 0.7444 | 0.7484 |
| Fine-tune (Acc.) | 0.8517 | 0.8508 | 0.8513 | 0.8526 |
Table 11.
REx weight ablation (group scheme = levelB, group mask rate = 0.5, env scheme = densityhour16).
Table 11.
REx weight ablation (group scheme = levelB, group mask rate = 0.5, env scheme = densityhour16).
| 0.0 | 0.05 | 0.1 | 0.2 |
|---|
| Linear Probe (Acc.) | 0.7454 | 0.7510 | 0.7447 | 0.7437 |
| Fine-tune (Acc.) | 0.8515 | 0.8547 | 0.8517 | 0.8513 |
Table 12.
Best hyperparameter configurations and validation accuracy for each encoder (Experiment I). GMAE-REx (Contributions 3–4) achieves the best performance, outperforming all baseline architectures. Trainable parameter counts (Params) are reported for the best-found configuration of the hyperparameters found in
Table 7.
Table 12.
Best hyperparameter configurations and validation accuracy for each encoder (Experiment I). GMAE-REx (Contributions 3–4) achieves the best performance, outperforming all baseline architectures. Trainable parameter counts (Params) are reported for the best-found configuration of the hyperparameters found in
Table 7.
| HParam | GMAE-REx | DAE | EAE | TCN | Transformer | LiST |
|---|
| 32 | 64 | 128 | 32 | 32 | 32 |
| | | | | | |
| 3 | 4 | 5 | 3 | 3 | 3 |
| 3 | 4 | 3 | 1 | 1 | 1 |
| 128 | 64 | 64 | 128 | 128 | 128 |
| 256 | 512 | 512 | – | – | – |
| 4 | 8 | 4 | – | 4 | – |
| | | | | | |
| – | | | – | – | – |
| Params | ≈1.01 M | ≈0.74 M | ≈0.74 M | ≈25 K | ≈0.58 M | ≈0.60 M |
| Validation Accuracy | % | | | | | |
Table 13.
Clustering methods used for Experiment II and their fitness for maritime trajectory analysis. Detailed descriptions and mathematical formulations are provided in
Appendix B.
Table 13.
Clustering methods used for Experiment II and their fitness for maritime trajectory analysis. Detailed descriptions and mathematical formulations are provided in
Appendix B.
| Method | Key Characteristics | Fitness for AIS Behaviour |
|---|
| kNN-Leiden | Graph-based community detection via modularity optimisation with guaranteed connectivity [65]. | Excellent for route-network structures and heterogeneous density; connectivity guarantee reduces fragmentation. |
| HDBSCAN | Hierarchical density-based clustering with explicit noise handling via mutual reachability [63]. | Excellent for arbitrary-shape clusters and anomaly detection; multi-scale structure captures local maneuvers and global patterns. |
| VBGMM | Variational Bayesian Gaussian Mixture with automatic component selection [71]. | High fitness for overlapping behaviours and uncertainty quantification; aligns with Contribution 5. |
| FINCH | Parameter-free hierarchical clustering via first-neighbour relations [66]. | High fitness for exploratory multi-scale analysis; minimal tuning required. |
Table 14.
Distance metrics and optimisation objectives for each clustering method and representation (Experiment II).
Table 14.
Distance metrics and optimisation objectives for each clustering method and representation (Experiment II).
| Method | Distance Metric | Optimisation Objective |
|---|
| kNN-Leiden | Cosine | Modularity |
| HDBSCAN | Euclidean | DBCV |
| VBGMM | Probabilistic | ELBO |
| FINCH | Cosine | Silhouette |
Table 15.
Optuna search space for Stage 1 rough hyperparameter optimisation (Experiment II).
Table 15.
Optuna search space for Stage 1 rough hyperparameter optimisation (Experiment II).
| Method | Hyperparameter | Optuna Search Space |
|---|
| kNN-Leiden | n_neighbors | [10, 50] |
| resolution | [0.5, 2.0], step 0.1 |
| HDBSCAN | min_cluster_size | [20, 200], step 5 |
| min_samples | [5, 50] |
| cluster_selection_epsilon | [0.0, 1.0], step 0.05 |
| VBGMM | n_components | [5, 25] |
| weight_concentration_prior | [0.001, 1.0] |
| mean_precision_prior | [0.0001, 0.1] |
| FINCH | n_neighbors | [10, 50] |
Table 16.
Grid search space for Stage 2 fine hyperparameter optimisation (Experiment II). Each range is centred around the Optuna optimum.
Table 16.
Grid search space for Stage 2 fine hyperparameter optimisation (Experiment II). Each range is centred around the Optuna optimum.
| Method | Hyperparameter | Grid Specification |
|---|
| kNN-Leiden | n_neighbors | Spread: 8, count: 5, global bounds: [3, 150] |
| resolution | Spread: 0.5, count: 5, global bounds: [0.1, 5.0] |
| HDBSCAN | min_cluster_size | Spread: 20, count: 5, global bounds: [5, 500] |
| min_samples | Spread: 10, count: 5, global bounds: [1, 100] |
| cluster_selection_epsilon | Spread: 0.2, count: 5, global bounds: [0.0, 2.0] |
| VBGMM | n_components | Spread: 3, count: 3, global bounds: [2, 30] |
| weight_concentration_prior | Fixed grid: [0.01, 0.1, 1.0] |
| mean_precision_prior | Fixed grid: [0.001, 0.01, 0.1] |
| FINCH | n_neighbors | Spread: 10, count: 5, global bounds: [3, 100] |
Table 17.
Intrinsic clustering metrics on a fixed random sample of 50,000 trajectory segments (Experiment II, Contribution 6). Higher ↑ is better for DBCV and modularity; lower ↓ is better for conductance. Learnt embeddings consistently outperform expert features across kNN-Leiden, FINCH, and VBGMM. Complete metric tables and additional traditional metrics are provided in
Appendix D; metric properties and formulations are detailed in
Appendix C.
Table 17.
Intrinsic clustering metrics on a fixed random sample of 50,000 trajectory segments (Experiment II, Contribution 6). Higher ↑ is better for DBCV and modularity; lower ↓ is better for conductance. Learnt embeddings consistently outperform expert features across kNN-Leiden, FINCH, and VBGMM. Complete metric tables and additional traditional metrics are provided in
Appendix D; metric properties and formulations are detailed in
Appendix C.
| Method | Expert Features | | Learnt Embeddings |
|---|
| DBCV ↑ | Conductance ↓ | Modularity ↑ | Communities | | DBCV ↑ | Conductance ↓ | Modularity ↑ | Communities |
|---|
| kNN-Leiden | | | | 48 | | ↑ | ↓ | ↑ | 47 |
| FINCH | | | | 8 | | ↑ | ↓ | | 27 |
| HDBSCAN | | ↓ | – | 2 | | ↑ | | – | 3 |
| VBGMM | | | | 28 | | ↑ | ↓ | ↑ | 28 |