Representation Learning for Maritime Vessel Behaviour: A Three-Stage Pipeline for Robust Trajectory Embeddings

Al-Falouji, Ghassan; Gao, Shang; Huang, Zhixin; Biesenbach, Ben; Kröger, Peer; Sick, Bernhard; Tomforde, Sven

doi:10.3390/jmse14050507

Open AccessArticle

Representation Learning for Maritime Vessel Behaviour: A Three-Stage Pipeline for Robust Trajectory Embeddings

by

Ghassan Al-Falouji

^1,*,†

,

Shang Gao

^2,†

,

Zhixin Huang

^2,†

,

Ben Biesenbach

^1,†

,

Peer Kröger

³

,

Bernhard Sick

²

and

Sven Tomforde

^1,*

¹

Intelligent Systems, Kiel University, 24118 Kiel, Germany

²

Intelligent Embedded Systems, Kassel University, 34121 Kassel, Germany

³

Database Systems and Data Mining, Kiel University, 24118 Kiel, Germany

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Mar. Sci. Eng. 2026, 14(5), 507; https://doi.org/10.3390/jmse14050507

Submission received: 29 January 2026 / Revised: 28 February 2026 / Accepted: 4 March 2026 / Published: 8 March 2026

(This article belongs to the Special Issue Intelligent Solutions for Marine Operations)

Download

Browse Figures

Versions Notes

Abstract

The growing complexity of maritime navigation creates safety challenges that drive the shift toward autonomous systems. Maritime vessel behaviour modelling is critical for safe and efficient autonomous operations. Representation learning offers a systematic approach to learn feature embeddings encoding vessel behaviour for improved situational awareness and decision-making. We introduce a three-stage representation learning pipeline evaluating six architectures on real-world AIS trajectories. Grouped Masked Autoencoder (GMAE)-Risk Extrapolation (REx) combines group-wise masked autoencoding at the semantic feature level with risk extrapolation regularisation, forcing encoders to learn cross-group dependencies between temporal, kinematic, spatial, and interaction features. DAE and EAE provide robust and uncertainty-aware baselines. Evaluation uses a dual-pipeline framework on two years of Kiel Fjord AIS data (176,787 trajectories, 527,225 segments). Pipeline 1 applies three-stage representation learning using vessel-type classification as encoder selection probe. GMAE-REx achieves 86.03% validation accuracy, outperforming DAE (85.63%), EAE (85.56%), and baselines Transformer (84.93%), TCN (76.27%), LiST (85.12%). Pipeline 2 applies unsupervised clustering to discover intrinsic behavioural structure. Learnt representations consistently outperform expert features on DBCV, conductance, and modularity metrics, organising trajectories by operational context rather than vessel type. This behaviour-oriented organisation enables cross-vessel knowledge transfer for autonomous navigation, VTS monitoring, and safety analysis.

Keywords:

AIS trajectory analysis; self-supervised representation learning; Maritime Autonomous Surface Ships; vessel behaviour modelling; Masked Autoencoder; domain generalisation; uncertainty quantification; situational awareness; unsupervised clustering; deep learning; collision avoidance; vessel traffic services

1. Introduction

Maritime transportation remains the backbone of global commerce, facilitating over 80% of international trade by volume and handling approximately 12.6 billion tonnes of cargo in 2024 [1]. The Automatic Identification System (AIS), mandated by the International Maritime Organisation (IMO) under the International Convention for the Safety of Life at Sea (SOLAS), has transformed maritime domain awareness by providing real-time vessel tracking data at an unprecedented scale [2]. Modern maritime stakeholders, including port authorities, vessel traffic services, and autonomous navigation systems, now have access to millions of AIS messages daily, creating rich spatio-temporal datasets that capture the complex dynamics of vessel movements across diverse operational contexts. The large-scale availability of trajectory data presents a compelling opportunity for understanding maritime behaviour at scale through Machine Learning (ML) approaches that could enhance Situational Awareness (SA), improve safety, and support the emerging paradigm of Automatic Maritime Operations (AMO) [3].

However, the operationalisation of data-driven maritime intelligence remains largely unfulfilled.

Despite the availability of vast unlabelled AIS data, most existing approaches rely on supervised learning paradigms that require extensive manual annotation of vessel behaviours—a process that is expensive, subjective, and inherently limited in scope [4,5].

Moreover, the maritime domain exhibits pronounced environmental heterogeneity: vessel behaviours manifest differently across varying traffic densities, geographical contexts, and temporal conditions. A cargo vessel approaching a busy port navigates fundamentally different conditions than the same vessel transiting open ocean, yet both represent instances of the same underlying navigational objective.

This environmental variation creates substantial challenges for learning representations that generalise beyond the specific conditions encountered during training.

The analysis of maritime trajectory data presents several fundamental challenges that limit the effectiveness of existing solutions. First, environmental heterogeneity and domain shift pose significant obstacles: representations learnt from one operational context—a particular port, traffic regime, or season—may fail to transfer to others, as standard learning methods implicitly assume that training and deployment conditions are drawn from the same distribution [6,7]. Second, unlike domains with homogeneous feature semantics such as pixel grids in computer vision, AIS-derived features exhibit semantic complexity. Temporal features encode when observations occur; kinematic features describe instantaneous motion state; spatial features characterise spatial context; density features quantify traffic conditions; and interaction features capture relationships with nearby vessels.

However, raw AIS broadcasts inherently lack this semantic categorisation, transmitting only basic navigational parameters, including position, speed, course, and heading, without distinguishing their functional roles in behavioural representation. This absence of structural organisation conceals the causal dependencies between feature categories—for instance, how temporal context and traffic density jointly influence kinematic decisions, or how interaction features emerge from the interplay of spatial positioning and relative motion. Consequently, ML models operating on unstructured feature sets may exploit superficial within-category correlations (such as temporal autocorrelation in speed measurements) without discovering the deeper cross-category relationships that characterise coherent navigational strategies [8]. Third, expert annotation of vessel behaviours is expensive, time-consuming, and inherently subjective [9]. The resulting label scarcity limits the applicability of supervised approaches and motivates unsupervised or self-supervised methods that can learn from large-scale unlabelled data.

The transition towards Maritime Autonomous Surface Ships (MASS) amplifies the urgency of developing robust behavioural representations. Autonomous navigation systems must operate reliably across diverse maritime environments, recognising normal navigational patterns, detecting anomalies, and making safe decisions even in conditions not explicitly covered by training data [10]. The European Maritime Safety Agency (EMSA) reported 2590 maritime incidents in 2023, with over 65% attributed to human decision failures related to inadequate SA [11].

While MASS promise to mitigate human error, realising this potential demands representations that remain valid across the distributional shifts between training environments and deployment scenarios—a challenge that conventional supervised learning approaches fail to address [12]. Furthermore, safety-critical maritime decision-making requires not only accurate predictions but also reliable quantification of prediction confidence, enabling autonomous systems to recognise when they encounter unfamiliar conditions and defer to human operators or adopt conservative strategies [13]. Current trajectory analysis methods lack principled mechanisms for distinguishing between uncertainty arising from inherent environmental stochasticity (aleatoric) and uncertainty stemming from insufficient training coverage (epistemic), limiting their applicability in safety-critical autonomous navigation; see [13].

This article addresses the fundamental challenge of learning robust behavioural representations from maritime AIS trajectory data that generalise across diverse environmental conditions. We propose a comprehensive three-stage representation learning pipeline that combines self-supervised pretraining with domain generalisation techniques to address the unique challenges of spatio-temporal maritime data. GMAE-REx (Group-wise Masked Autoencoder (MAE) with REx) combines structured masking at the semantic feature-group level with REx regularisation that penalises reconstruction strategies favouring particular environmental conditions, encouraging representations that work equally well across traffic densities and geographical contexts. We complement this novel architecture with a Denoising Autoencoder (DAE) that provides a strong baseline through reconstruction of noise-corrupted inputs [14] and an Evidential Autoencoder (EAE) that extends the denoising framework with principled uncertainty quantification through evidential regression, enabling distinction between epistemic and aleatoric uncertainties essential for safety-critical applications [13].

We evaluate learnt representations through two complementary experimental pipelines that together provide comprehensive assessment of representation quality. The first pipeline employs supervised classification using vessel type as the prediction target, with linear probing on frozen encoder representations to isolate representation quality from fine-tuning effects. This pipeline explicitly assesses robustness across environmental conditions through per-environment metrics, worst-group accuracy, and accuracy variance across traffic density quartiles. The second pipeline applies four unsupervised clustering methods (Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), kNN-Leiden, Variational Bayesian Gaussian Mixture Model (VBGMM), First Integer Neighbour Clustering Hierarchy (FINCH)) with intrinsic structure quality metrics (DBCV, conductance, modularity) to assess whether representations naturally organise trajectories into coherent behavioural groups aligned with operational context without explicit supervision.

This work contributes several novel insights to the field of maritime trajectory representation learning:

(i): Comprehensive Processing Pipeline: We present an extensive AIS data processing pipeline that transforms raw positional broadcasts into richly annotated trajectory representations. This includes ship-level features (e.g., kinematic states, temporal encodings, vessel characteristics), ship-to-ship interaction features (e.g., Time to Closest Point of Approach (TCPA), Distance at Closest Point of Approach (DCPA), relative bearing, collision risk indicators), and ship-to-environment features (e.g., water depth from nautical charts, distance to land, proximity to restricted areas, traffic density estimates). The exhaustive nature of this feature engineering process, necessitated by the sparse and noisy characteristics of AIS data, represents a significant methodological contribution that enables downstream representation learning.
(ii): Three-Stage Representation Learning Framework: We propose a systematic pipeline comprising self-supervised pretraining, linear probe evaluation, and full fine-tuning reference, with hyperparameter optimisation conducted exclusively on self-supervised validation loss to prevent information leakage from downstream tasks.
(iii): Environment-Aware Self-Supervised Learning: We adapt REx from the domain generalisation literature to maritime trajectory representation learning, demonstrating its effectiveness for learning representations that generalise across traffic densities and geographical contexts.
(iv): Semantic Feature Grouping: We introduce group-wise masking strategies that respect the semantic structure of AIS-derived features (temporal, kinematic, spatial, density, interaction), forcing models to learn cross-category relationships rather than exploiting within-category correlations.
(v): Uncertainty-Aware Representations: We integrate evidential learning into the autoencoder framework, providing principled estimates of both epistemic and aleatoric uncertainty essential for safety-critical maritime applications.
(vi): Comprehensive Dual-Pipeline Evaluation: We establish an evaluation methodology combining supervised linear probing for encoder selection and unsupervised clustering with intrinsic quality metrics (HDBSCAN, kNN-Leiden, VBGMM, FINCH) to assess both discriminative capacity and natural behavioural structure discovery.
(vii): Reusable and Adaptable Framework: We present a modular representation learning framework that generalises across maritime geographical regions and downstream tasks 1. The three-stage pipeline can be deployed to different operational contexts by adapting the data processing module to local AIS sources and replacing the probing head (e.g., from vessel-type classification to collision risk regression) whilst preserving the self-supervised pretraining strategy. This design enables transfer learning for region-specific behaviour modelling and task-specific embedding optimisation without requiring large-scale labelled datasets.

The remainder of this article is organised as follows. Section 2 reviews related work in maritime trajectory analysis, self-supervised representation learning, and uncertainty quantification. Section 3 presents the comprehensive AIS data processing pipeline, including data sources, standardisation, trajectory extraction, feature engineering, and collision risk computation. Section 4 details the representation learning models, including the unified encoder–decoder architecture, and the specific implementations of GMAE-REx, DAE, EAE, and baseline architectures (Transformer, Temporal Convolutional Network (TCN), Linear Spatial–Temporal Feature Extractor (LiST)). Section 5 describes the dual-pipeline evaluation framework, experimental protocols, and performance metrics. Section 6 presents the experimental setup, dataset characteristics, and reports results for both supervised encoder selection (Experiment I) and unsupervised structural analysis (Experiment II). Section 7 discusses and interprets experimental results across both representation learning pipelines, analyses practical implications, and outlines study limitations. Finally, Section 8 summarises key findings and outlines directions for future research.

2. Background

Maritime transportation operates in a complex, dynamic, and inherently hazardous environment. IMO classifies maritime operations as safety-critical systems where failures can result in loss of life, environmental catastrophes, and significant economic damage [15]. EMSA reported over 2500 maritime incidents annually, with human error accounting for the majority of accidents [11]. These challenges motivate the development of advanced decision-support systems and autonomous capabilities to enhance maritime safety and operational efficiency.

The last decade has witnessed strong efforts towards autonomous vessel navigation. Currently, existing technical studies and prototypical solutions for autonomous and semi-autonomous behaviour are primarily confined to standard assistance systems, such as open-sea autopilot, or focused on short-distance cargo transportation within coastal areas [16].

2.1. Situational Awareness

Situational awareness (SA) is a fundamental concept in autonomous systems, referring to the perception of environmental elements and events, comprehension of their meaning, and projection of their future status [17]. In the context of autonomous maritime navigation, SA encompasses the vessel’s ability to accurately detect and track surrounding entities, understand the maritime traffic situation and environmental conditions, and anticipate potential hazards or conflicts. Achieving robust SA is critical for safe and efficient autonomous vessel operation, as it forms the foundation for informed decision-making and collision avoidance.

Autonomous surface vessel navigation behaviour is based on continuous observation and assessment of the current environmental conditions—which is then turned into concrete navigation actions by an autonomous controller (e.g., [18]). Consequently, situational awareness is a prerequisite for autonomous navigation.

Situational awareness frameworks have been extensively studied across safety-critical domains, including aviation [17], nuclear power operations, and military command and control systems. These frameworks emphasise hierarchical perception, comprehension, and projection capabilities that are equally relevant to maritime autonomous systems.

In the last decade, various contributions have considered aspects of situational modelling and awareness problems, including approaches to reconstruct ship trajectories from spatio-temporal historical data [19], modelling of ship trajectories in safety-critical conditions—such as collision avoidance manoeuvres—using optimisation heuristics [20], a prediction of trajectories for the own and other ships in the vicinity based on ML technologies such as artificial neural networks [21], modelling of current constellations based on observational data [22], or an analysis of ship trajectories [23]. However, there is no unified approach for maritime navigational situation modelling and analysis yet established, which is partly driven by the high heterogeneity of available input data.

2.2. AIS for SA

The various maritime test environments oriented towards autonomous navigation (i.e., prototype test ships for experimentation purposes such as the “MS Wavelab” in Kiel, Germany [24]) have different sensor constellations, characteristics, and capabilities. The only reference data common in all use cases are AIS, which is an integral component in today’s maritime navigation and safety. The AIS is an automatic tracking system used on ships and by Vessel Traffic Services (VTS) for identifying and locating vessels by electronically exchanging data with other nearby ships, VTS stations, and satellites. This system allows real-time tracking of ships, providing vital data that encompass their identity, position, speed, and course, thus aiding vessel operators in collision avoidance strategies [25]. Consequently, the IMO has mandated the adoption of AIS for vessels exceeding a certain size [15,26].

Despite its widespread adoption and critical role in maritime safety, AIS has inherent limitations that impact its effectiveness for SA; it is not without vulnerabilities, including the potential for errors and susceptibility to cyberattacks. These concerns have prompted examinations into the accuracy and integrity of position data transmitted via AIS, as in [27]. Jiang et al. [28] used Monte Carlo simulation to model probabilistic distributions of vessel size, vessel speed, and traffic volume for inland water transport. This research sheds light on the challenges of extracting knowledge from AIS data, which arise from factors such as data volume, incompleteness, noise, and unobserved vessels.

Nevertheless, AIS remains a prevailing choice within the research community for tracking maritime vessels efficiently and cost-effectively [29,30].

In terms of safety, AIS data have been used for traffic management, identifying potentially dangerous locations for maritime transportation, and providing relevant information for traffic control. One of the critical applications of AIS is traffic management, which involves spatial analysis of near collisions to identify potentially hazardous locations in maritime transportation. As an example, Wu et al. [31] investigated vessel conflicts in the Southeast Texas waterway, examining features such as vessel size, spatial distribution, time of day, and collision risk levels using the Vessel Conflict Ranking Operator (VCRO) model. This research identified hotspots with high frequencies of vessel conflicts and evaluated the impact of time of day on conflict density in each hotspot. As an alternative, Yoo et al. [32] estimated near-collision density in coastal areas based on AIS data, pinpointing high collision-risk locations using parameters such as DCPA and TCPA derived from ship location, speed, and course data. Shelmerdine et al. [33] leveraged AIS data to generate vessel tracks and density maps, illustrating temporal variations between months and distinguishing between different vessels around the Shetland Islands. The analysis revealed variations in vessel routes, especially around island groups. As an alternative, Jensen et al. [34] conducted an analysis of cargo traffic outside of San Francisco Bay, identifying dynamic changes in shipping traffic across seasons, both annually and daily. The study highlighted the challenge of documenting shifts in shipping traffic effectively.

As a second major aspect, anomaly and novelty detection (e.g., [13,35,36]) has been performed on AIS data with goals such as the identification of conspicuous ships, the detection of deviation from expected manoeuvres, or an online observation and assessment system. For instance, Chen et al. [37] developed a method to detect and restore anomalies in raw AIS data by exploiting ship maneuverability constraints. The approach classifies abnormal AIS trajectory points based on longitude (lon), latitude (lat), speed, acceleration, and heading information. Anomaly detection criteria include maximum and minimum acceleration thresholds, maximum reachable distance between consecutive points, and maximum angular displacement, all derived from vessel design specifications. The method was validated on 156 AIS messages from a 110 m cruise vessel operating at 10 s transmission intervals near Xiamen International Cruise Center, demonstrating effective identification of drift, acceleration, and turning anomalies. In the context of self-adaptive systems, Goller et al. [38] investigated abnormal behaviour detection in maritime traffic by analysing AIS data from the Kiel Fjord and Kiel Canal region. This approach applies external observation metrics—including configuration stability, configuration coherence, and global parameter usage—to monitor collective vessel behaviour without requiring knowledge of individual navigation strategies. The study demonstrated the effectiveness of these metrics in detecting abnormal events such as the 2022 Kiel Canal bridge collision, where configuration stability exhibited a prominent peak immediately following the incident, providing early indicators for potential re-integration decisions in autonomous navigation systems. More recently, Gao et al. [13] proposed a novel framework named Federated Evidential Learning for Anomaly Detection of Ship Trajectories (FEAST) that advances maritime situational awareness through privacy-preserving and uncertainty-aware anomaly detection. This framework combines federated learning with evidential learning and employs a Transformer-based AutoEncoder architecture to model navigation behaviour from high-dimensional AIS data while explicitly quantifying both epistemic and aleatoric uncertainties. Evaluated on one year of AIS data from the Kiel Fjord, FEAST demonstrated superior performance in detecting out-of-distribution anomalies compared to traditional Variational AutoEncoders, providing interpretable uncertainty estimates that enhance SA for autonomous maritime systems.

Recent work has also applied deep learning directly to vessel trajectory classification. Kim et al. [39] proposed a supervised framework that converts AIS position sequences into trajectory images and fine-tunes a deep Convolutional Neural Network (CNN) initialised from ImageNet weights to classify vessel types, addressing domain shift through transfer learning from a large source domain. This approach demonstrates the effectiveness of convolutional feature extraction for trajectory analysis; it is fully supervised and relies on a labelled training corpus that the authors themselves note may limit generalisation (8679 vessels total).

3. Data Processing and Trajectories Retrieval

The Automatic Identification System (AIS) is a maritime communication and tracking system in which vessels periodically broadcast navigational and identification information via Very High Frequency (VHF) radio to nearby ships and coastal authorities. Introduced by the International Maritime Organisation (IMO) in 2000, AIS was primarily designed to enhance navigational safety and support collision avoidance. Since December 2004, AIS transponders have been mandatory for all vessels subject to the requirements of the International Convention for the Safety of Life at Sea (SOLAS) [2,26].

In recent years, AIS data have been increasingly utilised for a wide range of commercial and research applications, including maritime traffic monitoring, route extraction, and vessel trajectory prediction [40]. However, previous studies have consistently reported the inherently noisy nature of AIS data [41]. This issue occurs particularly in manually entered fields, such as destination and draught, which are frequently incomplete, outdated, or erroneous. Furthermore, transmission errors, irregular reporting intervals, and temporary signal loss contribute to additional degradation of data quality.

To effectively exploit the large volume of available AIS data, we propose a preprocessing pipeline designed to extract, clean, and enrich vessel trajectories. The pipeline follows a modular structure, enabling straightforward adaptation to alternative data sources and facilitating the integration of additional features relevant for downstream analysis and prediction tasks.

AIS transmissions are associated with a Maritime Mobile Service Identity (MMSI) that uniquely identifies each vessel. The messages comprise dynamic information, including vessel position, Speed Over Ground (SOG), and Course Over Ground (COG), as well as static and voyage-related attributes such as ship type, vessel dimensions, and reported destination.

3.1. Data Sources and Decoding

The proposed preprocessing pipeline integrates multiple heterogeneous data sources, including raw and decoded AIS messages as well as geospatial map data, to establish a comprehensive foundation for vessel trajectory prediction and interaction modelling. The selected datasets complement each other with respect to temporal and spatial coverage as well as semantic richness.

AIS Data Sources

Two independent AIS data sources were utilised to ensure high temporal resolution in the port area while maintaining robust regional coverage.

(i): Raw AIS Messages (Kiel University of Applied Science)
High-resolution AIS data were provided by Kiel University of Applied Science, see [42], and consist of raw AIS messages encoded according to the NMEA 0183 standard defined by the National Marine Electronics Association [43]. Each message contains metadata describing sentence fragmentation, communication channel, and timestamp information, as well as a payload encoding navigational data. The raw message format preserves the original temporal resolution of AIS broadcasts, enabling fine-grained analysis of vessel motion and short-term dynamics. Spatial coverage is strongest in the immediate vicinity of the Port of Kiel and gradually decreases with increasing distance from shore, particularly beyond approximately 10 nautical miles, as vessels leave the effective reception range of shore-based AIS receivers [44]. The dataset spans the years 2022 and 2023 and contains intermittent temporal gaps caused by receiver outages and adverse environmental conditions.
(ii): Decoded AIS Data (Danish Maritime Authority)
To complement the raw AIS messages, decoded AIS records provided by the Danish Maritime Authority (DMA) [45] were incorporated. These data are distributed as Comma-Separated Values (CSV) files in which AIS messages are already decoded and represented in tabular form. Although records dating back to March 2006 are available, this study exclusively considers data from 2022 and 2023 to ensure temporal consistency across all data sources. Compared to the raw AIS messages, the decoded records exhibit reduced spatial resolution, as latitude and longitude values are rounded. However, they offer broader geographic coverage, with highest density in Danish waters and extended coverage into the western Baltic Sea, including the Port of Kiel.

Map and Environmental Data

To enrich AIS trajectories with environmental context, several geospatial datasets were integrated.

(i): Water Depth (Denmark’s Depth Model)
Bathymetric data were obtained from Denmark’s Depth Model [46], which provides a gridded representation of water depth at a spatial resolution of $50 \times 50$ m. The dataset is referenced in the Lambert Conformal Conic projection (EPSG:3034), with depth values defined relative to Mean Sea Level (MSL) according to the Danish Mean Sea Level (DKSML) vertical reference system [46]. The model integrates satellite-derived bathymetry, airborne LiDAR measurements, and crowdsourced depth observations to construct a detailed and high-resolution representation of seabed topography.
(ii): Water Depth (Hydrographic Survey Data)
Additional bathymetric information was sourced from hydrographic surveys conducted by the Federal Maritime and Hydrographic Agency [47]. This dataset covers the North Sea and Baltic Sea at the same spatial resolution of $50 \times 50$ m and is referenced to the Normalhöhennull (NHN) vertical datum within the ETRS89–UTM32 coordinate reference system. The underlying measurements were collected between 1994 and 2010 as part of joint mapping initiatives involving the Leibniz Institute for Baltic Sea Research [48].
(iii): Land and Maritime Infrastructure Data
Additional maritime context, including coastline geometry [49], restricted navigation areas, and ferry routes, was extracted from OpenStreetMap through the Overpass API [50]. The Port of Kiel is characterised by dense ferry traffic, comprising multiple local ferry connections operating at regular intervals as well as international ferry routes linking Kiel with destinations such as Oslo and Göteborg. These routes are particularly relevant for modelling vessel interactions and traffic patterns in confined and highly trafficked waters.

3.2. Standardisation

A central objective of the preprocessing pipeline is the integration of multiple AIS data sources into a single, coherent dataset. To this end, all inputs are standardised to a unified schema, ensuring consistency across the two AIS data sources employed in this study. AIS transmissions comprise 27 distinct message types, with the primary distinction being between Class A and Class B messages (see Table 1). Class A messages provide more detailed information and are mandatory for SOLAS-class vessels, including cargo ships, tankers, and commercial passenger vessels. In contrast, recreational craft and smaller commercial vessels are only required to transmit the more limited Class B messages.

Based on these message types, two relational tables are constructed. During this process, all navigational fields are standardised across data sources in accordance with the recommendations of the International Telecommunication Union [51].

The first, contains dynamic trajectory information that lies within the defined geographical area of interest (see Table 2).

The second, serves as the vessel database and is constructed from static AIS reports. It stores vessel type and dimensional information required for geometric and interaction-aware modelling (see Table 3).

The static vessel metadata are further enriched using commercial web services (MyShipTracking) [52] for vessels with missing or incomplete entries. Invalid, inconsistent, or malformed AIS messages are discarded during this stage.

Maps

To enrich AIS trajectories with additional environmental and navigational context, several geospatial data sources are incorporated. These include bathymetric information as well as polygon-based map layers extracted from the Overpass API (see Figure 1).

All positional data from these sources are transformed into a common geographic coordinate reference system (EPSG:4326) prior to further processing and feature extraction.

3.3. Trajectory Construction

Following standardisation, individual AIS messages are aggregated per MMSI and transformed into continuous vessel trajectories suitable for downstream analysis and prediction tasks. This stage focuses on organising temporally ordered position reports into coherent motion sequences while mitigating noise, irregular sampling, and artefacts inherent to AIS transmissions. The trajectory construction process consists of spatial and kinematic filtering, temporal segmentation into distinct movement episodes, smoothing and interpolation of vessel motion. Together, these steps yield structured trajectories that preserve navigational behaviour while remaining robust to measurement errors and reporting inconsistencies.

Spatial and Kinematic Filtering

Trajectory construction begins with a naive aggregation of temporally ordered position reports for each unique MMSI. From these preliminary trajectories, physically implausible positions are identified and flagged. Such positions include reports located on land or exhibiting unrealistically high speeds.

While conventional AIS preprocessing pipelines typically employ a speed threshold of 30 kn for commercial vessels [53], this study adopts a threshold of 40 kn to accommodate high-speed patrol vessels operating in the Port of Kiel, which routinely exceed conventional merchant vessel speeds during security and surveillance operations.

For outlier detection, speed is not taken from the transmitted speed-over-ground field but is instead computed from consecutive spatial positions to ensure consistency across data sources.

Importantly, flagged outliers are not immediately removed but are retained as informative cues for subsequent trajectory segmentation. Removing isolated points prematurely can introduce additional artefacts when the underlying issue corresponds to a sustained positional jump rather than a single erroneous report.

Trajectory Segmentation

AIS transmissions can be manually activated and deactivated, resulting in temporal gaps that lead to erroneous interpolations if not handled appropriately. In addition, AIS transmissions often remain active while vessels are moored, generating misleading motion artefacts during movement analysis.

To address these issues, a four-stage trajectory segmentation is applied to extract valid sub-trajectories for further preprocessing. The segmentation steps are applied iteratively and in the following order:

(i): Outlier-based segmentation: Flagged outliers are used as potential split points. However, split trajectories are reconnected if the transition between neighbouring points (excluding the outlier) does not result in physically implausible speeds.
(ii): Geospatial segmentation: Trajectories are split when they reach the boundary of the defined area of interest or enter a berthing area. This step separates local ferry routes and excludes stationary berthing manoeuvres.
(iii): Observation gap segmentation: Trajectories are split when large temporal gaps between two reports occur. As straight trajectories can be interpolated more reliably than turning trajectories, the tolerated time gap varies between 2 and 5 min depending on the course of the trajectory.
(iv): Motion-based segmentation: Trajectories are split when a vessel is assumed to be anchored. This condition is met if the vessel exhibits speeds below 0.3 kn for more than one hour or remains within a radius of 100 m for at least 5 min.

Smoothing and Interpolation

AIS positions have been shown to deviate substantially from true vessel positions due to measurement noise and transmission errors [44]. To mitigate these effects, a Kalman filter with a constant-velocity motion model is applied to smooth vessel trajectories using position, speed, and course information [54]. Following smoothing, trajectories are interpolated using a Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) [55]. Interpolation is performed on a fixed temporal grid with a step size of 5 s to align all trajectories in time, for consistent vessel interaction modelling. As interpolation can reintroduce invalid positions, such as points on land or unrealistically low-speed artefacts, the trajectory segmentation criteria for land proximity and stationary behaviour are re-applied as a final validation step.

All kinematic quantities derived from the trajectory—including speed, course, and Rate-of-Turn (ROT)—are computed from the interpolated position sequence rather than from the transmitted AIS fields. The broadcast ROT, SOG, and COG values are known to be unreliable in practice, as they depend on gyrocompass calibration, transmitter configuration, and update frequency. These frequently deviate substantially from the motion implied by consecutive positional fixes [44,56]. The position-derived kinematic features therefore constitute the sole reference representation used by all downstream components. Furthermore, the observation-gap segmentation step (see Section 3.3) applies a curvature-dependent gap tolerance—splitting trajectories at gaps exceeding 2 min during turning manoeuvres versus 5 min on straight segments—which prevents the constant-velocity Kalman model from being applied across extended high-ROT intervals.

Figure 2 provides a qualitative assessment of the pipeline’s behaviour under high-ROT conditions. Two representative 10 min trajectory segments from sailing vessels are shown; sailing vessels are selected because their frequent tacking manoeuvres—driven by wind conditions—produce multiple direction reversals within a single 10 min window, representing the most demanding case for the constant-velocity smoother. In each panel, blue dots indicate raw AIS position reports and the red line shows the fully preprocessed trajectory after Kalman smoothing and PCHIP interpolation. Across the two examples, the pipeline faithfully preserves the overall shape of the manoeuvres while suppressing positional noise, and the interpolated path connects observed positions without introducing spurious artefacts.

Figure 3 illustrates the cumulative effect of the complete preprocessing pipeline on a sample of 100 trajectories from the Port of Kiel area. The comparison clearly demonstrates the transformation from noisy, unstructured raw AIS position reports to clean, continuous vessel trajectories suitable for downstream machine learning tasks.

3.4. Train/Test Split Strategy

Prior to feature generation, the dataset is partitioned into training and test sets to prevent data leakage. Since vessel interactions are central for both the analysis and prediction tasks, splitting the data by MMSI is infeasible. Instead, the split is performed along the temporal dimension. This strategy introduces a trade-off between maintaining trajectory integrity and ensuring diversity across seasons, weekdays, and spatial regions in both subsets. To minimise trajectory fragmentation, data are split at midnight, a time period characterised by reduced maritime traffic. Individual dates are then randomly assigned to either the training or test set. For the Port of Kiel dataset, this approach results in fewer than 0.5% of trajectories being split.

Additionally, this temporal separation enables the combination of multiple AIS data sources without introducing duplicates or inconsistencies. In particular, high-resolution but temporally incomplete AIS data from Kiel University of Applied Science can be complemented with the lower-resolution but temporally continuous dataset provided by the DMA while maintaining a clean separation between training and test data, as illustrated in Figure 4.

3.5. Feature Generation

To obtain a feature-rich dataset, the processed vessel trajectories are enriched with contextual, kinematic, and interaction-based features.

Geospatial Features

To provide additional environmental context, several geospatial features are computed for each trajectory point. These include the distance to the nearest shoreline, ferry route, and restricted navigation area, as well as the local water depth. In addition, historic vessel density maps are constructed from the trajectories in the training set. Separate density maps are generated for each vessel group as well as a global density map comprising all vessels. To improve generalisation, the density maps are smoothed using a Gaussian, as illustrated in Figure 5).

Ship-level Features

From the interpolated trajectory positions, additional kinematic features are derived, including speed, acceleration, course, and rate of turn. On a per-sample basis, these derived features closely resemble the transmitted SOG and COG values and often exhibit smoother and more plausible behaviour. An exception occurs at very low speeds, where small positional fluctuations can lead to large variations in the derived course. In these cases, the transmitted course information is retained and used as a more reliable reference.

Ship-to-Ship Interaction Features

Interactions between vessels are of particular interest for trajectory prediction and collision avoidance tasks. Therefore, additional interaction features are computed whenever two vessels are within a distance of 2 km of each other. These interaction features include relative motion characteristics such as relative speed, course, and bearing. Relative bearing is especially important for navigation, as it is a determining factor for right-of-way according to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs).

Another key navigational metric is the Closest Point of Approach (CPA), which is defined as the point at which two vessels would reach their minimum separation distance if they were to maintain their current course and speed. Given the large variation in vessel dimensions within the confined waters of the Port of Kiel, vessel dimensions are incorporated into the computation of the Distance at Closest Point of Approach (DCPA) and the Time to Closest Point of Approach (TCPA) [57]. Since a small DCPA does not necessarily indicate a critical situation, for example, when vessels travel in parallel, a collision risk index based on [58] is employed. This provides a more reliable indication of which vessels are at risk of colliding and is defined as:

collision_risk = (1 - \frac{TCPA}{W_{tcpa}}) \cdot (1 - \frac{DCPA}{W_{dcpa}}) .

(1)

TCPA and DCPA are normalised using weighting factors and clipped to the interval

[0, 1]

. The weighting parameters

W_{tcpa}

and

W_{dcpa}

were selected based on interviews with navigation officers reported in the referenced study. The distance weighting factor is set to 500 m × 1.25, where 500 m is considered a safe passing distance in congested waters, with an additional 25% safety margin. The time weighting factor is set to 10 min and is further adjusted based on the relative angle of approach. Head-on encounters are considered less critical, as they are more easily detected by navigators and onboard systems [58], and therefore assigned a lower collision risk.

4. Representation Learning Models

To comprehensively evaluate the effectiveness of different representation learning paradigms in ship trajectory analysis, this paper presents a systematic comparison of seven deep learning models spanning three representative architectural categories:

Convolution-based models, which capture local temporal dependencies through sliding receptive fields and are well suited for modelling short-term dynamic patterns;
Transformer-based models, which employ attention mechanisms to model long-range temporal dependencies and global temporal relationships;
Linear and lightweight sequence models, which enable efficient modelling of global temporal structures with reduced architectural complexity.

To ensure fairness and reproducibility, all models are implemented within a unified encoder–decoder representation learning framework and evaluated under consistent input/output definitions, training protocols, and experimental settings. Within this framework, Section 4.1 first formulates ship trajectory data in a mathematical manner and introduces the unified encoder–decoder architecture adopted in this work, including precise definitions of model inputs, latent feature representations, and reconstruction or prediction objectives. Subsequent subsections then describe the specific implementations and modelling characteristics of different representation learning methods within this unified framework.

4.1. Overall Architecture

Ship Trajectory Definition

Ship trajectories are modelled as high-dimensional multivariate time series. Given a trajectory sample, it is represented as a matrix

X \in R^{L \times F}

, where L denotes the trajectory length and F denotes the feature dimension at each time step, including position, velocity, heading, and other navigation-related attributes. Let

x_{t} \in R^{F}

denote the observation vector at time step t; then, the trajectory can be expressed as

X = [\begin{matrix} x_{1}^{⊤} \\ x_{2}^{⊤} \\ ⋮ \\ x_{L}^{⊤} \end{matrix}] \in R^{L \times F} .

(2)

During model training, trajectory data are provided in batches. Given a batch size B, a batch is denoted as

X = {X_{i}}_{i = 1}^{B}

, and the full dataset is represented as

D = {X_{i}}_{i = 1}^{N}

, where N denotes the total number of trajectory samples in the dataset. For trajectories with varying lengths, a fixed length L is adopted. Shorter sequences are zero-padded.

Encoder and Latent Representation

Within the unified framework, all models share the same high-level structural definition. The encoder, denoted by

E (\cdot)

, maps an input trajectory

X

into a latent representation space:

Z = E (X),

(3)

where

Z

represents a low-dimensional or high-level semantic representation of the trajectory, capturing ship motion patterns, temporal dependencies, and multivariate interactions. The primary differences among representation learning methods lie in the design of the encoder architecture, such as convolutional structures, attention mechanisms, or linear mappings.

Decoder and Reconstruction Objective

The decoder

D (\cdot)

takes the latent representation

Z

as input and produces a reconstructed trajectory

\hat{X}

:

\hat{X} = D (Z) .

(4)

During the unsupervised representation learning stage, the model learns discriminative representations by minimising the reconstruction error between the input and reconstructed trajectories. The reconstruction objective is defined as

min_{θ} L_{ssl} = {∥X - \hat{X}∥}_{2}^{2},

(5)

where

θ

denotes all learnable parameters of the encoder and decoder. This reconstruction-based learning process does not rely on manual annotations and allows effective utilisation of large-scale unlabelled ship trajectory data.

Classification Head and Representation Evaluation

To quantitatively assess the quality of the learnt trajectory representations, a lightweight classification head

C (\cdot)

is introduced on top of the encoder output

Z

, which predicts the behaviour category of each trajectory:

\hat{y} = C (Z) .

(6)

In this setting, the classification task is used only during evaluation to examine the discriminative capability of the latent representations in downstream tasks, without altering the unsupervised training objective of the representation learning stage.

Training Paradigm and Fine-tuning Strategies

A two-stage training paradigm of ”unsupervised pretraining + supervised fine-tuning” is adopted. First, the model is pretrained in an unsupervised manner using the trajectory reconstruction task to learn general-purpose trajectory representations. Subsequently, when a limited amount of labelled data is available, a classification head is added for fine-tuning. Two fine-tuning strategies are considered:

Freeze Fine-tuning: the encoder parameters are kept fixed, and only the classification head is optimised;
Joint Fine-tuning: the encoder and classification head are jointly optimised in an end-to-end manner.

This unified architecture provides a structurally aligned basis for comparing different representation learning methods, enabling a systematic analysis of their modelling capabilities and applicability in ship trajectory representation learning.

4.2. Temporal Convolutional Network

Temporal Convolutional Networks (TCNs) [59] extend one-dimensional CNNs to time-series modelling by capturing temporal dependencies through convolution operations along the time axis. Compared with recurrent models, convolution-based approaches enable efficient parallel computation and stable training, while shared kernels facilitate modelling recurring local temporal patterns.

Given an input sequence of length L, convolution kernel size

K_{c}

, stride S, and padding P, the output length after one-dimensional convolution is

L^{'} = ⌊\frac{L + 2 P - K_{c}}{S}⌋ + 1 .

(7)

At temporal position t, the convolution operation is defined as

z_{t} = \sum_{k = 0}^{K_{c} - 1} W_{k} x_{t \cdot S + k - P}, t = 1, \dots, L^{'},

(8)

where

W_{k}

denotes the convolution kernel parameters and

z_{t}

is the output representation at time step t. After linear projection or interpolation, the output sequence can be aligned to the original length L and used as the unified encoder output representation

Z

.

While standard 1D-CNNs are effective at modelling local temporal patterns, their receptive field is limited by the kernel size, restricting the ability to capture long-range temporal dependencies. TCNs address this limitation by introducing dilated convolutions, which enlarge the receptive field without significantly increasing model complexity. For the lth TCN layer with dilation factor

d^{(l)}

and kernel size

K_{c}

, the dilated convolution at time step t is computed as

Z_{t}^{(l)} = \sum_{k = 0}^{K_{c} - 1} W_{k}^{(l)} Z_{t - d^{(l)} k}^{(l - 1)},

(9)

where

Z^{(l - 1)}

denotes the feature representation from the previous layer. Causal padding is applied when indices fall outside the valid range to preserve temporal ordering. Under this formulation, the effective receptive field of a single TCN layer is given by

R^{(l)} = 1 + (K_{c} - 1) d^{(l)} .

(10)

By stacking multiple dilated convolution layers with exponentially increasing dilation factors (e.g.,

d^{(l)} = 2^{l - 1}

), the receptive field of a TCN grows exponentially with network depth, enabling modelling of long-range temporal dependencies.

In this work, a stride of

S = 1

together with appropriate padding is adopted, so that the output length of each layer matches the input length, allowing the TCN encoder to be seamlessly integrated into the unified time-series representation learning framework.

4.3. Linear Spatial–Temporal Feature Extractor

To enable efficient and practical representation learning for high-dimensional time-series data, this paper adopts Linear Spatial–Temporal Feature Extractor (LiST) [60] as a representative linear sequence modelling approach. LiST is designed to construct a lightweight encoder with spatio-temporal modelling capability while avoiding the high computational overhead of convolution and self-attention, making it well suited for large-scale, high-dimensional ship trajectory analysis.

In contrast to convolution-based or attention-based models, LiST is built entirely upon linear mappings, residual structures, and lightweight attention mechanisms. Its central idea is to apply linear feature transformations separately along the temporal and feature dimensions, thereby explicitly modelling temporal evolution patterns and multivariate interactions. This design substantially reduces model complexity while retaining the ability to capture discriminative spatio-temporal structural information.

Conceptually, LiST employs a dual-branch encoding scheme, corresponding to two different orders of spatio-temporal feature extraction: spatial → temporal and temporal → spatial. Given an input trajectory

X

, the outputs of the two branches are formulated as

Z_{ref} = T (S (X)), Z_{flip} = S (T (X)),

(11)

where

S (\cdot)

and

T (\cdot)

denote linear feature extraction modules along the feature and temporal dimensions, respectively. This dual-path design mitigates structural bias introduced by a single modelling order, leading to more robust spatio-temporal representations.

Finally, the outputs of the two branches are fused to produce a unified trajectory representation:

Z_{LiST} = Z_{ref} \oplus Z_{flip} or Z_{ref} + Z_{flip},

(12)

where concatenation preserves richer structural information, while additive fusion offers higher computational efficiency in high-dimensional scenarios.

Overall, LiST does not aim to introduce complex nonlinear modelling, but instead achieves a balance between efficiency, stability, and representational capacity through an all-linear design. As a deployment-friendly spatio-temporal representation learning method, it provides effective latent representations for subsequent classification and analysis tasks.

4.4. Vanilla Transformer

Originally developed for natural language processing, the Transformer architecture is directly applied here to time-series data, where self-attention is performed across all time steps. The Vanilla Transformer [61] models dependencies among all temporal positions through self-attention, without relying on recurrent or convolutional structures. This design enables direct modelling of global temporal dependencies and supports fully parallel computation along the temporal dimension.

Given an input trajectory sequence

X \in R^{L \times F}

, the feature dimension is first projected to a model dimension D via a linear mapping, and positional embeddings are added to preserve temporal order information:

Z^{(0)} = X W^{in} + E, W^{in} \in R^{F \times D} .

(13)

In the lth Transformer encoder layer, the multi-head self-attention mechanism models interactions among time steps using queries (Query), keys (Key), and values (Value). For the hth attention head, the computation is given by

Q_{h} = Z^{(l - 1)} W_{Q}^{(l, h)}, K_{h} = Z^{(l - 1)} W_{K}^{(l, h)}, V_{h} = Z^{(l - 1)} W_{V}^{(l, h)},

(14)

and scaled dot-product attention is used to compute temporal correlation weights:

A_{h} = softmax (\frac{Q_{h} K_{h}^{⊤}}{\sqrt{D / H}}), O_{h} = A_{h} V_{h},

(15)

where H denotes the number of attention heads and

D / H

is the dimension per head. The outputs of multiple attention heads are concatenated and linearly projected to form the output of the multi-head attention sublayer. Together with the Feed-Forward Network (FFN), residual connections, and layer normalisation, they constitute a Transformer encoder layer.

By stacking multiple encoder layers, a temporal latent representation

Z \in R^{L \times D}

is obtained as the output of the unified encoder framework.

4.5. Denoising Autoencoder

Denoising Autoencoders (DAEs) [14] introduce noise perturbations at the input level and train an encoder–decoder model to reconstruct the clean sequence from the corrupted input, thereby promoting the learning of stable temporal semantics in the latent representation.

Given a trajectory sequence

X \in R^{L \times F}

, a noisy input is constructed as

\tilde{X} = X + ϵ, ϵ \sim N (0, σ^{2} I),

(16)

where

σ

controls the noise magnitude (dae_noise). When a padding mask is present, noise is applied only to valid time steps.

DAE employs a Transformer Encoder–Decoder backbone. Through linear projection and positional encoding,

\tilde{X}

is mapped to a latent space and encoded into a representation

Z \in R^{L \times D}

, which is subsequently decoded to obtain the reconstructed sequence

\hat{X} \in R^{L \times F}

.

The self-supervised objective minimises the denoising reconstruction error over valid time steps:

L_{ssl}^{DAE} = λ_{rec} \frac{1}{\sum_{t = 1}^{L} m_{t}} \sum_{t = 1}^{L} m_{t} {∥{\hat{x}}_{t} - x_{t}∥}_{2}^{2},

(17)

where

m \in {0, 1}^{L}

denotes the valid time-step mask and

λ_{rec}

is the loss weight.

4.6. Evidential Autoencoder

Building upon the DAE, Evidential Autoencoders (EAEs) [13] further model the decoder outputs as parameters of an evidential distribution, thereby constraining not only the reconstruction mean but also the predicted uncertainty structure.

EAE adopts the same input perturbation scheme as DAE (Equation (16)) while producing Normal-Inverse-Gamma (NIG) parameters at the decoder. Given the latent representation

Z \in R^{L \times D}

obtained from the Transformer encoder, the EAE decoder outputs four groups of NIG parameters:

(μ, v, α, β) = D_{Trm} (Z), μ, v, α, β \in R^{L \times F},

(18)

where

μ

represents the deterministic reconstruction mean and is directly used in downstream tasks while the remaining parameters characterise the evidence strength and distribution shape.

The self-supervised objective of EAE consists of an evidential negative log-likelihood and a regularisation term:

L_{ssl}^{EAE} = λ_{nll} L_{nll} + λ_{reg} L_{reg},

(19)

where

L_{nll}

denotes the average NIG negative log-likelihood over valid time steps, and

L_{reg}

penalises excessive confidence. Specifically, the regularisation term is defined as

L_{reg} = E [{(X - μ)}^{2} ⊙ (2 v + α)],

(20)

with the expectation computed only over valid time steps using the padding mask.

4.7. GroupMAE

GroupMAE is designed for high-dimensional ship trajectory sequences, where

X \in R^{L \times F}

and F denotes the feature dimension. Its key characteristic is that masking is performed on feature groups, rather than individual feature dimensions, and the selected feature groups are masked consistently across all time steps.

Feature groups

G = {g_{1}, \dots, g_{G}}

are constructed based on feature names, where each

g_{j}

corresponds to a subset of feature indices. For each sample,

n_{mask}

groups are randomly selected for masking, as determined by the mask rate and the minimum number of masked groups. This results in a binary mask tensor

M \in {0, 1}^{L \times F}

, and the masked input is constructed as

\tilde{X} = X ⊙ (1 - M),

(21)

where ⊙ denotes element-wise multiplication. When a padding mask is present, masking is applied only to valid time steps, preventing padded positions from contributing to training.

GroupMAE adopts a Transformer Encoder–Decoder architecture to encode

\tilde{X}

into a latent representation

Z \in R^{L \times D}

and decode it to obtain the reconstructed sequence

\hat{X} \in R^{L \times F}

, as illustrated in Figure 6.

The reconstruction loss is computed only at masked positions:

L_{rec} = \frac{1}{{∥ M ∥}_{1}} {∥(\hat{X} - X) ⊙ M∥}_{2}^{2},

(22)

where

{∥ M ∥}_{1}

denotes the number of masked elements over valid time steps.

To improve robustness under varying operating conditions, an additional Risk Extrapolation (REx) regularisation term is introduced. The core idea is to constrain the variance of reconstruction risk across different environments. Let the environment label of each sample be

e_{i} \in {1, \dots, E}

, either provided by the data or constructed as pseudo-environments within a batch via quantile-based binning of input features.

The reconstruction risk for each environment e is computed as

r_{e} = E_{i : e_{i} = e} [L_{rec} (X_{i})],

(23)

and a variance penalty is applied:

L_{rex} = Var ({r_{e}}_{e = 1}^{E}) .

(24)

This regularisation encourages consistent reconstruction performance across environments, thereby discouraging reliance on environment-specific statistical cues.

The final Self-Supervised Learning (SSL) optimisation objective is

L_{ssl}^{GroupMAE} = λ_{rec} L_{rec} + λ_{rex} L_{rex} .

(25)

5. Representation Learning Pipelines

Since different self-supervised encoders adopt heterogeneous learning objectives with distinct formulations and numerical scales, their self-supervised loss values are not directly comparable and thus cannot be used to quantitatively assess the quality of the learnt latent representations. To address this issue, this paper designs two interconnected yet purpose-specific representation learning pipelines, namely Pipeline 1 for fair encoder evaluation and optimal encoder selection and Pipeline 2 for high-quality representation extraction and structural analysis (see Figure 7).

In this section, the trajectory representation

X \in R^{L \times F}

defined in Equation (2) is adopted by default, and the differences in data processing and training strategies across the two pipelines are clarified in detail.

5.1. Dataset Splitting Strategy

Given the complete ship trajectory dataset

D = {X_{i}}_{i = 1}^{N}

, the data are split into three mutually exclusive subsets following a

7 : 2 : 1

ratio, namely the training set

D_{tr}

, the validation set

D_{va}

, and the test set

D_{te}

. The training set is used for parameter optimisation, the validation set for early stopping and hyperparameter selection, and the test set for final performance evaluation.

5.2. Pipeline 1: Encoder Evaluation and Optimal Hyperparameter Selection

The objective of Pipeline 1 is to compare the quality of latent representations learnt by different self-supervised encoders across a unified downstream task and to select the optimal encoder architecture and its corresponding hyperparameter configuration.

Feature Debiasing

In the original feature space, a subset of features is directly or explicitly correlated with ship-type labels. Let the number of such features be denoted as M. In Pipeline 1, to prevent label information leakage and ensure a fair comparison among different encoders, a feature debiasing operation is applied to the input trajectories by removing these ship-type-related features, resulting in the debiased input

X^{'} \in R^{L \times (F - M)} .

(26)

The specific features removed (

M = 6

) and the full input feature inventory (

F = 35

) are listed in Table A4 in Appendix A.

This operation ensures that the encoders learn representations purely from kinematic and dynamic motion patterns, such that the resulting latent representation

Z = E (X^{'})

, which is defined in Equation (3), faithfully reflects the temporal modelling capability of the encoder.

Stage 0: Self-Supervised Learning Pretraining

In Stage 0, the unified encoder–decoder framework introduced in the previous sections is adopted. Given an input trajectory

X^{'}

, the encoder maps the input to a latent representation,

Z = E (X^{'}),

(27)

which is subsequently processed by the corresponding decoder

D (\cdot)

or prediction head, depending on the specific SSL paradigm. Each encoder is trained by minimising its own SSL objective,

min_{θ_{E}^{(m)}, θ_{D}^{(m)}} L_{ssl}^{(m)},

(28)

where

L_{ssl}^{(m)}

denotes the self-supervised loss of the mth encoder (e.g., reconstruction loss, denoising loss, masked modelling loss, or evidential loss). This stage is performed without using any ship-type labels. Different encoders (TCN, LiST, Transformer, DAE, EAE, GroupMAE, etc.) are trained independently, resulting in pretrained encoder parameters

θ_{E}^{(m)}

.

Stage 1: Frozen Encoder Supervised Evaluation

In Stage 1, the encoder parameters are frozen,

E (\cdot; θ_{E}^{(m)}) fixed,

(29)

and a lightweight classification head is introduced on top of the latent representation

Z

. The supervised mapping defined in Equation (6) is adopted for ship-type prediction. The classification head is trained by minimising the supervised classification loss,

min_{θ_{C}} L_{cls},

(30)

while keeping the encoder unchanged. Model optimisation is performed on the training set

D_{tr}

, and the validation set

D_{va}

is used for early stopping and hyperparameter selection. This stage evaluates the discriminative power of the learnt representation

Z

without altering the encoder, ensuring that performance differences primarily reflect the quality of the encoder itself rather than the capacity of the classifier.

Stage 2: Joint Fine-tuning

In Stage 2, both the encoder and the classification head are jointly optimised in an end-to-end supervised manner. Specifically, the following objective is minimised:

min_{θ_{E}^{(m)}, θ_{C}} L_{cls} .

(31)

Training is conducted on the training set

D_{tr}

, while the validation set remains responsible for early stopping and model selection. Final performance is reported exclusively on the test set

D_{te}

. This stage assesses the adaptability of each encoder under supervised fine-tuning and provides an upper bound on downstream performance.

Hyperparameter Optimisation and Encoder Selection

For each encoder architecture, hyperparameter optimisation is performed using the Tree-structured Parzen Estimator (TPE) implemented in the Optuna framework [62]. The optimisation objective is the validation classification accuracy obtained in Stage 1. By performing hyperparameter grid search (e.g., latent dimension, network depth, mask ratio, noise level) and comparing supervised performance across Stage 1 and Stage 2, Pipeline 1 selects the optimal encoder and its corresponding hyperparameter configuration, denoted as

(E^{⋆}, θ^{⋆}),

(32)

where

E^{⋆}

represents the selected encoder architecture and

θ^{⋆}

denotes its optimal hyperparameter set.

5.3. Pipeline 2: Representation Extraction and Structural Analysis

Pipeline 2 builds upon the optimal encoder

E^{⋆}

selected by Pipeline 1 and shifts the focus from encoder comparison to representation analysis. Its objective is to extract high-quality latent representations and investigate their intrinsic structural and interpretable properties.

Full Feature Space

Unlike Pipeline 1, Pipeline 2 operates on the complete trajectory representation

X \in R^{L \times F}

, including all kinematic, contextual, and ship-type-related features. This enables the encoder to leverage the full richness of the input data and produce representations that are optimally suited for downstream behavioural analysis and clustering tasks.

Stage 0: Self-Supervised Learning Pretraining

Using the optimal hyperparameters identified in Pipeline 1, SSL training is performed using the original trajectory representation

X

with the complete feature space, and only the training set

D_{tr}

is employed. The encoder is trained by minimising its corresponding self-supervised objective,

Z = E^{⋆} (X), min_{θ_{E}^{⋆}} L_{ssl}^{⋆},

(33)

without involving ship-type labels in any part of the training process. Early stopping is applied based on the validation reconstruction loss. Upon convergence, the encoder is used to extract latent representations for all trajectories in the dataset:

Z_{i} = E^{⋆} (X_{i}), i = 1, \dots, N .

(34)

After training, the resulting latent representation

Z

is used for subsequent unsupervised analyses, including clustering, embedding-space visualisation, and post hoc comparison with ship-type labels, in order to reveal intrinsic ship-type structures and other interpretable motion patterns captured by the learnt representations.

Embedding Clustering as a Behavioural Structure Probe

Clustering serves as an operational probe to assess how well different feature spaces capture interpretable behavioural organisation in AIS data. We consider two complementary feature spaces: (i) the learnt embedding space

z \in R^{D}

produced by

E^{⋆}

(e.g.,

D = 128

) and (ii) an expert-designed feature space derived from commonly used nautical descriptors, including SOG, COG, turn rate, and proximity to ports or fairways. This dual-view design enables direct comparison between the behavioural structures induced by representation learning and those encoded by domain-driven features.

Stage 1: Unsupervised Clustering Analysis

To investigate the behavioural structure encoded in the latent space, unsupervised clustering is performed on the extracted representations

{Z_{i}}_{i = 1}^{N}

. AIS trajectories exhibit non-linear, multi-modal structure, elongated corridors, branching patterns, heterogeneous density, and substantial noise. To accommodate these properties, Pipeline 2 focuses on density-based and graph-based clustering models capable of recovering arbitrarily shaped clusters, robustly handling heterogeneous density, and naturally supporting noise/outlier treatment.

Multiple clustering algorithms are evaluated, including Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [63], Variational Bayesian Gaussian Mixture Model (VBGMM) [64], k-nearest neighbour graph with Leiden community detection [65], and First Integer Neighbour Clustering Hierarchy (FINCH) [66]. The specific models considered are summarised in Table 4.

For each clustering method, hyperparameter tuning is conducted to optimise the Density-Based Clustering Validation (DBCV) score [67], which measures the quality of density-based cluster separation. The resulting cluster assignments are analysed to assess:

The semantic interpretability of discovered behaviour modes;
The alignment between learnt clusters and expert-defined vessel categories;
The robustness of cluster structure across different clustering algorithms.

Clustering Validation Metrics

To compare clusterings across feature spaces and models, we employ two complementary categories of metrics: (i) intrinsic density/graph-based metrics suited to non-spherical clusters (DBCV, Conductance, Modularity), which measure cluster quality without requiring ground-truth labels, and (ii) traditional centroid-based metrics (Silhouette, Calinski–Harabasz, Davies–Bouldin) provided as baseline references, though these assume convex cluster geometry less appropriate for maritime trajectories. The intrinsic metrics are prioritised because clustering serves as an operational probe of representation quality rather than a classification task, focusing on discovering latent behavioural structure naturally encoded in the representation space. Key intrinsic metrics and their interpretations are summarised in Table 5.

Dimensionality Reduction for Visualisation

To facilitate visual inspection of the learnt representations, the high-dimensional latent embeddings

Z \in R^{D}

are projected into a two-dimensional space using Uniform Manifold Approximation and Projection (UMAP) [68]. UMAP is a manifold learning technique that preserves both local and global structure, making it well suited for visualising cluster separation and topological relationships in the embedding space.

The two-dimensional projections are colour-coded by cluster assignment and vessel type, enabling qualitative assessment of representation quality and behavioural coherence.

Table 5. Key clustering metrics used in Pipeline 2, with emphasis on intrinsic density/graph-based metrics.

Metric	Type	What It Measures	Preferred Direction
DBCV [67]	Density-intrinsic	Density connectivity within clusters versus density separation between clusters using a density-based distance.	Higher is better.
Conductance [69]	Graph-intrinsic	Edge-cut quality: internal connectivity of a cluster relative to its boundary with the rest of the graph.	Lower is better.
Modularity [65,69]	Graph-intrinsic	Strength of community division relative to a random-graph expectation; primary optimisation objective in Leiden community detection.	Higher is better.

6. Experiments

This section evaluates self-supervised representation learning methods adopted for AIS-based trajectory data under a unified experimental protocol and analyses the intrinsic structure of the learnt representation space. Because the considered self-supervised objectives are heterogeneous and their loss values are not directly comparable, model selection is performed using common downstream task (Pipeline 1), followed by an unsupervised structural analysis of the learnt representation (Pipeline 2).

6.1. Experimental Setup

Post-processing Experimental Dataset and Unified Data Load

We conduct all experiments on a processed AIS trajectory dataset covering the port of Kiel and surrounding waters, extracted from the year 2022–2023 AIS streams.

The processes trajectories are presented on a fixed temporal grid with sampling interval of 5 s, and interactions are computed when vessels are within 2 km, to keep the interaction input size fixed; only the two neighbours with the highest collision-risk score are retained per time step. After applying the comprehensive processing pipeline described in Section 3 (Contribution 1), the dataset comprises 176,787 cleaned trajectories from 9948 unique vessels. The complete post-processed feature set used in the experiments—including static ship-, dynamic ship-, and pairwise ship-to-ship— is listed in Appendix A.

Table 6 summarises the key characteristics of the post-processed dataset and the sampling strategy employed in both experiments.

The unified data loader constructs learning samples by segmenting the post-processed 176,787 trajectories (already on a 5 s temporal grid) into fixed-length sequences of

L = 120

time steps, equivalent to 10 min of vessel movement without overlap. This segmentation procedure yields 527,225 trajectory segments that serve as the fundamental unit of analysis in both experiments.

At each time step, ship-to-ship interaction features are available for vessels within 2 km, with only the two neighbours exhibiting the highest collision-risk score retained to maintain fixed input dimensionality.

Shorter segments are zero-padded and accompanied by a binary mask to ensure that padded values do not contribute to training objectives or evaluation metrics.

All continuous features are subjected to Z-score standardisation. The mean and standard deviation were computed exclusively from the training set to prevent information leakage, and subsequently applied to the training, validation, and test sets.

Periodic angular features, such as course, heading, relative bearing are encoded using sine and cosine transforms to avoid discontinuities at wrap-around points (e.g.,

0^{\circ} \equiv 360^{\circ}

).

Dataset splitting and leakage contro

We split the 527,225 segments into training (

D_{t r}

), validation (

D_{v a}

), and test (

D_{t e}

) sets with a [7:2:1] ratio. To prevent information leakage in the segment-based split or shared interaction features, all segments sharing the same trajectory identifier (traj_id) or overlapping temporal windows are assigned to the same subset. This ensures that no interaction scene is fragmented across training and evaluation splits, maintaining the integrity of temporal dependencies and vessel-specific behaviour patterns. The validation set

D_{v a}

is used exclusively for hyperparameter selection and early stopping in Experiment I (Section 6.2), while the test set

D_{t e}

is held out for final performance reporting. For Experiment II (Section 6.3), clustering is performed on a fixed random sample of 50,000 segments drawn from the combined dataset to ensure computational reproducibility across representations and methods.

Experimental scope and evaluation strategy

The remainder of this section presents two complementary experiments aligned with the dual-pipeline evaluation methodology (Contribution 6).

Experiment I (Section 6.2) implements the three-stage representation learning framework (Contribution 2):

Stage 0: self-supervised pretraining with hyperparameter optimisation conducted exclusively on self-supervised validation loss.
Stage 1: linear probe evaluation to isolate representation quality from classifier capacity.
Stage 2: full fine-tuning as an upper-bound reference.

This workflow benchmarks three encoder architectures under the ship-type classification probing task. The three proposed methods, GMAE with environment-aware learning via REx (Contribution 3) and semantic feature grouping through group-wise masking (Contribution 4), EAE with uncertainty-aware representations via evidential regression, and DAE as a denoising baseline, and three established baselines, Transformer, TCN, and LiST, are used to establish comparative performance across architectural paradigms.

Experiment II (Section 6.3) employs unsupervised clustering with explicit assessment of intrinsic structure quality (Contribution 6) to validate whether learnt representations naturally organise trajectory segments into coherent behavioural groups aligned with operational context.

Together, these experiments establish representation quality through both discriminative power (supervised probing) and interpretable structure (unsupervised discovery), providing comprehensive validation of the proposed framework’s ability to learn robust representations that generalise across diverse environmental conditions.

6.2. Experiment I: Supervised Probing for Encoder Selection

Scope and evaluation protocol

This experiment evaluates the three-stage representation learning framework (Contribution 2) by benchmarking six self-supervised encoder architectures under a unified downstream task: ship-type prediction.

We compare three proposed methods (GMAE-REx, EAE, DAE) against three established baselines (Transformer, TCN, LiST) to establish which architectural approach yields the most discriminative trajectory representations when only limited supervision is available.

Feature debiasing for fair encoder comparison

To ensure a fair comparison on the ship-type probing task and avoid trivial shortcuts, ship-type indicative static attributes are removed or neutralised following the debiasing procedure in Section 5.2. The complete list of all 35 input features and the six features excluded under this procedure are detailed in Table A4 (Appendix A). This debiased input is used consistently across all encoders in Pipeline 1. For consistency across pipelines, the same debiasing is also applied when constructing the expert-selected feature representation in Pipeline 2. As a result, performance differences primarily reflect how representations capture motion and interaction patterns, rather than memorising vessel metadata that directly encodes ship type.

Environment-aware learning

The benchmarked encoders include GMAE-REx, which combines semantic feature grouping with REx regularisation (Contribution 3) to penalise reconstruction strategies favouring particular environmental conditions, and EAE, which provides uncertainty-aware representations through evidential regression that distinguishes epistemic from aleatoric uncertainty. DAE serves as a denoising reconstruction baseline, providing a strong reference point for evaluating the additional benefits of environment-aware learning (GMAE-REx) and uncertainty quantification (EAE).

Three established baseline encoders—Transformer (attention-based), TCN (convolution-based), and LiST (linear projection)—provide comparative reference points across different architectural paradigms.

These design choices directly address the environmental heterogeneity and domain shift challenges outlined in the introduction, enabling learning of representations that remain valid across distributional shifts between training and deployment scenarios.

Hyperparameter optimisation

Following Contribution 2, hyperparameters are optimised using Optuna [62] with TPE sampling over the search space defined in Table 7, with selection based exclusively on validation accuracy. For each encoder, we run 30 Optuna trials and select the configuration with the highest Stage 2 validation accuracy on

D_{v a}

with early stopping.

Across all runs, we optimise with Rectified Adam (RAdam) [70] (learning rate

1 \times 10^{- 3}

), train for up to 50 epochs, and apply early stopping with patience of 5 epochs.

Ablation and sensitivity analysis for GMAE-REx

To understand the contribution of the main design choices in GMAE-REx, we conduct a one-factor-at-a-time ablation under the same three-stage protocol (SSL pretraining → linear probe → fine-tune) and report both linear-probe accuracy (representation quality) and fine-tune accuracy (downstream-task performance).

Sensitivity to group mask ratio

Table 8 sweeps the group mask ratio under the coarse grouping scheme with env scheme = density4 and

λ_{rex} = 0.1

. Overall, moderate masking yields competitive probe accuracy, while downstream fine-tuning remains stable across a wide range of mask ratios.

Grouping granularity (levelA vs. levelB)

We define two semantic grouping granularities for group-wise masking. levelA is a coarse partition that aggregates features into a small number of robust blocks (e.g., time, kinematics, geo/map, density, interaction/risk, shape, and categorical/meta), while levelB further refines these blocks into sub-groups (e.g., hour/day/month time cycles, heading/ course vs. speed/acc motion, absolute vs. relative coordinates, global vs. group-specific density, and risk vs. bearing vs. relative-motion interaction features). This design allows us to control the strength and structure of the masked reconstruction task: levelA emphasizes broad cross-group dependencies, whereas levelB provides a more structured masking target and typically yields a richer reconstruction signal for invariance regularisation. We next compare the coarse semantic grouping (levelA) to a finer-grained grouping (levelB) while fixing group mask rate = 0.5, env scheme = density4, and

λ_{rex} = 0.1

. As shown in Table 9, levelB consistently improves downstream fine-tuning, suggesting that more structured feature partitions provide a stronger reconstruction signal and a more informative invariance regularisation target.

Environment scheme

Table 10 evaluates different pseudo-environment definitions for REx under group scheme = levelB, group mask rate = 0.5, and

λ_{rex} = 0.1

. We observe small but consistent differences: incorporating temporal proxies (e.g., densityhour16) slightly improves fine-tune accuracy, while the simpler density4 remains a strong default with minimal partitioning complexity.

Strength of REx regularisation

Finally, we ablate

λ_{rex}

under group scheme = levelB, group mask rate = 0.5, and env scheme = densityhour16. Table 11 suggests that a mild invariance penalty is beneficial (

λ_{rex} = 0.05

), while stronger regularisation may slightly degrade both probe and fine-tune performance.

Selected configuration

Based on the above ablations, we choose group scheme = levelB with a strong group masking (group mask rate = 0.5) and the environment partition from the combination of density and hours (densityhour16) with

λ_{rex} = 0.05

as the default setting. This configuration achieves the best fine-tune accuracy in our sweep and is used as the reference GMAE-REx setup in subsequent experiments.

Results and selected encoder

Table 12 reports the best hyperparameters and corresponding validation accuracy for each encoder. GMAE-REx achieves the highest validation accuracy (

86.03 %

), outperforming all baselines: transformer-based (DAE:

85.63 %

, EAE:

85.56 %

, Transformer:

84.93 %

), linear (LiST:

85.12 %

), and convolution-based (TCN:

76.27 %

). The superior performance of GMAE-REx validates the effectiveness of combining semantic feature grouping (Contribution 4) with environment-aware learning via REx regularisation (Contribution 3), which encourages representations that generalise across varying traffic densities, geographical contexts, and temporal conditions rather than memorising environment-specific statistical cues. GMAE-REx is therefore selected as the final encoder, denoted by

(E^{⋆}, θ^{⋆})

, and used in Experiment II for unsupervised structural analysis.

6.3. Experiment II: Unsupervised Clustering for Structural Discovery

Scope and evaluation objective

This experiment implements the unsupervised component of the dual-pipeline evaluation methodology (Contribution 6), employing clustering to validate whether learnt representations naturally organise trajectories into interpretable behavioural groups aligned with operational context—without relying on manual annotations. Clustering serves as an operational probe of representation quality because maritime navigation exhibits multi-modal behaviour arising from heterogeneous factors (traffic regulations, bathymetry, vessel intent), and regions of high trajectory density often correspond to semantically recognisable structures such as traffic lanes, anchorage zones, and manoeuvring areas. We compare learnt embeddings extracted from the encoder selected in Experiment I (GMAE-REx) against expert-selected nautical features (Contribution 7) to assess whether representations learnt via the proposed framework (Contributions 2–5) capture behavioural structure that extends beyond domain-engineered features.

Clustering backends and design rationale

We evaluate four clustering methods selected for their complementary strengths in capturing complex, non-linear structures characteristic of maritime navigation, as summarised in Table 13.

kNN-Leiden constructs a k-nearest neighbour graph and applies modularity-based community detection with guaranteed connectivity, suitable for route-network structures with heterogeneous density. HDBSCAN builds a hierarchical density model and selects stable clusters while explicitly identifying noise (anomalous trajectories), enabling multi-scale analysis of local manoeuvres and global patterns. VBGMM employs variational Bayesian inference with automatic component selection, providing probabilistic assignments and uncertainty quantification that align with the framework’s emphasis on uncertainty-aware representations (Contribution 5). FINCH offers parameter-free hierarchical clustering, enabling exploratory multi-scale analysis with minimal tuning.

Table 14 lists the distance metrics and optimisation objectives for each method–representation pair.

Two-stage hyperparameter optimisation

For each clustering method and representation, we follow a two-stage optimisation procedure, with random seeds fixed for reproducibility, to balance exploration and computational efficiency. Stage 1 (Rough optimisation): Optuna with TPE sampling explores the search space defined in Table 15 on a subset of 500 samples.

Stage 2 (Fine optimisation): A focused grid search refines the Optuna solution on the same 500-sample subset using the centred grid defined in Table 16. The best configuration from Stage 2 is used to cluster the full fixed pool of 50,000 segments.

Evaluation metrics

Following Contribution 6, we incorporate intrinsic clustering metrics that do not require ground-truth labels and are suitable for non-convex, arbitrary-shape clusters. The primary metrics are DBCV (density-based cluster validity), conductance (graph edge-cut quality), and modularity (community structure strength), selected for their alignment with density-based and graph-based clustering paradigms. Detailed descriptions, value ranges, and interpretation guidelines are provided in Appendix C.

Additional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) are computed, see Appendix D, but are not used for primary comparison due to their restrictive geometric assumptions. Finally, UMAP visualisations are deployed for further qualitative inspection.

Quantitative results

Table 17 presents the clustering metrics and the number of detected clusters for each method–representation pair. Across kNN–Leiden, FINCH, and VBGMM, learnt embeddings consistently achieve stronger intrinsic validity than expert features, reflected in higher DBCV, improved conductance, and increased modularity. In contrast, for HDBSCANembeddings, DBCV yields a higher DBCV (

0.112

vs.

0.042

) but poorer conductance (

0.479

vs.

0.311

). This behaviour is expected, as HDBSCAN optimises density-based cluster stability rather than graph boundary quality; consequently, conductance is less informative for density-optimised partitions (see Appendix C for a discussion of metric–method alignment).

kNN-Leiden discovers 47–48 fine-grained communities with high modularity (>0.87), reflecting well-connected route-network structures. VBGMM identifies 28 probabilistic components with consistent ELBO across representations. HDBSCAN detects 2–3 density-separated clusters with explicit noise handling, where embeddings reveal additional density modes (3 clusters including noise) compared to expert features (2 clusters). FINCH produces 8–27 hierarchical partitions, where embeddings yield finer-grained structures (27 partitions), suggesting richer multi-scale behavioural hierarchies. Segment count distributions per cluster are visualised in Appendix E. Pairwise cluster-assignment agreement (ARI, NMI, FMI) across all four methods is reported in Appendix J; the three fully covering algorithms converge to mean NMI

= 0.60

on learnt embeddings vs.

0.37

on expert features, confirming that the identified behavioural groupings are a robust, algorithm-independent property of the representation space.

Interpretation and operational context alignment

Figure 8 compares the organisational structure of learnt embeddings and expert features through UMAP projections of VBGMM clustering results (28 components). The visual analysis reveals a fundamental difference in how the two representations partition maritime trajectories.

The learnt embeddings (Figure 8a) exhibit well-separated, spatially compact clusters forming distinct communities with clear boundaries and minimal overlap, as shown in the left panel coloured by cluster assignment. Critically, the ship-type overlay (right panel) demonstrates that individual clusters contain mixed vessel types—cargo, passenger, sailing, and other vessels coexist within the same behavioural community. This heterogeneity indicates that the learnt representation does not separate trajectories by vessel class, but instead organises them according to behavioural patterns that transcend vessel type. Section 7 provides SHAP analysis quantifying the specific features responsible for embedding formation and cluster assignments.

In contrast, expert features (Figure 8b) produce a diffuse clustering structure dominated by a single large central cluster (dark red/brown, left panel) with substantial inter-cluster overlap throughout the UMAP space. The ship-type overlay (right panel) reveals fundamentally different organisation: vessel types form more segregated spatial regions, with cargo vessels concentrated in the dominant central mass and passenger/sailing vessels occupying peripheral areas. This pattern suggests that expert features—comprising kinematic descriptors and spatial attributes—primarily cluster vessels by their characteristic motion profiles and physical properties rather than by operational behavioural context.

The quantitative metrics in Table 17 confirm the superior clustering structure of learnt embeddings: DBCV (

- 0.594

vs.

- 0.742

), conductance (

0.193

vs.

0.498

), and modularity (

0.756

vs.

0.419

). This embedding-based organisation directly supports the framework’s objective (Contribution 1): by grouping trajectories according to behavioural patterns rather than vessel attributes, learnt representations enable an autonomous vessel to identify operationally similar navigation scenarios across different vessel types. For instance, a cargo vessel and a passenger ferry executing similar manoeuvring patterns would be grouped together in embedding space despite their different physical characteristics, allowing an autonomous system to learn from both examples and transfer knowledge across vessel classes. Expert features, by separating vessels primarily by kinematic and physical attributes, do not provide this cross-vessel behavioural coherence necessary for context-aware autonomous decision-making.

Figure 8 presents the representative case for VBGMM, complete UMAP projections for all clustering methods are provided in Appendix F, where similar patterns are consistently observed.

7. Discussion

This study presents a systematic evaluation of Self-Supervised Learning Representation Learning for maritime trajectory analysis through a dual-pipeline framework. Pipeline 1 employs supervised linear probing to identify the encoder architecture that yields the most discriminative latent representations for vessel-type classification, while Pipeline 2 applies unsupervised clustering to assess whether learnt embeddings naturally expose intrinsic behavioural structure aligned with operational context. The comparative evaluation benchmarks three proposed methods—GMAE-REx (semantic group masking with environment-invariant regularisation), EAE (Evidential uncertainty quantification), and DAE (reconstruction baseline)—against three established architectural baselines (Transformer, TCN, LiST). Both pipelines operate on two-year AIS dataset from the Kiel Fjord region (Contribution 1), processed through a comprehensive preprocessing pipeline that transforms raw positional broadcasts into structured trajectory representations enriched with kinematic, spatial, density, and ship-to-ship interaction features.

7.1. Encoder Selection via Supervised Probing

Experiment I follows the three-stage framework—self-supervised pretraining, linear probe, and full fine-tuning—to compare three encoders under a unified ship-type prediction task, and three baselines. This three-stage protocol identifies GMAE-REx as the optimal encoder architecture, achieving a validation accuracy of

86.03 %

on the debiased ship-type classification task. This performance surpasses all baseline methods: DAE achieves

85.63 %

, EAE reaches

85.56 %

, the Vanilla Transformer attains

84.93 %

, LiST obtains

85.12 %

, and TCN achieves

76.27 %

.

The performance margin over the strongest baselines is modest but consistent across hyperparameter configurations, indicating that the combination of semantic feature-group masking and REx-based regularisation offers a measurable advantage over purely reconstruction- or evidential-based objectives.

The feature debiasing protocol in Pipeline 1—removing vessel dimensions and static attributes—ensures that classification performance reflects behavioural pattern learning rather than memorisation of vessel characteristics. See Section 5.

Under this constraint, the superior performance of GMAE-REx supports the hypothesis that group-wise masking encourages the model to learn cross-category relationships between temporal, kinematic, spatial, density, and interaction features, whilst REx discourages environment-specific overfitting by penalising reconstruction risk variance across traffic-density-based pseudo-environments. The gap between attention-based encoders (GMAE-REx, DAE, EAE, Transformer;

84.93

–

86.03 %

) and the convolutional TCN baseline (

76.27 %

) suggests that global temporal dependencies captured by self-attention are more informative for this probing task than local receptive fields constructed via dilated convolutions. At the same time, the competitive performance of the LiST encoder (

85.12 %

) indicates that explicit linear temporal–spatial projections can provide a viable, computationally efficient alternative when deployment conditions favour lightweight models. Based on these results, GMAE-REx is selected as the final encoder

(E^{⋆}, θ^{⋆})

and used in Pipeline 2 for unsupervised structural analysis.

7.2. Intrinsic Structure of Learnt Embeddings

Experiment II compares the intrinsic clustering structure of GMAE-REx embeddings with that of an expert-selected nautical feature set across four clustering backends. On a fixed pool of 50,000 segments, embeddings consistently outperform expert features for kNN–Leiden, FINCH, and VBGMM across the primary intrinsic metrics DBCV, conductance, and modularity (Table 17).

For kNN–Leiden, DBCV improves from

- 0.791

(expert) to

- 0.533

(embeddings), conductance decreases from

0.327

to

0.186

, and modularity increases from

0.875

to

0.906

, indicating denser communities with cleaner graph cuts and stronger community structure in the embedding space.

For FINCH, embeddings increase DBCV from

- 0.907

to

- 0.588

and modularity from

0.671

to

0.756

, whilst the number of communities rises from 8 to 27, revealing finer-grained multi-scale structure.

For VBGMM, embeddings improve DBCV from

- 0.742

to

- 0.594

, reduce conductance from

0.498

to

0.193

, and raise modularity from

0.419

to

0.756

with the same number of components (28), suggesting a more coherent probabilistic partition.

HDBSCAN displays a different pattern: embeddings yield higher DBCV (

0.112

vs.

0.042

) but worse conductance (

0.479

vs.

0.311

) relative to expert features. This behaviour is consistent with HDBSCAN’s optimisation objective, which favours density-based cluster stability and explicit noise identification rather than graph boundary quality; conductance is therefore less informative for assessing HDBSCAN partitions.

Taken together, these results indicate that the latent space learnt by GMAE-REx better captures density-connected and community-structured organisation of trajectories than the expert-crafted representation, especially under graph- and mixture-based clustering methods. For instance, considering UMAP visualisations of VBGMM components clarifies how embeddings differ from expert features (Figure 8). In the embedding space, 28 components form compact and well-separated clusters with clear boundaries, where each cluster contains mixed ship types (cargo, passenger, sailing, other), indicating organisation by operational navigation modes that transcend vessel class. In contrast, the expert feature space exhibits a dominant central cluster with substantial overlap between components and stronger ship-type segregation, suggesting that clustering primarily follows kinematic and physical similarities (e.g., typical speed profiles per vessel class). This behaviour-centric organisation in the embedding space is advantageous for MASS, as it facilitates recognition of similar navigation scenarios across heterogeneous vessel types.

7.3. Feature-Based Analysis: Operational Context Alignment

Feature-based analysis provides additional evidence that the discovered structure is behaviour-relevant rather than arbitrary. When the embedding space is coloured by trajectory-level mean values of contextual and physical variables (Figure 9), distinct regions align with interpretable factors such as vessel size, water depth, distance to land, and distance to restricted areas.

Cargo trajectories concentrate in two dominant regions on the left side of the embedding space (Figure 9a). Both regions are associated with larger vessel length (Figure 9b) and width (Figure 9c), while they differ systematically in operating context. One region aligns with deeper waters (Figure 9d) and larger distances to land (Figure 9e) and restricted areas (Figure 9f), whilst the other region aligns with shallower waters and closer proximity to coastlines and regulated regions. Passenger and sailing trajectories appear more frequently in other regions (Figure 10) and are associated with smaller vessel dimensions and different distance-to-land patterns. Passenger trajectories (Figure 10b) occur closer to land, consistent with ferry operations in confined port waters, while sailing trajectories (Figure 10c) are more common in central regions farther from the coastline.

These patterns support the interpretation that the embedding space captures meaningful operational context and behaviour-related conditions rather than vessel identity alone.

7.4. Explainability of Cluster Structure via SHAP

To understand which input features drive the cluster structure in embedding space, we apply Kernel SHapley Additive exPlanations (SHAP) analysis to the VBGMM assignments (28 components) on learnt embeddings. SHAP values quantify the marginal contribution of each feature to cluster assignments, providing a principled explainability framework grounded in cooperative game theory. VBGMM is selected as the primary example for detailed discussion due to its probabilistic partition structure and balanced cluster sizes; complete SHAP analyses for kNN-Leiden, FINCH, and HDBSCAN, including per-cluster feature attributions and sensitivity studies, are provided in Appendix H.

7.4.1. Global Feature Importance

Global SHAP values aggregated across all 28 VBGMM clusters reveal that temporal encodings dominate the ranking (Figure 11): day_of_year_cos and day_of_year_sin exhibit the highest mean absolute importance, followed by day_of_week_cos and day_of_week_sin.

This temporal dominance suggests that seasonal and weekly patterns—such as summer ferry traffic versus winter shipping schedules, or weekday commercial operations versus weekend recreational activity—are primary drivers of behavioural variation in the Kiel Fjord dataset. Kinematic features occupy mid-range positions (length, speed, course_cos, course_sin, acc), whilst environmental features (water_depth, dist_to_land, density_all, density_own_group, dist_to_ferry_route) occupy mid-to-lower ranks. Vessel attributes (width, ship_type) and interaction features (dcpa_0, tcpa_0, collision_risk_0) contribute at moderate levels.

This hierarchy indicates that whilst static vessel characteristics and spatial context inform cluster structure, they are not primary differentiators, consistent with the hypothesis that operational context, not vessel identity, defines behavioural modes. Global SHAP rankings for kNN-Leiden, FINCH, and HDBSCAN are provided in Appendix G.

7.4.2. Per-Cluster Feature Importance

Per-cluster SHAP rankings reveal heterogeneous feature dependencies across behavioural modes (Figure 12). Whilst temporal features remain consistently important across most clusters, specific clusters exhibit elevated importance for kinematic, environmental, or interaction features. For example, Cluster 0 (

n = 2397

, the largest cluster) shows temporal features in top positions, followed by rel_bearing_1_cos, course_sin, and dist_to_land, suggesting a high-traffic transit cluster with mixed vessel sizes operating under typical seasonal conditions. Cluster 1 (

n = 2

, an outlier cluster) exhibits strong importance for rel_bearing_1_cos, course_sin, and course_diff_1_cos, indicating trajectories characterised by specific interaction geometries and directional consistency, potentially corresponding to close-quarters encounters or special manoeuvres. Cluster 2 (

n = 10

) prioritises course_sin, dist_to_restricted_area, and rel_bearing_1_cos, suggesting navigation near regulated zones with specific heading constraints.

The diversity of per-cluster SHAP rankings indicates that GMAE-REx embeddings encode multi-faceted behavioural representations where different feature categories become salient depending on operational context, rather than relying on a single dominant signal. Complete per-cluster SHAP rankings for all 28 VBGMM clusters, as well as per-cluster analyses for kNN-Leiden (48 clusters), FINCH (27 clusters), and HDBSCAN (3 clusters), are detailed in Appendix H.

7.4.3. Cluster Centroid Profile Interpretation

Cluster centroid profiles visualise normalised mean feature values across seven key maritime dimensions—water depth, distance to land, speed, distance to restricted areas, vessel width, vessel length, and traffic density—providing intuitive geometric representations of each cluster’s characteristic operational signature (Figure 13). For each cluster, feature values are aggregated as means across all member trajectories, then normalised to [0,1] relative to the global minimum and maximum observed across all clusters within the method, enabling direct visual comparison of operational contexts.

Clusters with large polygons spanning most axes (e.g., Cluster 7,

n = 40

) indicate trajectories with consistently high normalised values across multiple dimensions, potentially representing large vessels operating in confined, high-traffic waters near restricted areas. Clusters with balanced mid-range profiles (e.g., Cluster 10,

n = 162

) suggest typical open-water transit conditions with moderate values across all features. Clusters with irregular, spike-shaped profiles (e.g., Cluster 1,

n = 2

) reveal specialisation along specific dimensions—such as elevated water depth combined with low traffic density and specific distance-to-land characteristics—consistent with offshore navigation or outlier manoeuvres.

The diversity of radar profile geometries across 28 clusters underscores the embedding’s capacity to capture fine-grained operational nuances beyond coarse vessel-type categories. These centroid visualisations complement the SHAP feature attribution analysis (Figure 11 and Figure 12) by translating cluster centroids into interpretable geometric signatures: whilst SHAP quantifies which features drive cluster assignments, centroid profiles reveal what values those features take within each cluster, enabling domain experts to validate clustering outputs against known maritime operational categories. Individual cluster profiles and cross-method comparisons for kNN-Leiden (47 clusters), FINCH (27 clusters), and HDBSCAN (3 clusters) are detailed in Appendix I.

7.5. Metric Interpretation and Granularity

The absolute values of intrinsic metrics highlight the difficulty of unsupervised behaviour discovery for maritime trajectories. DBCV remains negative for both representations across kNN-Leiden, FINCH, and VBGMM (Table 17), which is consistent with a continuous spectrum of behaviours rather than well-separated, compact clusters.

The consistent gains in DBCV, conductance and modularity indicate that the learnt embeddings provide a better geometric and graph structure for community discovery, even when behaviours remain partially continuous.

Community detection also involves a granularity trade-off. A partition with more communities is not automatically better; a useful representation is expected to support communities that are coherent, well separated in the graph, and stable under intrinsic criteria. The embedding-induced communities achieve higher modularity and lower conductance together with comparable or fewer communities (kNN-Leiden: 47 vs. 48; VBGMM: 28 vs. 28; FINCH: 27 vs. 8), which is consistent with a more structured partition rather than a fragmented one.

7.6. Comparison with Expert-Engineered Features

Including expert-designed nautical features (SOG, COG, turn rate, proximity to infrastructure, vessel dimensions) as a baseline clarifies what is gained by self-supervised representation learning. For graph-based and mixture models, embeddings consistently achieve higher DBCV and modularity and lower conductance than expert features (Table 17), with relative improvements of

32.6 %

for kNN-Leiden DBCV,

35.2 %

for FINCH DBCV, and

19.9 %

for VBGMM DBCV, indicating more compact and better-separated communities in the embedding space.

The expert feature space retains interpretability advantages for domain practitioners: a nautical expert can directly verify whether a trajectory segment exhibits high SOG or sharp turns, whereas latent embedding dimensions lack direct physical meaning. However, the UMAP projections (Figure 8, Appendix F), feature-mean visualisations (Figure 9 and Figure 10), and intrinsic metrics (Table 17, Appendix D) demonstrate that expert features emphasise vessel-class-specific motion envelopes rather than context-dependent behaviours that generalise across classes. This implies a trade-off: embedding-based clustering better supports tasks that require recognition of similar scenarios across heterogeneous vessels (e.g., identifying congested, shallow-water manoeuvring irrespective of ship type), whereas expert features may be preferable when the primary goal is to diagnose class-specific performance or to communicate results to practitioners using directly interpretable quantities. Hybrid approaches that combine learnt embeddings with a small set of high-level nautical descriptors could offer a promising compromise between structural quality and human interpretability.

7.7. Labels and Vessel-Type Alignment

The community structure is not expected to align perfectly with vessel-type labels. Vessel type is largely static, while navigation behaviour is context-driven and dynamic. Different vessel types can adopt similar manoeuvring strategies under the same encounter geometry and fairway constraints, as evidenced by the mixed-type composition of individual clusters in Figure 8. The same vessel type can also exhibit different behaviours across operational states such as transit, waiting, berthing, and service operations.

In a fully unsupervised setting, vessel type is therefore more appropriate as a post-hoc attribute for interpretation than as a target that must be separated. This design choice aligns with the framework’s objective (Contribution 1): by grouping trajectories according to behavioural patterns rather than vessel attributes, learnt representations enable an autonomous vessel to identify operationally similar navigation scenarios across different vessel types, facilitating cross-vessel knowledge transfer for context-aware autonomous decision-making.

7.8. Cluster-Level Collision Risk Case Studies

The collision risk index (Equation (1)) combines TCPA and DCPA into a single safety-relevant scalar that fires only when both conditions hold simultaneously: a vessel must be on an imminent intercept track (low TCPA) and projected to pass with dangerously small separation (low DCPA). Neither quantity alone is sufficient: a low TCPA with a large DCPA describes parallel traffic that resolves safely; a large TCPA with a small DCPA indicates a future near-miss that current separation will prevent. The composite index therefore provides a more reliable operationalisation of collision exposure than either component in isolation. To link discovered clusters to concrete navigational scenarios, Figure 14 visualises the spatial collision risk profiles for two contrasting kNN-Leiden behavioural clusters. For each cluster, the Kiel Fjord bounding box is rasterised at

10 m \times 10 m

resolution, and each cell is coloured by the mean collision risk (Equation (1)) averaged over all AIS observations from trajectories belonging to that cluster. This produces a geographic heat map of where vessels in a given behavioural group characteristically experience high or low encounter danger.

Cluster 16—high-risk encounter pattern. The spatial profile of Cluster 16 (Figure 14a) shows elevated mean collision risk concentrated in the central navigational channel and around the ferry-terminal approach of Kiel Fjord. These are the geometrically constrained zones where converging vessel tracks naturally produce small DCPA values simultaneously with short TCPA, corresponding to vessels that cannot deviate to increase separation. The cluster therefore captures a distinct behavioural mode: trajectory segments in which the ego vessel is consistently operating in close-quarters situations, regardless of vessel type or speed class. This is consistent with the SHAP analysis (Section 7.4), which identifies interaction features (collision_risk_0, dcpa_0, tcpa_0) as non-trivially contributing features in clusters with elevated encounter density.

Cluster 29—low-risk transit pattern. In contrast, the spatial profile of Cluster 29 (Figure 14b) shows near-zero mean collision risk across the entire Kiel Fjord area. The near-uniform low-risk colouration indicates that trajectory segments in this cluster consistently encounter other vessels either at large spatial separations or with ample temporal margin before closest approach—conditions characteristic of open-water transit or well-separated route following. The geographic spread of non-zero cells confirms that this is not a spatially trivial cluster (e.g., vessels exclusively in peripheral areas with no traffic), but rather a genuine behavioural class defined by how vessels interact with surrounding traffic, not merely by where they happen to be located.

Taken together, these two profiles demonstrate that the kNN-Leiden clusters discovered from GMAE-REx embeddings correspond to verifiable, safety-relevant navigational scenarios. The collision risk spatial maps provide a direct, chart-level validation criterion: if a cluster genuinely represents a cognitively distinct behavioural mode, its mean collision risk surface should show a coherent geographic pattern, not random noise distributed uniformly across the study area. The contrast between Cluster 16 and Cluster 29 confirms this hypothesis, supporting the operational validity of the learnt representation space for maritime situational awareness and risk-screening applications.

7.9. Practical Implications for Maritime Autonomous Systems

The representation and community structure obtained in this work enable several practical applications in maritime systems:

Vessel Traffic Services (VTS): Online trajectories can be mapped into the learnt embedding space and compared against community prototypes for behaviour recognition and monitoring. Dominant communities provide baselines for expected-route modelling and traffic-flow analysis, while deviations from typical community regions can be used as indicators of unusual behaviour and for early-stage risk screening.
Maritime Autonomous Surface Ships (MASS): A stable and generalisable representation can serve as input to decision-making and risk-assessment modules, reducing reliance on handcrafted features. The embedding also supports label-efficient behaviour modelling via transfer learning, which is particularly valuable under limited supervision.
Safety Analysis and Simulation: Community structure supports systematic scenario organisation by grouping trajectories with similar operational context, facilitating structured coverage of operational design domains for simulation, testing, and validation.

Offline preprocessing latency was estimated at ≈811 ms per 10 min segment on a single CPU core (see Appendix K for full experimental details), confirming that the per-unit throughput of the pipeline is well within the 10 min update cadence of MASS situational awareness.

8. Conclusions

This section concludes the paper by summarising the main findings and contributions of this work. It also outlines several directions for future research, building on the proposed representation learning framework.

8.1. Summary

This work focuses on the core problem of robust behavioural representation learning from maritime AIS trajectory data. It systematically investigates how to construct representations that can generalise across environments and provide interpretability in maritime scenarios characterised by strong environmental heterogeneity, limited labels, and high safety requirements. To address the limitations of existing methods in environment generalisation, feature structure utilisation, and uncertainty modelling, this paper proposes and validates a complete self-supervised representation learning framework.

First, we develop a comprehensive AIS data processing and feature modelling pipeline that transforms raw, sparse, and noisy vessel broadcast data into structured trajectory representations with multiple semantic layers, including temporal, kinematic, spatial, traffic density, and ship-to-ship interaction information. This pipeline provides a solid foundation for subsequent representation learning and offers a reusable data engineering paradigm for maritime trajectory analysis.

At the methodological level, we propose the GMAE-REx representation learning model. By introducing structured masking at the level of semantic feature groups, the model is explicitly encouraged to learn cross-category relationships between different feature types. In addition, we integrate REx regularisation from the domain generalisation literature into the self-supervised reconstruction objective, effectively reducing overfitting to specific traffic densities or geographical conditions. For comparison, we also systematically evaluate a denoising autoencoder (DAE) and an uncertainty-aware evidential autoencoder (EAE), enabling a unified assessment of different self-supervised strategies for maritime trajectory modelling.

For evaluation, we design a dual-pipeline experimental framework. On the one hand, a frozen-encoder classification task is used to objectively assess the discriminative power of the learned representations on a downstream vessel-type classification task. On the other hand, multiple unsupervised clustering methods are applied to analyse the intrinsic structure of the representation space. Experimental results show that GMAE-REx achieves stable and consistent performance advantages in the vessel classification task, demonstrating its ability to effectively capture vessel behaviour patterns. At the same time, in unsupervised clustering, the learned embeddings significantly outperform expert-engineered features across multiple intrinsic metrics, and the resulting cluster structures reflect operational context and behavioural modes rather than static vessel-type differences.

Furthermore, by combining UMAP visualisation, feature mean analysis, and SHAP-based explainability, this work systematically reveals the physical meaning of different behavioural clusters in the embedding space. The results show that temporal patterns, kinematic states, environmental constraints, and interaction geometry exhibit different levels of importance across clusters, confirming the interpretability of the learned representations and their ability to integrate multiple feature modalities. This behaviour-oriented organisation of trajectories, rather than vessel-type-based separation, supports knowledge transfer and scenario understanding across heterogeneous vessel classes.

From an application perspective, the proposed representation learning and community discovery framework provides a general and scalable foundation for behaviour monitoring in VTS, decision support for MASS, and maritime safety analysis.

8.2. Future Work

Although this work achieves systematic progress in robust maritime behavioural representation learning, several directions remain open for further research.

First, vessel-type classification is mainly used in this work as a supervised probe task to validate the discriminative quality of the learned representations under a debiased setting. However, vessel classification is only one of many possible downstream tasks, and its role here is primarily to provide a standard and reproducible evaluation of representation quality. The robust trajectory representations learned in this study are naturally applicable to a wider range of maritime tasks, such as collision risk detection, abnormal behaviour identification, trajectory prediction, and remaining navigation risk assessment. Future work can systematically evaluate the generalisation and practical value of the proposed representations across these diverse task settings.

Second, although rich ship-to-ship interaction features (e.g., TCPA, DCPA, relative bearing, and collision risk indicators) are explicitly constructed during feature modelling, the current representation learning framework still follows a single-trajectory-centred encoding paradigm. This data characteristic is well aligned with graph-based modelling, where vessels can be naturally represented as nodes and their spatial proximity and interaction dynamics as time-varying edges. Future research may combine the proposed robust self-supervised learning principles with graph neural networks to directly learn interaction-aware trajectory representations from spatio-temporal vessel graphs, enabling a deeper understanding of multi-vessel coordination and collective traffic behaviour.

Third, although the clustering analysis in Experiment II reveals that GMAE-REx embeddings exhibit stronger and more consistent inter-algorithm agreement than expert features (mean NMI

0.60

vs.

0.37

; see Appendix J), the clustering component was deliberately treated as an off-the-shelf exploratory probe whose purpose is to reveal the intrinsic structure of the representation space, not to optimise partition quality in its own right. Future work may investigate deep clustering methods—such as Deep Embedded Clustering [72] or its improved variant IDEC [73]—that co-optimise representation learning and cluster assignment simultaneously, potentially yielding sharper behavioural segmentation for the AIS domain and enabling principled anomaly detection by scoring trajectory segments against learned normal-behaviour cluster structures [13]. Such an investigation would additionally allow a principled comparison against trajectory-distance baselines (e.g., Dynamic Time Warping) on a curated representative subset, which was out of scope here due to the

O (N^{2})

cost of pairwise distance computation at

N = 50,000

segments.

Fourth, the preprocessing pipeline, feature engineering layer, and SSL training procedure are designed to be region-agnostic and use-case-agnostic. Every feature is derived from physical or geometric quantities—kinematic states computed from position sequences, CPA geometry, bathymetric depth, and distance to navigational structures—none of which carry location-specific encodings. The data requirements reduce to three inputs: decoded AIS broadcasts, standard nautical geo-map layers for the study area (bathymetry, land mask, ferry routes, restricted areas) that are globally available from public sources such as EMODnet and OpenSeaMap, and a vessel registry or equivalent web service (e.g., MyShipTracking, Equasis) to complete missing static vessel attributes (ship_type, dimensions) for MMSI identifiers whose static AIS reports are absent or incomplete. The CPA calculator for ship-to-ship interaction features is already integrated into the preprocessing pipeline. Applying the pipeline to a new maritime region therefore reduces to supplying the corresponding AIS archive and raster layers, and re-running the SSL training on the new data. A natural next step is to instantiate the framework on AIS archives from complementary environments—such as open-sea shipping lanes, bulk-cargo ports, or coastal archipelagos with different vessel-type mixes and fairway topologies—and compare the resulting community structures across regions, both to assess the generality of the discovered behavioural patterns and to study how environmental geometry shapes vessel behaviour.

8.3. Limitations

The experimental evaluation relies on a single temporal split (approximately 70% training, 20% validation, and 10% test, chronologically ordered within the 2022–2023 Kiel Fjord dataset). While this split reflects realistic temporal deployment conditions and avoids data leakage, it does not yield multi-seed or cross-validation confidence intervals for the linear probe accuracy values reported in Table 12. Performing repeated self-supervised learning pre-training across multiple seeds or folds is computationally prohibitive at this dataset scale: with 527,225 training segments of

L = 120

timesteps each, a single GMAE-REx pre-training run requires substantial GPU compute, and replicating this across even two seeds would double the already considerable training cost.

Notwithstanding this constraint, the absolute size of the evaluation set provides substantial intrinsic statistical reliability. Our held-out test partition alone comprises approximately 52,722 trajectory segments from 9948 unique vessels spanning two years of recorded maritime traffic. For context, this figure substantially exceeds the complete training corpora reported in comparable maritime AIS trajectory learning studies [39,40,74], suggesting that performance estimates derived from this large held-out set are inherently more stable than those from smaller benchmarks in the field.

The cluster analysis in Experiment II uses standard off-the-shelf algorithms as an exploratory tool to probe the geometric structure of the learnt representation space. The algorithms were selected for diversity of inductive bias (graph modularity, Bayesian mixture, hierarchical first-neighbour, density-based), not jointly optimised for maximal partition quality. Accordingly, the absolute clustering metrics (silhouette coefficient, DBCV, and mutual-information scores) should be interpreted as indicative rather than as upper bounds. The consistent advantage of learnt embeddings over expert features across all methods and metrics provides strong evidence for the quality of the representation, and pairwise inter-algorithm agreement (mean NMI

= 0.60

on embeddings vs.

0.37

on expert features) confirms that the recovered structure is a genuine, algorithm-independent property of the space (Appendix J). The question of whether jointly trained deep-clustering objectives can further sharpen this structure is left to future work.

Author Contributions

Conceptualization, G.A.-F., S.G., Z.H. and B.B.; Methodology, G.A.-F., S.G., Z.H. and B.B.; Software, G.A.-F., S.G., Z.H. and B.B.; Validation, G.A.-F., S.G., Z.H. and B.B.; Formal analysis, G.A.-F., S.G., Z.H. and B.B.; Investigation, G.A.-F., S.G., Z.H. and B.B.; Resources, G.A.-F., S.G., Z.H. and B.B.; Data curation, G.A.-F. and B.B.; Writing—original draft, G.A.-F., S.G., Z.H. and B.B.; Writing—review & editing, G.A.-F., S.G., Z.H., B.B., P.K., B.S. and S.T.; Visualization, G.A.-F., S.G., Z.H. and B.B.; Supervision, G.A.-F. and S.T.; Project administration, G.A.-F. and S.T.; Funding acquisition, S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partly funded by the German Federal Ministry for Digital and Transport within the project “CAPTN Förde Areal II—Praxisnahe Erforschung der (teil)autonomen, emissionsfreien Schifffahrt im digitalen Testfeld” (45DTWV08D).

Data Availability Statement

The raw AIS data from the Danish Maritime Authority (DMA) used in this study are publicly available via the Danish Maritime Authority (https://www.dma.dk/safety-at-sea/navigational-information/ais-data) [45], accessed on 1 April 2025. The source code for both the AIS processing pipeline and the representation learning and clustering pipeline has been made available on GitHub: https://github.com/CAPTN-sh/marlin-ais-process and https://github.com/CAPTN-sh/marlin-repl (accessed on 28 February 2026). The processed feature dataset is not publicly available due to third-party commercial licensing restrictions on upstream data sources.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AIS	Automatic Identification System
AMO	Automatic Maritime Operations
ASV	Autonomous Surface Vessel
CNN	Convolutional Neural Network
COG	Course Over Ground
COLREGs	Convention on the International Regulations for Preventing Collisions at Sea
CPA	Closest Point of Approach
CSV	Comma-Separated Values
DAE	Denoising Autoencoder
DBCV	Density-Based Clustering Validation
DCPA	Distance at Closest Point of Approach
DKSML	Danish Mean Sea Level
DMA	Danish Maritime Authority
EAE	Evidential Autoencoder
ELBO	Evidence Lower Bound
EMSA	European Maritime Safety Agency
FEAST	Federated Evidential Learning for Anomaly Detection of Ship Trajectories
FFN	Feed-Forward Network
FINCH	First Integer Neighbour Clustering Hierarchy
GMAE	Grouped Masked Autoencoder
GMM	Gaussian Mixture Model
GNN	Graph Neural Network
GNSS	Global Navigation Satellite System
HDBSCAN	Hierarchical Density-Based Spatial Clustering of Applications with Noise
IMO	International Maritime Organisation
IRM	Invariant Risk Minimisation
lat	latitude
LiST	Linear Spatial–Temporal Feature Extractor
lon	longitude
MAE	Masked Autoencoder
MASS	Maritime Autonomous Surface Ships
ML	Machine Learning
MMSI	Maritime Mobile Service Identity
MSL	Mean Sea Level
MST	Minimum Spanning Tree
NHN	Normalhöhennull
NIG	Normal-Inverse-Gamma
PCHIP	Piecewise Cubic Hermite Interpolating Polynomial
RAdam	Rectified Adam
RepL	Representation Learning
REx	Risk Extrapolation
ROT	Rate-of-Turn
S2E	Ship-to-Environment
S2S	Ship-to-Ship
SA	Situational Awareness
SHAP	SHapley Additive exPlanations
SOG	Speed Over Ground
SOLAS	International Convention for the Safety of Life at Sea
SSL	Self-Supervised Learning
TCN	Temporal Convolutional Network
TCPA	Time to Closest Point of Approach
TPE	Tree-structured Parzen Estimator
UMAP	Uniform Manifold Approximation and Projection
VAE	Variational Autoencoder
VB	Variational Bayesian Inference
VBGMM	Variational Bayesian Gaussian Mixture Model
VCRO	Vessel Conflict Ranking Operator
VHF	Very High Frequency
VTS	Vessel Traffic Services

Appendix A. Post-Processed Features Set

Table A1. Static vessel information (Ship DB).

Column	Unit	Description
`mmsi`	–	Identifier of the ship
`ship_type`	–	Official ship type
`ship_group`	–	Ship category (`sailing`, `passenger`, `cargo`, `other`)
`to_bow`	m	Distance from GPS antenna to bow
`to_stern`	m	Distance from GPS antenna to stern
`to_port`	m	Distance from GPS antenna to port side
`to_starboard`	m	Distance from GPS antenna to starboard side
`crawled`	–	Flag indicating crawled data (otherwise AIS)

Table A2. Dynamic ship features.

Column	Unit	Description
`timestamp`	ns	Timestamp of the observation
`mmsi`	–	Identifier of the ship
`traj_id`	–	Trajectory identifier
`time_diff`	s	Interpolation time offset
`draught`	m	Ship draught
`geometry`	deg	GeoPandas geometry (EPSG:4326)
`lon`	deg	Longitude
`lat`	deg	Latitude
`heading`	deg	Heading [0–360]
`course`	deg	Course [0–360]
`status`	–	Navigational status
`speed`	m/s	Speed
`acc`	m/s²	Acceleration
`angular_difference`	deg	Change in course per step [0–180]
`dist_to_land`	m	Distance to shore
`dist_to_ferry_route`	m	Distance to nearest ferry route
`dist_to_restricted_area`	m	Distance to nearest restricted area
`water_depth`	m	Water depth
`density_all`	ships/km²/h	Density of all ship groups (log-scaled)
`density_sailing`	ships/km²/h	Density of sailing vessels (log-scaled)
`density_cargo`	ships/km²/h	Density of cargo vessels (log-scaled)
`density_other`	ships/km²/h	Density of other vessels (log-scaled)
`density_passenger`	ships/km²/h	Density of passenger vessels (log-scaled)

Table A3. Pairwise ship-to-ship interaction features.

Column	Unit	Description
`timestamp`	ns	Timestamp of the observation
`mmsi`	–	Identifier of the ego ship
`mmsi_other`	–	Identifier of the target ship
`dist`	m	Distance between ships
`rel_speed`	m/s	Relative speed
`course_of_rel_motion`	deg	Course of relative motion
`course_diff`	deg	Difference in course
`true_bearing`	deg	Bearing from north
`rel_bearing`	deg	Bearing relative to course
`rel_bearing_cat`	–	(`bow`, `stern`, `starboard`, `port`)
`tcpa`	s	Time to closest point of approach
`dcpa`	m	Distance at closest point of approach
`collision_risk`	[0–1]	Collision risk score

Table A4. Complete model input feature set and debiasing for Pipeline 1. All features are present at every observation timestamp. Cyclic features use sin/cos encoding; categorical features use one-hot encoding. Features marked “Excluded“ are removed under the debiasing procedure (Section 5.2, Equation (26)) to prevent ship-group label leakage in Pipeline 1. The Ship DB dimension columns to_bow, to_stern, to_port, and to_starboard are collapsed into length and width during feature engineeringreferences and are therefore covered by their exclusion.

Feature	Group	Encoding	Pipeline 1 Status
Trajectory features
`lon`	Trajectory	–	Retained
`lat`	Trajectory	–	Retained
`x`	Trajectory	–	Retained (projected)
`y`	Trajectory	–	Retained (projected)
`rel_x`	Trajectory	–	Retained (segment-relative)
`rel_y`	Trajectory	–	Retained (segment-relative)
`status`	Trajectory	one-hot	Retained
`speed`	Trajectory	–	Retained
`acc`	Trajectory	–	Retained
`course`	Trajectory	sin/cos	Retained
`angular_difference`	Trajectory	sin/cos	Retained
Static vessel features
`ship_type`	Static	one-hot	Excluded—direct class label
`ship_group`	Static	one-hot	Excluded—direct class label
`length`	Static	–	Excluded—class-discriminative dimension
`width`	Static	–	Excluded—class-discriminative dimension
Map/environment features
`water_depth`	Map	–	Retained
`dist_to_land`	Map	–	Retained
`dist_to_ferry_route`	Map	–	Retained
`dist_to_restricted_area`	Map	–	Retained
`density_all`	Map	–	Retained
`density_own_group`	Map	–	Excluded—label-derived proxy
Ship-to-ship interaction features
`dist`	Ship2ship	–	Retained
`rel_speed`	Ship2ship	–	Retained
`course_diff`	Ship2ship	sin/cos	Retained
`rel_bearing`	Ship2ship	sin/cos	Retained
`rel_bearing_cat`	Ship2ship	one-hot	Retained
`tcpa`	Ship2ship	–	Retained
`dcpa`	Ship2ship	–	Retained
`collision_risk`	Ship2ship	–	Retained
`ship_type_other`	Ship2ship	one-hot	Excluded—encodes class of interaction partner
`ship_group_other`	Ship2ship	one-hot	Retained
Datetime features
`day_of_year`	Datetime	sin/cos	Retained
`day_of_week`	Datetime	sin/cos	Retained
`hour_of_day`	Datetime	sin/cos	Retained
`month_of_year`	Datetime	sin/cos	Retained
Total features		35 full set; 29 retained in Pipeline 1 ( $M = 6$ excluded)

Appendix B. Clustering Methods: Detailed Descriptions and Formulations

This appendix provides detailed mathematical formulations for the four clustering methods used in Experiment II (Section 6.3).

The methods were selected for their complementary strengths in capturing the complex, non-linear structures characteristic of maritime navigation behaviour, including arbitrary-shape clusters, hierarchical patterns, and heterogeneous density distributions.

Appendix B.1. kNN-Leiden: Graph-Based Community Detection

Overview

The kNN-Leiden method combines k-nearest neighbour graph construction with the Leiden community detection algorithm [65].

Leiden improves upon the Louvain algorithm [69] by guaranteeing that all detected communities are internally connected, preventing the formation of fragmented or isolated clusters—a critical property for maritime trajectory analysis where spatial and temporal continuity defines coherent behavioural patterns.

Mathematical Formulation

For a dataset

X = {x_{1}, \dots, x_{n}}

in feature space

R^{d}

, we construct an undirected weighted graph

G = (V, E, W)

where

Vertices V correspond to trajectory segments: $| V | = n$ ,
Edges E connect each vertex to its k nearest neighbours under distance metric $d (\cdot, \cdot)$ ,
Edge weights are computed as:

$w_{i j} = exp (- \frac{d {(x_{i}, x_{j})}^{2}}{2 σ^{2}})$

where $σ$ is a scale parameter (typically set to the median pairwise distance).

The Leiden algorithm iteratively optimises the modularity quality function:

Q = \frac{1}{2 m} \sum_{i j} [w_{i j} - \frac{k_{i} k_{j}}{2 m}] δ (c_{i}, c_{j})

where

$m = \frac{1}{2} \sum_{i j} w_{i j}$ is the total edge weight,
$k_{i} = \sum_{j} w_{i j}$ is the weighted degree of node i,
$δ (c_{i}, c_{j}) = 1$ if nodes i and j are in the same community, 0 otherwise.

The resolution parameter

γ

controls community size: higher values yield more, smaller communities. The final partition

C = {C_{1}, \dots, C_{k}}

assigns each trajectory segment to exactly one community.

Key Hyperparameters

Table A5 summarises the key hyperparameters for kNN-Leiden clustering.

Table A5. kNN-Leiden hyperparameters and their interpretations.

Parameter	Symbol	Description
n_neighbors	k	Number of nearest neighbours for graph construction
resolution	$γ$	Controls granularity of detected communities; higher values yield finer partitions
metric	–	Distance metric: `cosine` for normalised embeddings, `euclidean` for expert features

Appendix B.2. HDBSCAN: Hierarchical Density-Based Spatial Clustering of Applications with Noise

Overview

Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [63] extends the classical DBSCAN algorithm by replacing the fixed density threshold with a hierarchical density model.

This enables detection of clusters with varying densities and provides explicit identification of noise points, essential for maritime anomaly detection.

Mathematical Formulation

For each point

x_{i}

, the core distance is defined as the distance to its

m_{pts}

th nearest neighbour:

{core}_{k} (x_{i}) = d (x_{i}, {NN}_{k} (x_{i}))

The mutual reachability distance between points

x_{i}

and

x_{j}

is:

d_{mreach} (x_{i}, x_{j}) = max {{core}_{k} (x_{i}), {core}_{k} (x_{j}), d (x_{i}, x_{j})}

This distance metric emphasises density by inflating distances in low-density regions.

A Minimum Spanning Tree (MST) is constructed over all points using

d_{mreach}

as edge weights. A cluster hierarchy is built by removing edges from the MST in order of decreasing weight (increasing density threshold

ϵ

).

For each candidate cluster C appearing in the hierarchy, its stability is computed as:

S (C) = \sum_{p \in C} (ϵ_{death}^{C} - ϵ_{birth}^{p})

where

ϵ_{birth}^{p}

is the threshold at which point p joins C and

ϵ_{death}^{C}

is the threshold at which C splits or disappears.

The final flat clustering is obtained by selecting clusters that maximise total stability, using a dynamic programming algorithm to avoid selecting parent and child simultaneously. Points not assigned to any cluster are labelled as noise (cluster label

- 1

).

Key Hyperparameters

Table A6 summarises the key hyperparameters for HDBSCAN clustering.

Table A6. HDBSCAN hyperparameters and their interpretations.

Parameter	Symbol	Description
min_cluster_size	–	Minimum number of points required to form a cluster
min_samples	$m_{pts}$	Number of neighbours for core distance computation
cluster_selection_epsilon	$ϵ$	Optional minimum threshold for cluster separation
cluster_selection_method	–	`eom` (Excess of Mass) or `leaf` (leaf clusters)

Appendix B.3. VBGMM: Variational Bayesian Gaussian Mixture Model

Overview

VBGMM employs variational Bayesian inference for Gaussian Mixture Model (GMM) with automatic component selection via Dirichlet process priors [64]. This probabilistic approach provides soft cluster assignments and principled uncertainty quantification, aligning with the framework’s emphasis on uncertainty-aware representations.

Mathematical Formulation

A Gaussian Mixture Model assumes that observations

X = {x_{1}, \dots, x_{n}}

are generated from a mixture of K Gaussian components:

p (x_{i} | π, μ, Σ) = \sum_{k = 1}^{K} π_{k} N (x_{i} | μ_{k}, Σ_{k})

where

π = {π_{1}, \dots, π_{K}}

are mixing proportions (

\sum_{k} π_{k} = 1

),

μ_{k} \in R^{d}

is the mean of component k, and

Σ_{k} \in R^{d \times d}

is the covariance matrix of component k.

The variational Bayesian approach places priors on all model parameters and approximates the posterior distribution

p (θ, Z | X)

with a factorised variational distribution

q (θ) q (Z)

. Prior distributions are specified as shown in Table A7.

Table A7. Prior distributions for VBGMM parameters.

Parameter	Prior Distribution	Hyperparameter
Mixing proportions	$π \sim Dir (α_{0} / K, \dots, α_{0} / K)$	$α_{0}$ : concentration parameter
Means	$μ_{k} \sim N (m_{0}, {(β_{0} Σ_{k})}^{- 1})$	$β_{0}$ : mean precision
Covariances	$Σ_{k}^{- 1} \sim W (W_{0}, ν_{0})$	$W_{0}, ν_{0}$ : Wishart parameters

The Evidence Lower Bound (ELBO) is maximised iteratively:

\begin{matrix} L & = E_{q} [log p (X, Z | θ)] - E_{q} [log q (Z)] \\ + E_{q} [log p (θ)] - E_{q} [log q (θ)] \end{matrix}

The algorithm alternates between updating the variational posterior on assignments

q (Z)

(computing responsibilities

r_{n k} = q (z_{n} = k)

) and updating the variational posterior on parameters

q (θ)

(computing updated Dirichlet and Gaussian–Wishart parameters).

The Dirichlet process prior with concentration

α_{0} ≪ 1

encourages sparsity in mixing proportions. Components with negligible posterior weight (

π_{k} \approx 0

) are effectively pruned, enabling automatic determination of the effective number of clusters.

The final hard clustering assigns each point to the component with highest responsibility:

c_{i} = arg max_{k} r_{i k}

Key Hyperparameters

Table A8 summarises the key hyperparameters for VBGMM clustering.

Table A8. VBGMM hyperparameters and their interpretations.

Parameter	Symbol	Description
n_components	K	Maximum number of mixture components
weight_concentration_prior	$α_{0}$	Dirichlet concentration; lower values favour fewer components
mean_precision_prior	$β_{0}$	Prior precision on component means
covariance_type	–	Covariance structure: `full`, `tied`, `diag`, `spherical`
weight_concentration_prior_type	–	`dirichlet_process` (sparse) or `dirichlet_distribution` (uniform)

Appendix B.4. FINCH: First Integer Neighbour Clustering Hierarchy

Overview

FINCH [66] is a parameter-free hierarchical clustering algorithm that constructs a cluster hierarchy by iteratively merging clusters based on first-neighbour relations. The algorithm is computationally efficient and requires minimal tuning, making it suitable for exploratory multi-scale analysis of large trajectory datasets.

Mathematical Formulation

FINCH constructs a nested hierarchy of partitions through an iterative procedure. Each point is initially assigned to its own cluster:

C^{(0)} = {{x_{1}}, {x_{2}}, \dots, {x_{n}}}

.

At iteration t, for each cluster

C_{i}^{(t)}

, its first neighbour

C_{j}^{(t)}

is defined as the cluster with the smallest minimum pairwise distance:

C_{j}^{(t)} = arg min_{C_{k} \neq C_{i}} min_{x \in C_{i}, y \in C_{k}} d (x, y)

An undirected graph

G^{(t)}

is constructed where vertices are clusters and edges connect each cluster to its first neighbour. Connected components from

G^{(t)}

form the next-level partition

C^{(t + 1)}

.

The process terminates when

| C^{(t + 1)} | = | C^{(t)} |

(no further merging possible), producing a sequence of nested partitions

C^{(0)} \supset C^{(1)} \supset \dots \supset C^{(T)}

where T is the number of hierarchy levels.

Since FINCH generates multiple hierarchy levels, the partition that maximises a validation metric (silhouette score in Experiment II) is selected:

C^{*} = arg max_{C^{(t)}} Silhouette (C^{(t)})

The computational complexity is

O (n log n)

per iteration, with typically logarithmic number of iterations

T = O (log n)

.

Key Hyperparameters

Table A9 summarises the key hyperparameters for FINCH clustering.

Table A9. FINCH hyperparameters and their interpretations.

Parameter	Symbol	Description
n_neighbors	k	Number of nearest neighbours for initial graph construction (optional)
metric	–	Distance metric for pairwise distance computation
max_levels	T	Maximum number of hierarchy levels; typically set to `null` for automatic termination

Appendix B.5. Method Comparison and Selection Rationale

Table A10 summarises the key characteristics, computational complexity, and suitability of each method for maritime trajectory clustering.

Table A10. Comparative summary of clustering methods used in Experiment II.

Method	Paradigm	Cluster Shapes	Complexity	Key Advantage for AIS Trajectories
kNN-Leiden	Graph-based	Arbitrary (network communities)	$O (n log n)$	Captures route-network structures with heterogeneous density; connectivity guarantee prevents fragmentation
HDBSCAN	Density-based	Arbitrary (density-connected)	$O (n log n)$	Explicit noise detection for anomalies; multi-scale hierarchy captures local manoeuvers and global patterns
VBGMM	Model-based	Ellipsoidal (Gaussian)	$O (K n d^{2})$ per iteration	Probabilistic assignments support uncertainty quantification; automatic component selection via Dirichlet process
FINCH	Hierarchical	Arbitrary (first-neighbor)	$O (n log n)$ per level	Parameter-free exploratory clustering; fast and scalable for large datasets

Rationale for Method Selection

The four methods provide complementary perspectives on trajectory organisation:

kNN-Leiden reveals community structures in the trajectory graph, capturing vessel movements organised into route networks and transit corridors
HDBSCAN identifies density-separated behavioural modes while explicitly flagging anomalous trajectories as noise, supporting safety-critical applications
VBGMM provides probabilistic cluster assignments with uncertainty estimates, enabling soft boundaries between overlapping behaviours (e.g., vessels transitioning between operational modes)
FINCH offers a parameter-free baseline for exploratory analysis, revealing multi-scale behavioural hierarchies without prior assumptions about cluster count or density

This multi-method evaluation ensures that conclusions about representation quality (Contribution 6) are robust across different clustering paradigms and not artifacts of a single methodological choice.

Appendix C. Clustering Validation Metrics

Table A11 summarises the mathematical properties, value ranges, and interpretation guidelines for all clustering validation metrics used in Experiment II (Section 6.3). Metrics are categorised by their underlying assumptions and suitability for different clustering paradigms.

Table A11. Properties and formulations of clustering validation metrics used in Experiment II. Metrics are categorised by their suitability for density-based, graph-based, or centroid-based clustering methods.

Metric	Description & Formulation	Value Range	Interpretation	Best Suited for
Intrinsic Density/Graph-based Metrics (Primary)
DBCV	Density-Based Clustering Validation [67]. Measures density connectivity within clusters and density separation between clusters using mutual reachability distance. $DBCV = \frac{1}{\| C \|} \sum_{i = 1}^{\| C \|} \frac{D S C (C_{i}) - D S P C (C_{i})}{max {D S C (C_{i}), D S P C (C_{i})}}$	$[- 1, 1]$	Higher is better. >0: dense, well-separated clusters.	HDBSCAN, density-based methods. Gold standard for arbitrary-shape clusters.
Conductance	Graph edge-cut quality [65]. Measures fraction of edges leaving a community relative to community volume. $ϕ (C) = \frac{\| \partial C \|}{min {vol (C), vol (\bar{C})}}$	$[0, 1]$	Lower is better. <0.1: excellent separation.	kNN-Leiden, graph-based community detection.
Modularity	Community structure strength [69]. Quantifies difference between actual and expected edge density within communities. $Q = \frac{1}{2 m} \sum_{i j} [A_{i j} - \frac{k_{i} k_{j}}{2 m}] δ (c_{i}, c_{j})$	$[- 0.5, 1]$	Higher is better. >0.3: significant structure.	kNN-Leiden, graph partitioning. Primary optimisation target for Leiden algorithm.
ELBO	Evidence Lower Bound for VBGMM [64]. Variational lower bound on log-likelihood. $L = E_{q} [\log p (X, Z \| θ)] - E_{q} [\log q (Z)]$	$(- \infty, 0]$	Higher (less negative) is better. Balances fit and complexity.	VBGMM. Native probabilistic model selection criterion.
Traditional Metrics (For Completeness, Appendix D)
Silhouette	Centroid-based cohesion and separation [75]. $s (i) = \frac{b (i) - a (i)}{max {a (i), b (i)}}$ where $a (i)$ is mean intra-cluster distance, $b (i)$ is mean nearest-cluster distance.	$[- 1, 1]$	Higher is better. Assumes convex clusters.	K-means, centroid-based methods. Less suitable for arbitrary shapes.
Calinski–Harabasz	Variance ratio criterion [76]. Ratio of between-cluster to within-cluster variance. $C H = \frac{S S_{B} / (k - 1)}{S S_{W} / (n - k)}$	$[0, \infty)$	Higher is better. Assumes isotropic Gaussian clusters.	K-means, GMM. Biased towards spherical clusters.
Davies–Bouldin	Average similarity between each cluster and its most similar cluster [77]. $D B = \frac{1}{k} \sum_{i = 1}^{k} {max}_{j \neq i} \frac{σ_{i} + σ_{j}}{d (c_{i}, c_{j})}$	$[0, \infty)$	Lower is better. Assumes spherical, similar-size clusters.	K-means. Not suitable for varying density or arbitrary shapes.

Method-Metric Alignment

The choice of primary metrics in Experiment II is aligned with the clustering methods’ optimisation objectives and underlying assumptions. DBCV is the native validation criterion for HDBSCAN and provides a density-aware alternative for evaluating all methods. Conductance and modularity directly measure the quality functions optimised by kNN-Leiden, making them the most faithful diagnostics for graph-based partitions. ELBO is the variational lower bound maximised during VBGMM inference, providing a principled model selection criterion. Traditional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) assume convex, isotropic clusters and are less informative for maritime trajectory data, which exhibit elongated corridors, branching patterns, and heterogeneous density; these metrics are reported for completeness but not used for primary comparison.

Appendix D. Clustering Quantitative Assessment

Table A12 presents the complete set of intrinsic and traditional clustering validation metrics for all four clustering methods applied to learnt embeddings and expert features. Primary metrics (DBCV, conductance, modularity) align with the optimisation objectives of each method and are discussed in detail in Section 7.2. Traditional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) are included for completeness but exhibit restrictive geometric assumptions (convexity, isotropy) that limit their validity for maritime trajectory data, as explained in Appendix C. Bold values with arrows indicate superior performance (↑ higher is better; ↓ lower is better).

Table A12. Complete clustering quality metrics for all methods on 50,000 trajectory segments. Primary metrics (DBCV, conductance, modularity) are optimised by the respective clustering algorithms. Traditional metrics (silhouette, Calinski–Harabasz, Davies–Bouldin) assume convex clusters and are less informative for arbitrary-shape maritime patterns. Bold values indicate better performance when comparing embeddings vs. expert features. Metric properties and interpretation guidelines are detailed in Appendix C.

Method	Repr.	Primary Metrics				Traditional Metrics
Method	Repr.	DBCV ↑	Cond. ↓	Modul. ↑	#Clust.	Silh. ↑	CH ↑	DB ↓
kNN-Leiden	Embed.	$- 0.533$ ↑	0.186 ↓	0.906 ↑	47	0.040 ↑	745	2.633
kNN-Leiden	Expert	$- 0.791$	0.327	0.875	48	$- 0.055$	928 ↑	2.301 ↓
FINCH	Embed.	$- 0.588$ ↑	0.205 ↓	0.756 ↑	27	0.010	941	2.695 ↓
FINCH	Expert	$- 0.907$	0.206	0.671	8	0.024 ↑	2374 ↑	2.986
HDBSCAN	Embed.	0.112 ↑	0.479	–	3	0.008 ↑	710	3.116
HDBSCAN	Expert	0.042	0.311 ↓	–	2	$- 0.078$	1165 ↑	2.766 ↓
VBGMM	Embed.	$- 0.594$ ↑	0.193 ↓	0.756 ↑	28	0.058 ↑	1251 ↑	2.386 ↓
VBGMM	Expert	$- 0.742$	0.498	0.419	28	$- 0.044$	749	3.832

Metric Interpretation

DBCV (Density-Based Cluster Validation): Embeddings achieve higher (less negative) values for all methods, indicating improved density connectivity within clusters and density separation between clusters. For HDBSCAN, embeddings reach positive DBCV (0.112), signaling dense, well-separated clusters.
Conductance: Lower values indicate cleaner graph cuts. Embeddings outperform expert features for kNN-Leiden (0.186 vs. 0.327), FINCH (0.205 vs. 0.206), and VBGMM (0.193 vs. 0.498). For HDBSCAN, higher conductance (0.479) is expected as the method optimises density stability, not graph boundaries.
Modularity: Higher values signal stronger community structure. Embeddings achieve superior modularity for kNN-Leiden (0.906 vs. 0.875), FINCH (0.756 vs. 0.671), and VBGMM (0.756 vs. 0.419). HDBSCAN does not compute modularity (density-based, not graph-based).
Cluster Count: Embeddings yield finer-grained partitions for FINCH (27 vs. 8 clusters) and HDBSCAN (3 vs. 2), revealing multi-scale behavioural structure. kNN-Leiden and VBGMM produce comparable granularity (47 vs. 48; 28 vs. 28).
Traditional Metrics: Silhouette, Calinski–Harabasz, and Davies–Bouldin exhibit mixed patterns. These metrics assume convex, isotropic clusters and are less informative for maritime trajectories, which exhibit elongated corridors and heterogeneous density (see Appendix C). For instance, expert features achieve higher Calinski–Harabasz for kNN-Leiden (928 vs. 745) and FINCH (2374 vs. 941), but this does not contradict the superior DBCV/conductance/modularity of embeddings—it reflects the metrics’ differing geometric assumptions.

Cross-Reference to Main Text

The primary metrics (DBCV, conductance, modularity) are reported in Table 17 (Section 6.3) and analysed in Section 7.2.

Appendix E. Cluster Size Distributions

Figure A1 compares cluster size distributions for learnt embeddings and expert features across all four clustering methods. Embeddings consistently yield more balanced partitions, whilst expert features produce skewed distributions dominated by a small number of large clusters. For HDBSCAN, embeddings reduce noise assignments by 48% (26,000 vs. 44,000 segments), indicating improved density structure. For FINCH, expert features collapse into 8 coarse-grained clusters with a dominant cluster containing 26% of all segments, whereas embeddings reveal 27 finer-grained behavioural modes with more uniform size distribution. The quantitative metrics corresponding to these partitions are detailed in Appendix D.

Figure A1. Cluster size distributions for embeddings (left column) vs. expert features (right column). Embeddings produce more balanced partitions: kNN-Leiden (a) yields 47 communities with gradual decay, whereas expert (b) exhibits power-law dominance. FINCH (c) reveals 27 multi-scale clusters, whilst expert (d) collapses into 8 coarse modes with one giant cluster (26% of data). HDBSCAN (e) rejects 52% as noise (grey) vs. 88% for expert (f), indicating embeddings form denser, more stable clusters. VBGMM (g) maintains balanced components, whereas expert (h) concentrates 18% of data in a single dominant component.

Interpretation

The balanced cluster size distributions observed for embeddings indicate that GMAE-REx representations support fine-grained behavioural differentiation without collapsing semantically distinct navigation patterns into a few dominant modes. In contrast, expert features exhibit concentration effects where large clusters capture generic transit behaviours, whilst smaller clusters represent edge cases or outliers. This pattern is consistent with the hypothesis that expert-crafted features emphasise kinematic similarities within vessel classes (leading to large homogeneous clusters), whereas learnt embeddings encode context-dependent operational modes that transcend vessel-type boundaries (enabling more granular and balanced partitioning). The substantial reduction in HDBSCAN noise assignments (26,000 vs. 44,000 segments) further demonstrates that embeddings exhibit denser, more coherent manifold structure in high-dimensional space, facilitating density-based community discovery.

Appendix F. UMAP Cluster Projections

Figure A2, Figure A3, Figure A4 and Figure A5 visualise clustering results in two-dimensional UMAP projections for all four methods applied to learnt embeddings and expert features. Each method shows two colourings: cluster assignments (left panels) and ship type (right panels). Embeddings consistently produce well-separated, compact clusters with mixed vessel types within each cluster, indicating organisation by operational behaviour rather than vessel identity. Expert features exhibit substantial overlap between clusters and stronger ship-type segregation, reflecting kinematic similarities within vessel classes. The visual structure corroborates the quantitative metrics in Appendix D and the cluster size distributions in Appendix E.

Figure A2. UMAP projections for kNN-Leiden clustering. Embeddings (a) yield 47 well-separated communities (left panel) with mixed ship types (right panel), whilst expert features (b) produce 48 communities with visible cluster overlap and stronger ship-type segregation. The right panels demonstrate that embedding-based clusters transcend vessel categories, capturing operational modes shared across cargo, passenger, sailing, and other vessel types.

Figure A3. UMAP projections for FINCH hierarchical clustering. Embeddings (a) reveal 27 distinct spatial regions (left panel) with balanced ship-type mixing (right panel), whereas expert features (b) collapse into 8 large overlapping clusters dominated by a giant central component. The finer granularity of embedding-based partitions reflects multi-scale behavioural structure not captured by expert kinematic features.

Figure A4. UMAP projections for HDBSCAN density-based clustering. Embeddings (a) identify 3 dense clusters (green, dark red, purple) with 52% noise rejection (grey), whereas expert features (b) produce only 2 small clusters with 88% noise rejection. The substantial reduction in noise for embeddings indicates improved density structure, consistent with the higher DBCV score (0.112 vs. 0.042) in Table A12.

Figure A5. UMAP projections for VBGMM probabilistic clustering. Embeddings (a) form 28 compact, well-separated components (left panel) with mixed ship types (right panel), whilst expert features (b) exhibit a dominant central cluster with substantial overlap between components and stronger ship-type segregation. This contrast is discussed in detail in Section 7.2 and illustrated in Figure 8 (main text).

Interpretation

The UMAP projections provide visual evidence that embeddings encode context-dependent operational modes rather than vessel-identity-driven kinematic profiles. Across all methods, embedding-based clusters exhibit: (i) spatial compactness with clear inter-cluster separation, (ii) mixed ship-type composition within individual clusters, and (iii) distinct regional organisation in the two-dimensional projection, suggesting that the latent space captures behavioural nuances beyond simple speed/course patterns. In contrast, expert features consistently show (i) overlapping cluster boundaries, (ii) stronger alignment between clusters and vessel types (e.g., cargo-dominated regions vs. passenger-dominated regions), and (iii) diffuse central concentrations, indicating that hand-crafted nautical features emphasise within-class kinematic similarities rather than cross-class behavioural patterns. This visual structure aligns with the superior DBCV, conductance, and modularity scores for embeddings reported in Table A12, and supports the hypothesis that GMAE-REx representations facilitate behaviour-centric clustering suitable for MASS applications.

Appendix G. Global SHAP Feature Importance

Figure A6, Figure A7, Figure A8 and Figure A9 present global SHAP feature importance aggregated across all clusters for each of the four clustering methods applied to GMAE-REx embeddings. SHAP values quantify the marginal contribution of each input feature to cluster assignments, providing model-agnostic explainability grounded in cooperative game theory. Temporal encodings (day-of-year, day-of-week, hour-of-day) dominate the rankings for kNN-Leiden, FINCH, and VBGMM, indicating that seasonal and diurnal patterns are primary drivers of behavioural differentiation in the Kiel Fjord dataset. HDBSCAN exhibits a distinct pattern, prioritising ship-to-ship interaction features (relative bearing, distance to land) and density features over temporal encodings, reflecting the method’s focus on local density structure rather than global community organisation. Kinematic features (speed, acceleration, course) and environmental features (water depth, distance to land, traffic density) occupy mid-range positions across all methods, whilst static vessel attributes (ship type, dimensions) contribute at moderate to low levels. Detailed per-cluster SHAP analyses are provided in Appendix H, and the VBGMM results are discussed in detail in Section 7.4.

Figure A6. kNN-Leiden: Temporal features dominate (top 4 positions).

Figure A7. FINCH: Day-of-year encodings rank highest.

Figure A8. HDBSCAN: Interaction features (rel_bearing_1_cos, dist_to_land) dominate.

Figure A9. VBGMM: Temporal encodings occupy top 4 positions (discussed in Section 7.4).

Cross-Method Comparison

The divergence between HDBSCAN and the other three methods reveals fundamental differences in clustering paradigms: Graph-based and mixture-based methods (kNN-Leiden, FINCH, VBGMM) partition the embedding space based on global community structure and probabilistic density, leading to temporal-operational clusters that capture when and how vessels navigate (e.g., summer ferry traffic vs. winter commercial operations). Density-based methods (HDBSCAN) identify local density-connected regions, emphasising ship-to-ship interactions and spatial context (relative bearing, distance to land, traffic density) over seasonal patterns. This distinction aligns with the methods’ optimisation objectives: kNN-Leiden maximises modularity (global graph structure), VBGMM maximises ELBO (probabilistic fit), FINCH merges nearest neighbours (hierarchical structure), whilst HDBSCAN maximises cluster stability under varying density thresholds (local persistence).

Feature Category Breakdown

Across all methods, the importance hierarchy follows a consistent pattern: (1) Temporal features rank highest for 3/4 methods (day_of_year_cos/sin: 0.0046–0.0028 for VBGMM, day_of_week_cos/sin: 0.0034–0.0013), indicating that behavioural modes are strongly time-dependent. (2) Kinematic features occupy mid-range positions (speed: 0.0019–0.0014, course_cos/sin: 0.0019–0.0014, acc: 0.0018–0.0014), suggesting that while speed/heading inform cluster structure, they are not primary differentiators. (3) Environmental features (water_depth: 0.0017–0.0012, dist_to_land: 0.0016–0.0010, density_all: 0.0017–0.0012) and (4) interaction features (rel_bearing_0_cos/sin, dcpa_0, tcpa_0) contribute at moderate levels, except for HDBSCAN where interaction features dominate (rel_bearing_1_cos: 0.0226, 5× higher than temporal features). (5) Static vessel attributes (ship_type, ship_group_cargo, width, length) rank in the lower quartile for all methods, confirming that clusters organise by operational behaviour rather than vessel identity.

Implications for Representation Learning

The global SHAP rankings demonstrate that GMAE-REx embeddings preserve multi-faceted behavioural signals from the input feature space, enabling different clustering methods to discover complementary structures: temporal-operational modes (graph/mixture methods) or spatial-interactional patterns (density methods). This flexibility supports diverse maritime applications, from seasonal traffic analysis (VTS, port planning) to collision risk assessment (real-time MASS navigation).

Appendix H. Per-Cluster SHAP Feature Importance

Figure A10, Figure A11, Figure A12, Figure A13, Figure A14, Figure A15 and Figure A16 present per-cluster SHAP feature importance distributions for all four clustering methods applied to GMAE-REx embeddings. Each subplot displays the top 15 features driving cluster assignments for a single cluster, revealing cluster-specific behavioural signatures. Whilst global SHAP rankings (Appendix G) aggregate importance across all trajectories, per-cluster analysis exposes intra-method heterogeneity. Radar plots comparing cluster centroids on top features are provided in Appendix I.

kNN-Leiden Observations (47 clusters)

Temporal features (day_of_year_cos/sin) dominate top 2 positions across 45/47 clusters, with importance values 0.026–0.030. Cluster-specific variations emerge in kinematic features: Clusters 0–5 prioritise acceleration and course encodings (ranks 3–5), whilst Clusters 20–25 elevate spatial features (dist_to_ferry_route, water_depth). Sample sizes range from

n = 2

(Cluster 1, micro-community) to

n = 494

(Cluster 0, dominant hub), reflecting hierarchical graph structure.

FINCH Observations (27 clusters)

Day-of-year encodings consistently occupy top 2 positions across all 27 clusters, with importance 0.028–0.040 (highest among all methods). Cluster 5 (

n = 190

) exhibits unique behaviour: rel_bearing_0_cos elevates to rank 3 (importance: 0.0236), 2× higher than other clusters, indicating specialised ship-to-ship interaction patterns. The hierarchical merging strategy produces clusters with strong temporal homogeneity but varying kinematic profiles.

HDBSCAN Observations (3 clusters)

Cluster 0 (

n = 2397

, 80% of non-noise data) prioritises interaction features: rel_bearing_1_cos (0.0332), course_sin (0.0309), dist_to_land (0.0289), reflecting dense maritime traffic navigation patterns. Cluster 1 (

n = 2

, 0.07%) is an outlier micro-cluster dominated by collision-risk features: course_diff_1_cos/sin and dcpa_1 (0.0225), likely representing near-miss scenarios. Cluster 2 (

n = 600

) emphasises spatial constraints (dist_to_restricted_area: 0.0220, rank 2), suggesting restricted-zone navigation awareness. Temporal features rank below position 10 for all clusters.

Figure A10. kNN-Leiden per-cluster SHAP importance (Clusters 0–23).

Figure A11. kNN-Leiden per-cluster SHAP importance (Clusters 24–45).

Figure A12. FINCH per-cluster SHAP importance (Clusters 0–15).

Figure A13. FINCH per-cluster SHAP importance (Clusters 16–26).

Figure A14. HDBSCAN per-cluster SHAP importance (3 clusters).

VBGMM Observations (28 clusters)

Temporal encodings (day_of_year, day_of_week, hour_of_day) occupy 4–6 of the top 10 positions across all clusters. Cluster 4 (

n = 218

) exhibits the strongest seasonal dependence: day_of_year_cos (0.0353) and day_of_year_sin (0.0320), both 15% higher than the global mean. Vessel attributes (length, width) consistently rank in the top 5–8 for 25/28 clusters, with importance 0.015–0.021, indicating that Gaussian mixture components partially align with vessel size classes. This contrasts with kNN-Leiden and FINCH, where vessel attributes rank below position 15 in most clusters.

Figure A15. VBGMM per-cluster SHAP importance (Clusters 0–15).

Figure A16. VBGMM per-cluster SHAP importance (Clusters 16–27).

Cross-Method Summary

Three clustering paradigms emerge: (1) Temporal stratification (kNN-Leiden, FINCH, VBGMM): Day-of-year encodings dominate most clusters (importance 1.5–2.5× kinematic features), organising trajectories by seasonal context before motion characteristics. (2) Density-driven specialisation (HDBSCAN): Interaction features (relative bearing, DCPA, course differences) rank 2–5× higher than temporal features, capturing dense traffic patterns (Cluster 0) and collision-risk scenarios (Cluster 1). (3) Vessel-attribute modulation (VBGMM): Length and width appear in top 10 for 89% of clusters, versus 20% for kNN-Leiden, reflecting probabilistic soft assignment that models vessel-size-dependent behaviour.

Appendix I. Cluster Centroid Feature Profiles

Figure A17, Figure A18, Figure A19 and Figure A20 present radar plots of cluster centroids in the original feature space, visualising normalised mean values for seven key maritime operational features: speed, dist_to_land, water_depth, density_all, dist_to_restricted_area, length, and width. For each cluster, feature values are aggregated as means across all member trajectories, then normalised to [0,1] relative to the global minimum and maximum observed across all clusters within the method. The resulting radar polygons encode geometric behavioural signatures: large filled areas indicate high values across multiple dimensions, whilst asymmetric shapes reveal specialised operational contexts. For kNN-Leiden, FINCH, and VBGMM, only the first 12 clusters (by index) are displayed due to space constraints, whilst HDBSCAN shows all three clusters. These profiles complement the SHAP importance analysis in Appendix H by translating cluster centroids into visual interpretability.

FINCH Observations

Cluster 7 (

n = 40

) exhibits extreme normalised values across water depth (≈0.95), distance to land (≈0.90), and vessel dimensions (width/length

> 0.85

), forming a maximal radar polygon that indicates deep-sea operations by large vessels. Cluster 0 (

n = 119

) shows minimal coverage (<0.3) across all dimensions, representing traffic density. Clusters 2 (

n = 600

), 3 (

n = 134

), and 6 (

n = 383

) form a mid-range group with moderate normalised values (0.4–0.6), likely corresponding to regional ferry routes or coastal fishing activity. Cluster 5 (

n = 36

) demonstrates asymmetric geometry: high dist_to_restricted_area (0.85) but low density_all (<0.2), indicating isolated navigation in unrestricted open waters away from congested zones.

Figure A17. FINCH cluster centroid radar profiles (Clusters 0–11 of 27 total).

VBGMM Observations

Vessel size stratification dominates the radar geometry. Clusters 0 (

n = 314

), 1 (

n = 254

), and 2 (

n = 247

) share similar spatial profiles (water depth and distance to land: 0.5–0.7) but exhibit progressive increases in vessel dimensions: length and width rise from 0.3 (Cluster 0) to 0.5 (Cluster 1) to 0.6 (Cluster 2), suggesting that Gaussian mixture components align with ship-size categories (small craft→medium vessels→large ships). Cluster 3 (

n = 272

) shows low traffic density (0.15) combined with high dist_to_restricted_area (0.7), indicating sparse navigation in permissive maritime zones. Cluster 11 (

n = 152

) demonstrates balanced normalised values (0.4–0.6) across all features, forming a near-circular radar profile that represents general-purpose maritime operations without behavioural specialisation. Unlike FINCH’s sharp extremes, VBGMM produces smoother, overlapping profiles, reflecting probabilistic soft cluster assignments that model transitional vessel behaviours.

Figure A18. VBGMM cluster centroid radar profiles (Clusters 0–11 of 28 total).

HDBSCAN Observations

Cluster 0 (

n = 2397

, 80% of non-noise data) exhibits maximal normalised coverage (0.6–0.9) across all features, forming a large, filled radar polygon that aggregates mainstream maritime traffic. The broad coverage reflects high internal variance: this density-dominant hub encompasses diverse vessel types, operational contexts, and spatial distributions. Cluster 1 (

n = 2

, outlier micro-cluster) shows extreme water depth (1.0) and distance to land (0.95) with minimal traffic density (<0.2), consistent with isolated deep-sea trajectories potentially representing anomalies, measurement errors, or rare offshore operations. Cluster 2 (

n = 10

) presents mid-range speed (0.5) combined with low spatial and vessel-size values (<0.3), likely representing slow-moving small craft in shallow coastal waters. The density-based paradigm prioritises cluster compactness over granularity, producing fewer, larger aggregations compared to graph-based (kNN-Leiden: 47 clusters) or mixture-based (VBGMM: 28 clusters) methods.

Figure A19. HDBSCAN cluster centroid radar profiles (all 3 clusters).

kNN-Leiden Observations

Fine-grained graph partitioning produces highly specialised radar signatures. Cluster 2 (

n = 600

, largest among displayed clusters) demonstrates extreme normalised values for water depth (0.95) and vessel size (width/length: 0.85), indicating deep-water corridors used by large commercial vessels (cargo ships, tankers). Clusters 0 (

n = 119

) and 8 (

n = 196

) show compact, low-magnitude radar polygons (<0.4), representing nearshore operations by small craft with limited spatial range. Cluster 10 (

n = 162

) exhibits asymmetric geometry: high distance to land (0.8) combined with low water depth (0.3), suggesting navigation along shallow offshore routes (e.g., island-hopping trajectories or archipelago transits). Cluster 7 (

n = 40

) mirrors FINCH Cluster 7, both displaying maximal depth and distance profiles, confirming consistent identification of deep-sea operational modes across hierarchical (FINCH) and graph-based (kNN-Leiden) paradigms. The modularity optimisation strategy partitions the embedding space into 47 micro-behaviours, capturing operational nuances not resolved by coarser methods.

Cross-Method Comparative Insights

Three interpretability paradigms emerge from radar geometry: (1) Extreme-behaviour isolation (FINCH, kNN-Leiden): Sharp, non-overlapping radar profiles with maximal or minimal normalised values enable detection of outlier operational modes (deep-sea routes, micro-density clusters, nearshore anomalies). (2) Size-stratified soft partitioning (VBGMM): Smooth, graduated profiles along vessel-dimension axes (length, width) indicate latent Gaussian components aligned with ship-size categories, whilst overlapping spatial features (depth, distance) reflect mixed-fleet operations in shared maritime zones. (3) Density-aggregated summaries (HDBSCAN): Large, filled radar areas with high internal variance prioritise cluster compactness over behavioural granularity, suitable for high-level traffic pattern recognition but limited in discriminating fine operational contexts. Radar visualisation complements SHAP feature importance rankings (Appendix G and Appendix H) by providing geometric interpretability of cluster centroids: whilst SHAP quantifies which features drive cluster assignments, radar plots reveal what values those features take within each cluster, enabling domain experts to validate clustering outputs against known maritime operational categories.

Figure A20. kNN-Leiden cluster centroid radar profiles (Clusters 0–11 of 47 total).

Appendix J. Pairwise Cluster-Assignment Agreement

To assess whether the discovered behavioural groupings are a robust property of the representation space rather than an artefact of a particular algorithm, we compute three standard external cluster-agreement metrics between every pair of the four clustering algorithms, applied to the same fixed pool of 50,000 trajectory segments:

Adjusted Rand Index (ARI) [78]: measures pair-wise label co-assignment, corrected for chance; range $[- 1, 1]$ , higher is better, 0 = random.
Normalised Mutual Information (NMI) [79]: measures shared information between partitions normalised by partition entropy; range $[0, 1]$ , higher is better.
Fowlkes–Mallows Index (FMI) [80]: geometric mean of pair-wise precision and recall; range $[0, 1]$ , higher is better.

All three metrics operate on pair-wise co-membership and are therefore insensitive to cluster relabelling permutations. HDBSCAN noise treatment: HDBSCAN assigns the special label

- 1

to noise and border points (51.8% of all points for embeddings; 86.7% for expert features) and invalidates any pairwise comparison. For all pairs that involve HDBSCAN, we therefore exclude noise points and compute metrics only on the non-noise subset; the corresponding labels of the other algorithm on those same rows are used. Pairs between covering algorithms (kNN-Leiden, VBGMM, FINCH) are evaluated on all 50,000 points.

Table A13 and Table A14 report all six unique off-diagonal pairs for learnt embeddings and expert features, respectively.

Table A13. Pairwise cluster-assignment agreement (ARI/NMI/FMI) for learnt embeddings (GMAE-REx encoder, 128-d), computed on the fixed 50,000-segment evaluation pool. Algorithm outputs: kNN-Leiden 47 clusters, VBGMM 28 clusters, FINCH 27 clusters, HDBSCAN 4 clusters (51.8% noise). HDBSCAN pairs are computed on the

n = 24,081

non-noise points only; see text for noise-handling details. Covering pairs use all 50,000 points.

Table A13. Pairwise cluster-assignment agreement (ARI/NMI/FMI) for learnt embeddings (GMAE-REx encoder, 128-d), computed on the fixed 50,000-segment evaluation pool. Algorithm outputs: kNN-Leiden 47 clusters, VBGMM 28 clusters, FINCH 27 clusters, HDBSCAN 4 clusters (51.8% noise). HDBSCAN pairs are computed on the

n = 24,081

non-noise points only; see text for noise-handling details. Covering pairs use all 50,000 points.

Pair	ARI	NMI	FMI	n
Covering pairs—all 50,000 points assigned to a named cluster
kNN-Leiden/VBGMM	$0.297$	$0.595$	$0.332$	$50,000$
kNN-Leiden/FINCH	$0.329$	$0.638$	$0.382$	$50,000$
VBGMM/FINCH	$0.280$	$0.573$	$0.325$	$50,000$
Mean (covering pairs ^†)	0.302	0.602	0.346
HDBSCAN pairs—restricted to $n = 24,081$ non-noise HDBSCAN points
HDBSCAN/kNN-Leiden	$0.001$	$0.014$	$0.182$	$24,081$
HDBSCAN/VBGMM	$0.001$	$0.009$	$0.221$	$24,081$
HDBSCAN/FINCH	$- 0.002$	$0.005$	$0.253$	$24,081$

^† Covering algorithms (kNN-Leiden, VBGMM, FINCH) assign every segment to a named cluster. HDBSCAN is excluded from the covering mean.

Table A14. Pairwise cluster-assignment agreement (ARI/NMI/FMI) for expert features (74-dimensional hand-crafted feature vector), computed on the same 50,000-segment pool. Algorithm outputs: kNN-Leiden 48 clusters, VBGMM 28 clusters, FINCH 8 clusters, HDBSCAN 3 clusters (86.7% noise). HDBSCAN pairs are computed on the

n = 6651

non-noise points only. The covering pair mean NMI is 0.37, which is 38% lower than for learnt embeddings (0.60; Table A13), indicating weaker latent structure in the expert feature space.

Table A14. Pairwise cluster-assignment agreement (ARI/NMI/FMI) for expert features (74-dimensional hand-crafted feature vector), computed on the same 50,000-segment pool. Algorithm outputs: kNN-Leiden 48 clusters, VBGMM 28 clusters, FINCH 8 clusters, HDBSCAN 3 clusters (86.7% noise). HDBSCAN pairs are computed on the

n = 6651

non-noise points only. The covering pair mean NMI is 0.37, which is 38% lower than for learnt embeddings (0.60; Table A13), indicating weaker latent structure in the expert feature space.

Pair	ARI	NMI	FMI	n
Covering pairs—all 50,000 points assigned to a named cluster
kNN-Leiden/VBGMM	$0.112$	$0.369$	$0.165$	$50,000$
kNN-Leiden/FINCH	$0.198$	$0.476$	$0.314$	$50,000$
VBGMM/FINCH	$0.064$	$0.250$	$0.169$	$50,000$
Mean (covering pairs ^†)	0.125	0.365	0.216
HDBSCAN pairs—restricted to $n = 6651$ non-noise HDBSCAN points
HDBSCAN/kNN-Leiden	$0.127$	$0.381$	$0.360$	6651
HDBSCAN/VBGMM	$0.013$	$0.041$	$0.507$	6651
HDBSCAN/FINCH	$0.396$	$0.553$	$0.634$	6651

^† See Table A13 for notation.

Interpretation

The three covering algorithms—kNN-Leiden (graph community detection), VBGMM (Bayesian mixture), and FINCH (hierarchical first-neighbour)—show moderate-to-strong mutual agreement on the learnt embeddings, with mean NMI = 0.60 across their three unique pairs. This convergence across algorithms with fundamentally different inductive biases (modularity maximisation, generative density modelling, and parameter-free hierarchical partitioning) provides direct evidence that the behavioural structure recovered in the GMAE-REx embedding space is a genuine, algorithm-independent property of the representation rather than an algorithmic artefact.

For expert features, the same three covering pairs yield a mean NMI of 0.37, a 38% reduction relative to learnt embeddings, indicating that the expert feature space contains less consistently recoverable structure. This contrast strengthens the claim that representation learning (Contribution 2–5) yields a more structured and generalisable latent space than hand-crafted feature engineering alone.

HDBSCAN interpretation. After excluding noise points, the HDBSCAN pairs tell qualitatively different stories for the two representation types:

Learnt embeddings: the 24,081 non-noise HDBSCAN points show near-zero agreement with all three covering algorithms (NMI $\approx 0.005$ – $0.014$ ). HDBSCAN identified only four extremely dense regions in the 128-d embedding space; the structure-based methods (kNN-Leiden, VBGMM, FINCH) further subdivide those same points into 10–30 finer behavioural sub-groups. The disconnect reflects a genuine difference in granularity rather than disagreement about which points are similar.
Expert features: the 6651 non-noise HDBSCAN points show moderate-to-high agreement with FINCH (NMI = 0.55) and kNN-Leiden (NMI = 0.38), but low agreement with VBGMM (NMI = 0.04). With only three dense clusters in a 74-d feature space, HDBSCAN and FINCH (8 clusters) identify compatible coarse groupings, while the 28-component VBGMM partitions the same points into many small Gaussian components that do not align with the dense HDBSCAN regions.

In both cases the noise fraction itself is informative: 51.8% (embeddings) and 86.7% (expert) of segments are categorised as noise by HDBSCAN, confirming that trajectory behaviour is broadly continuous and does not naturally decompose into a small number of sharply separated dense clusters at the chosen minimum-cluster-size parameter.

Appendix K. Preprocessing Pipeline: Processing Latency

To assess the computational cost of the preprocessing pipeline described in Section 3, we ran a dedicated profiling experiment on 100 one-hour AIS message files from the Kiel coastal receiver, processed sequentially on a single core of an AMD EPYC 7713 processor. The 100 files represent a timing benchmark, not the complete two-year study archive; because throughput is reported as a per-unit rate, the estimates transfer directly to any dataset size. The experiment covered the complete pipeline from raw message decoding through Kalman smoothing, PCHIP interpolation, and feature engineering (ship-level kinematic, spatial, traffic-density, and temporal features), through to the CPA-based ship-to-ship interaction features (DCPA, TCPA, relative bearing, and collision risk index).

The 100 files yielded 835.63 h of trajectory data and 1,784,074 ship-to-ship interactions in total. Table A15 reports the measured throughputs.

Table A15. Measured preprocessing throughput on a single CPU core (AMD EPYC 7713).

Stage	Total Time (100 Files)	Per-Unit Rate
Ship-level features	22.52 min	26.95 ms per trajectory-minute
Ship-to-ship interaction features	45.18 min	1.52 ms per interaction

For a standard 10-min trajectory segment, ship-level processing costs

10 \times 26.95 = 270

ms. The dataset averages

1,784,074 / (835.63 \times 60) \approx 35.6

interactions per trajectory-minute, yielding

\approx 356

interactions per 10-min segment and a further

356 \times 1.52 \approx 541

ms for ship-to-ship features. End-to-end preprocessing latency per 10-min segment: $270 + 541 \approx 811$ ms on a single CPU core. The subsequent GMAE-REx encoder forward pass (≈1.0 M parameters, sequence length 120) adds negligible overhead. Since vessels are processed independently, the pipeline scales linearly with the number of available CPU cores.

The ship-level kinematic features (speed, course, ROT, temporal encodings) are computable from on-board sensor fusion (Global Navigation Satellite System (GNSS), inertial measurement unit, vessel log), so the framework remains applicable even when external AIS reception is limited; only the ship-to-ship interaction features additionally require a surrounding traffic picture from a VTS feed, radar tracker, or shore-based AIS aggregator.

Note

1	The source code for the AIS processing pipeline and the representation learning and clustering pipeline is available on GitHub at https://github.com/CAPTN-sh/marlin-ais-process and https://github.com/CAPTN-sh/marlin-repl, respectively (accessed on 28 February 2026).

References

United Nations Conference on Trade and Development. Review of Maritime Transport 2025: Staying the Course in Turbulent Waters; Technical Report; United Nations: Geneva, Switzerland, 2025. [Google Scholar]
The International Maritime Organization (IMO). Revised Guidelines for the Onboard Operational Use of Shipborne Automatic Identification Systems (AIS). 2015. Available online: https://bit.ly/4rkX80K (accessed on 29 January 2026).
Burmeister, H.C.; Constapel, M. Autonomous collision avoidance at sea: A survey. Front. Robot. AI 2021, 8, 739013. [Google Scholar] [CrossRef]
Saleh, P.; Armitage, J.; Curtis, P.; Abielmona, R.; Petriu, E. Navigating the annotation bottleneck: Active learning for scalable maritime data analytics. In Oceans 2025-Great Lakes; IEEE: New York, NY, USA, 2025; pp. 1–10. [Google Scholar]
Liang, M.; Weng, L.; Gao, R.; Li, Y.; Du, L. Unsupervised maritime anomaly detection for intelligent situational awareness using AIS data. Knowl.-Based Syst. 2024, 284, 111313. [Google Scholar] [CrossRef]
Xie, Z.; Tu, E.; Fu, X.; Yuan, G.; Han, Y. AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review. arXiv 2025, arXiv:2505.07374. [Google Scholar] [CrossRef]
Li, K.; Guo, J.; Li, R.; Wang, Y.; Li, Z.; Miu, K.; Chen, H. The abnormal detection method of ship trajectory with adaptive transformer model based on migration learning. In International Conference on Spatial Data and Intelligence; Springer: Berlin/Heidelberg, Germany, 2023; pp. 204–220. [Google Scholar] [CrossRef]
Izmailov, P.; Kirichenko, P.; Gruver, N.; Wilson, A.G. On feature learning in the presence of spurious correlations. Adv. Neural Inf. Process. Syst. 2022, 35, 38516–38532. [Google Scholar] [CrossRef]
Thombre, S.; Zhao, Z.; Ramm-Schmidt, H.; Garcia, J.M.V.; Malkamäki, T.; Nikolskiy, S.; Hammarberg, T.; Nuortie, H.; Bhuiyan, M.Z.H.; Särkkä, S.; et al. Sensors and AI techniques for situational awareness in autonomous ships: A review. IEEE Trans. Intell. Transp. Syst. 2020, 23, 64–83. [Google Scholar] [CrossRef]
Chan, J.P.; Norman, R.; Pazouki, K.; Golightly, D. Autonomous maritime operations and the influence of situational awareness within maritime navigation. WMU J. Marit. Aff. 2022, 21, 121–140. [Google Scholar] [CrossRef]
European Maritime Safety Agency. Annual Overview of Marine Casualties and Incidents 2023; Technical Report; European Maritime Safety Agency: Lisbon, Portugal, 2023. [Google Scholar]
Subbaswamy, A.; Schulam, P.; Saria, S. Preventing failures due to dataset shift: Learning predictive models that transport. In 22nd International Conference on Artificial Intelligence and Statistics; PMLR: Cambridge, MA, USA, 2019; pp. 3118–3127. [Google Scholar] [CrossRef]
Gao, S.; Huang, Z.; Al-Falouji, G.; Sick, B.; Tomforde, S. Towards cognitive situational awareness in maritime traffic using federated evidential learning. IEEE Trans. Intell. Transp. Syst. 2024; in press. [Google Scholar] [CrossRef]
Al-Falouji, G.; Gao, S.; Haschke, L.; Nowotka, D.; Tomforde, S. Enhancing maritime behaviour analysis through novel feature engineering and digital shadow modelling: A case study in the kiel fjord. In International Conference on Architecture of Computing Systems; Springer: Berlin/Heidelberg, Germany, 2024; pp. 97–111. [Google Scholar] [CrossRef]
International Maritime Organization (IMO). International Convention for the Safety of Life at Sea (SOLAS). Available online: https://bit.ly/3NJ15Oo (accessed on 29 January 2026).
Felski, A.; Zwolak, K. The ocean-going autonomous ship-Challenges and threats. J. Mar. Sci. Eng. 2020, 8, 41. [Google Scholar] [CrossRef]
Endsley, M.R. Toward a theory of situation awareness in dynamic systems. In Situational Awareness; Routledge: London, UK, 2017; pp. 9–42. [Google Scholar] [CrossRef]
Smirnov, N.; Tomforde, S. Navigation Support for an Autonomous Ferry Using Deep Reinforcement Learning in Simulated Maritime Environments. In 2022 CogSIMA; IEEE: New York, NY, USA, 2022; pp. 142–149. [Google Scholar] [CrossRef]
Zhang, L.; Meng, Q.; Xiao, Z.; Fu, X. A novel ship trajectory reconstruction approach using AIS data. Ocean Eng. 2018, 159, 165–174. [Google Scholar] [CrossRef]
Lazarowska, A. Ship’s trajectory planning for collision avoidance at sea based on ant colony optimisation. J. Navig. 2015, 68, 291–307. [Google Scholar] [CrossRef]
Suo, Y.; Chen, W.; Claramunt, C.; Yang, S. A ship trajectory prediction framework based on a recurrent neural network. Sensors 2020, 20, 5133. [Google Scholar] [CrossRef]
Schwinger, R.; Al-Falouji, G.; Tomforde, S. Autonomous Ship Collision Avoidance Trained on Observational Data. In Architecture of Computing Systems; Springer Nature: Cham, Switzerland, 2023; pp. 296–310. [Google Scholar] [CrossRef]
Rong, H.; Teixeira, A.; Soares, C.G. Ship trajectory uncertainty prediction based on a Gaussian Process model. Ocean Eng. 2019, 182, 499–511. [Google Scholar] [CrossRef]
Al-Falouji, G.; Haschke, L.; Nowotka, D.; Tomforde, S. Self-Explanation as a Basis for Self-Integration—The Autonomous Passenger Ferry Scenario. In ACSOS’23 Companion; IEEE: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
IMO. Resolution A.1106(29). 2001. Available online: https://bit.ly/3EIC7H3 (accessed on 29 January 2026).
International Maritime Organization (IMO). Automatic Identification System (AIS). Available online: https://bit.ly/4k6esnC (accessed on 29 January 2026).
Felski, A.; Jaskólski, K. The integrity of information received by means of AIS during anti-collision manoeuvring. TransNav 2013, 7, 95–100. [Google Scholar] [CrossRef][Green Version]
Jiang, Z.; Yu, Z.; Zhang, D.; Chu, X.; Yang, Q. Characteristics of vessel traffic flow during waterway regulations: A case study in the Yangtze River. In 2019 5th International Conference on Transportation Information and Safety (ICTIS); IEEE: New York, NY, USA, 2019; pp. 364–368. [Google Scholar] [CrossRef]
Schwehr, K.D.; McGillivary, P.A. Marine Ship Automatic Identification System (AIS) for enhanced coastal security capabilities: An oil spill tracking application. In Oceans 2007; IEEE: New York, NY, USA, 2007; pp. 1–9. [Google Scholar] [CrossRef]
Nguyen, D.; Vadaine, R.; Hajduch, G.; Garello, R.; Fablet, R. A multi-task deep learning architecture for maritime surveillance using AIS data streams. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA); IEEE: New York, NY, USA, 2018; pp. 331–340. [Google Scholar] [CrossRef]
Wu, X.; Mehta, A.L.; Zaloom, V.A.; Craig, B.N. Analysis of waterway transportation in Southeast Texas waterway based on AIS data. Ocean Eng. 2016, 121, 196–209. [Google Scholar] [CrossRef]
Yoo, S.L. Near-miss density map for safe navigation of ships. Ocean Eng. 2018, 163, 15–21. [Google Scholar] [CrossRef]
Shelmerdine, R.L. Teasing out the detail: How our understanding of marine AIS data can better inform industries, developments, and planning. Mar. Policy 2015, 54, 17–25. [Google Scholar] [CrossRef]
Jensen, C.M.; Hines, E.; Holzman, B.A.; Moore, T.J.; Jahncke, J.; Redfern, J.V. Spatial and temporal variability in shipping traffic off San Francisco, California. Coast. Manag. 2015, 43, 575–588. [Google Scholar] [CrossRef]
Al-Falouji, G.; Gruhl, C.; Neumann, T.; Tomforde, S. A Heuristic for an Online Applicability of Anomaly Detection Techniques. In 2022 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C); IEEE: New York, NY, USA, 2022; pp. 107–112. [Google Scholar] [CrossRef]
Al-Falouji, G.; Gruhl, C.; Tomforde, S. Digital Shadows in Self-Improving System Integration: A Concept U sing Generative Modelling. In 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C); IEEE: New York, NY, USA, 2021; pp. 166–171. [Google Scholar] [CrossRef]
Chen, S.; Huang, Y.; Lu, W. Anomaly detection and restoration for ais raw data. In Wireless Communications and Mobile Computing; John Wiley & Sons: Hoboken, NJ, USA, 2022. [Google Scholar] [CrossRef]
Goller, M.; Thomsen, I.; Al-Falouji, G.; Tomforde, S. Abnormal Behaviour Detection of Self-Adaptive Agents in Traffic Environments. In ACSOS’23 Companion; IEEE: New York, NY, USA, 2023; pp. 1–8. [Google Scholar] [CrossRef]
Kim, H.; Choi, M.; Park, S.; Lim, S. Vessel trajectory classification via transfer learning with deep convolutional neural networks. PLoS ONE 2024, 19, e0308934. [Google Scholar] [CrossRef]
Nguyen, D.; Fablet, R. A transformer network with sparse augmented data representation and cross entropy loss for ais-based vessel trajectory prediction. IEEE Access 2024, 12, 21596–21609. [Google Scholar] [CrossRef]
Emmens, T.; Amrit, C.; Abdi, A.; Ghosh, M. The promises and perils of Automatic Identification System data. Expert Syst. Appl. 2021, 178, 114975. [Google Scholar] [CrossRef]
Al-Falouji, G.; Haschke, L.; Nowotka, D.; Tomforde, S. A Framework for Surface Vessel Nautical-Behaviour Analysis towards Cognitive Situation Awareness. In Proceedings of the Conference on Cognitive and Com, Boston, MA, USA, 6–9 August 2024; Volume 102, pp. 199–214. [Google Scholar] [CrossRef]
National Marine Electronics Association (NMEA). NMEA 0183 Standard for Interfacing Marine Electronic Devices. Available online: https://www.nmea.org/ (accessed on 29 January 2026).
Jankowski, D.; Lamm, A.; Hahn, A. Determination of AIS position accuracy and evaluation of reconstruction methods for maritime observation data. IFAC-PapersOnLine 2021, 54, 97–104. [Google Scholar] [CrossRef]
Danish Maritime Authority. AIS Data. 2025. Available online: https://www.dma.dk/safety-at-sea/navigational-information/ais-data (accessed on 16 September 2025).
Geodatastyrelsen/Dataforsyningen. Danmarks Dybdemodel (DDM) 50 m Grid Bathymetry Data. 2024. Available online: https://dataforsyningen.dk/data/4817 (accessed on 16 September 2025).
Federal Maritime and Hydrographic Agency (BSH). Hydrographic Surveying and Marine Data. 2025. Available online: https://www.bsh.de/EN/DATA/Marine-use/Hydrographic_surveying/hydrographic_surveying_node.html (accessed on 16 September 2025).
Leibniz Institute for Baltic Sea Research. Bathymetric Survey Data for the Baltic Sea. In Data Collected as Part of Joint Hydrographic Mapping Initiatives; Measurement Period: 1994–2010; Leibniz Institute for Baltic Sea Research: Rostock, Germany, 2010. [Google Scholar]
OpenStreetMap Contributors. OSM Land Polygons. Available online: https://osmdata.openstreetmap.de/data/land-polygons.html (accessed on 16 September 2025).
OpenStreetMap Contributors. OpenStreetMap Data Extracted via Overpass API. Available online: https://overpass-api.de/api/interpreter (accessed on 16 September 2025).
International Telecommunication Union (ITU). Recommendation ITU-R M.1371-5: Technical Characteristics for an Automatic Identification System Using Time Division Multiple Access in the VHF Maritime Mobile Frequency Band. Available online: https://www.itu.int/rec/R-REC-M.1371 (accessed on 29 January 2026).
MyShipTracking.com. Ship Tracking and Maritime Information. Available online: https://api.myshiptracking.com/ (accessed on 16 September 2025).
Rong, H.; Teixeira, A.; Soares, C.G. A framework for ship abnormal behaviour detection and classification using AIS data. Reliab. Eng. Syst. Saf. 2024, 247, 110105. [Google Scholar] [CrossRef]
Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
Fritsch, F.N.; Carlson, R.E. Monotone piecewise cubic interpolation. SIAM J. Numer. Anal. 1980, 17, 238–246. [Google Scholar] [CrossRef]
Harati-Mokhtari, A.; Wall, A.; Brooks, P.; Wang, J. Automatic Identification System (AIS): Data Reliability and Human Error Implications. J. Navig. 2007, 60, 373–389. [Google Scholar] [CrossRef]
Zhao, L.; Fu, X. A method for correcting the closest point of approach index during vessel encounters based on dimension data from AIS. IEEE Trans. Intell. Transp. Syst. 2021, 23, 13745–13757. [Google Scholar] [CrossRef]
Kuwahara, S.; Nishimura, H.; Nakagawa, K.; Yoshinaga, M.; Iseki, S.; Yoshida, R.; Hakoyama, T.; Kutsuna, K.; Nakamura, J. Research and development of collision risk decision method for safe navigation and its verification. ClassNK Tech. J. 2021, 3, 13–40. [Google Scholar]
Chen, Y.; Kang, Y.; Chen, Y.; Wang, Z. Probabilistic forecasting with temporal convolutional neural network. Neurocomputing 2020, 399, 491–501. [Google Scholar] [CrossRef]
Huang, Z.; Gruhl, C.; Sick, B. LiST: An All-Linear-Layer Spatial-Temporal Feature Extractor with Uncertainty Estimation for RUL Prediction. In IEEE Conference on Industrial Electronics and Applications (ICIEA); IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
Campello, R.J.; Moulavi, D.; Sander, J. Density-based clustering based on hierarchical density estimates. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2013; pp. 160–172. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef] [PubMed]
Sarfraz, S.; Sharma, V.; Stiefelhagen, R. Efficient parameter-free clustering using first neighbor relations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8934–8943. [Google Scholar]
Moulavi, D.; Jaskowiak, P.A.; Campello, R.J.G.B.; Zimek, A.; Sander, J. Density-based clustering validation. In 2014 SIAM International Conference on Data Mining (SDM); Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2014; pp. 839–847. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
Liu, L.; Jiang, H.; He, P.; Chen, W.; Liu, X.; Gao, J.; Han, J. On the Variance of the Adaptive Learning Rate and Beyond. arXiv 2021, arXiv:1908.03265. [Google Scholar] [CrossRef]
Beal, M. Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, University of London, University College London (United Kingdom), London, UK, 2003. [Google Scholar]
Xie, J.; Girshick, R.; Farhadi, A. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2016; pp. 478–487. [Google Scholar]
Guo, X.; Gao, L.; Liu, X.; Yin, J. Improved deep embedded clustering with local structure preservation. In Proceedings of the International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 1753–1759. [Google Scholar]
Spadon, G.; Kumar, J.; Eden, D.; van Berkel, J.; Foster, T.; Soares, A.; Fablet, R.; Matwin, S.; Pelot, R. Multi-path long-term vessel trajectories forecasting with probabilistic feature fusion for problem shifting. Ocean Eng. 2024, 312, 119138. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat.-Theory Methods 1974, 3, 1–27. [Google Scholar] [CrossRef]
Davies, D.L.; Bouldin, D.W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 2, 224–227. [Google Scholar] [CrossRef]
Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
Strehl, A.; Ghosh, J. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 2002, 3, 583–617. [Google Scholar]
Fowlkes, E.B.; Mallows, C.L. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 1983, 78, 553–569. [Google Scholar] [CrossRef]

Figure 1. Different navigationally relevant map layers extracted from OpenStreetMap via the Overpass API [50], colour-coded for visual distinction. Commercial ferry routes (green) are widened to improve visual clarity.

Figure 2. Qualitative assessment of the smoothing and interpolation pipeline on two representative 10 min AIS trajectory segments (120 observations at 10 s resolution) from two different sailing vessels in the Kiel Fjord. In each panel, blue dots indicate the raw AIS position reports as broadcast and the red line shows the fully preprocessed trajectory after Kalman smoothing and PCHIP interpolation. Sailing vessels are chosen because their wind-driven tacking behaviour produces frequent, pronounced direction reversals within a single 10 min window—the most demanding scenario for a constant-velocity motion model. (a) A sailing vessel exhibiting high positional scatter in the raw reports; the smoother suppresses the noise while the tacking pattern is faithfully preserved in the interpolated output. (b) A sailing vessel combining substantial raw scatter with strong directional complexity; the curvature-dependent gap tolerance (2 min for turning segments) prevents the Kalman model from being applied across extended high-ROT intervals. In both cases the pipeline preserves the qualitative shape of the manoeuvre while effectively removing positional noise.

Figure 3. Comparison of raw and processed AIS data in the Port of Kiel area. (Left) Raw AIS position reports showing noise, outliers, and irregular sampling. (Right) Processed vessel trajectories after filtering, segmentation, smoothing, and interpolation. Each colour represents a distinct trajectory segment with a unique identifier. The preprocessing pipeline successfully removes outliers, segments continuous vessel movements, and produces clean trajectories suitable for downstream analysis.

Figure 4. Daily AIS trajectory steps count after interpolation for the years 2022–2023. Data from Kiel University of Applied Science (blue) capture a larger number of vessels when available, resulting in higher daily counts, while data from the DMA (orange) show more consistent coverage across the full two-year period.

Figure 5. Vessel density in the Kiel area at a spatial resolution of

10 \times 10

m. The density is computed separately for each ship group (cargo, passenger, sailing, and other) using only AIS trajectories from the training dataset.

Figure 5. Vessel density in the Kiel area at a spatial resolution of

10 \times 10

m. The density is computed separately for each ship group (cargo, passenger, sailing, and other) using only AIS trajectories from the training dataset.

Figure 6. Architecture of GroupMAE-REx for AIS trajectory representation learning.

Figure 7. Representation learning pipelines.

Figure 8. UMAP projections of VBGMM clustering (28 components). (a) Learnt embeddings with 28 well-separated clusters with mixed ship types per cluster, organised by operational context. (b) Expert features with 28 overlapping clusters with ship-type segregation, organised by kinematic similarity. Each subfigure shows cluster assignments ((left), coloured by cluster ID) and ship-type distribution ((right), coloured by vessel category). Learnt embeddings form spatially compact, well-separated clusters containing mixed vessel types, indicating organisation by operational navigation modes (e.g., channel transit, port manoeuvring, and anchorage). Expert features produce diffuse, overlapping clusters with greater ship-type segregation, indicating organisation by kinematic profiles (speed/course characteristics). Quantitative metrics in Table 17 confirm superior cluster quality for embeddings: DBCV (

- 0.594

vs.

- 0.742

), conductance (

0.193

vs.

0.498

), modularity (

0.756

vs.

0.419

). This demonstrates that learnt representations capture operational context essential for autonomous navigation, transcending vessel-specific characteristics. Complete projections for all methods are in Appendix F.

Figure 8. UMAP projections of VBGMM clustering (28 components). (a) Learnt embeddings with 28 well-separated clusters with mixed ship types per cluster, organised by operational context. (b) Expert features with 28 overlapping clusters with ship-type segregation, organised by kinematic similarity. Each subfigure shows cluster assignments ((left), coloured by cluster ID) and ship-type distribution ((right), coloured by vessel category). Learnt embeddings form spatially compact, well-separated clusters containing mixed vessel types, indicating organisation by operational navigation modes (e.g., channel transit, port manoeuvring, and anchorage). Expert features produce diffuse, overlapping clusters with greater ship-type segregation, indicating organisation by kinematic profiles (speed/course characteristics). Quantitative metrics in Table 17 confirm superior cluster quality for embeddings: DBCV (

- 0.594

vs.

- 0.742

), conductance (

0.193

vs.

0.498

), modularity (

0.756

vs.

0.419

). This demonstrates that learnt representations capture operational context essential for autonomous navigation, transcending vessel-specific characteristics. Complete projections for all methods are in Appendix F.

Figure 9. UMAP visualisation of learnt trajectory embeddings coloured by trajectory-level mean values of selected AIS and contextual features. Panel (a) highlights cargo trajectories, which concentrate in two regions on the left side of the embedding space. Panels (b,c) show that both regions correspond to vessels with larger length and width. Panels (d–f) indicate differences in operating context: the left region is associated with deeper waters and larger distances to land and restricted areas, while the upper-left region relates to shallower waters and closer proximity to coastlines and regulated regions. This spatial organisation demonstrates that GMAE-REx embeddings encode operational environmental context alongside vessel characteristics.

Figure 10. UMAP visualisation of the learnt trajectory embeddings coloured by ship type. Cargo trajectories in panel (a) mainly occupy the left and upper-left regions. Passenger trajectories in panel (b) appear more often in the lower-right region, consistent with ferry operations close to land (cf. Figure 9e). Sailing trajectories in panel (c) are more frequent in the central region, farther from the coastline. The vessel length and width patterns in Figure 9a,b suggest that passenger and sailing trajectories are associated with smaller physical dimensions. The mixed-type composition of individual clusters in Figure 8 indicates that the embedding organises trajectories by operational behaviour rather than vessel identity.

Figure 11. Global SHAP feature importance aggregated across all 28 VBGMM clusters. Temporal encodings (day_of_year_cos/sin, day_of_week_cos/sin) dominate the top 4 positions, indicating that seasonal and weekly patterns are primary drivers of cluster differentiation. Kinematic features (length, speed, course, acc) occupy mid-range ranks, whilst environmental features (water_depth, dist_to_land, density_all) and interaction features (dcpa_0, tcpa_0) contribute at moderate levels. This hierarchy suggests that GMAE-REx embeddings encode operational context (when and how vessels navigate) rather than vessel identity (what vessel types navigate). Detailed per-cluster SHAP rankings and comparisons with kNN-Leiden, FINCH, and HDBSCAN are provided in Appendix H.

Figure 12. Per-cluster SHAP feature importance showing the top 15 features for each of the 28 VBGMM clusters. Temporal features are consistently important across most clusters, but specific clusters exhibit elevated importance for kinematic, spatial, interaction, or density features, indicating context-dependent feature salience. This heterogeneity demonstrates that the embedding space supports multiple behavioural facets rather than a single canonical pattern. Detailed cluster-specific analyses and feature interaction studies for all clustering methods are provided in Appendix H.

Figure 13. VBGMM cluster centroid radar profiles (Clusters 0–11 of 28 total). Each polygon represents the operational signature of one cluster, computed as the mean across all member trajectories and normalised to [0,1] relative to global feature extrema within the method. Large filled areas indicate high values across multiple dimensions, whilst asymmetric shapes reveal specialised operational contexts. These profiles complement SHAP analysis by visualising cluster centroids in the original feature space rather than quantifying feature importance for cluster assignments. Complete profiles for all 28 VBGMM clusters and comparative analyses for kNN-Leiden (47 clusters), FINCH (27 clusters), and HDBSCAN (3 clusters) are in Appendix I.

Figure 14. Spatial collision risk profiles for two contrasting kNN-Leiden clusters. Each cell in the

10 m \times 10 m

raster shows the mean collision risk (Equation (1)) averaged over all AIS observations from trajectory segments belonging to the cluster; cells with no observations are transparent. (a) Cluster 16: high-risk encounter pattern, with elevated collision risk concentrated in the main channel and ferry-terminal approaches of Kiel Fjord, corresponding to geometrically constrained zones where simultaneous low TCPA and low DCPA are structurally enforced by converging traffic. (b) Cluster 29: low-risk transit pattern, showing near-zero mean collision risk across the full study area, characterising trajectories that consistently operate with either ample temporal or spatial separation from other vessels. The geographic contrast between the two profiles demonstrates that the discovered behavioural clusters encode verifiable, safety-relevant navigational scenarios.

Figure 14. Spatial collision risk profiles for two contrasting kNN-Leiden clusters. Each cell in the

10 m \times 10 m

raster shows the mean collision risk (Equation (1)) averaged over all AIS observations from trajectory segments belonging to the cluster; cells with no observations are transparent. (a) Cluster 16: high-risk encounter pattern, with elevated collision risk concentrated in the main channel and ferry-terminal approaches of Kiel Fjord, corresponding to geometrically constrained zones where simultaneous low TCPA and low DCPA are structurally enforced by converging traffic. (b) Cluster 29: low-risk transit pattern, showing near-zero mean collision risk across the full study area, characterising trajectories that consistently operate with either ample temporal or spatial separation from other vessels. The geographic contrast between the two profiles demonstrates that the discovered behavioural clusters encode verifiable, safety-relevant navigational scenarios.

Table 1. Overview of AIS message classes and commonly used message types [51].

Message Class	Description	Message IDs
Class A	Position report	1, 2, 3
Class A	Static and voyage-related data	5
Class B	Position report	18, 19
Class B	Static data report	24

Table 2. Trajectory table containing dynamic AIS information.

Column	Description	Manually Entered
mmsi	MMSI	No
timestamp	UTC timestamp of the AIS message	No
lon	Longitude (EPSG:4326)	No
lat	Latitude (EPSG:4326)	No
status	Navigational status	Yes
cog	COG	No
heading	True heading	No
draught	Reported vessel draught	Yes

Table 3. Static vessel information table.

Column	Description	Manually Entered
mmsi	MMSI	No
ship_type	AIS vessel type code	Yes
to_bow	Distance from GPS antenna to bow	Yes
to_stern	Distance from GPS antenna to stern	Yes
to_port	Distance from GPS antenna to port side	Yes
to_starboard	Distance from GPS antenna to starboard side	Yes

Table 4. Clustering models considered in Pipeline 2 for behavioural analysis.

Model	Core Assumptions	Fitness for Maritime Behaviour
HDBSCAN [63]	Clusters are high-density regions separated by low density. No assumptions about cluster number or shape.	Excellent for arbitrary-shape behaviour modes and noise handling. Hierarchical structure enables multi-scale analysis of local manoeuvres and global routes.
VBGMM [64]	Gaussian mixture components with VB inference automatically determining effective number of clusters.	High. Mixture components approximate complex distributions. Probabilistic nature valuable for anomaly detection and modelling speed/heading variability.
k-NN + Leiden [65]	Dense feature-space regions correspond to densely connected graph communities.	Excellent for interconnected route networks. Connectivity guarantee prevents fragmented clusters and improves stability.
FINCH [66]	Shared nearest neighbour relationships form hierarchical multi-scale patterns.	High. Hierarchical approach suits multi-scale behaviours. Parameter-free (except k) and robust for exploratory clustering.

Table 6. Post-processed dataset summary used for experiments.

Item	Value
Region of interest	Port of Kiel and surrounding waters (( $10.12 °$ E, $54.31 °$ N), ( $10.33 °$ E, $54.46 °$ N))
Time span	730 days, 2022–2023
Temporal resolution (post-processing)	5 s (fixed temporal grid)
Segment length for learning	$L = 120$ time steps (10 min)
Trajectories (after processing)	176,787
Unique vessels (MMSI)	9948
Total interpolated points	63,448,367
Total segments	527,225
Interaction range	2 km
Neighbour retention	Top 2 neighbours by collision-risk score
Feature definitions	See Appendix A

Table 7. Hyper-parameter search space for encoder selection (Experiment I). Following Contribution 2, optimisation is conducted exclusively on self-supervised validation loss.

Hyper-Parameter		Optuna Search Space
Batch size	$b a t c h_s i z e$	$[32, 64, 128]$
Learning rate	$l r$	$0.001$ (fixed)
Encoder layers	$e_{l a y e r s}$	$[1, 2, 3, 4, 5]$
Decoder layers	$d_{l a y e r s}$	$[1, 2, 3, 4, 5]$
Model dimension	$d_{m o d e l}$	$[32, 64, 128]$
FFN dimension	$f_{d i m}$	$[64, 128, 256, 512, 1024]$
Attention heads	$n_{h e a d s}$	$[2, 4, 8]$
Dropout	$d r o p o u t$	$[0.0, 0.1, 0.3]$
Noise (DAE/EAE only)	$n o i s e$	$[0.01, 0.02, 0.05, 0.1]$

Table 8. Sensitivity to group mask ratio (levelA, density4),

λ_{rex} = 0.1

.

Table 8. Sensitivity to group mask ratio (levelA, density4),

λ_{rex} = 0.1

.

Mask Ratio (Group Mask)	0.25	0.35	0.50	0.75
Linear Probe (Acc.)	0.7447	0.7447	0.7535	0.7479
Fine-tune (Acc.)	0.8514	0.8514	0.8548	0.8524

Table 9. Grouping scheme ablation (group mask rate = 0.5, env scheme = density4,

λ_{rex} = 0.1

).

Table 9. Grouping scheme ablation (group mask rate = 0.5, env scheme = density4,

λ_{rex} = 0.1

).

Group Scheme	levelA	levelB
Linear Probe (Acc.)	0.7447	0.7474
Fine-tune (Acc.)	0.8517	0.8556

Table 10. Environment scheme ablation (group scheme = levelB, group mask rate = 0.5,

λ_{rex} = 0.1

).

Table 10. Environment scheme ablation (group scheme = levelB, group mask rate = 0.5,

λ_{rex} = 0.1

).

Env Scheme	density4	geo4	densitygeo16	densityhour16
Linear Probe (Acc.)	0.7447	0.7495	0.7444	0.7484
Fine-tune (Acc.)	0.8517	0.8508	0.8513	0.8526

Table 11. REx weight ablation (group scheme = levelB, group mask rate = 0.5, env scheme = densityhour16).

$λ_{rex}$	0.0	0.05	0.1	0.2
Linear Probe (Acc.)	0.7454	0.7510	0.7447	0.7437
Fine-tune (Acc.)	0.8515	0.8547	0.8517	0.8513

Table 12. Best hyperparameter configurations and validation accuracy for each encoder (Experiment I). GMAE-REx (Contributions 3–4) achieves the best performance, outperforming all baseline architectures. Trainable parameter counts (Params) are reported for the best-found configuration of the hyperparameters found in Table 7.

HParam	GMAE-REx	DAE	EAE	TCN	Transformer	LiST
$b a t c h_s i z e$	32	64	128	32	32	32
$l r$	$0.001$	$0.001$	$0.001$	$0.001$	$0.001$	$0.001$
$e_{l a y e r s}$	3	4	5	3	3	3
$d_{l a y e r s}$	3	4	3	1	1	1
$d_{m o d e l}$	128	64	64	128	128	128
$f_{d i m}$	256	512	512	–	–	–
$n_{h e a d s}$	4	8	4	–	4	–
$d r o p o u t$	$0.1$	$0.1$	$0.1$	$0.1$	$0.1$	$0.1$
$n o i s e$	–	$0.01$	$0.05$	–	–	–
Params	≈1.01 M	≈0.74 M	≈0.74 M	≈25 K	≈0.58 M	≈0.60 M
Validation Accuracy	$86.03$ %	$85.63 %$	$85.56 %$	$76.27 %$	$84.93 %$	$85.12 %$

Table 13. Clustering methods used for Experiment II and their fitness for maritime trajectory analysis. Detailed descriptions and mathematical formulations are provided in Appendix B.

Method	Key Characteristics	Fitness for AIS Behaviour
kNN-Leiden	Graph-based community detection via modularity optimisation with guaranteed connectivity [65].	Excellent for route-network structures and heterogeneous density; connectivity guarantee reduces fragmentation.
HDBSCAN	Hierarchical density-based clustering with explicit noise handling via mutual reachability [63].	Excellent for arbitrary-shape clusters and anomaly detection; multi-scale structure captures local maneuvers and global patterns.
VBGMM	Variational Bayesian Gaussian Mixture with automatic component selection [71].	High fitness for overlapping behaviours and uncertainty quantification; aligns with Contribution 5.
FINCH	Parameter-free hierarchical clustering via first-neighbour relations [66].	High fitness for exploratory multi-scale analysis; minimal tuning required.

Table 14. Distance metrics and optimisation objectives for each clustering method and representation (Experiment II).

Method	Distance Metric	Optimisation Objective
kNN-Leiden	Cosine	Modularity
HDBSCAN	Euclidean	DBCV
VBGMM	Probabilistic	ELBO
FINCH	Cosine	Silhouette

Table 15. Optuna search space for Stage 1 rough hyperparameter optimisation (Experiment II).

Method	Hyperparameter	Optuna Search Space
kNN-Leiden	`n_neighbors`	[10, 50]
kNN-Leiden	`resolution`	[0.5, 2.0], step 0.1
HDBSCAN	`min_cluster_size`	[20, 200], step 5
	`min_samples`	[5, 50]
	`cluster_selection_epsilon`	[0.0, 1.0], step 0.05
VBGMM	`n_components`	[5, 25]
	`weight_concentration_prior`	[0.001, 1.0]
	`mean_precision_prior`	[0.0001, 0.1]
FINCH	`n_neighbors`	[10, 50]

Table 16. Grid search space for Stage 2 fine hyperparameter optimisation (Experiment II). Each range is centred around the Optuna optimum.

Method	Hyperparameter	Grid Specification
kNN-Leiden	`n_neighbors`	Spread: 8, count: 5, global bounds: [3, 150]
kNN-Leiden	`resolution`	Spread: 0.5, count: 5, global bounds: [0.1, 5.0]
HDBSCAN	`min_cluster_size`	Spread: 20, count: 5, global bounds: [5, 500]
	`min_samples`	Spread: 10, count: 5, global bounds: [1, 100]
	`cluster_selection_epsilon`	Spread: 0.2, count: 5, global bounds: [0.0, 2.0]
VBGMM	`n_components`	Spread: 3, count: 3, global bounds: [2, 30]
	`weight_concentration_prior`	Fixed grid: [0.01, 0.1, 1.0]
	`mean_precision_prior`	Fixed grid: [0.001, 0.01, 0.1]
FINCH	`n_neighbors`	Spread: 10, count: 5, global bounds: [3, 100]

Table 17. Intrinsic clustering metrics on a fixed random sample of 50,000 trajectory segments (Experiment II, Contribution 6). Higher ↑ is better for DBCV and modularity; lower ↓ is better for conductance. Learnt embeddings consistently outperform expert features across kNN-Leiden, FINCH, and VBGMM. Complete metric tables and additional traditional metrics are provided in Appendix D; metric properties and formulations are detailed in Appendix C.

Method	Expert Features				Learnt Embeddings
Method	DBCV ↑	Conductance ↓	Modularity ↑	Communities	DBCV ↑	Conductance ↓	Modularity ↑	Communities
kNN-Leiden	$- 0.791$	$0.327$	$0.875$	48	$- 0.533$ ↑	$0.186$ ↓	$0.906$ ↑	47
FINCH	$- 0.907$	$0.206$	$0.671$	8	$- 0.588$ ↑	$0.205$ ↓	$0.756$	27
HDBSCAN	$0.042$	$0.311$ ↓	–	2	$0.112$ ↑	$0.479$	–	3
VBGMM	$- 0.742$	$0.498$	$0.419$	28	$- 0.594$ ↑	$0.193$ ↓	$0.756$ ↑	28

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Falouji, G.; Gao, S.; Huang, Z.; Biesenbach, B.; Kröger, P.; Sick, B.; Tomforde, S. Representation Learning for Maritime Vessel Behaviour: A Three-Stage Pipeline for Robust Trajectory Embeddings. J. Mar. Sci. Eng. 2026, 14, 507. https://doi.org/10.3390/jmse14050507

AMA Style

Al-Falouji G, Gao S, Huang Z, Biesenbach B, Kröger P, Sick B, Tomforde S. Representation Learning for Maritime Vessel Behaviour: A Three-Stage Pipeline for Robust Trajectory Embeddings. Journal of Marine Science and Engineering. 2026; 14(5):507. https://doi.org/10.3390/jmse14050507

Chicago/Turabian Style

Al-Falouji, Ghassan, Shang Gao, Zhixin Huang, Ben Biesenbach, Peer Kröger, Bernhard Sick, and Sven Tomforde. 2026. "Representation Learning for Maritime Vessel Behaviour: A Three-Stage Pipeline for Robust Trajectory Embeddings" Journal of Marine Science and Engineering 14, no. 5: 507. https://doi.org/10.3390/jmse14050507

APA Style

Al-Falouji, G., Gao, S., Huang, Z., Biesenbach, B., Kröger, P., Sick, B., & Tomforde, S. (2026). Representation Learning for Maritime Vessel Behaviour: A Three-Stage Pipeline for Robust Trajectory Embeddings. Journal of Marine Science and Engineering, 14(5), 507. https://doi.org/10.3390/jmse14050507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Representation Learning for Maritime Vessel Behaviour: A Three-Stage Pipeline for Robust Trajectory Embeddings

Abstract

1. Introduction

2. Background

2.1. Situational Awareness

2.2. AIS for SA

3. Data Processing and Trajectories Retrieval

3.1. Data Sources and Decoding

3.2. Standardisation

3.3. Trajectory Construction

3.4. Train/Test Split Strategy

3.5. Feature Generation

4. Representation Learning Models

4.1. Overall Architecture

4.2. Temporal Convolutional Network

4.3. Linear Spatial–Temporal Feature Extractor

4.4. Vanilla Transformer

4.5. Denoising Autoencoder

4.6. Evidential Autoencoder

4.7. GroupMAE

5. Representation Learning Pipelines

5.1. Dataset Splitting Strategy

5.2. Pipeline 1: Encoder Evaluation and Optimal Hyperparameter Selection

5.3. Pipeline 2: Representation Extraction and Structural Analysis

6. Experiments

6.1. Experimental Setup

6.2. Experiment I: Supervised Probing for Encoder Selection

6.3. Experiment II: Unsupervised Clustering for Structural Discovery

7. Discussion

7.1. Encoder Selection via Supervised Probing

7.2. Intrinsic Structure of Learnt Embeddings

7.3. Feature-Based Analysis: Operational Context Alignment

7.4. Explainability of Cluster Structure via SHAP

7.4.1. Global Feature Importance

7.4.2. Per-Cluster Feature Importance

7.4.3. Cluster Centroid Profile Interpretation

7.5. Metric Interpretation and Granularity

7.6. Comparison with Expert-Engineered Features

7.7. Labels and Vessel-Type Alignment

7.8. Cluster-Level Collision Risk Case Studies

7.9. Practical Implications for Maritime Autonomous Systems

8. Conclusions

8.1. Summary

8.2. Future Work

8.3. Limitations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Post-Processed Features Set

Appendix B. Clustering Methods: Detailed Descriptions and Formulations

Appendix B.1. kNN-Leiden: Graph-Based Community Detection

Appendix B.2. HDBSCAN: Hierarchical Density-Based Spatial Clustering of Applications with Noise

Appendix B.3. VBGMM: Variational Bayesian Gaussian Mixture Model

Appendix B.4. FINCH: First Integer Neighbour Clustering Hierarchy

Appendix B.5. Method Comparison and Selection Rationale

Appendix C. Clustering Validation Metrics

Appendix D. Clustering Quantitative Assessment

Appendix E. Cluster Size Distributions

Appendix F. UMAP Cluster Projections

Appendix G. Global SHAP Feature Importance

Appendix H. Per-Cluster SHAP Feature Importance

Appendix I. Cluster Centroid Feature Profiles

Appendix J. Pairwise Cluster-Assignment Agreement

Appendix K. Preprocessing Pipeline: Processing Latency

Note

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI