1. Introduction
With the continuous increase in global trade volume, the shipping industry plays a vital role in international transportation. However, maritime traffic safety in busy waterways (i.e., environments characterized by restricted physical topologies, heterogeneous traffic, and highly interdependent dynamic situations prone to cascading risks) remains a critical and persistent challenge for coastal administrations and port authorities [
1,
2,
3,
4]. In hub ports and major waterways, the traffic density is typically high and strongly heterogeneous, involving vessels with diverse sizes and operational roles [
5,
6]. Under such conditions, collision is not an occasional anomaly but a systemic operational concern that directly affects navigational safety and port efficiency [
7,
8,
9]. With the ongoing development of smart shipping, there is a growing demand for intelligent supervision systems capable of providing earlier warnings and more reliable risk interpretation [
10,
11,
12,
13,
14].
In reality, a fundamental difficulty in collision risk assessment lies in the dynamic and context-dependent nature of multi-ship encounters. In dense traffic, encounter configurations may rapidly shift among crossing, head-on, and overtaking situations, while being further constrained by channel geometry and local navigational regulations [
15,
16,
17,
18]. As a result, the risk level of a traffic scene cannot be adequately characterized by an instantaneous snapshot. Instead, it emerges from the temporal evolution of relative motion and the constraints of maneuvering space [
19,
20]. It is therefore essential to develop methods that explicitly incorporate spatiotemporal history and reflect maritime operational semantics [
21,
22].
Current researches on collision risk assessment are mainly based on pairwise geometric indicators derived from relative motion, such as Distance at Closest Point of Approach (DCPA) and Time to Closest Point of Approach (TCPA) [
23,
24]. It is important to note that these traditional geometric metrics are highly reliable and fundamental for assessing risk in open-ocean and widely used in practical decision support systems (e.g., digital maritime regulatory platforms, automated Vessel Traffic Management Information System (VTMIS) early-warning modules). However, their reactive and purely kinematic nature presents specific limitations in restricted waterways. Due to narrow fairways and spatial constraints, the geometrically projected Closest Point of Approach (CPA) may sometimes unrealistically fall on shorelines or unnavigable areas. Furthermore, their applicability is structurally limited in complex traffic scenes. Firstly, DCPA and TCPA intrinsically describe pairwise relationships and do not capture system-level interaction topology [
15,
25]. In practice, a vessel pair that appears low-risk in isolation may still contribute to a hazardous situation by constraining maneuvering options [
26,
27,
28]. Secondly, pairwise indicators treat neighboring vessels independently, making it difficult to represent global traffic structure [
29,
30]. Furthermore, in dense waters, consideration must be given to the systemic evolution of risk, which denotes a cascading effect wherein local evasive maneuvers dynamically restrict surrounding navigable space, triggering a topological chain reaction of risk [
31,
32]. This suggests that pairwise risk does not necessarily translate into system-level risk in busy waterways.
With the widespread availability of Automatic Identification System (AIS) data, data-driven approaches have received increasing attention in maritime risk modeling [
33,
34]. Various time-series learning methods, such as Recurrent Neural Networks (RNN) and Temporal Convolutional Networks (TCN), have been developed to capture non-linear vessel motion patterns [
35,
36,
37,
38]. However, when multiple vessels interact simultaneously, the central modeling challenge extends to the representation of interaction structure. From this perspective, multi-ship traffic scenes are naturally formulated as graphs [
39,
40,
41]. Graph Convolutional Networks (GCNs) and Spatio-Temporal Graph Convolutional Networks (ST-GCNs) provide a principled framework to aggregate neighborhood information [
42,
43,
44].
Nevertheless, directly applying standard GCN architectures reveals a critical mismatch between generic graph construction and dynamic collision risk. Existing studies commonly construct interaction graphs using Euclidean distance thresholds [
45,
46]. However, spatial proximity is not equivalent to navigational threat. A relatively distant vessel on a converging course may pose a higher risk than a closer vessel moving in parallel. Consequently, purely distance-driven graphs may allocate modeling capacity to low-threat neighbors while under-representing ship encounters. Therefore, there is a critical need to develop rule-integrated and risk-aware graph constructions enabling learned representations to align more closely with practical navigational rules.
To overcome these shortcomings, this study aims to propose an Improved Spatio-Temporal Graph Convolutional Network (IST-GCN) framework for the dynamic collision risk assessment in complex waterways. The proposed framework leverages historical AIS trajectories to predict near-future scene-level collision risk. Specifically, the IST-GCN extracts non-linear spatiotemporal features, models encounters as dynamically evolving graphs with rule-integrated weighting, and aggregates node-level representations into a graph-level assessment. The objective of this research is to support VTS operators in maintaining the overall safety level of the monitored area. The proposed method aims to identify high-risk multi-ship encounters at the scene level, thereby enabling timely intervention to prevent the escalation of localized conflicts. The contribution of this study is summarized as follows:
Firstly, a predictive collision risk assessment framework named IST-GCN is proposed to jointly model spatiotemporal vessel dynamics and multi-ship interaction topology, structurally embedding maritime operational semantics into the graph learning process.
Secondly, a rule-integrated, dynamically weighted graph construction strategy is developed. Unlike traditional data-driven models that construct interaction graphs relying solely on Euclidean distance thresholds, this strategy adaptively modulates interaction strength based on kinematic threat cues and COLREGs obligations. Crucially, rather than merely pushing alarms earlier in time, it forces the network to accurately allocate attention to functionally threatening vessels, effectively filtering out nuisance alarms and mitigating VTS operator fatigue in complex scenes.
Thirdly, moving beyond isolated and reactive pairwise inference, a hierarchical graph-to-scene aggregation mechanism is designed. By capturing the cascading effects of multi-ship interactions, it converts node-level state representations into a macroscopic risk diagnosis, enabling the proactive forecasting of latent risk accumulation well before critical geometric conflicts manifest.
Finally, comprehensive case studies are conducted using real-world AIS data from the core waters of Zhoushan Port (and an extended generalization scenario in Rizhao Port). The results validate the superior capability of the proposed approach in pinpointing genuine collision threats and its spatial robustness against conventional geometric methods and representative learning-based baselines.
The remainder of this paper is organized as follows.
Section 2 elaborates on the proposed IST-GCN framework, detailing the rule-integrated graph construction strategy and the hierarchical spatio-temporal learning architecture.
Section 3 presents the experimental setup and conducts a comprehensive performance evaluation using real-world AIS data from Ningbo–Zhoushan Port, including comparative benchmarking and ablation studies.
Section 4 discusses the methodological implications, highlighting the shift towards proactive forecasting and analyzing current limitations. Finally,
Section 5 summarizes the main contributions of this study and outlines avenues for future research.
2. Materials and Methods
This section elucidates the proposed methodology, specifically focusing on the systematic translation of rule-aware multi-ship interaction modeling into a computable predictive framework. To accommodate the highly dynamic interaction topologies and inherent cascading effects within congested maritime traffic, we formulate a spatio-temporal graph model designed to characterize the evolutionary patterns of multi-vessel encounter sequences. Departing from conventional static screening approaches, the developed framework exploits historical motion trajectories to proactively forecast risk levels over a subsequent short-term horizon. This process culminates in a robust, interpretable, and scene-level classification of collision risk. The comprehensive schematic of the proposed methodological architecture is depicted in
Figure 1.
2.1. Problem Formulation
We consider a bounded region of interest (ROI) within a congested waterway. At any discrete time step t, the traffic scene is populated by a set of vessels, denoted by . In contrast to conventional instantaneous risk screening, our objective is to formulate a predictive model capable of anticipating the near-future scene-level risk within a specified horizon, conditioned on historical spatiotemporal observations and the underlying interaction topology.
Node Representation. Each vessel
is represented as a graph node characterized by a multidimensional feature vector
derived from Automatic Identification System (AIS) reports. This vector encapsulates fundamental kinematics, including projected Cartesian coordinates
, speed over ground (SOG), and course over ground (COG), alongside temporal dynamics such as short-term derivatives
and
. These derivative features are estimated from successive AIS messages to effectively capture maneuvering trends. By concatenating the individual node features, we obtain the scene-level feature matrix
. To model the evolutionary trajectory of encounters, the framework utilizes a sliding historical window of length
H as its input:
Rule-Aware Multi-Ship Interaction Graph. The complex topological dependencies among vessels are operationalized as a time-varying directed weighted graph , governed by an adjacency matrix . Here, the edge weight quantifies the interaction intensity exerted from vessel j upon vessel i at time t. Departing from simplistic proximity-based graph constructions, the weights are rigorously conditioned on collision-risk descriptors—such as DCPA/TCPA-derived cues—and COLREGs-inspired encounter semantics. Consequently, the graph emphasizes functionally threatening neighbors over those that are merely spatially proximal.
Predictive Learning Objective. Let
denote the prediction horizon. The primary task is to forecast the maximum collective risk level manifested within the forthcoming interval
, utilizing the historical sequence
. We formalize the output as an ordinal three-class label
, corresponding to Low, Medium, and High risk levels, respectively. To establish objective ground-truth supervision, we first define a pairwise collision risk index (CRI) for vessels
i and
j at any time
as a bounded score derived via a monotonic mapping
of DCPA and TCPA:
The scene-level risk
is then determined by aggregating the maximum CRI observed across all internal vessel pairs throughout the prediction window:
Finally, the discrete risk label
is assigned based on domain-informed thresholds
and
:
Accordingly, the proposed IST-GCN aims to optimize a mapping function such that .
2.2. Rule-Integrated Graph Construction
Traditional proximity-based adjacency metrics, such as fixed-radius or k-nearest neighbors, often prove suboptimal as they fail to account for relative motion trends and navigational obligations. We therefore formulate a risk-aware, rule-integrated, and time-varying weighted adjacency matrix , where the edge weights quantify the functional threat relevance between vessels rather than simple geometric proximity.
Pairwise relative motion descriptors. For any vessel pair
at time
t, let their respective positions and velocities in a local Cartesian frame be denoted by
and
. The relative position vector is defined as
, and the relative velocity vector is
. Under the assumption of short-term constant velocity, the time to closest point of approach (TCPA) is derived by projecting the relative position onto the relative velocity vector:
where a positive value indicates that the vessels are converging. The distance at closest point of approach (DCPA) then represents the magnitude of the relative position vector at the time of closest approach:
To ensure numerical robustness, we apply to focus on future collision potential and incorporate a small epsilon to avoid division by zero.
Encounter type and rule obligation indicators. Each vessel pair is categorized into a specific encounter type based on the relative bearing and course difference. To incorporate navigational semantics into the interaction topology, we introduce a simplified rule obligation indicator , where identifies vessel i as the give-way vessel relative to vessel j under COLREGs-inspired logic. This indicator enables the graph to encapsulate asymmetric interaction responsibilities, which are vital for characterizing real-world collision avoidance behaviors. In instances where rule information is unavailable or ambiguous, is conservatively set to zero.
Risk-aware threat scoring. The aforementioned geometric and semantic descriptors are fused into a differentiable pairwise threat score
, serving as a soft measure of interaction significance. We first define continuous gating functions to capture temporal and spatial proximity:
and
, where
and
calibrate the sensitivity to temporal imminence and spatial separation. An encounter-type modulation term
is then applied to reflect navigational criticality, typically following the priority order
. The final threat score is operationalized as:
where
amplifies the influence of rule-based obligations. Crucially,
acts as a relative interaction weight that steers the focus of subsequent message passing rather than functioning as a deterministic alarm.
Dynamic adjacency, sparsification, and normalization. To mitigate the noise associated with dense connectivity in high-traffic scenes, the graph is sparsified by retaining only the
M most threatening neighbors for each target vessel
i:
. The adjacency weights are then normalized using a softmax-based approach:
yielding a row-stochastic, dynamically weighted matrix that emphasizes high-threat interactions. Self-loops can be incorporated by adding an identity matrix during graph convolution normalization. Through this iterative construction, the resulting graph topology evolves adaptively, focusing model capacity on vessels most likely to influence near-future collision risk and providing a physically grounded foundation for spatio-temporal graph learning.
2.3. IST-GCN Model Encoder
The IST-GCN architecture is operationalized through a dual-encoder framework designed to extract and integrate multi-dimensional representations of ship encounters. The comprehensive schematic of the proposed methodological architecture is depicted in
Figure 2. This section elucidates the structural components of the spatial and temporal encoders, which jointly transform raw kinematic sequences into actionable risk assessments.
2.3.1. Spatial Interaction Encoder
The spatial interaction encoder is engineered to characterize the instantaneous topological dependencies and functional threats among multiple vessels within the region of interest. At each discrete time step t, the node embeddings undergo iterative updates through a rule-integrated message-passing mechanism. This process is designed to map raw kinematic features into a high-dimensional latent space that reflects the complex interaction structure of the maritime traffic scene. To stabilize feature propagation and mitigate the numerical instability inherent in dynamically evolving graphs, we apply a symmetric normalization scheme to the self-loop augmented adjacency matrix . The resulting normalized matrix ensures that the scale of node embeddings remains consistent across varying traffic densities.
The core of this encoder lies in its multi-branch (multi-head) spatial convolution architecture, which facilitates the simultaneous extraction of diverse interaction semantics:
where
denotes the number of parallel attention branches, and each branch
is parameterized by distinct rule-gating or threat-weighting configurations.
Unlike standard graph convolutional networks that aggregate information based on uniform spatial decay, this multi-branch design enables the network to adaptively prioritize information flow from vessels posing higher functional threats—defined by the interplay between COLREGs obligations and the CRI. Specifically, each branch can be optimized to focus on different navigational sub-contexts, such as give-way responsibilities or imminent geometric conflicts. By aggregating these refined message-passing outputs, the final spatial layer generates a comprehensive node embedding that encapsulates not only the immediate encounter semantics but also the local interaction constraints imposed by the surrounding traffic. This interaction-aware representation provides a robust spatial foundation for subsequent temporal evolution modeling.
2.3.2. Temporal Evolution Encoder
The temporal evolution encoder is responsible for modeling the dynamic progression of multi-ship encounters over the historical observation window
. It operates on the sequence of interaction-aware embeddings generated by the spatial encoder to identify latent risk precursors and characterize the temporal dependencies inherent in vessel maneuvers. To effectively capture non-linear motion trends and cascading maneuver responses, we employ a gated temporal convolution (TCN) architecture. This mechanism utilizes causal convolutions to ensure that the risk assessment at time
t is strictly conditioned on past observations, thereby preserving the temporal causality of maritime traffic evolution. The gated activation units are operationalized using one-dimensional causal convolutions. To concisely capture both feature extraction and noise filtration, the final temporal representation
is formulated as:
where ∗ denotes the temporal convolution operator, and
represent the trainable weight matrices for the feature and gate branches, respectively. Here, ⊙ is the element-wise Hadamard product. The tanh function extracts complex motion features, while the sigmoid function acts as a temporal gate to suppress noise from irregular AIS reporting intervals.
To synthesize a unified scene-level assessment from these heterogeneous temporal representations, a permutation-invariant aggregation mechanism is required. We implement an attention-based pooling strategy to prioritize risk-dominant vessels, effectively mirroring the cognitive heuristic where maritime operators focus on the most threatening ships in a complex scene. The aggregation is succinctly formulated as:
where
reflects the learned significance of vessel
i in driving the collective risk, derived from its latent attention score
. This hierarchical encoding process culminates in a MLP prediction head followed by a softmax layer, which outputs the probability distribution
over discrete risk levels (Low, Medium, and High). By coupling the rule-integrated spatial interaction topology with long-term temporal evolution, the IST-GCN encoder facilitates proactive collision risk forecasting that is both physically grounded and computationally efficient for real-time VTS applications.
2.4. IST-GCN Model Decoder
The functionality of the scene-level risk decoder extends beyond simple feature aggregation; it serves as a semantic bridge that distills microscopic, node-level interaction states into a macroscopic, actionable risk diagnosis. Given the heterogeneous node embeddings produced by the spatio-temporal encoders, the decoder faces two fundamental modeling challenges: the time-varying cardinality of the vessel set (due to ships entering or leaving the ROI) and the absence of a canonical node ordering. To address these constraints, we engineer a permutation-invariant decoding architecture that synthesizes a fixed-length scene representation, ensuring that the risk assessment remains structurally robust regardless of traffic density fluctuations.
2.4.1. Risk-Focused Attention Aggregation
Conventional pooling operators, such as global average pooling, often dilute critical risk signals by treating all vessels uniformly, effectively “averaging out” the threat posed by a single dangerous target in a sea of safe vessels. Conversely, global max pooling may ignore secondary but significant threats. To overcome these limitations and mirror the cognitive heuristics of VTS operators—who selectively focus their attention on high-threat targets amidst complex backgrounds—we implement a parameterized, risk-focused attention mechanism.
Each vessel node is first projected into a latent scalar score representing its contribution to the global risk context. This is operationalized through a learned alignment function:
where
and
are trainable projection parameters, and
serves as a learnable “context query vector” that encapsulates the global pattern of a high-risk scenario. The scalar scores are subsequently normalized via a softmax function to produce the attention distribution
, ensuring
:
The final scene-level embedding is then synthesized as a weighted sum . This mechanism allows the model to perform a “soft selection” of salient features, effectively amplifying signals from vessels involved in rule-violating or geometrically converging encounters while suppressing noise from non-interacting traffic.
2.4.2. Probabilistic Prediction and Interpretability
The synthesized scene embedding
, which now encapsulates the most salient spatiotemporal risk characteristics, is fed into a prediction head. This component consists of a Multi-Layer Perceptron (MLP) designed to project the high-dimensional graph representation into the target class space. The final output is generated via a softmax activation layer, yielding a normalized probability distribution over the discrete risk categories (Low, Medium, High):
where
and
denote the weights and biases of the projection and classification layers, respectively.
This hierarchical formulation aligns the model’s predictive output directly with the operational requirements of maritime supervision, providing a probabilistic measure of confidence for each risk level. Crucially, the learned attention weights offer intrinsic interpretability to the decision support system. By visualizing these weights as a “risk saliency map,” VTS authorities can immediately identify which specific vessels or interaction clusters are driving the predicted alarm, thereby transforming the model from a “black box” into a transparent tool for proactive safety intervention.
4. Discussion
The empirical results presented in
Section 3 substantiate that the proposed Improved Spatio-Temporal Graph Convolutional Network (IST-GCN) constitutes a robust and physically grounded framework for maritime collision risk assessment. Beyond the quantitative performance gains in accuracy and recall, these findings hold significant implications for intelligent maritime supervision and the future operation of Vessel Traffic Services (VTS). This section critically discusses the methodological advantages of the proposed approach, particularly its role in shifting the paradigm from reactive detection to proactive forecasting, while also addressing current limitations and prospective research directions.
4.1. Proactive Forecasting via Rule-Integrated Topology
A pivotal contribution of this study is the methodological transcendence from reactive, geometry-based risk detection to proactive, rule-integrated risk forecasting. Conventional collision avoidance systems predominantly rely on instantaneous snapshots of DCPA/TCPA metrics, which are inherently reactive—triggering alarms only after a safety domain has been violated or when a hazardous encounter is already geometrically inevitable. Such approaches often fail to account for the large inertia and delayed hydrodynamic response of merchant vessels, leaving insufficient time for effective intervention. In contrast, the IST-GCN leverages a 60-s historical observation window to forecast the risk category over a subsequent 30-s horizon. This look-ahead capability effectively internalizes the temporal trends of vessel maneuverability, enabling VTS operators to identify emerging hazardous states before they manifest as critical spatial conflicts.
However, the reliability of such proactive forecasting is fundamentally contingent on the quality of the underlying interaction graph. Traditional proximity-based graph models tend to allocate uniform attention to all spatially adjacent vessels, resulting in the indiscriminate aggregation of noise from navigationally benign neighbors (e.g., parallel sailing). The proposed IST-GCN addresses this by embedding a rule-integrated semantic filter directly into the topological construction. By synthesizing COLREGs obligations with the CRI, the model selectively amplifies interactions that are kinematically converging or rule-violating while suppressing those that are operationally safe. In addition, in practical VTS deployment, the IST-GCN framework is strategically optimized to balance True Positives and False Positives. It maximizes TP through a weighted loss function to ensure maritime safety, while simultaneously suppressing FP via the rule-integrated graph to prevent operator alarm fatigue. Furthermore, the additional generalization experiment in Rizhao Port confirms that this optimal FP/TP balance is highly robust across different waterway topologies without overfitting.
This synergistic integration of temporal forecasting and rule-based spatial filtering yields a dual advantage: it achieves a high sensitivity to genuine threats (90.8% Recall) while significantly mitigating the false alarm problem endemic to dense traffic monitoring (8.5% FAR). Consequently, the framework not only provides an early warning buffer for timely decision-making but also ensures structural alignment with maritime operational logic, fostering greater trust among human operators who require automated tools that think in accordance with established navigational rules. Furthermore, the practical deployment of the IST-GCN in VTS centers is strongly supported by its real-time computational efficiency. As empirical tests indicate, the model achieves an average inference time of 149.5 ms per scene. Given that standard AIS data update intervals typically range from 2 to 10 s, this rapid processing throughput ensures seamless, zero-latency risk monitoring. In an operational context, this computational lightness allows the framework to be efficiently integrated into existing Vessel Traffic Management Information Systems (VTMIS) as a backend module. It ensures that the proactive risk alerts generated by the model can be delivered to VTS operators synchronously with live traffic feeds, without imposing prohibitive hardware burdens on port authorities.
4.2. Limitations and Future Research Directions
Despite the demonstrated efficacy of the IST-GCN framework in proactive risk forecasting, several intrinsic limitations warrant objective acknowledgment to guide subsequent research efforts. The primary constraint arises from the model’s dependence on unimodal Automatic Identification System (AIS) data streams. While AIS provides essential kinematic baselines, it remains vulnerable to signal packet loss, multipath interference, and intentional spoofing, particularly in congested waterways or satellite-denied environments. In scenarios involving non-cooperative targets where transponders are disabled or malfunctioning, the structural integrity of the interaction graph is compromised, leading to topological fragmentation and potential risk underestimation. To mitigate this vulnerability, future iterations of the framework must pivot towards multi-modal sensor fusion strategies. The integration of heterogeneous data sources—specifically shore-based marine radar for non-cooperative target tracking and CCTV visual feeds for near-field semantic verification—would significantly bolster system robustness against single-source data degradation.
Furthermore, the framework’s applicability to autonomous navigation is currently limited. First, its representation of COLREGs is simplified, relying on basic encounter classifications rather than complex maneuvering behaviors. Second, it focuses solely on risk prediction without generating actionable collision avoidance or route planning strategies. Future work will address this by integrating fine-grained rule representations and coupling the model with decision-making modules for closed-loop autonomous control. Finally, while the current IST-GCN framework focuses on localized, short-term risk forecasting in port areas rather than entire routes, frequent local evasive maneuvers can induce cumulative navigational delays. Therefore, future research could integrate these micro-level risk assessments with macroscopic scheduling systems to optimize Estimated Time of Arrival (ETA) and Requested Time of Arrival (RTA) strategies, thereby enhancing overall port logistical efficiency.
5. Conclusions
In order to achieve proactive identification of emerging hazardous states and provide adequate early-warning lead time, this study makes an attempt to propose an Improved Spatio-Temporal Graph Convolutional Network (IST-GCN) framework for the short-term forecasting of ship collision risk in complex maritime environments. By explicitly integrating maritime domain knowledge (i.e., CRI and COLREGs) into a spatiotemporal graph learning framework, the proposed approach aims to bridge the gap between individual trajectory analysis and system-level interaction topology.
With the AIS data from the core waters of Ningbo–Zhoushan Port, one case study is created to validate the effectiveness of the proposed methodology in this study. Results show that the proposed IST-GCN achieves a peak accuracy of 92.4% and an F1-score of 0.911 in the three-tier risk classification task. Compared to the representative deep learning baselines (e.g., standard ST-GCN), the proposed methodology could contribute to an improvement of approximately 7.5 percentage points. This suggests that the proposed IST-GCN framework could consistently outperform both traditional geometric indicators and generic deep learning models.
Findings of this study reveal that the incorporation of rule-aware dynamic adjacency weighting could substantially reduce unnecessary warnings in complex encounter scenarios. Specifically, the False Alarm Rate (FAR) is reduced to 8.5% where proximity-based models tend to be overly conservative. This shows that the proposed methodology could provide a more reliable and accurate estimate for the vessel collision risk in dense traffic waters. Moreover, results show that the average inference time is 149.5 ms per scene with a processing throughput of 214 frames per second. This indicates that the proposed framework is well suited for real-time deployment in operational maritime supervision systems, such as Vessel Traffic Services (VTS).
Nevertheless, the forecasting performance is currently dependent on the quality of single-source AIS data. Therefore, future work will focus on integrating multi-source sensing data (e.g., radar) to improve robustness. Furthermore, integrating the proposed method with the collision avoidance decision-making functions of MASS would be an interesting direction for future research.