Next Article in Journal
Adaptive Dynamic Prediction-Based Cooperative Interception Control Algorithm for Multi-Type Unmanned Surface Vessels
Next Article in Special Issue
Dual-View Entropy-Driven AIS–Sonar Fusion for Surface and Underwater Target Discrimination
Previous Article in Journal
Delay-Compensated EKF and Adaptive Delay Threshold Weighting for AUV–MDS Docking
Previous Article in Special Issue
Hybrid Regularized Variational Minimization Method to Promote Visual Perception for Intelligent Surface Vehicles Under Hazy Weather Condition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MF-GCN: Multimodal Information Fusion Using Incremental Graph Convolutional Network for Ship Behavior Anomaly Detection

1
Tianjin Research Institute for Water Transport Engineering, M.O.T., Tianjin 300456, China
2
School of Electrical Automation and Information Engineering, Tianjin University, Tianjin 300072, China
3
DaGukou Maritime Safety Administration, Tianjin 300211, China
4
Transportation Development Center of Henan Province, Zhengzhou 450003, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2026, 14(1), 87; https://doi.org/10.3390/jmse14010087
Submission received: 28 November 2025 / Revised: 26 December 2025 / Accepted: 30 December 2025 / Published: 1 January 2026
(This article belongs to the Special Issue Emerging Computational Methods in Intelligent Marine Vehicles)

Abstract

Ship behavior anomaly detection is critical for intelligent perception and early warning in complex inland waterways, where single-source sensing (e.g., AIS-only or vision-only) is often fragile under occlusion, illumination variation, and signal noise. This study proposes MF-GCN, a multimodal (heterogeneous) information fusion framework based on an Incremental Graph Convolutional Network (IGCN) to detect and warn anomalous ship behaviors by jointly modeling AIS, video imagery, LiDAR point clouds, and water level signals. We first extract modality-specific features and enforce temporal–spatial consistency via timestamp and geo-referencing alignment, then construct an evolving graph in which nodes represent multimodal features and edges encode temporal dependency and semantic similarity. MF-GCN integrates a Semantic Clustering-based GCN (S-GCN) to inject historical semantic context and an Attentive Fusion-based GCN (A-GCN) to learn dynamic cross-modal correlations using multi-head attention. Experiments on our constructed real-world datasets demonstrate that MF-GCN achieves accuracies of 93.8%, 93.8%, and 93.3% with F1-scores of 93.6%, 93.6%, and 93.3% for ship deviation warning, bridge-crossing warning, and inter-ship collision warning, respectively, consistently outperforming representative baselines. These results verify the effectiveness of the proposed method for robust multimodal anomaly detection and early warning in inland-waterway scenarios.

1. Introduction

Ship behavior prediction serves as essential technical support for forecasting and early warning of maritime accidents, significantly improving the efficiency of maritime traffic supervision and reducing traffic risks. However, ship motion is influenced by numerous factors, and the challenges of information transmission at sea lead to noise in AIS data, making ship trajectory prediction more difficult compared to other fields [1]. With increasing maritime traffic density, the navigational environment has become more complex, demanding higher standards for maritime safety oversight [2]. In this context, the detection of anomalous ship behavior has become a vital tool for ensuring safety and combating illicit activities at sea [3]. By leveraging algorithms to effectively identify deviations in ships’ navigation and behavioral patterns, this approach enhances overall waterway safety and strengthens the capacity to monitor and issue early warnings against potential violations. It provides an essential scientific and technical foundation for intelligent maritime governance.
An anomaly refers to an unusual occurrence or a deviation from the norm. Normal ship behavior typically involves movements or patterns consistent with standard maritime operations, such as steady navigation, docking, and routine port activities. In contrast, anomalous behavior is characterized by significant deviations from these expectations—for instance, erratic course changes, the disappearance of a vessel’s track, or unusual navigation routes. Such behaviors often indicate potential illegal activities or breaches of maritime safety regulations. Traditional methods for detecting ship anomalies primarily relied on manual monitoring by land-based personnel. However, with continuously growing vessel traffic and increasingly complex environmental factors, manual monitoring faces numerous limitations and is struggling to meet the demands of modern, intelligent maritime management [4].
This paper focuses on multimodal data required for ship behavior perception. These datasets, oriented toward the same task or object, are derived from diverse perspectives or domains, with each modality offering distinct advantages and inherent limitations. Furthermore, underlying correlations exist among the multimodal data. By implementing complementary fusion of multimodal data and exploiting the intrinsic interrelationships across modalities, this approach enhances the robustness and accuracy of the integrated system.
(1)
Multimodal Data Fusion
Current ship behavior anomaly detection primarily relies on AIS (Automatic Identification System), video surveillance, radar, and other modalities [5]. The AIS broadcasts a ship’s static and dynamic information via a time-division self-organized dynamic network to nearby shore-based stations or other AIS-equipped vessels. While widely adopted, AIS inherently depends on the installation and proper operation of onboard terminals, rendering it a “passive” monitoring mechanism [6]; video surveillance offers intuitive, reliable, and information-rich capabilities at a cost-effective rate, particularly in inland waterways. However, as unstructured data, video feeds only provide visual ship presence without granular ship-specific metadata. Additionally, video quality is highly susceptible to environmental factors such as illumination variations [7]; maritime radar detects and tracks ships by analyzing target echoes, with widespread applications in coastal ports. Nevertheless, its utility in inland waterways is limited due to signal obstruction from mountainous terrain and curved shorelines along narrow channels [8]; LiDAR (Light Detection and Ranging), typically deployed at waterway checkpoints or navigable bridges, enables the precise measurement of ship distances and 3D exterior profiles. However, its operational range is constrained, hindering long-distance detection. Moreover, LiDAR point cloud data often exhibits sparsity and spatial sampling inhomogeneity [9]. In summary, single-source detection methodologies for ship anomaly behavior suffer from inherent limitations in data completeness and reliability [10]. A comparative analysis of these sensor modalities is summarized in Table 1.
(2)
Ship Anomaly Detection Technologies
Current research in ship anomaly detection primarily focuses on AIS data-driven methodologies, encompassing classification/clustering algorithms, geometric feature extraction, and deep learning innovations [11]. Classification approaches, such as the K-means clustering framework by Oruc et al. [12], address class imbalance issues while eliminating reliance on inter-ship distance metrics to enhance safety response times for autonomous ships. Geometric feature-based methods, exemplified by Wijaya et al. [13], analyze trajectory redundancy and curvature to distinguish normal and anomalous ship tracks.
In recent years, with the rapid development of artificial intelligence technology, neural network-based algorithms have gained increasing attention in ship anomaly behavior detection [14,15]. Deep learning techniques further advance the field: Rong et al. [16] employ sliding window algorithms for anomaly detection, while Seong et al. [17] integrate YOLOv7 and StrongSORT algorithms within a video-based graph framework to identify deviations from normal routes, particularly in narrow coastal zones. Zhang et al. [18] explored semi-supervised learning with Graph Convolutional Networks (GCNs). Liu et al. [19] introduced a spatiotemporal multi-GCN for ship trajectory prediction, while Zhao et al. [20] developed a framework combining k-GCNs and LSTM for ship speed prediction. More recently, Wang et al. [21] proposed a deep attention-aware spatiotemporal GCN for predicting ship trajectories. A key limitation of conventional GCNs is their use of fixed weights, which forces them to treat all neighboring nodes equally. This approach hinders the model’s ability to distinguish important nodes and to effectively leverage the diverse features present in AIS data.
Despite recent progress in ship anomaly detection, two key gaps remain for inland-waterway early-warning applications. First, many existing approaches still rely on a single sensing source (typically AIS or video) [22], which limits robustness under real-world complexities such as AIS dropouts/noise, dynamic occlusion, and illumination changes. Second, even when graph-based methods are introduced, conventional GCNs often apply static neighborhood aggregation and do not explicitly model time-varying cross-modal semantic relations or support efficient online updates. These limitations hinder real-time and reliable anomaly detection in complex and evolving inland-waterway environments. The integration of multimodal data—AIS, visual feeds, and LiDAR point clouds—offers enriched feature representation, mitigates single-source limitations, and enhances detection robustness, positioning multimodal fusion as a critical frontier for improving anomaly detection efficiency and reliability [23]. To bridge these gaps, we propose MF-GCN, which constructs an incremental multimodal graph aligned in time and space and learns dynamic correlations across AIS, video, LiDAR, and water level signals for accurate detection and early warning.
The main contributions of this work are summarized as follows.
(1) Incremental multimodal graph formulation: an incremental graph construction strategy aligning heterogeneous sensor streams with water level signals to enable online anomaly detection.
(2) Two-branch fusion architecture: S-GCN for semantic clustering-based historical context injection and A-GCN for attention-driven cross-modal relation learning.
(3) Real-World Dataset and Comprehensive Evaluation: Based on the constructed real-world dataset, a comprehensive evaluation was conducted, including systematic experiments and ablation analyses on three specific warning tasks: ship deviation warning, bridge-crossing warning, and inter-ship collision warning. The results demonstrate that the proposed method consistently and significantly outperforms representative baseline models.

2. Methods

2.1. General Workflow

This study processed ship perception data (AIS, LiDAR point clouds, video imagery) along with waterway elevation data into corresponding feature representations [24]. LiDAR-derived features encapsulate the 3D spatial information of ships and their surrounding environments, while AIS features include real-time positional coordinates, velocity, and heading. Video data provides ship contour dimensions and relative positioning, and waterway elevation data is analyzed to extract water level trends and fluctuations. Initial preprocessing of video, LiDAR, and AIS data extracted critical ship attributes such as position, speed, and geometric profiles. Subsequently, LiDAR, video, and AIS features were temporally and spatially aligned with water level data using synchronized timestamps and geospatial coordinates. This alignment ensured spatiotemporal consistency across different modalities. As a result, it generated fused multimodal feature datasets [25]. These integrated datasets unified 3D spatial context, kinematic trajectories, real-time dynamics, and environmental hydrology. Finally, the fused multimodal features were input into MF-GCN, as illustrated in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5. MF-GCN leveraged these comprehensive features to detect anomalies such as ship deviations, collisions, and bridge strikes, thereby enhancing navigational safety and operational reliability.

2.2. MF-GCN Architecture

The Incremental Graph Convolutional Network (IGCN) can be represented as a sequence of graphs: g 1 V 1 , E 1 , g 2 V 2 , E 2 , g 3 V 3 , E 3 , …, where g t V t , E t denotes the graph at time t. Nodes in the graph represent multimodal feature data, with V t being the node set and E t being the edge set. Here, V t = v i i = 1 N denotes the set of nodes at time t, where v i corresponds to the feature vector of sensor i [26]. In E t , an undirected edge v i , v j represents the relationship between sensors i and j . Edges are defined in two ways:
Temporally Ordered Edges: When sensor   i   and   j   exhibit explicit temporal dependencies (e.g., sensor i ’s data v i precedes sensor j ’s data v j ), the edge weight is set to 1. To avoid reliance on hardware-level precise synchronization, we adopt a soft synchronization strategy based on GPS timestamps to achieve controllable alignment. All sensors are calibrated to a unified UTC time reference. Using the sensor with the lowest sampling rate—typically AIS—as the reference timeline, we perform nearest-neighbor interpolation for higher-rate modalities at each AIS timestamp. Specifically, at each reference time *t*, the frame or point cloud whose timestamp is closest to *t* is selected for alignment. Additionally, a temporal deviation threshold is applied for quality control. Given the relatively slow movement speed of ships, we ensure that alignment errors remain consistently within 100 ms, which satisfies the precision requirements for behavioral analysis.
Semantic Edges: For nodes v i and v j without temporal order, edge weights are computed using cosine similarity between node features. To mitigate noise from weakly correlated edges, edges with similarity below a threshold of 0.75 are pruned [27].
This framework dynamically constructs graph structures at any timestep based on node features and edge weights [28].
The IGCN captures temporal evolution and cross-modal correlations, enabling more effective modeling of dynamic multimodal data compared to traditional architectures [29,30]. By treating anomaly detection as a node classification task, IGCN identifies events through integrated spatiotemporal patterns [31].
LiDAR detects ship targets by emitting laser beams and generates three-dimensional point cloud data by measuring multi-line distances between the laser probes and the ship targets. In this paper, LiDAR point cloud data processing is accomplished through four components: Constraint of Action Range, Constraint of Echo Intensity, Constraint of Geometric Shape, and Feature Extraction of Ship Targets.
By utilizing the Optimization of Detection Network Architecture, Fusion of Scale Diversity, Design of Prior Anchor Boxes, and Attention Feature Extraction Network from our prior research on ship object detection in low-illumination environments [7], the model can more accurately extract ship features from video image data.
AIS data is a type of structured data. However, due to communication or hardware failures, raw AIS data may contain some aberrant data and usually requires certain cleaning before it can be properly used. AIS data cleaning primarily involves identification data duplication, position data drift, missing trajectory data, and abnormality of steering data.
In article [32], we incorporated a multi-head attention mechanism into the GRU model, enabling it to assign feature weights to critical factors such as temporal and spatial elements in waterway water level sequence data. This approach allows the model to focus on the key factors influencing changes in waterway water levels.
A multimodal data fusion method is proposed, which integrates LiDAR point cloud, video image, and AIS data to extract key ship features such as position, speed, and shape, and then calibrates and correlates them with corresponding spatiotemporal waterway water level information. This method treats water level data as a dynamic environmental background, combining real-time ship status with environmental conditions through spatiotemporal matching to analyze the impact of water level changes on ship navigation. This fusion architecture enables more accurate monitoring of ship status, assists in adjusting navigation strategies in response to water level variations, and ultimately enhances the reliability of ship monitoring and navigation safety. The specific fusion architecture is shown in Figure 5.
Key Challenges and Solutions:
(1) Temporal Dependency Propagation: how to discover latent inter-node correlations at time t and propagate them to t + 1.
(2) Incremental Feature Learning: how to leverage prior temporal data to guide feature extraction for new nodes in evolving graphs.
To address these challenges, the proposed MF-GCN employs two specialized GCNs: Semantic Clustering-based GCN (S-GCN) identifies latent correlations through semantic similarity clustering, and Attentive Fusion-based GCN (A-GCN) dynamically fuses features using attention mechanisms. As illustrated in Figure 6, S-GCN and A-GCN operate synergistically to enhance detection robustness. Subsequent sections detail their designs.

2.3. Branch Network I: Semantic Clustering-Based GCN (S-GCN)

This subsection introduces the workflow of S-GCN, which focuses on learning newly acquired ship features by leveraging semantic correlations between ship attributes and water level data. S-GCN first clusters historical ship features based on sensor-specific semantics, reconstructs an enhanced graph, and then performs graph learning to derive refined node features that embed sensor-level contextual information [33]. The workflow is illustrated in Figure 7.
Consider a system with K sensors. For the IGCN, the graph g t = V t , E t represents the monitoring state at time t . The vertex set V t 1 from time t 1 is aggregated into K clusters, one per sensor, via mean pooling operations. The fused feature for the k -th sensor is denoted as C k t 1 . Edges between fused vertices C k t 1 and new vertices v i t are recomputed based on cosine similarity between their feature vectors. The updated graph G t V t , E t is derived as follows:
V t = V t 1 , V t n e w
V t = C K t 1 , V t n e w = c 1 t 1 , c 2 t 1 , , c K t 1 , V t n e w
where V t n e w denotes the set of newly added nodes at time t , and V t 1 represents the node set from time t 1 . The fused cluster set C K t 1 , derived from K sensors via mean pooling, combines with V t n e w to form the updated node set V t .
The edge set E t is computed as
E t = E t 1 , E t n e w = E t 1 , E t i n t r a , E t i n t e r
E t = c m t 1 , c n t 1 c C K t 1 , c m t 1 , v v V t n e w , E t i n t r a
Here:
E t i n t r a : Intra-edges among nodes in V t n e w ;
E t i n t e r : Inter-edges connecting V t n e w to existing nodes;
c m t 1 , c n t 1 : Edges between fused sensor clusters;
c m t 1 , v : Edges between fused clusters and new nodes.
S-GCN operates on G t V t , E t to refine node features by integrating ship attributes and water level data. The graph convolution is formulated as
h k i l = ρ A ~ i k W k l y i l + b k l
where
A ~ i k : Normalized adjacency matrix for the k -th sensor cluster;
W k l ,   b k l : Trainable weights and biases at layer l;
ρ : Non-linear activation function.
This process enriches new nodes with semantic context from historical clusters, enhancing detection accuracy by mitigating feature sparsity and noise [34].

2.4. Branch Network II: Attentive Fusion-Based GCN (A-GCN)

As perception data is continuously collected, the number of nodes in the IGCN increases dynamically over time. At any timestep t , this section employs an Attentive Fusion GCN (A-GCN) with multi-head attention mechanisms to explore latent correlations between multimodal data [35]. The workflow is illustrated in Figure 8, and the methodology involves three key steps:
The multi-head attention mechanism captures diverse latent relationships between nodes by computing parallel attention heads, generating multiple feature representations for each node. These representations comprehensively reflect complex interdependencies among multimodal data [36]. During graph learning, GCN updates node features to better encapsulate the system’s global state at time t . The fused features provide robust support for subsequent event detection and early warning.
While edge weights in the original graph g t V t , E t are computed via cosine similarity, such static measures fail to capture nuanced semantic relationships between sensor data. To address this, multi-head attention augments the original graph by generating n fully connected subgraphs [37]. For each attention head s s = 1,2 , , n : initialize paired transformation matrices W s K ( q u e r y ) and W s Q ( k e y ) .
The standard multi-head attention is defined as
A t t e n t i o n Q , K , V = s o f t m a x Q K T d k V
where   Q , K , and V represent query, key, and value matrices, respectively, and d k is the scaling factor. Multiple heads (n) enable diverse relational modeling, mitigating biases from single-attention perspectives [38].
To adapt this mechanism for graph augmentation, we substitute V with the adjacency matrix A , yielding
A ~ s = s o f t m a x Q W s Q × K W s K T d k A
Here,   Q = K = H t (node features), and A   is the original adjacency matrix of g t . Each A ~ s generates a fully connected subgraph g t , encoding distinct semantic relationships. Replacing V with A preserves the graph’s structural priors while injecting attention-driven semantics.
The n fully connected subgraphs generated via multi-head attention undergo iterative feature refinement through graph convolutional layers [39]. Let i denote the index of a vertex in the graph. To enhance vertex embedding, the update of vertex i at layer   l   incorporates both its initial features and aggregated historical embeddings from preceding layers, formulated as
y i l = v i ; h i 1 ; ; h i l 1
where
y i l : Concatenated feature vector of vertex i   at layer l , combining its raw input feature v i with embeddings from layers 1 to l 1 .
h i k : Embedding of vertex   i   at layer k .
The residual feature concatenation (Equation (8)) preserves historical context to mitigate gradient vanishing and enhance feature stability across deep layers.
For each vertex i , the feature update at layer   l integrates information from all n subgraphs:
h s i l = ρ j = 1 n A ~ i j ( s ) W s l y i l + b s l
where
s 1,2 , , n : Index of the subgraph.
A ~ s : Attention-augmented adjacency matrix of the s-th subgraph.
W s l , b s l : Trainable weight matrix and bias term for the s-th subgraph at layer l .
ρ : ReLU activation function.
Following graph embedding, each node obtains n distinct embeddings from the n subgraphs. To unify these representations, mean pooling is employed to fuse the embeddings, producing a consolidated feature vector for each node:
Z i = 1 n s = 1 n h s i l
This fusion step finalizes the multimodal association and integration within the GCN framework.
The final detection stage concatenates features from both branch networks: S-GCN (Captures water level and ship attribute correlations) and A-GCN (Encodes cross-modal attention dynamics). The concatenated features [ Z S G C N ; Z A G C N ] are processed through a fully connected layer followed by a softmax classifier for anomaly prediction algorithm.

2.5. Training and Testing Procedures

The objective of the training process is to learn the parameters 1 of the S-GCN model and 2 of the A-GCN model. The pseudocode is summarized in Algorithm 1, where
f t U : S-GCN function leveraging ship attributes and water level data.
f t S : A-GCN function for cross-modal feature integration.
Algorithm 1: Training process
Input: Graph sequence: g t V t , E t , t = 1 , , K , Initial parameters 1 & 2
Output: The learned GCN parameters 1 & 2
1:  for  t = 1 K  do
2:    if  t = 1  then;
3:       g v , e = g t V t , E t ;
4:       1 = a r g m a x 1 f t U g , ;
5:       1 = 1 ;
6:       t = t + 1 , Break;
7:    else
8:       g v , e = g t V t , E t ;
9:       V t V t by Equation (1);
10:     E t E t by Equation (2);
11:     g t v , e = G t V t , E t ;
12:     1 , 2 = a r g m a x 1 , 2 f t U g , 1 + f t S g , 2
13:     1 = 1 , 2 = 2 ;
14:     t = t + 1 ;
15:  return The final parameters: 1 & 2 ;
During inference, the framework processes incremental sensor data at each timestep to perform real-time event detection. The pseudocode for this phase is outlined in Algorithm 2:
Algorithm 2: Testing process
Input: Graph sequence: g t V t , E t , t = 1 , , K , The trained parameters 1 & 2
Output: The label prediction for each vertex
1:  for  t = 1 K  do
2:    if  t = 1  then;
3:       g v , e = g t V t , E t ;
4:       V n e w = f t U g , 1 ;
5:       Y = s o f t m a x f c V n e w ;
6:       t = t + 1 , Break;
7:    else
8:       g v , e = g t V t , E t
9:       V t V t by Equation (1);
10:     E t E t by Equation (2);
11:     g t v , e = G t V t , E t ;
12:     V 1 n e w = f t U g , 1 ;
13:     V 2 n e w = f t S g , 2 ;
14:          V n e w = V 1 n e w ; V 2 n e w ;
15:     Y = s o f t m a x f c V n e w ;
16:          t = t + 1 ;
17:  return The final parameters: 1 & 2 ;
During each timestep, the framework focuses exclusively on the newly added node set V n e w to predict event labels for incremental perception data. f denotes classic graph convolution operation using pre-trained parameters 1 and 2 . f c denotes fully connected layer mapping fused features to event labels. Y is predicted labels for new sensor data.

3. Experiments and Results

3.1. Dataset Construction

Common anomalous ship behaviors in inland waterways include navigation deviations, collisions, overloading, and obscured ship identification [40]. Due to the absence of public datasets for inland ship anomaly detection, this study collected multi-source data from the Jiangsu section of the Yangtze River, Hujiashen Line, and Hangshen Line (Jiaxing section) to construct a dedicated Ship Anomaly Event Detection Dataset. The dataset supports three critical tasks: ship deviation warning, ship bridge-crossing warning, and inter-ship collision warning. Data acquisition covered diverse environmental conditions (daytime, nighttime, and dusk) [41]. A total of 500 multimodal event segments were constructed, including 47 abnormal segments and 453 normal segments. Each segment spans approximately 5 min and contains aligned AIS records, video frames, LiDAR point clouds, and water level measurements with one-to-one correspondence to the same ship event. Although abnormal events are inherently rare in real inland waterways, the proposed MF-GCN does not treat each 5 min segment as a single independent sample. Instead, the incremental graph formulation discretizes each segment into a sequence of time increments and performs event learning and prediction in a node-/timestep-wise manner. Therefore, each segment contributes multiple supervised graph updates (approximately 300/T updates for a 5 min segment under time increment interval T), which increases the effective number of learning instances beyond the raw segment count while preserving event-level labeling consistency. The sensor specifications used are as follows:
Video (HIKVISION, Hangzhou, China): 1920 × 1080 resolution @ 30 fps [42];
LiDAR (RoboSense, ShenZhen, China): 192-line scanning, 300 m operational range, ±3 cm accuracy at maximum distance [43].
AIS (Dalian Ninghang Communication& Navigation Co., Ltd., Dalian, China): Dual-channel (161.975/162.025 MHz) reception compliant with international standards, providing dynamic (position, speed) and static (ship ID, dimensions) data [44].
A total of 500 multimodal event-type samples were collected in the observation area, including 47 positive samples (instances with annotated anomalous events) and 453 negative samples (normal operation status). Each sample has an average duration of approximately 5 min, and each sample segment contains four types of modal data: video data, point cloud data, AIS data, and water level data, which are one-to-one corresponding to ship events.
(1)
Ship Deviation Warning Detection
Ship deviation warning detection primarily targets anomaly behaviors such as deviation from the planned waterway or deviation from the buoy area during ship navigation. Part of the dataset is shown in Figure 9. For ship deviation warning detection, dataset collection was conducted at Hongxitang East Bridge and Xiadianmiao Bridge on the Hangshen Line Waterway, as well as the waterway upstream of Nanjing Yangtze River Bridge, with the data collection period ranging from 19 February to 2 July 2024.
(2)
Ship Bridge-Crossing Warning Detection
For ships passing through bridges, full-course monitoring is required to prevent anomaly behaviors such as deviating from the planned route and colliding with bridge piers or decks. Part of the dataset is shown in Figure 10. For ship bridge-crossing warning detection, dataset was collected at Beiyue Bridge on the Hangshen Line Waterway, Dongcai Bridge on the Hujiashen Line Waterway, and Yuhui Bridge on the Yechi Waterway, spanning from 6 March to 2 July 2024.
(3)
Inter-Ship Collision Warning Detection
Inter-Ship collision warning primarily targets anomaly behaviors such as excessively close points or the closest point of approach between multi-target ships during navigation, which may easily lead to ship collisions. Part of the dataset is shown in Figure 11. For inter-ship collision warning detection, field data acquisition was carried out in the waters near Nanjing Baguazhou Bridge over the Yangtze River, the navigation sections both upstream and downstream of Nanjing Yangtze River Bridge, and the waters adjacent to the Longtan Navigation Mark, with the data collection window from 8 May to 2 July 2024.

3.2. Event Detection Performance Evaluation

The MF-GCN algorithm was implemented using the PyTorch 1.9 deep learning framework in this study. A sliding window size was set to five to obtain optimal experimental results. During training, the Adam optimizer was employed, with a weight decay of 0.0001 and a learning rate of 0.0001. Network parameters were randomly initialized at the beginning of training. An ε-greedy strategy was adopted during training, with a total of 30 epochs conducted. In the first 10 epochs, ε was fixed at 1.0 to allow the agent to gradually learn the model parameters. Thereafter, ε was set to 0.1, enabling the agent to adjust model parameters based on learning experiences from its own decisions [45]. Parameter updates were performed using stochastic gradient descent and backpropagation algorithms, with Dropout regularization applied during training. All experiments were conducted on a server configured with an NVIDIA 3090 GPU (Santa Clara, CA, USA) and an Intel i9 CPU (Santa Clara, CA, USA) [46].
To ensure fair evaluation for imbalanced anomaly detection, we report Accuracy, Precision, Recall, and F1-score in our comparisons (Table 2, Table 3 and Table 4) [47]. In particular, Precision/Recall and F1-score are more informative than Accuracy when abnormal events are rare. In addition, since each event segment is discretized into multiple timesteps under the incremental graph formulation, data splitting is performed at the event-segment level to avoid temporal leakage (i.e., all timesteps derived from the same 5 min segment are assigned to the same subset), and the abnormal/normal ratio is kept consistent across subsets to the extent possible. The proposed method is evaluated on three warning tasks (ship deviation warning, ship bridge-crossing warning, and inter-ship collision warning) and compared with representative state-of-the-art methods; the experimental results are summarized in Table 2, Table 3 and Table 4.
MF-GCN is designed for real-time monitoring via incremental inference. At each timestep, the framework focuses on the newly added node set and its local connections, rather than recomputing representations over the entire historical graph. For a typical GCN layer with feature dimension d, the message-passing cost is proportional to the number of edges involved in the current update (approximately O ( E sub d ) ), where E sub denotes the subgraph edges participating in the incremental update. Because a fixed sliding window is adopted in our implementation, the size of the subgraph participating in each update is bounded, and the per-step computation remains stable over long monitoring periods. This incremental-update property contrasts with full-graph recomputation, whose cost grows with the accumulated graph size, and therefore supports scalable deployment in continuous inland-waterway surveillance settings.
First, the proposed model was evaluated on the ship deviation warning detection dataset, with results shown in Table 2. In the ship deviation warning detection task, the proposed method demonstrated significant advantages in all key metrics. Specifically, the method achieved an accuracy of 93.8%, a precision of 93.5%, a recall of 93.7%, and an F1 score of 93.6%, comprehensively outperforming other advanced models. Our model not only led by more than 2 percentage points in accuracy but also exceeded others by 2.2% and 2.3% in precision and recall, respectively, demonstrating its comprehensive advantages in reducing false alarms and capturing true deviation events and verifying its effectiveness and superiority.
Next, the results on the ship bridge-crossing warning detection dataset are shown in Table 3. In the ship bridge-crossing warning detection task, the proposed method still demonstrated excellent performance in accuracy and F1-score. The method achieved an accuracy of 93.8% and an F1-score of 93.6%, significantly outperforming other models.
Finally, the results on the inter-ship collision warning detection dataset are shown in Table 4. In the inter-ship collision warning detection task, the proposed method also demonstrated significant advantages in all evaluation metrics, achieving an accuracy of 93.3%, a precision of 93.8%, a recall of 92.8%, and an F1-score of 93.3%, representing significant improvements over other models. The performance improvement in the proposed method is attributed to the effective modeling of multimodal information correlations, enabling the model to more accurately detect events in complex multi-sensor data environments and verifying the effectiveness and superiority of the proposed method in ship warning detection tasks.

3.3. Ablation Experiments

To better explore the effectiveness of each component in the MF-GCN algorithm, an ablation study was conducted in this section. The experimental results for ship deviation warning detection are shown in Table 5, where “−” indicates the removal of the corresponding component and “+” indicates its inclusion in the event detection and warning task.
The following conclusions can be drawn from the experimental results:
(1) A-GCN Outperforms S-GCN: A-GCN achieved a higher F1 score than S-GCN, because each sensor data provides more semantic information about new data, whereas S-GCN primarily focuses on the semantic-level correlations between sensor data. This study only used a mean-based method to fuse information from multiple sensors, leading to some information loss, which will be addressed in future work.
(2) Combined Model Achieves Optimal Results: The combined model demonstrated the best performance, indicating that the attention-based fusion GCN can compensate for the limitations of the semantic clustering GCN. Additionally, the multimodal information GCN provides a novel perspective for generating new data embeddings and captures information overlooked by the semantic clustering GCN.

3.4. Impact of Time Increment Interval on Model Performance

The effect of the time interval parameter   T   on model performance was investigated in this section. The time interval T determines the frequency of adding new perceptual data to the new graph g t n e w , i.e., the frequency of adding new nodes to the graph structure. The optimal time interval parameter was selected through comparative experiments. In the experiments, T was set to 5 s, 10 s, 20 s, and 30 s in sequence, with results shown in Table 6.
In Table 6, the column “S-GCN” represents the accuracy of ship deviation warning detection using only the S-GCN, and the column “A-GCN” represents the accuracy using only the A-GCN, while “Accuracy” and “F1” denote the experimental results of using both GCN layers simultaneously. The experimental results show that event detection performance increases within a certain range as the time interval increases. When T = 5 s, the overall model achieved an accuracy of 92.7%.
Model performance improved as T increased, reaching its optimum at T = 10 s with an accuracy of 93.8%. However, when T > 10 s, performance declined because the frequency of new sensor data added to the graph was too low to capture sufficient information, affecting graph learning.
The performance of S-GCN increased within a certain range as the update frequency of node numbers in the graph increased, but more new nodes also introduced more invalid potential correlations. Due to the lack of semantic information in these new nodes, they could not provide effective influence, thus negatively impacting graph embedding operations. Therefore, excessive perceptual data does not provide additional effective information for event detection.
A-GCN also demonstrated good performance, achieving 92.5% accuracy at T = 10 s, as the new graph g t n e w considered the correlations between multimodal information. However, when T > 10 s, model performance did not significantly improve because the new data update frequency was too low to provide sufficient dynamic information. At T = 5 s, more new nodes increased the impact of new sensor data, which introduced uncertainty and affected final performance—consistent with the behavior of S-GCN. Therefore, it is necessary to concatenate the feature vectors of S-GCN and A-GCN to leverage their combined advantages.

3.5. Impact of Multimodal Information Volume on A-GCN Performance

The influence of the number of graphs in the multi-head attention mechanism of A-GCN, i.e., the volume of multimodal information, was further discussed in this section [55]. This parameter affects the update of node features during graph convolution, with the number of graphs denoted as parameter n. To determine the optimal parameter, experiments were conducted to explore the impact of n on model performance. Parameter n was varied from 2 to 10, and the results of ship deviation warning detection experiments are shown in Figure 12.
As shown in Figure 12 as n increased from 2 to 10, the performance of A-GCN generally exhibited an initial increase followed by a decline. When n = 2, the model achieved an accuracy of 90.5%. Performance gradually improved with increasing n, peaking at n = 8 with an accuracy of 93.8%. However, performance began to decline when n > 8. This phenomenon can be attributed to two main factors:
(1) Increased Potential Relationships: More fully connected graphs provide additional potential correlations between sensor data for feature learning, leading to improved experimental results as n increases.
(2) Introduction of Noise: As the number of graphs increases, excessive potential relationships introduce noise, negatively impacting feature updates and causing performance degradation at higher n.
In summary, the multi-head attention mechanism effectively enhances latent correlations in embeddings, but performance does not strictly improve with increasing graph count.

3.6. Impact of Fusion Methods on S-GCN Performance

The influence of different fusion methods in S-GCN on the overall performance was discussed in this section. The mean pooling method was employed to generate sensor features in the new graph g t n e w . Additionally, other information fusion methods, such as max pooling and min pooling were conducted on ship deviation warning detection to evaluate their effects [56]. The experimental results are shown in Table 7.
During the experiments, the mean pooling method was first used to fuse data from different sensors, and the results showed that it performed best in terms of accuracy and F1 score, reaching 93.8% and 93.6%, respectively. This is because mean pooling can more evenly integrate data from each sensor, smooth outliers, and thus provide a more stable feature representation. Then, the max pooling method was tested, yielding an accuracy of 92.9% and an F1 score of 92.8%. The max pooling method tends to select the maximum value from each sensor’s data, which can capture significant features in some cases but may also ignore partial detailed information, resulting in slightly inferior performance compared to mean pooling. Finally, experiments with the min pooling method showed accuracies and F1 scores of 91.8% and 91.2%, respectively. The min pooling method tends to select the minimum value from each sensor’s data, which has certain advantages in handling noise and outliers but has poor overall feature representativeness; hence, its performance is inferior to the first two methods.
Based on these experimental data, the mean pooling method was found to be the most effective among different fusion methods, as it balances the influence of data from each sensor and provides a more reliable and stable feature representation. Therefore, the mean pooling was ultimately selected as the primary fusion method to improve the accuracy and reliability of ship event warning detection.

3.7. Impact of Edge Computation Formulas on Model Performance

This subsection investigates how edge computation formulas affect the proposed model. The similarity metric determining edge weights directly influences feature learning and model efficacy. We conducted experiments comparing three common similarity measures: Euclidean Distance (ED), Cosine Distance (CD), and Manhattan Distance (MD). Results are shown in Figure 13.
Experimental results (Figure 13) reveal that the choice of similarity metric, ED, CD, or MD, exerts only marginal influence on the model, with all three achieving >92% accuracy and F1-score (CD: 93.8%/93.6%, MD: 92.2%/92.6%, ED: 92.7%/92.3%). Notably, CD delivers optimal performance due to its directional sensitivity and invariance to feature magnitude, which better captures semantic relationships in high-dimensional multimodal data. Consequently, CD is adopted for edge weight computation in the final framework to maximize detection accuracy and stability while maintaining robustness against metric variations.

3.8. Visualization Analysis

The proposed MF-GCN was employed to visualize ship anomaly alerts. For a selected video segment, the algorithm analyzes each frame to output probabilities of ship deviation, ship bridge-crossing, and inter-ship collision, as demonstrated in Figure 14.
For ship deviation warning, four sequential frames (first row in Figure 14) show increasing deviation probabilities: 55% → 66% → 82% → 92%. This progression indicates escalating deviation risk, culminating in near-collision with a navigation buoy, likely caused by erroneous course adjustments. Such visual analytics enable real-time perception of deviation trends, providing critical alerts for maritime supervision, especially in narrow channels where timely corrections prevent hazardous entries.
For ship bridge-crossing warning, the probabilities of bridge collision across frames are 37% → 67% → 62% → 5% (second row in Figure 14). The fluctuating risk reflects dynamic ship-bridge distance variations influenced by ship speed, heading, and water levels. High-risk periods (e.g., 67%) occur during bridge approach/transit, while post-transit risk dissipates (5%). This capability enhances situational awareness for infrastructure safety management.
For inter-ship collision warning, collision probabilities rise from 35% → 55% → 65% → 69% (third row in Figure 14), signaling decreasing ship separation due to converging paths or operational errors. Visual trajectory overlays clarify spatial relationships and motion trends, enabling proactive collision avoidance.
The visualization framework intuitively quantifies three critical risks—ship deviation, bridge-crossing, and inter-ship collision—through probability visualization. By translating multimodal fusion results into actionable spatial–temporal insights, the system significantly enhances navigational safety decision making for maritime authorities.

4. Discussion

4.1. Interpretation of Results and Why MF-GCN Works

The experimental results across three inland-waterway warning tasks demonstrate that MF-GCN consistently outperforms representative baselines. In particular, MF-GCN achieves 93.8% accuracy for both ship deviation warning and bridge-crossing warning and 93.3% accuracy for inter-ship collision warning, with corresponding F1-scores of 93.6%, 93.6%, and 93.3%. This advantage is largely attributed to the complementary nature of heterogeneous sensing: AIS provides structured kinematic attributes; video offers intuitive visual cues; LiDAR supplies accurate geometric distance and 3D profile information; and water level signals capture environmental dynamics that can affect clearance and risk near bridges. By aligning these sources and learning their correlations within an incremental graph, MF-GCN reduces the failure modes of single-modality systems under occlusion, illumination changes, or AIS noise. The ablation study further validates the necessity of combining semantic context modeling (S-GCN) and attention-driven cross-modal fusion (A-GCN), while parameter studies show stable performance within a reasonable range of settings.

4.2. Limitations and Failure Modes

Although MF-GCN enhances the robustness of the system, several limitations remain. Under extreme adverse weather conditions that simultaneously affect multiple sensors, dense fog can significantly attenuate LiDAR returns, and heavy rain can severely degrade video visibility, potentially leading to a decline in model performance. Although the current dataset is collected from multiple waterways and covers various lighting conditions, it still reflects the practical scarcity of real abnormal event samples in real-world scenarios, meaning that rare edge cases may not be sufficiently represented. Additionally, cross-region generalization may be impacted by multiple forms of domain shifts, including differences in camera viewpoints, background clutter, variations in vessel distribution characteristics, and local hydrodynamic conditions.

4.3. Scalability and Practical Deployment Considerations

A key practical advantage of MF-GCN is its incremental inference mechanism. Instead of rebuilding a full graph when new sensor data arrives, the framework focuses on newly added nodes and their local connections at each timestep, which is more suitable for real-time monitoring. For deployment at scale, engineering factors include reliable multi-sensor timestamping and soft synchronization, communication latency and packet loss (especially for AIS), and edge–cloud collaboration. Model compression and hardware-aware optimization can further reduce computational requirements and facilitate deployment on mid-range GPUs or high-performance embedded devices.

4.4. Future Work: Simulation-Based Data Augmentation and Continual Learning

To address the limited availability of rare abnormal events, a promising direction is simulation-based dataset generation. We plan to construct a digital-twin inland-waterway environment to generate controllable abnormal scenarios including collision, grounding, and bridge-strike events under diverse weather and traffic conditions. Such synthetic data can complement real-world observations and improve coverage of edge cases. In addition, MF-GCN is compatible with continual learning: newly collected labeled events can be incrementally incorporated to update the model over time.

5. Conclusions

The experimental results validate the effectiveness of MF-GCN across three tasks: ship deviation warning, bridge-crossing warning, and inter-ship collision warning. The method significantly outperforms existing models, achieving accuracies of 93.8%, 93.8%, and 93.3%, respectively. Additionally, the study investigates the impact of key parameters on model performance, including the time interval T, the number of graphs in the multi-head attention mechanism n, fusion methods, and edge weight calculation formulas. The results indicate that the algorithm achieves optimal performance when T = 10, n = 8, mean pooling is applied for fusion, and cosine distance is selected for calculating edge weights. These findings confirm the accuracy and effectiveness of MF-GCN in complex scenarios, providing new insights for multimodal data fusion and dynamic feature learning. This study not only proposes an innovative multimodal graph fusion framework at the algorithmic level, advancing the development of heterogeneous perceptual information fusion and spatiotemporal modeling techniques, but also holds significant practical importance:
(1) Enhancing vessel navigation safety and abnormal behavior detection capabilities in complex environments: By fusing multi-source data such as AIS, video, LiDAR, and water level information, the system can more reliably identify abnormal behaviors such as deviation and collision risks. It demonstrates stronger robustness particularly under challenging conditions like occlusion and adverse weather, providing effective technical support for real-time vessel safety assistance and autonomous decision making.
(2) Supporting intelligent dynamic maritime supervision and waterway operational safety: The proposed architecture enables real-time alignment and incremental inference of multi-sensor information, contributing to the development of a more extensive and responsive intelligent monitoring system. It offers a scalable technical solution for maritime authorities to conduct comprehensive dynamic supervision, waterway traffic scheduling, and emergency incident management, thereby enhancing the overall safety and efficiency of waterway operations.

Author Contributions

Conceptualization, R.M. and W.N.; methodology, R.M., J.Z., N.G., H.W. and A.L.; software, R.M.; validation, R.M. and W.N.; formal analysis, R.M., J.Z., W.N. and N.G.; investigation, R.M.; writing—original draft preparation, R.M., J.Z., W.N. and N.G.; writing—review and editing, R.M., J.Z., W.N., N.G., H.W. and A.L.; visualization, R.M., N.G., H.W. and A.L.; supervision, R.M.; project administration, R.M.; funding acquisition, R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Tianjin Municipal Science and Technology Plan Project (Grant No. 25ZYJDJC00050), Guangxi Key Science and Technology Special Program (Grant No. AA23062052), National Key R&D Program of China (Grant No. 2023YFB2603800), the Basic Research Fund of Central-level Nonprofit Scientific Research Institutes (No. TKS20250307).

Data Availability Statement

The datasets presented in this article are not readily available because the data are subject to copyright restrictions by the data provider and can only be accessed upon request or after the decryption period.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AISAutomatic Identification System
A-GCNAttentive Fusion-based Graph Convolutional Network
S-GCNSemantic Clustering-based Graph Convolutional Network
GCNGraph Convolutional Network
IGCNIncremental Graph Convolutional Network
LiDARLight Detection and Ranging
MF-GCNMultimodal Information Fusion using Incremental Graph Convolutional Network
CIConfidence Interval
FLOPsFloating Point Operations
FPSFrames Per Second

References

  1. Liu, J.; Shi, G.; Zhu, K. Vessel Trajectory Prediction Model Based on AIS Sensor Data and Adaptive Chaos Differential Evolution Support Vector Regression (ACDE-SVR). Appl. Sci. 2019, 9, 2983. [Google Scholar] [CrossRef]
  2. Xin, X.; Liu, K.; Loughney, S.; Wang, J.; Yang, Z. Maritime Traffic Clustering to Capture High-Risk Multi-Ship Encounters in Complex Waters. Reliab. Eng. Syst. Saf. 2023, 230, 108936. [Google Scholar] [CrossRef]
  3. Durlik, I.; Miller, T.; Dorobczyński, L.; Kozlovska, P.; Kostecki, T. Revolutionizing Marine Traffic Management: A Comprehensive Review of Machine Learning Applications in Complex Maritime Systems. Appl. Sci. 2023, 13, 8099. [Google Scholar] [CrossRef]
  4. Wang, L.; Chen, P.; Chen, L.; Mou, J. Ship AIS Trajectory Clustering: An HDBSCAN-Based Approach. JMSE 2021, 9, 566. [Google Scholar] [CrossRef]
  5. Liu, R.W.; Liang, M.; Nie, J.; Lim, W.Y.B.; Zhang, Y.; Guizani, M. Deep Learning-Powered Vessel Trajectory Prediction for Improving Smart Traffic Services in Maritime Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022, 9, 3080–3094. [Google Scholar] [CrossRef]
  6. Farahnakian, F.; Nicolas, F.; Farahnakian, F.; Nevalainen, P.; Sheikh, J.; Heikkonen, J.; Raduly-Baka, C. A Comprehensive Study of Clustering-Based Techniques for Detecting Abnormal Vessel Behavior. Remote Sens. 2023, 15, 1477. [Google Scholar] [CrossRef]
  7. Ma, R.; Bao, K.; Yin, Y. Improved Ship Object Detection in Low-Illumination Environments Using RetinaMFANet. JMSE 2022, 10, 1996. [Google Scholar] [CrossRef]
  8. Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep Learning for Time Series Anomaly Detection: A Survey. ACM Comput. Surv. 2025, 57, 1–42. [Google Scholar] [CrossRef]
  9. Ma, R.; Yin, Y.; Chen, J.; Chang, R. Multi-Modal Information Fusion for LiDAR-Based 3D Object Detection Framework. Multimed. Tools Appl. 2024, 83, 7995–8012. [Google Scholar] [CrossRef]
  10. Feng, X.; Song, R.; Yin, W.; Yin, X.; Zhang, R. Multimodal Transportation Network with Cargo Containerization Technology: Advantages and Challenges. Transp. Policy 2023, 132, 128–143. [Google Scholar] [CrossRef]
  11. Li, H.; Jiao, H.; Yang, Z. AIS Data-Driven Ship Trajectory Prediction Modelling and Analysis Based on Machine Learning and Deep Learning Methods. Transp. Res. Part E Logist. Transp. Rev. 2023, 175, 103152. [Google Scholar] [CrossRef]
  12. Furkan Oruc, M.; Altan, Y.C. Predicting the Risky Encounters without Distance Knowledge between the Ships via Machine Learning Algorithms. Expert Syst. Appl. 2023, 221, 119728. [Google Scholar] [CrossRef]
  13. Wijaya, W.M.; Nakamura, Y. Unexpected Trajectory Detection Based on the Geometrical Features of AIS-Generated Ship Tracks. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1442–1450. [Google Scholar] [CrossRef]
  14. Gamage, C.; Dinalankara, R.; Samarabandu, J.; Subasinghe, A. A Comprehensive Survey on the Applications of Machine Learning Techniques on Maritime Surveillance to Detect Abnormal Maritime Vessel Behaviors. WMU J. Marit. Aff. 2023, 22, 447–477. [Google Scholar] [CrossRef]
  15. Wang, S.; Zhang, X.; Qin, Y.; Song, W.; Li, B. Marine Target Magnetic Anomaly Detection Based on Multitask Deep Transfer Learning. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1501705. [Google Scholar] [CrossRef]
  16. Rong, H.; Teixeira, A.P.; Guedes Soares, C. A Framework for Ship Abnormal Behaviour Detection and Classification Using AIS Data. Reliab. Eng. Syst. Saf. 2024, 247, 110105. [Google Scholar] [CrossRef]
  17. Seong, N.; Kim, J.; Lim, S. Graph-Based Anomaly Detection of Ship Movements Using CCTV Videos. JMSE 2023, 11, 1956. [Google Scholar] [CrossRef]
  18. Zhang, H.; Lu, G.; Zhan, M.; Zhang, B. Semi-Supervised Classification of Graph Convolutional Networks with Laplacian Rank Constraints. Neural Process. Lett. 2022, 54, 2645–2656. [Google Scholar] [CrossRef]
  19. Liu, R.W.; Liang, M.; Nie, J.; Yuan, Y.; Xiong, Z.; Yu, H.; Guizani, N. STMGCN: Mobile Edge Computing-Empowered Vessel Trajectory Prediction Using Spatio-Temporal Multigraph Convolutional Network. IEEE Trans. Ind. Inf. 2022, 18, 7977–7987. [Google Scholar] [CrossRef]
  20. Zhao, J.; Yan, Z.; Chen, X.; Han, B.; Wu, S.; Ke, R. K-GCN-LSTM: A k-Hop Graph Convolutional Network and Long–Short-Term Memory for Ship Speed Prediction. Phys. A Stat. Mech. Its Appl. 2022, 606, 128107. [Google Scholar] [CrossRef]
  21. Wang, S.; Li, Y.; Xing, H.; Zhang, Z. Vessel Trajectory Prediction Based on Spatio-Temporal Graph Convolutional Network for Complex and Crowded Sea Areas. Ocean. Eng. 2024, 298, 117232. [Google Scholar] [CrossRef]
  22. Guo, S.; Zhang, H.; Guo, Y. Toward Multimodal Vessel Trajectory Prediction by Modeling the Distribution of Modes. Ocean. Eng. 2023, 282, 115020. [Google Scholar] [CrossRef]
  23. Cai, Y.; Xu, J.; Jiao, S. Intelligent Prediction of Urban Road Network Carrying Capacity and Traffic Flow Based on Deep Learning. IEEE Trans. Veh. Technol. 2025, 74, 2067–2079. [Google Scholar] [CrossRef]
  24. Liu, T.; Zhang, Z.; Du, P.; Wang, W.; Yan, H.; Qiang, B.; Xu, S. Recognition of Building Group Patterns Using GCN and Knowledge Graph. Geocarto Int. 2025, 40, 2436906. [Google Scholar] [CrossRef]
  25. Guo, Y.; Liu, R.W.; Qu, J.; Lu, Y.; Zhu, F.; Lv, Y. Asynchronous Trajectory Matching-Based Multimodal Maritime Data Fusion for Vessel Traffic Surveillance in Inland Waterways. IEEE Trans. Intell. Transp. Syst. 2023, 24, 12779–12792. [Google Scholar] [CrossRef]
  26. Nie, W.; Chang, R.; Ren, M.; Su, Y.; Liu, A. I-GCN: Incremental Graph Convolution Network for Conversation Emotion Detection. IEEE Trans. Multimed. 2022, 24, 4471–4481. [Google Scholar] [CrossRef]
  27. Yu, H.; Zhang, Y.; Zhao, J.; Liao, Y.; Huang, Z.; He, D.; Gu, L.; Jin, H.; Liao, X.; Liu, H.; et al. RACE: An Efficient Redundancy-Aware Accelerator for Dynamic Graph Neural Network. ACM Trans. Archit. Code Optim. 2023, 20, 1–26. [Google Scholar] [CrossRef]
  28. Feng, Y.; Tang, Z.; Xu, Y.; Hu, Q. Predicting Vacant Parking Space Availability Zone-Wisely: A Graph Based Spatio-Temporal Prediction Approach. IEEE Trans. Veh. Technol. 2025, 74, 2503–2512. [Google Scholar] [CrossRef]
  29. Ren, Y.; Lan, Z.; Liu, L.; Yu, H. EMSIN: Enhanced Multistream Interaction Network for Vehicle Trajectory Prediction. IEEE Trans. Fuzzy Syst. 2025, 33, 54–68. [Google Scholar] [CrossRef]
  30. Song, L.; Jin, Y.; Lin, T.; Zhao, S.; Wei, Z.; Wang, H. Remaining Useful Life Prediction Method Based on the Spatiotemporal Graph and GCN Nested Parallel Route Model. IEEE Trans. Instrum. Meas. 2024, 73, 3511912. [Google Scholar] [CrossRef]
  31. Cheng, X.; He, X.; Qiao, M.; Li, P.; Chang, P.; Zhang, T.; Guo, X.; Wang, J.; Tian, Z.; Zhou, G. Multi-View Graph Convolutional Network with Spectral Component Decompose for Remote Sensing Images Classification. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 3–18. [Google Scholar] [CrossRef]
  32. Ma, R.; Yin, Y.; Bao, K. Water level prediction of inland waterways based on MHA-BiGRU. J. Dalian Marit. Univ. 2024, 50, 46–56. [Google Scholar]
  33. Zhang, J.; Wang, H.; Cui, F.; Liu, Y.; Liu, Z.; Dong, J. Research into Ship Trajectory Prediction Based on An Improved LSTM Network. JMSE 2023, 11, 1268. [Google Scholar] [CrossRef]
  34. Alam, M.M.; Spadon, G.; Etemad, M.; Torgo, L.; Milios, E. Enhancing Short-Term Vessel Trajectory Prediction with Clustering for Heterogeneous and Multi-Modal Movement Patterns. Ocean. Eng. 2024, 308, 118303. [Google Scholar] [CrossRef]
  35. Xie, F.; Li, G.; Hu, W.; Fan, Q.; Zhou, S. Intelligent Fault Diagnosis of Variable-Condition Motors Using a Dual-Mode Fusion Attention Residual. JMSE 2023, 11, 1385. [Google Scholar] [CrossRef]
  36. Jiang, Z.; Liu, D.; Cui, L. Deep Adaptively Dynamic Edge Graph Convolution Network with Attention Weight and High-Dimension Affinity Feature Graph for Rotating Machinery Fault Diagnosis. Meas. Sci. Technol. 2025, 36, 026104. [Google Scholar] [CrossRef]
  37. Bläser, N.; Magnussen, B.B.; Fuentes, G.; Lu, H.; Reinhardt, L. MATNEC: AIS Data-Driven Environment-Adaptive Maritime Traffic Network Construction for Realistic Route Generation. Transp. Res. Part C Emerg. Technol. 2024, 169, 104853. [Google Scholar] [CrossRef]
  38. Wang, W.; Liu, C.; Liu, G.; Wang, X. CF-GCN: Graph Convolutional Network for Change Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5607013. [Google Scholar] [CrossRef]
  39. Kilic, U.; Karadag, O.O.; Ozyer, G.T. AGMS-GCN: Attention-Guided Multi-Scale Graph Convolutional Networks for Skeleton-Based Action Recognition. Knowl.-Based Syst. 2025, 311, 113045. [Google Scholar] [CrossRef]
  40. Wang, Y.; Liu, J.; Liu, R.W.; Liu, Y.; Yuan, Z. Data-Driven Methods for Detection of Abnormal Ship Behavior: Progress and Trends. Ocean. Eng. 2023, 271, 113673. [Google Scholar] [CrossRef]
  41. Gan, L.; Gao, Z.; Zhang, X.; Xu, Y.; Liu, R.W.; Xie, C.; Shu, Y. Graph Neural Networks Enabled Accident Causation Prediction for Maritime Vessel Traffic. Reliab. Eng. Syst. Saf. 2025, 257, 110804. [Google Scholar] [CrossRef]
  42. Liu, Q.; Chen, H.; Zhao, F. Maritime Target Detection Algorithm Based on Fusion of Visible and Infrared Images. J. Supercomput. 2025, 81, 22. [Google Scholar] [CrossRef]
  43. Cong, R.; Sheng, H.; Zhao, M.; Yang, D.; Wang, T.; Chen, R.; Shen, J. Multimodal Perception Integrating Point Cloud and Light Field for Ship Autonomous Driving. IEEE Trans. Intell. Transport. Syst. 2024, 25, 12477–12489. [Google Scholar] [CrossRef]
  44. Pu, Z.; Hong, Y.; Hu, Y.; Jiang, J. Research on Ship-Type Recognition Based on Fusion of Ship Trajectory Image and AIS Time Series Data. Electronics 2025, 14, 431. [Google Scholar] [CrossRef]
  45. Wang, X.; Zhang, W.; Wang, C.; Gao, Y.; Liu, M. Dynamic Dense Graph Convolutional Network for Skeleton-Based Human Motion Prediction. IEEE Trans. Image Process. 2024, 33, 1–15. [Google Scholar] [CrossRef]
  46. Ma, Y.; Liu, Q.; Yang, L. Machine Learning-Based Multimodal Fusion Recognition of Passenger Ship Seafarers’ Workload: A Case Study of a Real Navigation Experiment. Ocean. Eng. 2024, 300, 117346. [Google Scholar] [CrossRef]
  47. Zhang, Z.; Zhang, L.; Wu, J.; Guo, W. Optical and Synthetic Aperture Radar Image Fusion for Ship Detection and Recognition: Current State, Challenges, and Future Prospects. IEEE Geosci. Remote Sens. Mag. 2024, 12, 132–168. [Google Scholar] [CrossRef]
  48. Pei, Z.; Ge, M.; Li, H.; He, J.; Wang, C. Environmental factors influencing HDL-C in middle-aged and elderly Chinese population based on random forest model. J. Geo-Inf. Sci. 2022, 24, 1286–1300. [Google Scholar]
  49. Li, X.; Cheng, K.; Tan, S.; Huang, T.; Yuan, D. Fault diagnosis method of nuclear power plant based on AdaBoost algorithm. Nucl. Power Eng. 2022, 43, 118–125. [Google Scholar]
  50. Zhang, S.; Yuan, Y.; Yao, Z.; Wang, X.; Lei, Z. Improvement of the Performance of Models for Predicting Coronary Artery Disease Based on XGBoost Algorithm and Feature Processing Technology. Electronics 2022, 11, 315. [Google Scholar] [CrossRef]
  51. Chai, T.; Xue, H.; Sun, K.; Weng, J. Ship Accident Prediction Based on Improved Quantum-Behaved PSO-LSSVM. Math. Probl. Eng. 2020, 2020, 8823322. [Google Scholar] [CrossRef]
  52. Yuan, Z.; Liu, J.; Liu, Y.; Zhang, Q.; Liu, R.W. A Multi-Task Analysis and Modelling Paradigm Using LSTM for Multi-Source Monitoring Data of Inland Vessels. Ocean. Eng. 2020, 213, 107604. [Google Scholar] [CrossRef]
  53. Liu, R.W.; Guo, Y.; Nie, J.; Hu, Q.; Xiong, Z.; Yu, H.; Guizani, M. Intelligent Edge-Enabled Efficient Multi-Source Data Fusion for Autonomous Surface Vehicles in Maritime Internet of Things. IEEE Trans. Green Commun. Netw. 2022, 6, 1574–1587. [Google Scholar] [CrossRef]
  54. Dong, X.; Cai, W.; Tan, Q.; Wang, D.; Li, S.; Wang, Q. Fusing Dynamic Dual Attention with Multi-Modal Feature Pyramid for Small Target Detection Algorithm in Cruise Ship Collision Avoidance and Experimental Validation. Ship Eng. 2025, 47, 8–19. [Google Scholar] [CrossRef]
  55. Wang, X.; Wang, X.; Yin, X.; Li, K.; Wang, L.; Wang, R.; Song, R. Distributed LSTM-GCN-Based Spatial–Temporal Indoor Temperature Prediction in Multizone Buildings. IEEE Trans. Ind. Inf. 2024, 20, 482–491. [Google Scholar] [CrossRef]
  56. Zhang, Y.; Xu, W.; Ma, B.; Zhang, D.; Zeng, F.; Yao, J.; Yang, H.; Du, Z. Linear Attention Based Spatiotemporal Multi Graph GCN for Traffic Flow Prediction. Sci. Rep. 2025, 15, 8249. [Google Scholar] [CrossRef]
Figure 1. LiDAR point cloud data processing.
Figure 1. LiDAR point cloud data processing.
Jmse 14 00087 g001
Figure 2. Video data processing.
Figure 2. Video data processing.
Jmse 14 00087 g002
Figure 3. AIS data processing.
Figure 3. AIS data processing.
Jmse 14 00087 g003
Figure 4. Waterway water level data processing.
Figure 4. Waterway water level data processing.
Jmse 14 00087 g004
Figure 5. Multimodal data feature fusion and event detection.
Figure 5. Multimodal data feature fusion and event detection.
Jmse 14 00087 g005
Figure 6. MF-GCN algorithm architecture.
Figure 6. MF-GCN algorithm architecture.
Jmse 14 00087 g006
Figure 7. S-GCN Workflow.
Figure 7. S-GCN Workflow.
Jmse 14 00087 g007
Figure 8. A-GCN process.
Figure 8. A-GCN process.
Jmse 14 00087 g008
Figure 9. A subset of images from the ship deviation detection dataset.
Figure 9. A subset of images from the ship deviation detection dataset.
Jmse 14 00087 g009
Figure 10. A subset of images from the ship bridge-crossing detection dataset.
Figure 10. A subset of images from the ship bridge-crossing detection dataset.
Jmse 14 00087 g010
Figure 11. A subset of images from the inter-ship collision detection dataset.
Figure 11. A subset of images from the inter-ship collision detection dataset.
Jmse 14 00087 g011
Figure 12. The influence of the number of graphs on performance in A-GCN.
Figure 12. The influence of the number of graphs on performance in A-GCN.
Jmse 14 00087 g012
Figure 13. The impact of edge computation formulas on model performance.
Figure 13. The impact of edge computation formulas on model performance.
Jmse 14 00087 g013
Figure 14. Visualization of ship anomaly event detection.
Figure 14. Visualization of ship anomaly event detection.
Jmse 14 00087 g014aJmse 14 00087 g014b
Table 1. Comparative Analysis of Sensor Modalities.
Table 1. Comparative Analysis of Sensor Modalities.
SensorAdvantagesLimitationsRemarks
AIS(1) Operational range of 15–30 n miles with extensive coverage;(1) Passive signal reception, limited to ships with active AIS transponders;Provides detailed ship metadata but lacks intuitive situational awareness.
(2) All-weather capability, unaffected by meteorological conditions;(2) Network-dependent latency in data transmission;
(3) Contains ship attribute data;(3) Information reliability depends on manual input or external sensors.
(4) High tracking stability and reliability;
(5) Satellite-derived structured positioning data with high accuracy.
Video(1) Active detection independent of target material;(1) Sensitivity to illumination and visibility conditions;Enables visual confirmation of ship presence but lacks detailed metadata.
(2) Operational range of 2–5 km with high-resolution imaging;(2) Distance measurement inaccuracy due to perspective-induced scale variations;
(3) Rich visual feature extraction for rapid situational awareness.(3) Limited efficacy in detecting small targets at long ranges.
LiDAR(1) Active target detection;(1) Limited operational range (~300 m);
(2) High deployment and maintenance costs.
Delivers precise distance/azimuth data but lacks visual contextualization.
(2) All-weather operation, effective in darkness and low visibility;
(3) High precision (centimeter-level), particularly for speed and offshore distance measurements.
Table 2. Experimental results of ship deviation warning detection.
Table 2. Experimental results of ship deviation warning detection.
ModelAccuracyPrecisionRecallF1
Random Forest [48]88.689.290.589.8
Adaboost [49]88.287.388.587.9
GBDT [50]90.389.690.890.2
SVM [51]89.789.389.889.5
LSTM [52]87.786.387.486.8
YOLOX-s [53]88.587.888.388.0
DDA-YOLO [54]92.591.892.492.1
MF-GCN (ours)93.893.593.793.6
Table 3. Experimental results of ship bridge-crossing warning detection.
Table 3. Experimental results of ship bridge-crossing warning detection.
ModelAccuracyPrecisionRecallF1
Random Forest [48]88.889.490.790.0
Adaboost [49]88.087.588.387.9
GBDT [50]90.589.891.090.4
SVM [51]89.589.189.689.3
LSTM [52]87.586.587.687.0
YOLOX-s [53]90.890.290.690.4
DDA-YOLO [54]92.191.591.891.6
MF-GCN (ours)93.893.393.993.6
Table 4. Experimental results of inter-ship collision warning detection.
Table 4. Experimental results of inter-ship collision warning detection.
ModelAccuracyPrecisionRecallF1
Random Forest [48]87.588.388.288.2
Adaboost [49]87.889.087.888.4
GBDT [50]89.590.290.590.3
SVM [51]88.989.290.189.6
LSTM [52]86.987.286.987.0
YOLOX-s [53]89.990.289.589.8
DDA-YOLO [54]91.891.790.991.3
MF-GCN (ours)93.393.892.893.3
Table 5. The functions of each component in the algorithm.
Table 5. The functions of each component in the algorithm.
S-GCNA-GCNAccuracyF1
+92.8%92.5%
+92.5%92.7%
++93.8%93.6%
Table 6. The impact of time intervals on model performance.
Table 6. The impact of time intervals on model performance.
TS-GCNA-GCNAccuracyF1
591.8%92.2%92.7%92.2%
1092.8%92.5%93.8%93.6%
2091.6%91.8%92.2%92.5%
3090.6%90.9%91.8%91.9%
Table 7. The impact of different fusion methods on performance in S-GCN.
Table 7. The impact of different fusion methods on performance in S-GCN.
Fusion MethodAccuracyF1
Max pooling92.9%92.8%
Min pooling91.8%91.2%
Mean pooling93.8%93.6%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, R.; Zhang, J.; Nie, W.; Ge, N.; Wen, H.; Liu, A. MF-GCN: Multimodal Information Fusion Using Incremental Graph Convolutional Network for Ship Behavior Anomaly Detection. J. Mar. Sci. Eng. 2026, 14, 87. https://doi.org/10.3390/jmse14010087

AMA Style

Ma R, Zhang J, Nie W, Ge N, Wen H, Liu A. MF-GCN: Multimodal Information Fusion Using Incremental Graph Convolutional Network for Ship Behavior Anomaly Detection. Journal of Marine Science and Engineering. 2026; 14(1):87. https://doi.org/10.3390/jmse14010087

Chicago/Turabian Style

Ma, Ruixin, Jinhao Zhang, Weizhi Nie, Naiming Ge, Hao Wen, and Aoxiang Liu. 2026. "MF-GCN: Multimodal Information Fusion Using Incremental Graph Convolutional Network for Ship Behavior Anomaly Detection" Journal of Marine Science and Engineering 14, no. 1: 87. https://doi.org/10.3390/jmse14010087

APA Style

Ma, R., Zhang, J., Nie, W., Ge, N., Wen, H., & Liu, A. (2026). MF-GCN: Multimodal Information Fusion Using Incremental Graph Convolutional Network for Ship Behavior Anomaly Detection. Journal of Marine Science and Engineering, 14(1), 87. https://doi.org/10.3390/jmse14010087

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop