Semantic Modeling of Ship Collision Reports: Ontology Design, Knowledge Extraction, and Severity Classification

Yu, Hongchu; Xu, Xiaohan; Guo, Zheng; Wei, Tianming; Xu, Lei

doi:10.3390/jmse14050448

Open AccessArticle

Semantic Modeling of Ship Collision Reports: Ontology Design, Knowledge Extraction, and Severity Classification

by

Hongchu Yu

^1,2,3,*

,

Xiaohan Xu

¹,

Zheng Guo

¹,

Tianming Wei

¹ and

Lei Xu

⁴

¹

School of Navigation, Wuhan University of Technology, Wuhan 430063, China

²

State Key Laboratory of Maritime Technology and Safety, Wuhan University of Technology, Wuhan 430063, China

³

Sanya Science and Education Innovation Park of Wuhan University of Technology, Sanya 572000, China

⁴

National Engineering Research Center for Geographic Information System, China University of Geosciences (Wuhan), Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(5), 448; https://doi.org/10.3390/jmse14050448

Submission received: 25 January 2026 / Revised: 25 February 2026 / Accepted: 25 February 2026 / Published: 27 February 2026

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

With the expansion of water transportation networks and increasing traffic intensity, maritime accidents have become frequent, posing significant threats to safety and property. This study presents a knowledge graph-driven framework for maritime accident analysis, addressing the limitations of traditional risk analysis methods in extracting and organizing unstructured accident data. First, a standardized ontology for ship collision accidents is developed, defining core concepts such as event, spatiotemporal behavior, causation, consequence, responsibility, and decision-making. Advanced natural language processing models, including a lexicon-enhanced LEBERT-BiLSTM-CRF and a K-BERT-BiLSTM-CRF incorporating ship collision knowledge triplets, are proposed for named entity recognition and relation extraction, with F1-score improvements of 6.7% and 1.2%, respectively. The constructed accident knowledge graph integrates heterogeneous data, enabling semantic organization and efficient retrieval. Leveraging graph topological features, an accident severity classification model is established, where a graph-feature-driven LSTM-RNN demonstrates robust performance, especially with imbalanced data. Comparative experiments show the superiority of this approach over conventional models such as XGBoost and random forest. Overall, this research demonstrates that knowledge graph-driven methods can significantly enhance maritime accident knowledge extraction and severity classification, providing strong information support and methodological advances for intelligent accident management and prevention.

Keywords:

maritime accident; ship collision; knowledge extraction; knowledge graph; entity recognition; severity classification

Graphical Abstract

1. Introduction

Waterway transport is a mainstay of trade shipping due to its large capacity and long-distance coverage. With the growth of international trade, waterway traffic continues to increase, which in turn raises the risk of accidents, threatening safety and the environment. To effectively mitigate these maritime risks, current research broadly advances along two complementary trajectories: real-time situational perception and retrospective accident cognition. On the perceptual front, relevant technologies have reached a remarkable level of maturity. For instance, advanced computer vision and deep learning models have been successfully deployed for high-precision, orientation-aware ship detection to ensure safe navigation [1]. In stark contrast to these well-established visual perception systems designed for front-line collision avoidance, research on retrospective accident cognition remains relatively insufficient. While real-time perception excels at capturing immediate surface-level hazards, uncovering the deep-seated causal chains and hidden risk patterns requires in-depth semantic mining of historical accident reports. Sufficient utilization of accident data is crucial for risk prevention and control. A large number of detailed accident investigation reports have been issued by authoritative institutions, providing valuable sources for understanding and analyzing the causes of accidents.

However, existing maritime regulations and accident classification schemes mainly focus on post-incident reporting and require manual interpretation of text-based reports, limiting timely and intelligent risk management. Data-driven and automated classification models, such as those based on knowledge graphs, can efficiently extract, organize, and analyze key information from unstructured reports, enabling rapid, near-real-time accident classification and intelligent decision support.

2. Literature Review

2.1. Ship Collision Accident Research

Ship collision accident research covers the identification and analysis of influencing factors and accident severity, as well as the information mining of accident data. Chauvin et al. analyzed human and organizational factors in collisions using HFACS, revealing that human error is critical [2]. Kayiran et al. built a data-driven Bayesian Network for accidents involving dry-bulk carriers in the Turkish Search and Rescue areas (2001–2019), quantifying how season, region, time-of-day, flag and other factors shape accident types and severities, and offering targeted management suggestions [3]. Hänninen synthesized the benefits and challenges of Bayesian-network modeling for multi-stage maritime accident causation [4]. Gan et al. constructed an ontology-based knowledge graph from 241 collision investigation reports to structure risk factors and support analysis [5]. In parallel, Qu et al. proposed AIS-based risk indices for close-quarters early warning and spatially pinpointed high-risk legs in a chokepoint strait, while Fan et al. advanced a validated Bayesian framework for navigational risk assessment of remotely controlled autonomous ships (MASS), linking probability and consequence under scenario-specific factors [6,7]. Building on these quantitative risk frameworks, Namgung and Kim further developed an inference system to determine critical decision timing by calculating the Collision Risk Index (CRI), while Namgung integrated such risk assessments into local route planning to ensure autonomous maneuvers remain compliant with the International Regulations for Preventing Collisions at Sea (COLREGs) rules [8,9].

The existing studies have carried out an in-depth analysis of ship collision accidents from different perspectives, such as HFACS-based human-factor analysis, AIS-derived close-quarters risk indices, Bayesian-network causation modeling, and ontology-based knowledge graphs from investigation reports, and emphasized the key roles of human, environmental, and management factors in accident management [10,11,12,13]. Particularly regarding environmental challenges, adverse weather conditions like dense fog severely impair visibility and threaten navigation safety. This has prompted the development of advanced visual enhancement methods, such as vision transformer-based image dehazing, to ensure climate-resilient maritime navigation [14]. However, from the perspective of historical accident data mining, the existing research mainly focuses on the organization of structured and semi-structured data and static feature analysis [15,16]. Furthermore, specifically regarding the semantic modeling of ship collision reports, the evolution of methodologies highlights a transition from shallow statistical extraction to deep semantic structuring. Early approaches primarily relied on rule-based text mining to extract superficial accident causalities [17]. Meanwhile, recent trends emphasize the construction of domain-specific ontologies and semantic networks to capture complex, multidimensional relationships [18]. However, current semantic modeling frameworks still struggle with processing long domain-specific entities and low-frequency maritime terminology [19]. Constrained by these technical bottlenecks, existing methods fall short of fully transforming highly unstructured accident texts into computable and inferable deep safety knowledge, leaving a critical gap between theoretical research and practical application. To address these practical difficulties, this study proposes a semantic modeling and application framework specifically for ship collision reports. Specifically, the concrete work and contributions of this paper are mainly reflected in three aspects. First, a multidimensional ontology model incorporating event, spatiotemporal behaviour, cause, consequence, responsible party, and disposition decision is designed, providing a structured expression for the accident evolution process. Second, targeting the specific difficulties of fuzzy boundaries in long entities and low-frequency vocabulary recognition in maritime texts, an information extraction approach combining LeBERT-BiLSTM-CRF incorporating domain vocabulary information, and BERT-MLP is adopted to construct a ship collision knowledge graph. Additionally, through comparative experiments, this study demonstrates that injecting a subset of the extracted triplet knowledge into a K-BERT-BiLSTM-CRF model further improves Named Entity Recognition (NER) accuracy, validating the tremendous potential of knowledge-enhanced architectures. Finally, building upon the structured organization of textual data, this study attempts to extract two distinct dimensions of quantifiable features—Node characteristics and topological features from the constructed knowledge graph, and utilizes an LSTM-RNN model to explore the classification of accident severity. Ultimately, this research not only provides effective data support for maritime knowledge management but also offers a feasible technical pathway for extracting deep risk characteristics from historical text mining.

2.2. Named Entity Recognition

NER is an important Natural Language Processing (NLP) task, which can be categorized into rule-based and dictionary-based methods, statistical machine learning approaches, and neural network-based deep learning methods.

Rule-based and dictionary-based methods rely on manually defined domain dictionaries and pattern matching to extract entities [20]. While interpretable, these methods require significant manual effort, are domain-dependent, and struggle to recognize unseen or out-of-vocabulary entities.

Due to the limitations of rule-based and dictionary-based methods, statistical machine learning approaches have become popular for NER, typically framing it as a sequence labeling or classification task. Representative models include the Hidden Markov Model (HMM), Conditional Random Fields (CRF), Maximum Entropy Model (MEM), and Support Vector Machine (SVM) [21,22,23,24,25]. These models can recognize a variety of entity types by learning feature representations from labeled data, and have been applied to multiple languages and domains [26,27,28]. Compared to rule-based methods, machine learning approaches improve scalability and adaptability, but still rely heavily on manually annotated corpora and handcrafted features, limiting their performance in complex or data-scarce scenarios.

With the rise of neural networks, NER has made significant progress in terms of accuracy and flexibility, especially in variable linguistic scenarios. Deep learning models, including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Graph Neural Network (GNN), and Transformer architectures, have been widely used in NER.

CNN-based methods efficiently convert vocabulary into vector representations and extract features [29]. Improvements such as IDCNN and GRAM-CNN enhance large-context modeling and local feature extraction [30,31]. Dictionary-augmented and data-augmented CNN approaches further improve model robustness and adaptability [32,33]. RNN-based models, such as Long Short-Term Memory Networks (LSTM) and Gated Recurrent Unit (GRU), are widely used to capture sequential dependencies in NER tasks [34,35]. The combination of Bidirectional Long Short-Term Memory (BiLSTM) and CRF further improves sequence labeling performance, while integrating Graph Convolutional Network (GCN) allows for enhanced modeling of complex dependencies [36,37].

Transformer-based models represent the state-of-the-art for NER by modeling complex contextual information [38]. Pre-trained language models such as the pre-trained Bidirectional Encoder Representations from Transformers (BERT) and its variants have significantly improved NER performance [39,40]. In particular, for Chinese NER, advanced models like RoBERTa with Whole Word Masking-Extended (RoBERTa-wwm-ext) effectively extract context-sensitive vectors and enhance recognition accuracy [41]. Other methods, including Whole Word Masking and domain-specific adaptations, further improve the extraction of complex entities [42,43].

Recent research shows that deep learning methods excel in NER tasks by capturing global contextual semantics and modeling long-range dependencies. However, for tasks requiring rich domain knowledge, pre-trained models still face challenges in generalization and knowledge representation. To address this, researchers have embedded external knowledge—such as knowledge graphs or dictionaries—into model inputs to enhance semantic representation. Representative knowledge-enhanced BERT models, including Knowledgeable BERT (K-BERT), Lexicon-Enhanced BERT (LeBERT), and similar variants, have achieved strong results in handling small-sample scenarios and domain-specific terminology [39,44,45]. Specifically, K-BERT incorporates semantic triples from knowledge graphs, while LeBERT injects lexical information at the input layer to address fuzzy entity boundaries and complex terms in domain texts.

In this study, considering the characteristics of water transportation accident texts—such as fuzzy entity boundaries, complex long terminologies, and data imbalance—a LeBERT-BiLSTM-CRF model integrating domain vocabulary information is employed for NER. Specifically, the model utilizes LeBERT to dynamically inject specialized maritime vocabularies, effectively enhancing the recognition of nested and fuzzy entities. Furthermore, to explore the utility and quality of the extracted structural knowledge, this study trained a K-BERT model using a mixture of manually annotated data and the derived knowledge triplets. While the LeBERT-BiLSTM-CRF pipeline constructed the foundational knowledge graph, comparative experiments confirm that the K-BERT model, empowered by the injected triplets, achieves higher recognition precision. This empirical finding not only verifies the high quality of the extracted knowledge but also highlights a promising paradigm for optimizing maritime NER tasks through knowledge-driven approaches.

2.3. Relationship Extraction

RE can also be categorized into rule-based methods, statistical machine learning methods, and neural network-based deep learning methods.

Rule-based RE methods rely on handcrafted extraction templates and structured formats to identify entity–relationship pairs [46,47]. These approaches are interpretable and simple, but require domain-specific knowledge and lack generalizability.

Statistical machine learning methods transform relationships into feature vectors and use classifiers such as MEM and SVM [48,49,50]. While effective, these methods depend heavily on feature engineering.

Neural network-based deep learning methods, such as CNNs, BiLSTM, and attention mechanisms, can automatically capture semantic relationships in text and model long-range dependencies [51,52,53]. Graph-based models further enhance RE by leveraging syntactic structures [54,55].

Pre-trained language models like BERT and its variants have set new benchmarks for RE by integrating contextual information and entity-aware attention [56,57].

For water transportation accident texts, RE is challenged by complex terminology and data imbalance. Therefore, this study adopts a knowledge migration strategy, combining pre-trained models with domain rule alignment, to improve extraction accuracy and support intelligent safety management.

2.4. Domain Knowledge Graph

Domain knowledge graphs have been widely applied across occupational safety and health, railway safety and risk management, emission calculation and control, cybersecurity risk governance and aquatic germplasm resource management [58,59,60,61,62,63]. These applications demonstrate the ability of knowledge graphs to transform raw data and named entities into intelligent reasoning and decision support for complex domains.

In the maritime domain, knowledge graphs have been developed for flag state control inspection, maritime traffic knowledge discovery, identification of illegal behaviors, and fine prediction, which have significantly enhanced information structuring, safety supervision, and decision-making [64,65,66,67,68,69,70]. However, the complexity of semantic relationships and the diversity of entity types in waterborne traffic accidents still pose major challenges to accurate and robust named-entity recognition (NER).

To address these challenges, this study proposes an approach that integrates domain-specific vocabularies and pre-trained language models to inject accident knowledge into the NER process. On this basis, we construct a comprehensive knowledge graph for ship collision prevention and control, covering all key accident elements, spatiotemporal behaviors, causes, consequences, responsible parties, and remedial decisions. This enables more accurate information extraction and supports intelligent discrimination of accident severity in maritime scenarios.

3. Methodology

The flowchart of the proposed methodology is shown in Figure 1. First, data collection and preprocessing are carried out to construct a standardized corpus, for example, cleaning the redundant information based on the corpus of ship collision accident reports. Second, the standardized knowledge framework for ship collision accidents is constructed, covering accidents, spatiotemporal behaviour, cause, consequence, responsible party, and disposition decisions. The LeBERT NER model integrates domain knowledge, and the RE model is based on BERT and domain rules, which are adopted for knowledge extraction from the unstructured text of water transportation accidents. Further, the structured triad is fed back to the NER to recognize more entity types in the ship collision reports during the training process, so as to improve the effectiveness of NER. Finally, the analysis and visualization of the knowledge graph and knowledge retrieval are used for accident-level classification.

3.1. Ship Collision Accident Ontology Modeling

3.1.1. Conceptual Layers

The study extends the SEM model to construct the knowledge standardization framework, which consists of seven core concepts, including accident, space and time, behaviour, cause, consequence, responsible party, and disposition decision [60]. The accident affects the safety and operation of the ship. Space and time are connected with where and when a series of behaviours occur in the accident, such as “collision location is 22°08.14′ N/114°13.80′ E”, “ship safety inspection on 9 April 2014”, etc. Behaviours are issued by the ship as objects accompanied by spatial and temporal relations and can be recorded using the relevant information. For example, “At about 1011 h, the AIS of HONGDA 186 recorded the ship’s position as 29°59.36′ N/122°00.87′ E, speed 6.8 knots, heading 140.5 degrees”. The cause of the accidents can be negligence in looking out, negligence of the management of the shipping company, etc. The consequences can be “caused the sinking of the three vessels involved in fishing”, “two people on board fell into the water, of which one was rescued, and one died”, “direct economic loss of about 367,000 yuan”, and so on. The responsible party can be the corresponding person or related parties. Disposition decision is the decision-making and recommendations for the accident, such as “recommending fisheries management organizations to strengthen the regulation of ‘three-less’ vessels involved in fishing” and “cracking down on illegal fishing by three-less vessels”.

3.1.2. Entity Types

Based on the core concepts set in the knowledge standardization framework, the types of entities are defined in Table 1. An accident is the core entity in the knowledge graph, connecting all other related entities, such as vessels, personnel, environment, etc. Vessels, as water transportation, can be of various types, such as cargo ships, passenger ships, fishing boats, and so on. Vessel dynamics refer to the behaviour and state changes in the vessel before and after the accident, including the sailing route, speed changes and other contents. It records the evolution of the accident. Personnel covers individuals involved in or affected by the accident, such as crew members, passengers, rescuers and so on. The behaviour, decision-making and skills of personnel are directly related to the accident. Organization refers to institutions related to accidents, such as maritime safety investigation agencies, ship inspection agencies, and so on. These organizations play an important role in the management, investigation, handling and prevention of accidents. Time information is crucial for tracking and analyzing the whole process of accident development, including the time of ship collisions, the time of ship construction, the time of disposition, etc. Location can be the country, administrative regions, maritime functional areas, locations, and so on, and it provides support for the spatial analysis of accidents. Environmental factors such as meteorology, water temperature, and navigational environment are important for the analysis of the causes of accidents and risk assessment. Equipment refers to the tools and devices used by ships and law enforcement agencies, such as radar, surveillance equipment, the ship’s main engine and so on. Cause refers to a series of factors that lead to the occurrence of accidents, including human errors, equipment failures, environmental conditions, and management negligence. The consequences of the accident include personnel injury, economic loss, environmental impact, accident level and so on. Laws and regulations provide a clear legal basis for the subsequent disposal of accidents and ensure the legitimacy and fairness of the handling process. The recommendation covers various aspects, such as recommendations for personnel, organizations, and management, which help to improve safety management, as well as enhance accident prevention and control.

3.1.3. Relationship Types

The relationship types between entities include the semantics of affiliation, attribute subject–object relationship, spatial–temporal relationship, and causal relationship, as shown in Table 2. Among them, “of_VesselFeature” is used to describe the specific attributes of the ship. Conceptual hierarchical relations include discovering, employ, manage, hold, equip, dispatch, occur, rescue, use, encounter, notify, investigate, report, and belongs_to. “manipulate_of_RealTimeDynamics” is used to reflect the specific operational behaviour of personnel in response to changes in ship dynamics. A causal relation is used to describe the causes and consequences of the accident. Spatiotemporal relation is used to portray the change in the spatial position of a ship or a person at a specific time.

3.1.4. Entity Attributes

The entity attributes include vessel features and personnel features, as shown in Table 3. Vessel features refer to the inherent attributes of a vessel, including MMSI, IMO, ship registry, ship size, and so on. Personnel features refer to the relevant characteristics of the personnel entity, including name, age, education, gender, and so on.

3.2. LeBERT Entity Recognition Model Enhanced by Domain Vocabularies

3.2.1. Domain Dictionary Construction and Matching Methods

The corpus related to waterborne traffic accidents was collected from the official website of the China MSA, the databases of China National Knowledge Infrastructure, Wanfang, Baidu Baike and so on. Following the collection and preprocessing of the corpus, we employed a TF-IDF-based ranking mechanism to identify potential domain keywords. By calculating the local frequency of terms weighted by their inverse document distribution, the top 50 representative words from each document were aggregated into a candidate set. These candidates were then filtered based on their normalized frequency percentage across the entire corpus; only terms surpassing a specific threshold were considered. The final domain dictionary was established after a rigorous manual verification by domain experts to ensure the accuracy and relevance of the maritime technical terminology.

The identified and extracted terminologies related to waterborne traffic accidents are shown in Table 4. Domain words may be formed from different words. For example, environmental factors can be a south wind turning to southwest and showers turning to cloudy, then overcast. The extracted domain vocabularies are trained using the Word2Vec word vector model. A domain dictionary consisting of 5377-word vectors with a dimension of 200 is obtained.

A prefix tree is an efficient structure to represent words in water transportation accidents. For instance, the strings “search and rescue vessel”, “search and rescue radar”, and “search and rescue radar transponder” share common prefixes and can be stored compactly using a prefix tree. In such a structure, each node represents a character, and the path from the root node to a leaf node forms a complete string. This method effectively reduces storage space and enhances retrieval performance.

The specific steps for the dictionary tree construction algorithm are shown in Algorithm 1. That covers initializing the vocabulary and the root node, and iterating through the vocabulary for each character to get matched.

Algorithm 1: Dictionary tree construction algorithm

Input: A sentence containing n characters

s_{c} = {c_{1}, c_{2}, \dots, c_{n}}

.

Output:

s_{c w} = {(c_{1}, {w s}_{1}), (c_{1}, {w s}_{2}), \dots, (c_{1}, {w s}_{n})}

.

Step 1: Initialize the vocabulary list

{W S}_{i} = {{w s}_{1}, {w s}_{2}, \dots, {w s}_{n}}

, which is derived from a pre-compiled domain terms.

Step 2: Initialize the root node of the dictionary tree.

Step 3: For

c_{i}

, iterate through the vocabulary list

W S

, and get the word

{W S}_{i}

that matches the character

c_{i}

. Take

(c_{i}, {w s}_{i})

to construct a dictionary tree child node.

Step 4: Repeat execution until each character is matched.

The dictionary matching algorithm is to match character-level feature vectors with word-level feature vectors through the constructed domain dictionary, as shown in Algorithm 2. It includes the retrieval of candidate words, mapping candidate words into vectors, calculating weights based on similarity, and the weighted summation of all candidate word vectors. For example, “search” in “search and rescue radar transponder” is matched with the word sets “search and rescue” and “search and rescue radar” through the domain dictionary tree. This is essential to embed dictionary adapters into the transformer encoders of the BERT model.

Algorithm 2: Dictionary matching algorithm

Input:

{W S}_{i} = {{w s}_{1}, {w s}_{2}, \dots, {w s}_{n}}

is the set of words,

c_{i}

is the character need to be matched.

Output:

\tilde{H} = {\tilde{h_{1}}, \tilde{h_{2}}, \dots, \tilde{h_{n}}}

the generated matching vectors.

Step 1: Retrieve candidate words from

W S

that potentially match character

c_{i}

.

Step 2: Map candidate words into vectors.

Step 3: Calculate weights based on the similarity of character

c_{i}

to the candidate word vector.

Step 4: Weighted summation of all candidate word vectors.

Step 5: Generate the final character representation.

3.2.2. Named Entity Recognition Model

This paper proposes an NER model incorporating a lexical enhancement mechanism, as shown in Figure 2. It is based on character-level input and adopts a LeBERT-BiLSTM-CRF structure. The LeBERT model adds a Lexicon Adapter between two specific layers of the Transformer in the BERT model to enhance the feature information [45]. It consists of a pre-trained model of RoBERTa using fixed parameters and a Lexicon Adapter, as shown in Figure 3. In order to better adapt to the characteristics of the Chinese language, this paper chooses the RoBERTa-wwm-ext as the representation model of the embedding layer. The model adopts the Whole Word Masking (WWM) mechanism on the basis of the RoBERTa architecture to improve the model’s semantic modeling ability and perception of Chinese word boundaries in the NER task.

Given a Chinese sentence

s_{c} = {c_{1}, c_{2}, \dots, c_{n}}

with

n

characters, construct a sequence of character–word pairs

s_{c w} = {(c_{1}, {w s}_{1}), (c_{1}, {w s}_{2}), \dots, (c_{1}, {w s}_{n})}

. The characters

{c_{1}, c_{2}, \dots, c_{n}}

are first input to the embedder, and the

E = {e_{1}, e_{2}, \dots, e_{n}}

by adding token, segmentation, and position is output. Then, input

E

to the Transformer encoder, and the computation of each layer of the Transformer are shown in Equations (1) and (2).

G = L N (H^{l - 1} + {M H A}_{t t n} (H^{l - 1}))

(1)

H^{l} = L N (G + F F N (G))

(2)

where

H^{l} = {h_{1}^{l}, h_{2}^{l}, \dots h_{n}^{l}}

denotes the output of layer l.

H^{0} = E

is the first layer,

L N

denotes normalization for each layer,

{M H A}_{t t n}

denotes multi-head attention mechanism, and

F F N

denotes a two-layer feedforward network with ReLU as hidden activation function. In order to incorporate dictionary information between the kth and k + 1th Transformer layers, the output

H^{k} = {h_{1}^{k}, h_{2}^{k}, \dots h_{n}^{k}}

is first obtained after

k

times Transformers. Then, each pair

(h_{i}^{k}, x_{i}^{w s})

is passed through a dictionary adapter to obtain

{\tilde{h}}_{i}^{k}

.

Lexicon Adapter consists of a feature word vector and a bilinear attention mechanism, as shown in Figure 4. The module receives two inputs, including the contextual representation and the matching lexical item corresponding to the character. The ith input sequence is denoted as

(h_{i}^{k}, x_{i}^{w s})

,

h_{i}^{c}

is a character vector generated by one of the Transformer layers, and

x_{i}^{w s} = {x_{i 1}^{w}, x_{i 2}^{w}, \dots, x_{i m}^{w}}

is the set of word embeddings. The jth word in

x_{i}^{w s}

is represented as Equation (3):

x_{i j}^{w s} = e^{w} (w_{i j})

(3)

where

e^{w}

is a pre-trained word-embedded table, and

w_{i j}

is the jth word in

{w s}_{i}

.

To align the representations of character–word, a nonlinear transformation is applied to the word vector as in Equation (4):

v_{i j}^{w} = W_{2} (\tanh (W_{1} x_{i j}^{w} + b_{1})) + b_{2}

(4)

where

W_{1}

and

W_{2}

are the matrixes

b_{1}

,

b_{2}

is the scalar, and

d_{w}

and

d_{c}

denotes the dimensions of the word embedding and hidden layer in BERT, respectively.

To select the most relevant word for the character from all the matched words, a character-to-word attention mechanism is introduced. The words for the ith character are represented by

V_{i} = (v_{i i}^{w}, \dots, v_{i m}^{w})

and the corresponding relevance for each word is calculated as Equation (5):

a_{i} = s o f t m a x (h_{i}^{c} W_{a t t n} V_{i}^{T})

(5)

where

W_{a t t n}

is the weight matrix of the bilinear attention mechanism. The weighted sum of all words can be obtained by Equation (6):

z_{i}^{w} = \sum_{j = 1}^{m} a_{i j} v_{i j}^{w}

(6)

Finally, the weighted dictionary information is injected into the character vector by the following Equation (7):

{\tilde{h}}_{i}^{k} = h_{i}^{k} + z_{i}^{w}

(7)

The main role of the BiLSTM layer is to perform contextual feature extraction to obtain bidirectional semantic information. The outputs

H^{l}

of the Transformer at layer

l

of the LeBERT module are used as inputs of the BiLSTM. The output

[{\vec{h}}_{0}, {\vec{h}}_{1}, {\vec{h}}_{2}, \dots, {\vec{h}}_{s}]

of the forward hidden layer and the output

[{\overset{\leftarrow}{h}}_{0}, {\overset{\leftarrow}{h}}_{1}, {\overset{\leftarrow}{h}}_{2}, \dots, {\overset{\leftarrow}{h}}_{s}]

of the backward hidden layer are combined to form the complete hidden state sequence

H_{t} = [h_{0}, h_{1}, h_{2}, \dots, h_{s}]

of the BiLSTM, where

h_{i} = [{\vec{h}}_{i}, {\overset{\leftarrow}{h}}_{i}]

.

The CRF layer optimizes the label sequence of the whole sentence. For a given input sequence

X

and labeled sequence

y

, the score of the whole sequence

s c o r e (y, X)

is obtained through Equation (8).

S c o r e (X, y) = \sum_{i = 0}^{n} A_{y_{i}, y_{i + 1}} + \sum_{i = 0}^{n} p_{i, y_{i}}

(8)

where

A

represents the transfer matrix,

A_{y_{i}, y_{i + 1}}

represents the probability score from label

y_{i},

to

y_{i + 1}

, and

p_{i, y_{i}}

represents the score of the

p_{i, y_{i}}

th label of the character.

3.3. Relationship Extraction Model Based on Semantic Representation and Rule Constraints

A RE model that integrates semantic representations with rule constraints is shown in Figure 5. The BERT-MLP_rule model takes the complete sentence text, the target entity pairs and their correspondences as joint inputs and semantically encodes the sentences and entities via BERT so as to obtain semantic vectors of the relationships between the entities. Subsequently, the encoded vectors are input into a Multilayer Perceptron (MLP) to calculate the predicted relationship through forward propagation.

The specific steps are as follows. First, the input data is formatted as

{S, e_{1}, e_{2}, R}

, where

S

denotes the input sentence,

e_{1}

and

e_{2}

denote two entities, and

R

denotes the relationship between the two entities. Second, the disambiguator of the BERT model is used to obtain the textual disambiguation sequence

t o k e n s = [t_{1}, t_{2}, \dots, t_{m}]

as follows.

t o k e n s = T o k e n i z e r (S)

(9)

p_{e_{1}} = F i n d E n t i t y P o s i t i o n (t o k e n s, e_{1})

(10)

p_{e_{2}} = F i n d E n t i t y P o s i t i o n (t o k e n s, e_{2})

(11)

where

m

is the number of tokens, and

T o k e n i z e r (S)

denotes performing textual disambiguation on the input text

S

.

F i n d E n t i t y P o s i t i o n (t o k e n s, e_{1})

denotes the sub-sequence that matches the entity

e_{1}

, with determined start and end positions.

Third, the location information

p_{e_{1}}

and

p_{e_{2}}

of entities

e_{1}

and

e_{2}

are compiled into entity vectors

H_{e_{1}}

and

H_{e_{2}}

to form

H_{e_{1}}^{*}

and

H_{e_{2}}^{*}

, respectively. They are jointed with the vector representation

H_{S}

of the sentence

S

to obtain the composite representation vector

H_{c o n a c t}

as follows.

H_{S} = B E R T (S)

(12)

H_{e_{1}}^{*} = P o s i t i o n E m b e d d i n g (p_{e_{1}}, H_{e_{1}})

(13)

H_{e_{2}}^{*} = P o s i t i o n E m b e d d i n g (p_{e_{2}}, H_{e_{2}})

(14)

H_{c o n a c t} = C o n c a t e n a t e (H_{S}, H_{e_{1}}^{*}, H_{e_{2}}^{*})

(15)

Fourth, input the synthesized representation vector

H_{c o n a c t}

into the multilayer perceptual machine to obtain the probability distribution

P (R | S, e_{1}, e_{2})

of the relationships.

P (R| S, e_{1}, e_{2}) = s o f t m a x (W [H_{c o n a c t}] + b)

(16)

where

W

and

b

are the weights and bias of the fully connected layer and softmax is the activation function. Fifth, evaluate the difference between the predicted and the real relationships as follows.

{l o s s}_{r e} = - \sum_{i = 1}^{N} z_{i} l o g (P (R_{i} | S, e_{1}, e_{2}))

(17)

where

N

is the number of relationship categories, and

z_{i}

is the unique hot coded form of the real label. Finally, validate the generated relationship triples and generate the corresponding relationship triad

(e_{1}, R_{p}, e_{2})

.

3.4. K-BERT-Based Entity Recognition Model

K-BERT consists of four knowledge, embedding, visualization, and mask coding layers, as shown in Figure 6. The input text is converted into a sentence tree that is fused with the external knowledge in the embedding and visible layers. Each sentence is converted into a token vector and a visible matrix.

The knowledge layer combines the input text

s = {w_{0}, w_{1}, w_{2}, \dots, w_{n}}

with external knowledge and transforms it into a sentence tree

t = {w_{0}, w_{1}, w_{2}, \dots, w_{l} [(r_{i 0}, w_{i 0}), \dots, (r_{i k}, w_{i k})], \dots, w_{n}}

. The process is divided into token querying in the knowledge graph K (K-Query) and knowledge injection (K-Inject).

In K-Query, the triples of all entities involved in sentence s are matched by querying the triples in ship collision knowledge triples as follows:

E = K_Q u e r y (s, k)

(18)

where

E = [(w_{i}, r_{i 0}, w_{0}), \dots, (w_{i}, r_{i k}, w_{k})]

denotes the set of matched triples for all entities involved in sentences. Injecting E into the corresponding place of sentence

s

, the original sentence serves as the trunk, and each trunk node can be connected to multiple branches. The formula for knowledge injection is as follows.

T = K_I n j e c t (s, E)

(19)

K-BERT introduces Soft Position Embedding (SPE) and a Visibility Matrix (VM) embedding layer to replace the absolute position encoding strategy in BERT. It provides relative position encoding based on the structure of the sentence tree so as to obtain the correct semantic information. To avoid structural semantic bias triggered by external knowledge injection, the visualization matrix is used to restrict the interconnection between each token. Only when a token is located in the same branch of the sentence tree is it considered “visible” in the visualization matrix. The visual matrix can be defined as

M_{i j} = \{\begin{matrix} 0 w_{i} {& w}_{j} \\ - \infty w_{i} | w_{j} \end{matrix}

(20)

where

w_{i} & w_{j}

means

w_{i}

and

w_{j}

visible and

w_{i} | w_{j}

means invisible.

It is necessary to modify the Transformer structure by introducing a mask unit and stacking it with the self-attention mechanism. The equations for mask-self-attention are (21)–(23):

Q^{i + 1}, K^{i + 1}, V^{i + 1} = h^{i} W_{q}, h^{i} W_{k}, h^{i} W_{v}

(21)

S^{i + 1} = s o f t m a x (\frac{Q^{i + 1} K^{i + 1 T} + M}{\sqrt{d_{k}}})

(22)

h^{i + 1} = S^{i + 1} V^{i + 1}

(23)

where

W_{q}

,

W_{k}

and

W_{v}

are the weights in the mask-attention mechanism,

h^{i}

is the hidden state of the ith self-attention, and

d_{k}

is the scaling factor.

M

is the visibility matrix, and

S_{j k}^{i + 1}

is the attention value. If

W_{k}

is masked to

W_{j}

in

M

, then

S_{j k}^{i + 1}

is 0.

3.5. Classification of Accident Severity

3.5.1. Feature Quantification

Relying on the constructed ship collision prevention and control knowledge graph, this paper further extracts the features related to the severity of the accident. Three types of topological structural indices including degree centrality, betweenness centrality, and closeness centrality are used to quantitatively the factors influencing accident severity [71,72,73]. The knowledge graph can be formalized as a directed graph

G = (V, E)

where

V

denotes the set of entity nodes,

E \subseteq V \times R \times V

is the set of triples, and each triple

(h, r, t) \in E

denotes the directed edge connected the head entity

h

to the tail entity

t

with the relationship type

r

. The out-degree and in-degree for node type

C

are defined as in the following Equations (24) and (25), respectively.

S_{o u t}^{(C)} = \sum_{v_{i} \in V_{C}} D_{o u t} (v_{i})

(24)

S_{i n}^{(C)} = \sum_{v_{i} \in V_{C}} D_{i n} (v_{i})

(25)

where

v_{i} \subseteq V_{C}

,

D_{o u t} (v_{i}) = | {(v_{i}, r, v_{j}) \in E} |

, and

D_{i n} (v_{i}) = | {(v_{j}, r, v_{i}) \in E} |

.

The betweenness centrality of node

v_{i}

is given by Equation (26):

C_{B} (v_{i}) = \sum_{\begin{matrix} s \neq v_{i} \neq t \\ s, t \in V \end{matrix}} \frac{σ_{s t} (v_{i})}{σ_{s t}}

(26)

where

σ_{s t} (v_{i})

denotes the number of paths passing through

v_{i}

in all shortest paths from

s

to

t

.

σ_{s t}

denotes the number of all shortest paths from nodes

s

to

t

.

The total value of betweenness centrality for node type

C

is given by Equation (27):

S_{B C}^{(C)} = \sum_{v_{i} \in V_{c}} C_{B} (v_{i})

(27)

The closeness centrality of node

v_{i}

is given by Equation (28):

C_{C} (v_{i}) = \frac{1}{\sum_{\begin{matrix} v_{j} \in V \\ v_{j} \neq v_{i} \end{matrix}} d (v_{i}, v_{j})}

(28)

where

d (v_{i}, v_{j})

denotes the shortest path distance between nodes

v_{i}

and

v_{j}

.

The total value of closeness centrality for node type

C

is given by Equation (29):

S_{C C}^{(C)} = \sum_{v_{i} \in V_{c}} C_{C} (v_{i})

(29)

3.5.2. Accident Severity Classification

The LSTM-RNN model is used for accident severity classification, as shown in Figure 7. It includes the input layer, two LSTMs, two dense layers, and a Softmax layer. The input of the LSTM layer is the quantified features, and the output is based on the result of a Rectified Linear Unit (ReLU) activation function. Two fully connected layers are trained on top of the LSTM layer. They are added to match the output of the LSTM layer and the accident severity. The softmax function is used to activate the output layer. To prevent the RNN model from overfitting, the ReLU activation function and Dropout are employed to enhance the model’s robustness, accelerate the convergence, and improve the generalization ability.

3.6. Evaluation of Experimental Results

The performance of the knowledge extraction model is quantitatively assessed using precision, recall, and the F1-score. Specifically, precision reflects the model’s recognition accuracy by comparing correctly identified entities against all predicted entities. Recall measures recognition comprehensiveness by comparing correctly identified entities against the total number of actual entities in the dataset. As a reconciled harmonic average of these two metrics, the F1-score is utilized as the primary indicator of the model’s integrated performance.

4. Results and Discussion

4.1. Data Collection and Preprocessing

Accident report data is acquired from the official website of China MSA (https://www.msa.gov.cn accessed on 24 February 2026). There was a total of 312 ship-collision investigation reports. A total of 90 reports were manually annotated and used in Section 3.2 (LeBERT entity recognition model enhanced by domain vocabularies); 292 reports (including the 90) were used to construct the ship-collision knowledge-triple dataset for knowledge injection; 20 reports (non-overlapping with the 292) were used in Section 3.4 (K-BERT-based entity recognition model). All 312 reports were used for knowledge graph construction and graph-structure-based severity classification.

Based on the structural characteristics of ship-collision investigation reports, the data is categorized into semi-structured and unstructured components. For semi-structured data, which primarily involves tabular information regarding vessel and personnel characteristics, an automated extraction framework is implemented using the DeepSeek-V3 LLM accessed via its official API. Specifically, the temperature was set to 0 to enforce greedy decoding, thereby completely eliminating randomness and preventing hallucination. Both the frequency-penalty and presence-penalty were set to 0, as information extraction tasks require the precise replication of source terminology without penalizing vocabulary repetition. A “five-step prompt strategy” is designed based on the features of semi-structured text data, as detailed in Table 5, encompassing task requirement, data description, sample data, scenario information, and standard output. First, the task requirement prompt is utilized to clearly define the extraction objectives, data fields, and attribute values, providing precise instructions to the LLM. Second, the data description prompt explicitly describes the detailed raw text format, key–value relationships, and other metadata, establishing a unified template for entity fields, relationship types, and attribute units to enhance extraction stability. Then, the sample data prompt provides instances of raw semi-structured text using formatted separators to differentiate between entities and attributes, thereby eliminating potential parsing conflicts and ensuring accurate parsing by the LLM. Furthermore, domain-specific prior knowledge is injected through the scenario information prompt to enhance task comprehension, while historical dialogue context is maintained within the API message buffer to allow for the iterative optimization of prompting strategies through contextual feedback. Finally, the standard output prompt explicitly regulates the output format to ensure it meets structured storage requirements for subsequent knowledge graph construction or database integration. By activating the LLM’s recognition of data parsing and conversion rules through this strategy, structured data is automatically generated.

Vessel feature data is utilized as validation data, where attributes such as Vessel Name, Former Name, Vessel Type, Port of Registry, IMO Number, and Call Sign are stored in semi-structured text within PDF files. By applying the five-step prompt strategy to inject instructions and constraints into the LLM, vessel feature extraction and conversion are realized, automatically generating standardized “entity-relation-entity” CSV files. The resulting structured data for the vessel “OMEGA”, following the completion of this extraction task, is presented in Table 6.

The preprocessing of unstructured text data includes annotations for the entity and relationship. The standardization and cleaning of a large volume of collected text uniformly replaced the punctuation marks, English characters converted to half-angle form and redundant information elimination. The BIO (Beginning, Inside, Outside) annotation scheme was employed to label entity sequences in the text. Specifically, “B” indicates the beginning token of an entity, “I” denotes the subsequent tokens within the same entity, and “O” represents tokens that do not belong to any entity. Based on this strategy, the entities, including accident, vessel, vessel feature, vessel dynamics, equipment, personnel, personnel feature, organization, time, location, environment, cause, consequence, laws and regulations, and recommendations in the water transportation domain, are annotated. After completing the entity annotation, it is necessary to annotate the semantic relationships between entities further. The RE dataset is formed by {sentence, entity 1, relationship, entity 2} format. The annotated data can be used for entity and relationship automatic recognition experiments training.

4.2. Entity Recognition Based on Domain Vocabulary Enhancement

The dataset constructed in this experiment is derived from 90 ship-collision accident reports and contains 8,556 text segments. About 74.6% are under 100 words, 23.8% are between 100 and 200 words, and only a small fraction exceeds 200 words. For the LeBERT experiments, this dataset was split 9:1 into training and test subsets for model development and evaluation. Table 7 shows the parameter configurations for the various components of the LeBERT-BiLSTM-CRF model, which incorporates the domain vocabulary enhancement mechanism used in the experiment. With forward–backward concatenation, the BiLSTM produces an output whose dimension is two times the hidden size. The base version of the Chinese RoBERTa with Whole Word Masking-Extended (Chinese RoBERTa-wwm-ext), developed by Harbin Institute of Technology and iFLYTEK Research (HFL), is used in this experiment and has approximately 125 million parameters. To improve training efficiency, this pre-trained model was fine-tuned during training using a small learning rate to balance model performance and resource consumption.

Table 8 shows the parameter configuration of each component module of the LeBERT-BiLSTM-CRF model that integrates the domain vocabulary enhancement mechanism in the experiment. Training used the Adam optimizer and minimized the CRF negative log-likelihood (NLL) as a sequence-level objective over the gold label sequences. Gradients were updated by backpropagation. The maximum sequence length was 512; the training/validation batch sizes were 32/16; the learning rates were 3 × 10⁻⁵ for the encoder and 3 × 10⁻³ for the CRF layer.

The NER experiments were run under the Windows 11 operating system, with the PyTorch 2.1.0 deep learning framework and CUDA 12.4 for model construction and training. In order to verify the effectiveness of the proposed LeBERT-BiLSTM-CRF model enhanced by domain vocabulary information on ship collision NER, several typical models are selected for comparison experiments, including the classical CRF model, BiLSTM combined with the CRF model and pre-trained language models with multiple variants of the BiLSTM-CRF combination. All the models are trained and tested based on the ship collision accident dataset, and the experimental results are shown in Table 9. It can be seen that the precision, recall and F1-score of the CRF model are 68.1%, 70.1% and 69.4%, respectively. The results show that it is difficult to effectively capture the complex entity relationship and contextual semantic information in ship collision scenarios by solely relying on the sequence annotation mechanism. After the introduction of the BiLSTM structure on top of the CRF, the precision of the model slightly improves to 69.7%. However, the recall decreases significantly to 63.0%, resulting in a decrease of the overall F1-score to 66.1%. Although the BiLSTM structure increases the complexity and number of parameters of the model, it fails to effectively improve the model’s generalization performance in capturing the domain features in ship collision accidents. After introducing the pre-trained language model BERT on top of CRF, the model performance improved significantly, with precision, recall and F1-score reaching 72.8%, 72.1% and 71.9%, respectively, which is about 3%~6% improvement compared to the CRF model and BiLSTM-CRF model. This experimental result fully reflects the outstanding advantages of the BERT pre-trained model in feature extraction and semantic understanding.

BERT-BiLSTM-CRF leverages BERT to excavate the feature-context dependency relationship on the linguistic level deeply, and BiLSTM to better capture the long-distance dependency and sequential context, which effectively improves entity boundaries recognition and the overall performance of the model, with the precision, recall, and F1-score of the model reaching 80.7%, 79.4% and 80.1%, respectively. Furthermore, Chinese-BERT-WWM and RoBERTa were selected to be compared with BERT. The Chinese-BERT-WWM-BiLSTM-CRF model achieves 85.2%, 85.8% and 85.5% in precision, recall and F1-score, respectively, while the RoBERTa-BiLSTM-CRF model performs even better, with larger than 85.6% in each index. This result indicates that pre-trained models with more detailed masking strategies and training methods possess better performance for ship collision domain data. The LeBERT-BiLSTM-CRF model introduces a specially designed Lexicon Adapter structure to fuse lexical information with character features effectively. This model achieves the optimal performance, with precision, recall and F1-score reaching 86.3%, 87.5% and 86.8%, respectively, which are significantly better than the other models. This fully demonstrates that the incorporation of lexical information plays a key role in improving the performance of entity recognition in the field of ship collision accidents.

In order to more comprehensively evaluate the recognition performance of the LeBERT-BiLSTM-CRF model in the ship collision accident dataset, this paper further compares and analyzes the recognition accuracy of each model on different entity categories, as shown in Table 10. The model relies only on CRF or BiLSTM-CRF for sequence annotation and performs differently in different types of entity recognition. The F1-score of “Vessel” is 89.7, while the F1-score of the categories with sparse data or semantic abstraction, such as “Environment” and “Recommendation”, is as low as 20~40%. It suggests that the basic model is unable to fully learn the feature representations when dealing with data-scarce, semantically ambiguous, or context-dependent entities. With the introduction of pre-trained language models in BERT-CRF and BERT-BiLSTM-CRF, the overall recognition performance is significantly improved, with richer semantic and intrinsic representational capabilities. Especially on high-frequency categories such as “Vessel”, “Personnel”, and “Vessel Feature”, the F1-score is further improved to over 80%. Meanwhile, the recognition performance of some low-frequency categories, such as “Environment” and “Agency,” is also improved, with an F1-score of about 50~70%. With the introduction of stronger pre-models such as Chinese-BERT-WWM and RoBERTa, the model performance continues to improve. Especially in the low-frequency entity categories such as “Recommendation” and “Event Cause”, the F1-score is significantly increased to 44~75%, indicating that under the relatively scarce training samples, the model can still effectively capture semantic features and improve the recognition accuracy. By introducing domain vocabulary into the pre-trained language model of RoBERTa, the model improves recognition of low-frequency class entities significantly. The F1-score for the lowest frequent entities in the “Recommendation” category is 57.5%, which is much higher than that of other models. The same as “Cause”, “Consequence”, “Equipment”, and “Environment”, the corresponding F1-score increases significantly to more than 75%. This result suggests that by incorporating external lexical knowledge enhancement strategies, the model was able to capture the boundaries of named entities more accurately, resulting in improved recognition of low-frequency categories and improved overall performance.

The ablation experiment aims to evaluate the contribution of individual modules to the overall performance of the model. In order to verify the actual effect of the pre-trained language module and lexical enhancement on the model performance, ablation experiments are conducted with BERT-BiLSTM-CRF, RoBERTa-BiLSTM-CRF, and LeBERT-BiLSTM-CRF (Information on vocabulary in common areas) as shown in Table 11. BERT-BiLSTM-CRF as a baseline model used BERT-base-Chinese to generate dynamic word vectors at the embedding layer and finally achieved 80.7% precision, 79.4% recall, and an 80.1% F1-score, which verified the efficacy of the BERT module in-context semantic modelling. The BERT module is replaced with RoBERTa in RoBERTa-BiLSTM-CRF, and the semantic modelling is significantly enhanced, with the F1-score improved from 80.1% to 85.6%. That may be related to the full-word masking strategy and larger-scale corpus training strategy. The LeBERT-BiLSTM-CRF model is based on RoBERTa-BiLSTM-CRF. The results show that the F1-score is improved from 85.6% to 86.3%, indicating that the lexical enhancement strategy makes up for the deficiency of relying only on character representation to a certain extent and can better determine the entity boundary and improve the entity recognition performance. After replacing the generic vocabulary information with the ship collision accident domain vocabulary, it improves performance to 87.5% in recall and 86.8% in F1-score. This result shows that the introduction of domain vocabulary enhances recognizing entities with fuzzy boundaries in the text of ship collision accidents, provides the model with more targeted semantic features to expand coverage of domain entities, and strengthens its capability in capturing low-frequency and proprietary entities. The ablation experiments reveal that the NER method of fusing vocabulary information has strong practicality in fine-grained entity recognition of ship collision accidents.

To further verify the robustness and generalization capability of the LeBERT-BiLSTM-CRF model incorporating domain vocabulary information in the NER task, a 5-fold cross-validation method was employed for assessment. To ensure the consistency and comparability of the evaluation, the model architecture and hyperparameter settings were maintained identically to the configurations specified in Table 7 and Table 8. Table 12 presents the specific performance metrics across five independent experiments. The results indicate highly consistent performance under different data partitions, with mean Precision, Recall, and F1-score reaching 86.28%, 87.13%, and 86.70%, respectively. Notably, the standard deviation of the F1-score is as low as 0.0078; such minimal fluctuation strongly demonstrates the superior robustness of the model. Furthermore, the highest F1-score of 0.8763 was achieved in Fold 4, showcasing the model’s exceptional peak performance.

Table 13 details the classification performance for 15 distinct entity types. Under the rigorous testing of 5-fold validation, the model maintained high recognition accuracy for categories such as “Vessel” and “Laws and Regulations”. Even for sparse categories like “Recommendation” and “Cause,” the average performance remained robust due to the injection of domain-specific lexical information. In conclusion, the 5-fold cross-validation results not only confirm the outstanding robustness of the LeBERT-BiLSTM-CRF model but also prove its reliable generalization capability through sustained high-level performance across multiple unseen data subsets.

Beyond the empirical robustness demonstrated by the cross-validation, it is essential to clarify the minimum data requirement for this extraction pipeline. When fine-tuning high-parameter architectures such as RoBERTa and LeBERT, the corpus scale must be evaluated at the sentence level rather than the document level. As outlined in Section 4.1, the manually annotated dataset used for this specific task was constructed from 90 representative collision reports, yielding a total of 8556 sentence-level text segments. The existing literature on domain-specific NER indicates that leveraging pre-trained language models substantially reduces the dependency on massive annotated datasets. Empirical evidence demonstrates that a high-quality, domain-specific corpus of a few thousand sentences is typically sufficient to effectively fine-tune these models and reach a performance plateau [74]. Therefore, our dataset of 8556 text segments comfortably exceeds this functional threshold.

Furthermore, statistical analysis reveals that approximately 74.6% of the extracted text segments contain fewer than 100 characters, and 23.8% range between 100 and 200 characters. This predominance of short text segments aligns perfectly with the inherent 512-token sequence processing limit inherited by the RoBERTa architecture [75,76]. More importantly, confining the input segments within this optimal processing window effectively prevents context dilution, frequently described as the “lost in the middle” phenomenon. This is a prevalent issue where the attention mechanism’s efficacy severely degrades when processing overly long documents [77]. Coupled with the explicit injection of domain lexicons acting as a strong inductive bias, this specific data scale and length distribution collectively ensure rapid convergence and mitigate the risk of overfitting in the proposed pipeline.

4.3. Analysis of Relationship Extraction

Using the same 90 reports as the domain-vocabulary-enhanced LeBERT-BiLSTM-CRF NER experiments, the RE dataset comprises 11,086 labeled triples, split 8:2 into 8866 training and 2220 validation instances. The BERT-MLP_rule model is adopted for RE in ship collision accidents. The configuration of the BERT-MLP_rule model used in this article is detailed in Table 14. The pre-trained language model used was the base version of Chinese-RoBERTa-wwm-ext. The input to the multilayer perceptron module consisted of the concatenation of the entire sentence context vector output by BERT and the embedding representations of the two entities, resulting in an input dimension three times that of the BERT hidden layer. The MLP architecture included a single hidden layer with 128 neurons. Given the 38 relation types involved in the experiment, the number of nodes in the model’s output layer was set to 38.

The hyperparameter configuration is shown in Table 15. The text length and entity length in the hyperparameters are used to standardize the data input. TheChinese RoBERTa-wwm-ext is employed to reduce the training cost and improve the convergence efficiency. During the training process, the model loss is calculated using the cross-entropy function, the MLP layer weights are updated by backpropagation, and the parameters are optimized using the Adam optimizer.

The performance of the model in recognizing each category of relationships is shown in Figure 8. For categories with more sufficient training data, the model shows better performance, such as “of_PersonFeature” and “at_Time”, with an F1-score of 0.98. For categories with fewer samples, it still shows relatively good performance, such as “rescue” and “occur”, with the F1-score of 0.75 and 0.73, respectively. This indicates that the model is capable of recognizing categories with sufficient training data. However, for the relationships “manipulate_of_NavigationStatus”, “on_of_EngineStatus”, and “at_of_EngineStatus” with small-size data, the recognition accuracies are 98.5, 98.4 and 98.1, respectively. This may be because their semantic and contextual features are more obvious, which enables the model to capture their relationships more accurately with limited data. The Chinese RoBERTa-wwm-ext model has strong contextual semantic capture capability, which is helpful for RE for sparse data. In a word, the model is able to efficiently identify and extract large-scale domain entity relationships in ship collision accidents.

To ensure the prediction quality and authenticity in the RE task, a quality control mechanism based on the correlation analysis between performance metrics and confidence scores was established. As illustrated in Figure 9, the reliability of the large-scale extraction task is quantitatively evaluated by benchmarking the F1-score against the average confidence on the validation set. The experimental data reveal that as the training converges, the F1-score stabilizes above 0.93, while the peak average confidence reaches 0.987. This strong positive correlation demonstrates that the model possesses excellent self-calibration capability, and its confidence scores serve as a reliable benchmark for prediction veracity. Based on this validation, by utilizing high-confidence thresholds as an automated verification criterion, the quality of the subsequent knowledge graph construction can be effectively ensured, providing quantitative evidence for the authenticity of the knowledge graph.

4.4. Entity Recognition Based on K-BERT-BiLSTM-CRF

Knowledge injection uses a domain triple set extracted from 292 ship-collision investigation reports, including the 90 reports used to train the domain-vocabulary-enhanced LeBERT-BiLSTM-CRF model. Entities are recognized by the domain-vocabulary-enhanced LeBERT-BiLSTM-CRF model, and relations by the BERT-MLP_rule model. The 20-report dataset used to train/evaluate the K-BERT model is disjoint from this triple pool.

Table 16 shows the parameter configurations for each module of the K-BERT-BiLSTM-CRF model used in the actual experiment. Because the BiLSTM module concatenates the outputs of the forward and backward LSTM models, the hidden layer dimension of the module output is twice that of a single LSTM hidden layer. The base version of the Chinese RoBERTa-wwm-ext pre-trained model used in this experiment has approximately 125 million parameters. To improve training efficiency, the RoBERTa-wwm-ext pre-trained model was fine-tuned during training using a small learning rate to balance model performance and resource consumption.

Hyperparameter settings for the K-BERT-based entity recognition model are shown in Table 17. The Adam optimizer was chosen to update the model parameters during the training process, and NLL was used as the loss function to measure the difference between the model output and the real values. The training process was set with a maximum sequence length of 512, a training batch size of 16, a validation batch size of 8, a learning rate of 1 × 10⁻⁵ for the BERT layer, a learning rate of 1 × 10⁻³ for the CRF layer, and 100 epochs.

In order to verify the effectiveness of the proposed K-BERT-BiLSTM-CRF model, it is compared with BERT-BiLSTM-CRF, RoBERTa-BiLSTM-CRF, and LeBERT-BiLSTM-CRF. RoBERTa-BiLSTM-CRF improves the BERT-BiLSTM-CRF architecture by replacing the Chinese RoBERTa with Whole Word Masking-Extended (Chinese RoBERTa-wwm-ext) as the pre-training language model with the Chinese-BERT-Base module. LeBERT-BiLSTM-CRF introduced domain vocabulary information. The K-BERT-BiLSTM-CRF model takes the self-constructed ship collision accident knowledge graph as external knowledge and injects it into BERT for training. As shown in Table 18, the K-BERT-BiLSTM-CRF model achieves the optimal performance with precision, recall, and F1-score at 84.5%, 84.4%, and 84.7%, respectively. It indicated the effectiveness of introducing the domain knowledge graph for improving the NER performance. The BERT-BiLSTM-CRF, without introducing a domain enhancement mechanism, has a relatively low F1-score, only 78.0%. By using the Chinese RoBERTa-wwm-ext instead of BERT-base-Chinese as the encoder, the model performance is significantly improved with an F1 of 81.0%, suggesting that a stronger pre-trained language model can help to improve the performance. LeBERT-BiLSTM-CRF improves the F1-score to 83.5%, indicating that semantic enhancement has a positive impact on NER. The proposed K-BERT-BiLSTM-CRF model is better at recognizing domain terms and complex entities by introducing the domain knowledge of ship collision accidents. It not only keeps the semantic modelling capability of Chinese RoBERTa-wwm-ext but also utilizes the knowledge triples injected by K-BERT to provide support for context modelling. K-BERT-BiLSTM-CRF has significant advantages in the task of NER in the field of water transportation.

4.5. Knowledge Graph of Ship Collision Accidents

For knowledge-graph construction, a validated extraction pipeline was adopted: LeBERT-BiLSTM-CRF (domain-vocabulary enhanced) for NER and Chinese RoBERTa-wwm-ext-MLP for RE, applied to all 312 ship-collision investigation reports. The resulting ship collision prevention and control knowledge graph contains 35,000 entities and 320,000 relationships. The distributions of entities and relationships of the ship collision prevention and control knowledge graph are shown in Figure 10 and Figure 11, respectively. Except for the common entity types such as time, personnel and location, it integrates entity types specific to ship collision incident knowledge, including ship dynamics, environment, recommendations, etc. That provides a comprehensive description of the evolution of ship collision incidents.

To further validate the effectiveness of the constructed knowledge graph, this paper compares the number of entities and relationships with other knowledge graphs in the water transportation domain, as shown in Table 19. The constructed knowledge graph for ship collision prevention and control is superior to the existing knowledge graphs in the water transportation domain across the entity and relation types, entity and relation volume, and the considered data type. It supports both semi-structured and unstructured data, covering 38 types of relations and 15 types of entities, with the highest number of entities and relations. Through the fine-grained entity and relationship types division, it can facilitate the applications in different scenarios, for instance, the association query and analysis among subject–space–time–behavior-driven accident evolution process, ship activity, and accident causation.

Knowledge graphs usually represent entities and their semantic relationships in the form of ternary groups, which have good expressive capability in characterizing structured knowledge. However, the ternary may lead to increased computational complexity in large-scale applications, limiting the operational efficiency of graphs. To address this problem, knowledge graph embedding techniques are used to map graph data to a real vector space, making it easier for intelligent algorithms (e.g., machine learning, deep learning, etc.) to utilize the information hidden in the graph data. This paper uses the t-distributed Stochastic Neighbor Embedding (t-SNE) method to reduce the dimension of the graph embedding from 768 to a two-dimensional plane. The K-means clustering method is used to analyze and visualize the embedding nodes of the graph, as shown in Figure 12. The clustering results align with the 15 types of entities in the ship collision prevention and control knowledge graph, indicating that the knowledge representation effectively encodes the semantic concepts of knowledge entities related to ship collision incidents. Furthermore, nodes such as “Accident,” “Vessel,” and “Vessel feature” formed distinct clusters, indicating that these nodes are similar in high-dimensional embedding space and maintain this similarity when reduced to two-dimensional space.

The shortest path between the two ships involved in the accident can demonstrate the comprehensive ship collision accident knowledge contained in the constructed knowledge graph. The shortest path query equation is as follows:

\begin{matrix} M A T C H p a t h = (n o d e 1) - [* n] - (n o d e 2) \\ W H E R E n o d e 1 . n a m e =' a' A N D n o d e 2 . n a m e =' b' \\ R E T U R N p a t h \end{matrix}

(30)

where

M A T C H p a t h

represents matched query path,

* n

represents that the path length is

n

, nodes

a

and

b

usually represent two different entities, and

R E T U R N p a t h

represents the returned query path information. Taking the “Ansheng 22” ship and “Minshiyu 06256” ship collision accident as an example, the shortest path in Figure 13 shows related entities such as the involved vessels, the companies of the involved vessels, vessel dynamics, accident locations, personnel on board, vessel equipment, accident causes, consequences of the accident, violated laws and regulations, and recommendations, as well as their relationships. This allows for a comprehensive association analysis of the entire process of a vessel collision incident.

Maritime supervisors can quickly obtain the overall overview of ship collision accidents through the shortest path query. Based on the subject–space–time–behavior analysis of the spatial and temporal process of the accident, Figure 14 shows the ship dynamics of the ZTE 2 (ship name) during the collision accident and can support the ship collision situation awareness combined with the relevant AIS information. As shown in Figure 15, the knowledge graph also supports the query of ship inspection activities so that maritime supervisors can understand whether the ship has violated the necessary inspection requirements. Ship collision prevention and control knowledge graph combined with NLP technology can improve the accuracy of NER in the field of water traffic accidents, analyze water traffic accidents more efficiently, and provide knowledge support for intelligent transportation decision-making.

4.6. Classification of Accidents Based on the Constructed Knowledge Graph

Accident severity can be determined according to the relevant standards of the Maritime Safety Administration of the Ministry of Transport of China. The classification criteria are mainly based on factors such as injury and death, economic losses, and the degree of environmental pollution caused by the accident. Although the current maritime regulations have clearly defined the accident severity level, the related classification mechanism mainly relies on manually filling in the consequences of accidents. It is difficult to satisfy the data-driven automatic identification of accident severity levels in practical application scenarios, such as emergency response, risk warning, and auxiliary supervision. The classification model based on a knowledge graph can quickly complete the NER after receiving the initial accident text and then realize the intelligent identification of accident severity level with good real-time accuracy and scalability.

The LSTM deep learning classification model is employed to predict the severity of injuries in ship collisions based on a knowledge graph. By introducing the topological information of the knowledge graph, the model is able to capture the complex relationship between the accident features so as to improve the accuracy. The topological features include the number of nodes, the sum of in-degrees, the sum of out-degrees, betweenness centrality, and closeness centrality. The Z-score normalization method was employed to address the differences in numerical magnitude between different topological features. To meet the model input requirements, the accident severity category variable was encoded using one-hot encoding with 0 (minor accident), 1 (ordinary accident), 2 (relatively serious accident), and 3 (serious accident). The Synthetic Minority Oversampling Technique (SMOTE) is an oversampling method designed to address class imbalance issues. It generates new synthetic samples by interpolating between existing minority class samples, thereby enhancing the learning effectiveness and classification performance of the classification model. The experiment is completed on a Core i9 CPU system equipped with 32GB RAM, and the stochastic gradient descent optimization algorithm is used for parameter updating. The hyperparameters are shown in Table 20.

This paper uses LSTM-RNN to classify the water transportation accident data and compare it with MLP, Random Forest (RF), and XGBoost models. Figure 16 shows the loss changes in the LSTM-RNN model during the training process. After 200 rounds of iterative training, the LSTM-RNN model already had a good performance. In the first 50 rounds of iterative training, the loss decreased slowly. The loss values decreased rapidly in the 50–125 rounds of iterative training, gradually slowed down and eventually stabilized after about 125 rounds of iterative training. Figure 17 shows the trend in training accuracy of the LSTM-RNN model during 200 iterations of training. During the first 50 iterations of training, the accuracy increased slowly. Between 50 and 125 iterations, the accuracy increased rapidly. After approximately 125 iterations, the accuracy on both the training and test sets gradually slowed down and eventually stabilized.

Figure 18, Figure 19, Figure 20 and Figure 21 show the confusion matrix of LSTM-RNN, MLP, XGBoost and RF models on the test set. The LSTM-RNN model demonstrated the best overall performance in recognizing the four types of accidents. Among them, the prediction accuracy for ordinary accidents was the highest, with 41 samples correctly identified and only 3 misclassified into other categories. For relatively serious accidents, 23 samples were correctly predicted, with only 1 misclassified. Of the 18 serious accident samples, 16 were correctly classified, with only 2 misclassified into other categories. Additionally, 16 minor accidents were accurately classified. The XGBoost model performed well in identifying relatively serious and serious accidents. The confusion matrix shows that the model successfully recognized all 24 samples of relatively serious accidents and achieved 17 correct predictions for serious accidents, with only 1 misclassification. However, its performance was slightly weaker in classifying minor and ordinary accidents, with 4 and 6 misclassifications, respectively. The random forest model also achieved 100% accuracy in identifying relatively serious accidents. However, its performance in predicting ordinary accidents was slightly inferior to that of the LSTM-RNN (35 correct and 9 misclassified). For minor accidents, 15 samples were correctly classified. Notably, this model misclassified 8 serious accidents as relatively serious accidents, indicating a relatively weaker ability to distinguish between adjacent severity levels. The MLP model performed less well compared to the other three models, especially in identifying ordinary and serious accidents, where its accuracy was relatively low. Only 29 ordinary accidents were correctly predicted, with as many as 15 misclassified. For serious accidents, 15 were correctly identified, and 3 were misclassified. While the model achieved some effectiveness in recognizing relatively serious (21 samples) and minor accidents (16 samples), its overall stability was insufficient, making it difficult to classify multi-category accident data accurately.

In order to further evaluate the comprehensive performance of different models in water traffic accident severity classification, this paper introduces four indicators, namely, accuracy, precision, recall and F1-score, to quantitatively analyze the prediction performance, as shown in Table 21. The overall accuracy of the three models in accident severity classification does not exceed 90% except for the LSTM-RNN model. Among them, the LSTM-RNN model has the most superior performance with an accuracy of 92.31%, which is significantly higher than the other models. In contrast, the MLP model performed weakly at 77.08%. XGBoost and RF models achieved an intermediate level of 80% to 90%, respectively. In terms of precision, the LSTM-RNN model is still dominant at 91.84%, indicating its high reliability in predicting and its capability to effectively reduce false alarms. XGBoost and RF achieve an intermediate level of 89.12% and 82.40%, respectively, while the MLP model is only 79.00%. As for the recall rate, LSTM-RNN still leads with a level of 91.70%, followed by XGBoost (89.65%), while the RF and MLP models are at 79.61% and 81.84%, respectively, which suggests that both of them are somewhat deficient in terms of recognition completeness. F1-score, as the reconciled average of precision rate and recall rate, becomes an important indicator of the comprehensive performance of the model. The LSTM-RNN model with F1-score reaches 91.73% and is far more than the other models, indicating that it achieves a good balance between accuracy and generalization ability, and possesses stronger prediction capability.

According to the model performance curves shown in Figure 22 and Figure 23, a more in-depth analysis can be conducted regarding the discriminative capability and accuracy of different models. From the macro-averaged ROC curve, both the LSTM-RNN and XGBoost models achieve an AUC value of 0.98, demonstrating the strongest classification ability. This indicates that these two models possess highly accurate favourable rates and low false favourable rates when handling the classification of waterway traffic accident severity. The RF model yields an AUC value of 0.97, which is slightly lower but still indicates excellent performance. In comparison, the MLP model has an AUC value of 0.95, which is lower than the other models, suggesting that its capability to distinguish between different categories is relatively weaker. Further combining the macro-averaged PR curve, it can be found that LSTM-RNN and XGBoost are also at the top with a PR-AUC value of 0.94, indicating that both of them are able to maintain high prediction accuracy while ensuring the recall rate, which is suitable for tasks with high requirements for accurate recognition in real scenarios. The PR-AUC of RF is 0.90, with overall good stability, while the PR-AUC of the MLP model is only 0.86, showing that its performance is weaker when facing imbalance data, which is prone to detection or misjudgment. Based on both ROC and PR evaluation metrics, it can be concluded that the LSTM-RNN and XGBoost models exhibit superior overall discriminative capability, predictive accuracy, and stability.

Considering the results from the confusion matrix, accuracy, precision, recall, and F1-score, as well as the ROC and PR curves, the LSTM-RNN model, driven by graph features, leverages its excellent sequence modelling capability and robustness to multi-class imbalanced accident severity classification. Compared with traditional methods that rely solely on text vectors or statistical features for accident severity prediction, the ship collision prevention and control knowledge graph approach extracts entities and relationships from accident reports and forms a semantic structure [67,84]. The knowledge graph has complex network topology features. To evaluate the effect of incorporating knowledge graph topological features on accident severity prediction, this study systematically compares the performance of the LSTM-RNN model under various feature combinations. As shown in Table 22, experimental results indicate that using only the Nodes characteristic as input, the model achieves an accuracy of 83.65% and an F1-score of 84.63%. The incremental introduction of individual topological components led to consistent performance gains: the combination of the Nodes characteristic with Betweenness centrality, Degree centrality, and Closeness centrality improved the F1-scores to 87.66%, 89.56%, and 89.65%, respectively. Notably, Closeness centrality provided the most significant boost to accuracy, reaching 90.38%, suggesting that the global proximity of entities within the knowledge graph is a critical factor in discriminating accident levels. Finally, when the Nodes characteristic and Topological features are jointly input, the accuracy reaches its peak at 92.31%, with the F1-score improving to 91.73%. This validates the significant advantages of the knowledge graph’s topological structure and demonstrates a clear synergistic effect among different topological dimensions in enhancing accident severity classification performance.

Although the optimized model achieved high accuracy on the test set, 6 samples were misclassified. To investigate the underlying causes of these errors, two representative misclassified cases (Case 74 and Case 91) were selected from the test set for qualitative analysis. This analysis is strictly based on the core features extracted from the knowledge graphs, namely the Nodes characteristic (representing the scale of entity nodes) and the Topological features (including degree centrality, betweenness centrality, and closeness centrality).

The analysis in Table 23 shows that the primary source of error is the nonlinear mapping bias between the graph’s structural representation (Nodes characteristic and topological features) and the actual accident severity. Overestimation bias occurs when a “Relatively serious” accident has a complex narrative structure, leading to an inflated Nodes characteristic and betweenness centrality. This structural “busyness” misleads the model into predicting a higher severity level. Conversely, underestimation bias occurs when a “Serious” accident involves few entities and a simple causal chain, resulting in a restricted Nodes characteristic along with low degree centrality and closeness centrality. This structural “simplicity” masks the true severity of the accident’s consequences.

These rare edge cases highlight the inherent boundary of relying exclusively on pure topological structures for classification. However, even without introducing any additional semantic weighting, the current LSTM-RNN model achieved accurate classification in the vast majority (over 91%) of real-world test samples. This robust performance thoroughly validates that decoding unstructured accident texts into quantifiable knowledge graph topological features is a highly effective paradigm for accident severity assessment. It successfully captures the complex physical and causal networks underlying marine accidents. While mitigating the mapping bias in extreme long-tail samples—where structural complexity and semantic severity mismatch—would require integrating deep semantic embeddings, the current pure-structure framework has already demonstrated overwhelming superiority over traditional baseline models. It serves as a highly robust, scalable, and accurate tool for practical maritime traffic safety management.

5. Conclusions and Future Work

This study establishes a comprehensive framework covering accident data acquisition, standardized knowledge representation, knowledge extraction method development, and knowledge graph construction and application for waterway traffic accident classification. By comprehensively analyzing ship accidents, an ontology model incorporating event, spatiotemporal behavior, cause, consequence, responsible party, and disposition decision is proposed. To address issues such as blurred boundaries of domain terms, the difficulty of identifying long entities, and low-frequency vocabulary recognition, a combined LeBERT and BiLSTM-CRF NER model incorporating maritime domain vocabulary was designed. This model demonstrated significant advantages in accurately identifying domain-specific terminology and long entities. Furthermore, to effectively extract complex semantic relationships from texts, a BERT-MLP_rule RE model incorporating semantic information was proposed. This model systematically extracted and identified semantic relations with an overall F1-score of up to 94.5%. Furthermore, a large-scale ship collision accident knowledge graph was constructed, containing 35,000 nodes and 320,000 relationships. It fully supports semantic queries and graph reasoning analysis in complex accident scenarios. Finally, a method combining knowledge graph topological features with an LSTM-RNN model was explored for accident severity classification. Comparative experiments showed that the proposed classification model achieved an accuracy of 92.31%, demonstrating high accuracy and generalization capability. This research not only confirms the important application value of knowledge graphs in waterway traffic accident severity classification but also provides a methodological foundation for future risk assessment and intelligent early warning system development.

Although this study conducted in-depth knowledge mining and intelligent application of ship collision accident text data, certain limitations need further exploration in future research. Firstly, the current knowledge graph is primarily constructed from historical accident text data. Future studies could incorporate multi-source heterogeneous data, such as AIS data, satellite remote sensing images, and video surveillance data, to enhance the knowledge graph’s capability in real-time scenarios and dynamic accident management. Secondly, a crucial subsequent step involves bridging the gap between retrospective semantic modeling and quantitative collision-risk interpretation. By integrating navigational variables and COLREGs, future research could explore the inference of quantitative risk indices, such as the CRI, to establish concrete decision thresholds and timing for navigation support systems. Furthermore, the transition from descriptive semantic patterns to actionable navigational insights is essential. It is intended that extracted accident knowledge will be integrated into real-time or near-real-time navigation support for Maritime Autonomous Surface Ships. This process would involve mapping semantic accident severity levels to navigational risk escalation and corresponding avoidance maneuvers, thereby providing proactive decision support rather than solely retrospective analysis. Moreover, to address the mapping bias between structural topology and actual semantic severity observed in rare edge cases, future iterations of the classification model will explore the multi-modal fusion of deep semantic embeddings with graph topological features. Finally, based on the further development of graph-driven intelligent decision-making technologies, graph embedding-based accident risk assessment and early warning models can be explored to realize more comprehensive maritime safety management.

Author Contributions

Conceptualization, H.Y.; methodology, H.Y., X.X., Z.G., T.W. and L.X.; resources, H.Y.; writing—original draft preparation, H.Y., X.X. and Z.G.; writing—review and editing, H.Y., T.W. and L.X.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2022YFC3302703), the National Natural Science Foundation of China (No. 42371415 and No. 42101429), and the Young Elite Scientists Sponsorship Program by China Association for Science and Technology (CAST) (No. YESS20220491).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available because they involve privacy restrictions of maritime authorities and confidentiality restrictions related to a key national research and development project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, X.; Wu, H.; Han, B.; Liu, W.; Montewka, J.; Liu, R.W. Orientation-aware ship detection via a rotation feature decoupling supported deep learning approach. Eng. Appl. Artif. Intell. 2023, 125, 106686. [Google Scholar] [CrossRef]
Chauvin, C.; Lardjane, S.; Morel, G.; Clostermann, J.P.; Langard, B. Human and organisational factors in maritime accidents: Analysis of collisions at sea using the HFACS. Accid. Anal. Prev. 2013, 59, 26–37. [Google Scholar] [CrossRef] [PubMed]
Kayiran, B.; Yazir, D.; Aslan, B. Data-driven Bayesian network approach to maritime accidents involved by dry bulk carriers in Turkish search and rescue areas. Reg. Stud. Mar. Sci. 2023, 67, 103193. [Google Scholar] [CrossRef]
Hänninen, M. Bayesian networks for maritime traffic accident prevention: Benefits and challenges. Accid. Anal. Prev. 2014, 73, 305–312. [Google Scholar] [CrossRef] [PubMed]
Gan, L.; Ye, B.; Huang, Z.; Xu, Y.; Chen, Q.; Shu, Y. Knowledge graph construction based on ship collision accident reports to improve maritime traffic safety. Ocean Coast. Manag. 2023, 240, 106660. [Google Scholar] [CrossRef]
Qu, X.; Meng, Q.; Suyi, L. Ship collision risk assessment for the Singapore Strait. Accid. Anal. Prev. 2011, 43, 2030–2036. [Google Scholar] [CrossRef]
Fan, C.; Bolbot, V.; Montewka, J.; Zhang, D. Advanced Bayesian study on inland navigational risk of remotely controlled autonomous ship. Accid. Anal. Prev. 2024, 203, 107619. [Google Scholar] [CrossRef]
Namgung, H.; Kim, J.S. Collision risk inference system for maritime autonomous surface ships using COLREGs rules compliant collision avoidance. IEEE Access 2021, 9, 7823–7835. [Google Scholar] [CrossRef]
Namgung, H. Local route planning for collision avoidance of maritime autonomous surface ships in compliance with COLREGs rules. Sustainability 2021, 14, 198. [Google Scholar] [CrossRef]
Yu, H.; Meng, Q.; Fang, Z.; Liu, J. Literature review on maritime cybersecurity: State-of-the-art. Navigation 2023, 76, 453–466. [Google Scholar]
Yu, H.; Guo, Z.; Fang, Z.; Xu, L.; Xu, J. An environment–kinetic compound space–time prism-based approach for assessing multi-ship collision risk in confined water. J. Navig. 2025, 78, 58–79. [Google Scholar] [CrossRef]
Fang, Z.; Yu, H.; Ke, R.; Shaw, S.L.; Peng, G. Automatic identification system-based approach for assessing the near-miss collision risk dynamics of ships in ports. IEEE Trans. Intell. Transp. Syst. 2018, 20, 534–543. [Google Scholar] [CrossRef]
Yu, H.; Fang, Z.; Murray, A.T.; Peng, G. A direction-constrained space-time prism-based approach for quantifying possible multi-ship collision risks. IEEE Trans. Intell. Transp. Syst. 2019, 22, 131–141. [Google Scholar] [CrossRef]
Chen, X.; Xin, Z.; Zhang, H.; Wu, Y.; Wei, C.; Postolache, O. Vision Transformer-Based Image Dehazing for Climate-Resilient Maritime Navigation. IEEE Trans. Intell. Transp. Syst. 2026, 1–13. [Google Scholar] [CrossRef]
Wang, Z.; Shao, F.; Zhang, C.; Yu, H.; Chen, S.; Wu, L. Collision Avoidance Pattern with Collective Wisdom: Ship Action Decision-Making Azimuth Map Construction Based on COLREGs. J. Mar. Sci. Eng. 2025, 13, 2240. [Google Scholar] [CrossRef]
Yu, H.; Meng, Q.; Fang, Z.; Liu, J.; Xu, L. A review of ship collision risk assessment, hotspot detection and path planning for maritime traffic control in restricted waters. J. Navig. 2022, 75, 1337–1363. [Google Scholar] [CrossRef]
Yu, C.; Mao, Z.; Gao, S. An Approach of Extracting Information for Maritime Unstructured Text Based on Rules. J. Transp. Inf. Saf. 2017, 35, 40–47. (In Chinese) [Google Scholar]
Liu, D.; Cheng, L. MAKG: A maritime accident knowledge graph for intelligent accident analysis and management. Ocean Eng. 2024, 312, 119280. [Google Scholar] [CrossRef]
He, L.; Wang, S.; Cao, X. Multi-feature fusion method for Chinese shipping companies credit named entity recognition. Appl. Sci. 2023, 13, 5787. [Google Scholar] [CrossRef]
Hettne, K.M.; Stierum, R.H.; Schuemie, M.J.; Hendriksen, P.J.; Schijvenaars, B.J.; Mulligen, E.M.; Kleinjans, J.; Kors, J.A. A dictionary to identify small molecules and drugs in free text. Bioinformatics 2009, 25, 2983–2991. [Google Scholar] [CrossRef]
Bikel, D.M.; Schwartz, R.; Weischedel, R.M. An algorithm that learns what’s in a name. Mach. Learn. 1999, 34, 211–231. [Google Scholar] [CrossRef]
Srihari, R.K. A hybrid approach for named entity and sub-type tagging. In Proceedings of the Sixth Applied Natural Language Processing Conference, Seattle, WA, USA, 29 April–4 May 2000; pp. 247–254. [Google Scholar]
Borthwick, A.; Sterling, J.; Agichtein, E.; Grishman, R. NYU: Description of the MENE named entity system as used in MUC-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7), Fairfax, VA, USA, 29 April–1 May 1998. [Google Scholar]
Zhou, G.; Su, J. Exploring deep knowledge resources in biomedical name recognition. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP), Geneva, Switzerland, 28–29 August 2004; COLING: Geneva, Switzerland, 2004; pp. 99–102. [Google Scholar]
Yu, H.; Han, Y.; Xu, L.; Wei, T.; Zhang, X. Incorporating knowledge graph and deep learning method for the classification of ship offense activities. Reg. Stud. Mar. Sci. 2026, 94, 104785. [Google Scholar] [CrossRef]
Shen, J.; Wang, X.; Li, S.; Yao, L. Exploiting rich features for Chinese named entity recognition. In Proceedings of the 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering, Hangzhou, China, 15–16 November 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 278–282. [Google Scholar]
Srivastava, S.; Sanglikar, M.; Kothari, D.C. Named entity recognition system for Hindi language: A hybrid approach. Int. J. Comput. Linguist. 2011, 2, 10–23. [Google Scholar]
Meenachisundaram, T.; Dhanabalachandran, M. Biomedical Named Entity Recognition Using the SVM Methodologies and bio Tagging Schemes. Rev. Chim. 2021, 72, 52–64. [Google Scholar] [CrossRef]
Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 2011, 12, 2493–2537. [Google Scholar]
Strubell, E.; Verga, P.; Belanger, D.; McCallum, A. Fast and accurate entity recognition with iterated dilated convolutions. arXiv 2017, arXiv:1702.02098. [Google Scholar] [CrossRef]
Zhu, Q.; Li, X.; Conesa, A.; Pereira, C. GRAM-CNN: A deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics 2018, 34, 1547–1554. [Google Scholar] [CrossRef]
Gui, T.; Ma, R.; Zhang, Q.; Zhao, L.; Jiang, Y.G.; Huang, X. CNN-Based Chinese NER with Lexicon Rethinking. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; International Joint Conferences on Artificial Intelligence: CA, USA, 2019; pp. 4982–4988. [Google Scholar]
Kong, J.; Zhang, L.; Jiang, M.; Liu, T. Incorporating multi-level CNN and attention mechanism for Chinese clinical named entity recognition. J. Biomed. Inform. 2021, 116, 103737. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar] [CrossRef]
Cetoli, A.; Bragaglia, S.; O’Harney, A.D.; Sloan, M. Graph convolutional networks for named entity recognition. arXiv 2017, arXiv:1709.10053. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Peters, M.E.; Ruder, S.; Smith, N.A. To tune or not to tune? Adapting pretrained representations to diverse tasks. arXiv 2019, arXiv:1903.05987. [Google Scholar] [CrossRef]
Li, H.; Yu, L.; Lyu, M.; Qian, Y. Fusion deep learning and machine learning for multi-source heterogeneous military entity recognition. In Proceedings of the 2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), Shenyang, China, 11–13 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 535–539. [Google Scholar]
Gao, F.; Zhang, L.; Wang, W.; Zhang, B.; Liu, W.; Zhang, J.; Xie, L. Named Entity Recognition for Equipment Fault Diagnosis Based on RoBERTa-wwm-ext and Deep Learning Integration. Electronics 2024, 13, 3935. [Google Scholar] [CrossRef]
Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-training with whole word masking for Chinese bert. IEEE ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
Xin, Y.; Li, S.; Meiling, L.; Keyan, X.; Cheng, L.; Xuchao, D. Knowledge graph construction with BERT-BiLSTM-IDCNN-CRF and graph algorithms for metallogenic pattern discovery: A case study of pegmatite-type lithium deposits in China. Ore Geol. Rev. 2025, 176, 106514. [Google Scholar]
Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; Volume 34, pp. 2901–2908. [Google Scholar]
Liu, X.; Zhao, J.; Yao, J.; Zheng, H.; Wang, Z. Sequential lexicon enhanced bidirectional encoder representations from transformers: Chinese named entity recognition using sequential lexicon enhanced BERT. PeerJ Comput. Sci. 2024, 10, e2344. [Google Scholar] [CrossRef]
Fundel, K.; Küffner, R.; Zimmer, R. RelEx—Relation extraction using dependency parse trees. Bioinformatics 2007, 23, 365–371. [Google Scholar] [CrossRef]
Deng, B.; Fan, X.; Yang, L. Entity relation extraction method using semantic pattern. Comput. Eng. 2007, 33, 212–214. (In Chinese) [Google Scholar]
Kambhatla, N. Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, Barcelona, Spain, 21–26 July 2004; Association for Computational Linguistic: Stroudsburg, PA, USA, 2004; pp. 178–181. [Google Scholar]
De Saeger, S.; Torisawa, K.; Tsuchida, M.; Kazama, J.; Wu, C.; Ohtake, K.; Uchimoto, K. Relation acquisition using word classes and partial patterns. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; Association for Computational Linguistic: Stroudsburg, PA, USA, 2011; pp. 825–835. [Google Scholar]
Giuliano, C.; Lavelli, A.; Pighin, D.; Romano, L. FBK-IRST: Kernel methods for semantic relation extraction. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, 23–24 June 2007; Association for Computational Linguistic: Stroudsburg, PA, USA, 2007; pp. 141–144. [Google Scholar]
Zeng, D.; Liu, K.; Chen, Y.; Zhao, J. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistic: Stroudsburg, PA, USA, 2015; pp. 1753–1762. [Google Scholar]
Hu, L.; Zhang, L.; Shi, C.; Nie, L.; Guan, W.; Yang, C. Improving distantly-supervised relation extraction with joint label embedding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistic: Stroudsburg, PA, USA, 2019; pp. 3821–3829. [Google Scholar]
Nayak, T.; Ng, H.T. Effective modeling of encoder-decoder architecture for joint entity and relation extraction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; Volume 34, pp. 8528–8535. [Google Scholar]
Zhang, Y.; Qi, P.; Manning, C.D. Graph convolution over pruned dependency trees improves relation extraction. arXiv 2018, arXiv:1809.10185. [Google Scholar] [CrossRef]
Guo, Z.; Zhang, Y.; Lu, W. Attention guided graph convolutional networks for relation extraction. arXiv 2019, arXiv:1906.07510. [Google Scholar]
Wei, Q.; Ji, Z.; Si, Y.; Du, J.; Wang, J.; Tiryaki, F.; Wu, S.; Xu, H. Relation extraction from clinical narratives using pre-trained language models. In AMIA Annual Symposium Proceedings; American Medical Informatics Association: Bethesda, MD, USA, 2019; Volume 2019, p. 1236. [Google Scholar]
Xu, B.; Li, S.; Zhang, Z.; Liao, T. BERT-PAGG: A Chinese relationship extraction model fusing PAGG and entity location information. PeerJ Comput. Sci. 2023, 9, e1470. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.; Yu, X.; Magoua, J.J.; Cui, J.; Luan, H.; Lin, D. Integrating machine learning and a large language model to construct a domain knowledge graph for reducing the risk of fall-from-height accidents. Accid. Anal. Prev. 2025, 215, 108009. [Google Scholar] [CrossRef] [PubMed]
Yu, H.; Fang, Q.; Fang, Z.; Xu, L.; Liu, J. Carbon footprints: Uncovering spatiotemporal dynamics of global container ship emissions during 2015–2021. Mar. Pollut. Bull. 2024, 209, 117165. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, Z.; Hu, H. Risk propagation mechanisms in railway systems under extreme weather: A knowledge graph-based unsupervised causation chain approach. Reliab. Eng. Syst. Saf. 2025, 260, 110976. [Google Scholar] [CrossRef]
Xu, L.; Chen, N.; Chen, Z.; Zhang, C.; Yu, H. Spatiotemporal forecasting in earth system science: Methods, uncertainties, predictability and future directions. Earth Sci. Rev. 2021, 222, 103828. [Google Scholar] [CrossRef]
Bag, S.; Sarkar, S.; Bose, I. Enhancing cybersecurity risk assessment using temporal knowledge graph-based explainable decision support system. Decis. Support Syst. 2025, 189, 114526. [Google Scholar] [CrossRef]
Peng, X.; Jiang, H.; Chen, J.; Liu, M.; Chen, X. Research and Construction of Knowledge Map of Golden Pomfret Based on LA-CANER Model. J. Mar. Sci. Eng. 2025, 13, 400. [Google Scholar] [CrossRef]
Yu, H.; Bai, X.; Liu, J. Ship Behavior Pattern Analysis Based on Graph Theory: A Case Study in Tianjin Port. J. Mar. Sci. Eng. 2023, 11, 2227. [Google Scholar] [CrossRef]
Gan, L.; Chen, Q.; Zhang, D.; Zhang, X.; Zhang, L.; Liu, C.; Shu, Y. Construction of knowledge graph for Flag State Control (FSC) inspection for ships: A case study from China. J. Mar. Sci. Eng. 2022, 10, 1352. [Google Scholar] [CrossRef]
Yu, H.; Jiang, C.; Fang, Q.; Wei, T.; Xu, L. Deep learning driven spatiotemporal prediction of global carbon emissions from container shipping. Transp. Res. Part D Transp. Environ. 2026, 151, 105169. [Google Scholar] [CrossRef]
Li, S.; Xu, J.; Chen, X.; Zhang, Y.; Zheng, Y.; Postolache, O. Maritime Traffic Knowledge Discovery via Knowledge Graph Theory. J. Mar. Sci. Eng. 2024, 12, 2333. [Google Scholar] [CrossRef]
Yu, H.; Wu, W.; Zhang, X.; Fang, Z.; Fu, X.; Xu, L.; Liu, J. Optimization-based global liquefied natural gas shipping network management for emission reduction. Ocean Eng. 2025, 321, 120366. [Google Scholar] [CrossRef]
Wan, H.; Fu, S.; Zhang, M.; Xiao, Y. A Semantic Network Method for the Identification of Ship’s Illegal Behaviors Using Knowledge Graphs: A Case Study on Fake Ship License Plates. J. Mar. Sci. Eng. 2023, 11, 1906. [Google Scholar] [CrossRef]
Yu, H.; Xiao, Y.; Chen, C.; Zhou, J.; Xu, L. Incorporating knowledge graph and multi-model stacking ensemble learning for prediction of fines for illegal fishing. Reg. Stud. Mar. Sci. 2025, 89, 104332. [Google Scholar] [CrossRef]
Van Hage, W.R.; Malaisé, V.; Segers, R.; Hollink, L.; Schreiber, G. Design and use of the Simple Event Model (SEM). J. Web Semant. 2011, 9, 128–136. [Google Scholar] [CrossRef]
Yu, H.; Chen, F. Quantitative analysis of the efficiency dynamics of global liquefied natural gas shipping under COVID-19. Digit. Transp. Saf. 2024, 3, 19–35. [Google Scholar] [CrossRef]
Guo, S.; Yang, W.; Han, L.; Song, X.; Wang, G. A multi-layer soft lattice based model for Chinese clinical named entity recognition. BMC Med. Inform. Decis. Mak. 2022, 22, 201. [Google Scholar] [CrossRef]
Li, J.; Sun, A.; Han, J.; Li, C. A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 2020, 34, 50–70. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistic: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
Liu, N.F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; Liang, P. Lost in the middle: How language models use long contexts. Trans. Assoc. Comput. Linguist. 2024, 12, 157–173. [Google Scholar] [CrossRef]
Liu, S.; Wang, F. Knowledge Graph of Maritime Collision Avoidance Rules in Chinese. In Proceedings of the 2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, 24–25 August 2019; IEEE: Piscataway, NJ, USA, 2019; Volume 1, pp. 169–172. [Google Scholar]
Wei, H. Construction of an integrated knowledge graph for coal mine safety. Master’s Thesis, China University of Mining and Technology, Xuzhou, China, 2020. (In Chinese) [Google Scholar]
Wu, J.; Jiang, F.; Yao, H.; Huang, M.; Ma, Q. Analysis of causal factors and risk prediction of inland vessel collision accidents based on text mining. J. Transp. Inf. Saf. 2018, 36, 8–18. (In Chinese) [Google Scholar]
Liu, J.; Chen, X.; Liu, H.; Zhang, B.; Xu, L.; Liu, T.; Fu, Y. Construction of a vessel activity knowledge graph based on trajectory semantics. J. Geo-Inf. Sci. 2023, 25, 1252–1266. (In Chinese) [Google Scholar]
Zhang, Q.; Wen, Y.; Zhou, C.; Long, H.; Han, D.; Zhang, F.; Xiao, C. Construction of knowledge graphs for maritime dangerous goods. Sustainability 2019, 11, 2849. [Google Scholar] [CrossRef]
Sur, J.M.; Kim, D.J. Comprehensive risk estimation of maritime accident using fuzzy evaluation method—Focusing on fishing vessel accident in Korean waters. Asian J. Shipp. Logist. 2020, 36, 127–135. [Google Scholar] [CrossRef]
Chen, J.; Liu, P.; Wang, S.; Zheng, N.; Guo, X. Prediction and interpretation of crash severity using machine learning based on imbalanced traffic crash data. J. Saf. Res. 2025, 93, 185–199. [Google Scholar] [CrossRef]

Figure 1. The flowchart of the methodology.

Figure 2. Named entity recognition model.

Figure 3. LeBERT model.

Figure 4. Feature dictionary adapter.

Figure 5. Relationship extraction model framework.

Figure 6. K-BERT-based entity recognition.

Figure 7. Schematic diagram of the accident severity classification model.

Figure 8. Relationship extraction analysis.

Figure 9. Correlation analysis between RE performance metrics and average confidence.

Figure 10. Distribution of entities in the ship collision prevention and control knowledge graph.

Figure 11. Relationship distribution of ship collision prevention and control knowledge graph.

Figure 12. Embedding nodes clustering for the constructed knowledge graph.

Figure 13. Shortest path query for collision cases.

Figure 14. Ship dynamic search.

Figure 15. Ship activity search.

Figure 16. Training loss trend of the LSTM-RNN model.

Figure 17. Trend in training accuracy of the LSTM-RNN model.

Figure 18. Confusion matrix of the LSTM-RNN model.

Figure 19. Confusion matrix of the MLP model.

Figure 20. Confusion matrix of the XGBoost model.

Figure 21. Confusion matrix of the Random Forest model.

Figure 22. ROC curve.

Figure 23. PR curve.

Table 1. Types of ship collision entities.

Types	Name
Entity	Accident
	Vessel
	Vessel dynamics
	Personnel
	Organization
	Time
	Location
	Environment
	Equipment
	Cause
	Consequence
	Laws and Regulations
	Recommendation

Table 2. Ship collision relationship types.

Type	Relationship Name	Type	Relationship Name
Attribute relation	of_VesselFeature	Causal relation	of_Violation
Attribute relation	of_PersonFeature		of_Consequence
Conceptual hierarchical relation	discover		of_Cause
	employ		produces_of_Cause
	manage		of_Recommendation
	hold	Spatiotemporal relation	on_of_RealTimeDynamics
	equip		on_of_NavigationStatus
	dispatch		on_of_EngineStatus
	occur		at_of_RealTimeDynamics
	rescue		at_of_NavigationStatus
	use		at_of_EngineStatus
	encounter		on_Location
	notify		go_Location
	investigate		at_Time
	report		leave_Location
	belongs_to		at_in_Environment
	manipulate_of_RealTimeDynamics		at_on_Location
	manipulate_of_EngineStatus		in_Environment
	manipulate_of_NavigationStatus		to_Time
	manipulate_of_NavigationStatus		time_to_Time

Table 3. Entity attribute types.

Types	Attribute	Data Type
Vessel Feature	MMSI	Integer
	IMO	Integer
	Vessel Name	String
	Vessel Dimensions	Integer
	Vessel Type	String
Personnel Feature	Name	String
	Age	String
	Education Level	String
	Date of Birth	String

Table 4. Examples of domain-specific words for maritime traffic accidents.

Type	Examples
Accident causes	Improper evasive manoeuvre; failure to maintain a safe speed
Environmental factors	South wind turning to southwest; showers turning to cloudy then overcast
Accident recommendations	Enhance the practical skills and professional knowledge of operators
Onboard equipment	Very High Frequency (VHF) radio equipment; search-and-rescue radar transponder

Table 5. Case study of the five-step prompt strategy for data conversion.

Component	Description
Task Requirement	Vessel feature fields are extracted from semi-structured data in PDF files and converted into a structured tabular format for subsequent analysis.
Data Description	Input data consists of image-based vessel information, including Vessel Name, Former Name, Vessel Type, Port of Registry, IMO Number, Call Sign, Gross Tonnage, Deadweight Tonnage, Net Tonnage, Overall Length, Molded Depth, Full Load Draft, Molded Breadth, and Main Engine Power.
Sample Data	1.Vessel “OMEGA” Vessel Name: OMEGA—Former Name: DIMITRISS Vessel Type: Bulk—Carrier Port of Registry: MAJURO IMO Number 9279836—Call Sign: V7A4604 Gross Tonnage: 28,171 tons—Deadweight Tonnage: 48,821 tons Net Tonnage: 16,055 tons—Overall Length: 189.96 m Molded Depth: 16.50 m—Full Load Draft: 11.623 m Molded Breadth: 32.20 m—Main Engine Power: 7700 kW
Scenario Information	Raw data is stored as semi-structured text within PDF reports. Screenshots or text segments are converted into structured tables to support knowledge graph construction.
Standard Output	Standardized “entity-relation-entity” CSV files or database-compatible tabular data are generated.

Table 6. Data extraction results using the five-step prompt strategy.

Vessel	Relationship	Vessel Feature Value	Vessel Feature
vessel “OMEGA”	of_VesselFeature	OMEGA	Vessel Name
vessel “OMEGA”	of_VesselFeature	DIMITRISS	Former Name
vessel “OMEGA”	of_VesselFeature	Bulk Carrier	Vessel Type
vessel “OMEGA”	of_VesselFeature	MAJURO	Port of Registry
vessel “OMEGA”	of_VesselFeature	9279836	IMO Number
vessel “OMEGA”	of_VesselFeature	V7A4604	Call Sign
vessel “OMEGA”	of_VesselFeature	28,171 tons	Gross Tonnage
vessel “OMEGA”	of_VesselFeature	48,821 tons	Deadweight Tonnage
vessel “OMEGA”	of_VesselFeature	16,055 tons	Net Tonnage
vessel “OMEGA”	of_VesselFeature	189.96 m	Overall Length
vessel “OMEGA”	of_VesselFeature	16.50 m	Molded Depth
vessel “OMEGA”	of_VesselFeature	11.623 m	Full Load Draft
vessel “OMEGA”	of_VesselFeature	32.20 m	Molded Breadth
vessel “OMEGA”	of_VesselFeature	7700 kW	Main Engine Power

Table 7. Entity recognition model structure parameter configuration.

Module	Parameter	Configuration
LeBERT	Pre-trained language model	Chinese-RoBERTa-wwm-ext
	Embedding dimension	768
	Transformer layers	12
	Attention heads	12
BiLSTM	Input size	768
	Hidden size	128
	Number of LSTM layers	1
CRF	Input size (emission)	256
	Number of output labels	15

Table 8. Hyperparameter configuration for entity recognition.

Hyperparameter	Configuration
Maximum sequence length (max_seq_len)	512
Training batch size (train_batch_size)	32
Validation batch size (dev_batch_size)	16
BERT learning rate (bert_learning_rate)	3 × 10⁻⁵
CRF learning rate (crf_learning_rate)	3 × 10⁻³
Dropout rate	0.01
Optimizer	Adam
Save step (save_step)	200
Number of training epochs (epochs)	100

Table 9. Comparison with other models.

Model	Precision	Recall	F1-Score
CRF	68.1	70.1	69.4
BiLSTM_CRF	69.7	63.0	66.1
BERT-CRF	72.8	72.1	71.9
BERT-BiLSTM-CRF	80.7	79.4	80.1
Chinese-BERT-WWM-BiLSTM-CRF	85.2	85.8	85.5
RoBERTa-BiLSTM-CRF	85.7	85.6	85.6
LeBERT-BiLSTM-CRF	86.3	87.5	86.8

Table 10. Model performance comparison for different types of entity recognition.

Entity Type	CRF	BiLSTM-CRF	BERT-CRF	BERT-BiLSTM-CRF	Chinese-BERT-WWM-BiLSTM-CRF	RoBERTa-BiLSTM-CRF	LeBERT-BiLSTM-CRF
Vessel	89.7	86.1	87.8	90.6	95.1	96.5	96.5
Vessel feature	66.1	44.1	52.4	61.5	74.3	84.3	82.7
Personnel	87.8	78.9	81.7	85.3	88.1	92.0	93.0
Personnel feature	86.9	54.7	81.4	87.3	91.2	88.9	90.1
Time	88.4	74.3	78.9	84.6	89	84.8	88.6
Accident	68.2	61.2	58.7	66.4	76	70.4	73.5
Location	84.1	37.4	67.7	75.3	79.1	89.2	88.7
Agency	50.4	46.3	55.8	67.3	76.3	85.1	84.0
Environment	19.3	24.3	60.2	68.5	72.1	81.8	82.4
Equipment	77.6	71.9	69.1	71.3	71.6	73.6	75.7
Vessel dynamics	84.2	78.5	69.0	74.3	86.3	76.9	78.4
Laws and regulations	78.1	60	60.8	62.4	88.3	97.5	97.5
Cause	64.1	30.5	45.6	46.3	54.3	72.0	74.4
Event Consequence	64.7	39.2	56.1	61.5	70	75	79.6
Recommendation	37.1	27	33.5	41.3	44	54.5	57.5

Table 11. Ablation experiments for entity recognition based on domain vocabulary enhancement.

Model	Precision	Recall	F1-Score
BERT-BiLSTM-CRF	80.7	79.4	80.1
RoBERTa-BiLSTM-CRF	85.7	85.6	85.6
LeBERT-BiLSTM-CRF (information on vocabulary in common areas)	86.0	86.7	86.3
LeBERT-BiLSTM-CRF (domain vocabulary information)	86.3	87.5	86.8

Table 12. The 5-fold cross-validation performance results of the LeBERT-BiLSTM-CRF model.

Fold	Precision	Recall	F1-Score
Fold 1	0.8581	0.8649	0.8615
Fold 2	0.8580	0.8670	0.8625
Fold 3	0.8702	0.8795	0.8748
Fold 4	0.8708	0.8818	0.8763
Fold 5	0.8571	0.8632	0.8601
Mean ± SD	0.8628 ± 0.0070	0.8713 ± 0.0087	0.8670 ± 0.0078

Table 13. Performance of LeBERT-BiLSTM-CRF on different entity types using 5-fold cross-validation.

Entity Type	Precision	Recall	F1-Score
Vessel	0.9688 ± 0.0058	0.9668 ± 0.0032	0.9678 ± 0.0039
Vessel feature	0.7964 ± 0.0340	0.8058 ± 0.0618	0.7992 ± 0.0311
Personnel	0.8873 ± 0.0266	0.9039 ± 0.0240	0.8953 ± 0.0200
Personnel feature	0.9198 ± 0.0256	0.9299 ± 0.0213	0.9247 ± 0.0220
Time	0.8397 ± 0.0231	0.8429 ± 0.0198	0.8413 ± 0.0208
Accident	0.7633 ± 0.0413	0.7660 ± 0.0504	0.7641 ± 0.0408
Location	0.8757 ± 0.0140	0.8921 ± 0.0132	0.8838 ± 0.0120
Agency	0.7936 ± 0.0423	0.8198 ± 0.0301	0.8064 ± 0.0355
Environment	0.8666 ± 0.0586	0.8411 ± 0.0614	0.8531 ± 0.0563
Equipment	0.7231 ± 0.0542	0.7536 ± 0.0458	0.7369 ± 0.0416
Vessel dynamics	0.8000 ± 0.0300	0.8197 ± 0.0221	0.8095 ± 0.0228
Laws and regulations	0.9597 ± 0.0263	0.9731 ± 0.0146	0.9662 ± 0.0179
Cause	0.7334 ± 0.0595	0.7378 ± 0.0492	0.7354 ± 0.0540
Event Consequence	0.7762 ± 0.0500	0.7523 ± 0.0692	0.7603 ± 0.0328
Recommendation	0.6162 ± 0.0697	0.5602 ± 0.0619	0.5833 ± 0.0444

Table 14. Model architecture configuration for relation extraction.

Module	Parameter	Configuration
BERT	Pre-trained language model	Chinese-RoBERTa-wwm-ext
	Embedding dimension	768
	Transformer layers	12
	Attention heads	12
	Output dimension	768
MLP	Input units	2304
	Activation function	ReLU
	Number of hidden layers	1
	Hidden units	128
	Output units	38

Table 15. Hyperparameter configuration for relationship extraction.

Parameter	Setting
Maximum Sequence Length (max_seq_len)	512
Maximum Entity Length (max_en_len)	30
Training Batch Size (train_batch_size)	32
Validation Batch Size (dev_batch_size)	16
Learning Rate (bert_learning_rate)	3 × 10⁻⁵
Optimizer	Adam
Save Step (save_step)	200
Number of Training Epochs (epochs)	100

Table 16. Model architecture configuration for K-BERT-based entity recognition.

Module	Parameter	Configuration
K-BERT	Pre-trained language model	Chinese-RoBERTa-wwm-ext
	Embedding dimension	768
	Transformer layers	12
	Attention heads	12
BiLSTM	Input size	768
	Hidden size	128
	Number of LSTM layers	1
CRF	Input feature size (from BiLSTM)	256
CRF	Number of output labels	15

Table 17. Hyperparameter settings for the K-BERT-based entity recognition model.

Parameter	Setting
Maximum Sequence Length (max_seq_len)	512
Training Batch Size (train_batch_size)	16
Validation Batch Size (dev_batch_size)	8
BERT Learning Rate (bert_learning_rate)	1 × 10⁻⁵
CRF Learning Rate (crf_learning_rate)	1 × 10⁻³
Dropout Rate	0.01
Optimizer	Adam
Save Step (save_step)	50
Number of Training Epochs (epochs)	100

Table 18. Comparison analysis between K-BERT-BiLSTM-CRF and other models.

Model	Precision	Recall	F1-Score
BERT-BiLSTM-CRF	78.7	77.4	78.0
RoBERTa-BiLSTM-CRF	81.1	81.6	81.0
LeBERT-BiLSTM-CRF (domain vocabulary information)	83.5	83.6	83.5
K-BERT-BiLSTM-CRF (domain knowledge triplet)	84.5	84.4	84.7

Table 19. Comparison of knowledge graphs in waterway transportation.

Knowledge Graph	Entities	Relations	Entity Types	Relation Types	Data Types
Reference [78]	395,478		6	3	Structured
Reference [79]	416	532	5	5	Semi-structured
Reference [80]	910	1920	6	14	Semi-structured
Reference [81]		3934	15	13	Unstructured
Reference [82]			6	7	Unstructured
Reference [83]			6	7	Unstructured
ship collision prevention and control knowledge graph (This paper)	35,589	321,948	15	38	Unstructured/Semi-structured

Table 20. Hyperparameters for the accident severity classification.

Hyperparameter	Optimal Value	Description
Batch size	32	Number of training samples used for each Stochastic gradient descent (SGD) update
Loss function	Categorical cross-entropy	Also known as multiclass logloss, suitable for classification targets
Optimizer	SGD	Stochastic gradient descent
Learning rate	0.01	Learning rate used by the SGD optimizer
Momentum	0.80	Momentum used by the SGD optimizer
Weight decay	0.9	Learning rate decay applied at each update

Table 21. Comparison of prediction performance.

Indicator	LSTM-RNN	XGBoost	RF	MLP
Accuracy	92.31%	89.42%	80.77%	77.88%
Precision	91.84%	89.12%	82.40%	79.00%
Recall	91.70%	89.65%	79.61%	81.84%
F1-score	91.73%	89.29%	78.48%	78.37%

Table 22. Comparison of LSTM-RNN model prediction performance with different input features.

Input Features	Accuracy	Precision	Recall	F1-Score
Nodes characteristic	83.65%	84.54%	84.79%	84.63%
Topological features	86.54%	85.77%	86.99%	86.21%
Nodes characteristic + Betweenness centrality	87.50%	86.97%	89.68%	87.66%
Nodes characteristic + Degree centrality	89.42%	89.96%	89.99%	89.56%
Nodes characteristic + Closeness centrality	90.38%	90.00%	89.74%	89.65%
Node characteristic + Topological features	92.31%	91.84%	91.70%	91.73%

Table 23. Qualitative analysis of representative misclassified cases based on Nodes characteristic and topological features.

Case ID	Actual Severity	Predicted Severity	Features Analysis	Error Mechanism Explanation
74	Relatively serious accident (2)	Serious accident (3)	Prominent Nodes characteristic and high topological features: The extracted graph exhibits a prominent Nodes characteristic (a large number of involved entity nodes) and extremely high topological features (specifically, betweenness and degree centrality). This implies the presence of many transitional "bridge" nodes and dense connections, resulting in a highly complex overall graph structure.	Complexity-Induced Overestimation: The accident report delineated a convoluted event sequence, generating a knowledge graph with an inflated node scale and extreme topological density. Relying heavily on these dual complexity indicators, the model misinterpreted the structural intricacy as a sign of higher severity.
91	Serious accident (3)	Relatively serious accident (2)	Restricted Nodes characteristic and low topological features: the graph shows a restricted Nodes characteristic (a small number of entity nodes) and low values for topological features (both degree centrality and closeness centrality). The resulting graph structure is sparse, with limited connectivity and loose global proximity between key nodes.	Sparsity-Induced Underestimation: Although the actual consequence was severe, the textual description was concise or involved few entity types, generating a sparse knowledge graph. The model failed to infer the severe outcome from this structurally simple topology (low connectivity and small node scale), leading to an underestimation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, H.; Xu, X.; Guo, Z.; Wei, T.; Xu, L. Semantic Modeling of Ship Collision Reports: Ontology Design, Knowledge Extraction, and Severity Classification. J. Mar. Sci. Eng. 2026, 14, 448. https://doi.org/10.3390/jmse14050448

AMA Style

Yu H, Xu X, Guo Z, Wei T, Xu L. Semantic Modeling of Ship Collision Reports: Ontology Design, Knowledge Extraction, and Severity Classification. Journal of Marine Science and Engineering. 2026; 14(5):448. https://doi.org/10.3390/jmse14050448

Chicago/Turabian Style

Yu, Hongchu, Xiaohan Xu, Zheng Guo, Tianming Wei, and Lei Xu. 2026. "Semantic Modeling of Ship Collision Reports: Ontology Design, Knowledge Extraction, and Severity Classification" Journal of Marine Science and Engineering 14, no. 5: 448. https://doi.org/10.3390/jmse14050448

APA Style

Yu, H., Xu, X., Guo, Z., Wei, T., & Xu, L. (2026). Semantic Modeling of Ship Collision Reports: Ontology Design, Knowledge Extraction, and Severity Classification. Journal of Marine Science and Engineering, 14(5), 448. https://doi.org/10.3390/jmse14050448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semantic Modeling of Ship Collision Reports: Ontology Design, Knowledge Extraction, and Severity Classification

Abstract

1. Introduction

2. Literature Review

2.1. Ship Collision Accident Research

2.2. Named Entity Recognition

2.3. Relationship Extraction

2.4. Domain Knowledge Graph

3. Methodology

3.1. Ship Collision Accident Ontology Modeling

3.1.1. Conceptual Layers

3.1.2. Entity Types

3.1.3. Relationship Types

3.1.4. Entity Attributes

3.2. LeBERT Entity Recognition Model Enhanced by Domain Vocabularies

3.2.1. Domain Dictionary Construction and Matching Methods

3.2.2. Named Entity Recognition Model

3.3. Relationship Extraction Model Based on Semantic Representation and Rule Constraints

3.4. K-BERT-Based Entity Recognition Model

3.5. Classification of Accident Severity

3.5.1. Feature Quantification

3.5.2. Accident Severity Classification

3.6. Evaluation of Experimental Results

4. Results and Discussion

4.1. Data Collection and Preprocessing

4.2. Entity Recognition Based on Domain Vocabulary Enhancement

4.3. Analysis of Relationship Extraction

4.4. Entity Recognition Based on K-BERT-BiLSTM-CRF

4.5. Knowledge Graph of Ship Collision Accidents

4.6. Classification of Accidents Based on the Constructed Knowledge Graph

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI