1. Introduction
Maritime transportation serves as the backbone of global trade, orchestrating the movement of the vast majority of international merchandise [
1]. The maritime industry encompasses a wide range of vessels, from colossal container ships to modest fishing vessels, each navigating the complex network of seafaring routes. Nevertheless, collisions between commercial and fishing vessels represent a particularly acute challenge within this maritime milieu. These incidents are not merely isolated events but expose significant vulnerabilities in existing maritime safety protocols and navigation practices. Therefore, understanding the dynamics and underlying causes of such collisions requires more than routine analysis; it calls for a systematic approach to mitigate future risks and enhance maritime safety management, and support the safe and sustainable coexistence of coastal navigation and fishing activities.
Fishing activity also plays a vital role in global food supply and economic development. However, commercial fishing remains one of the most hazardous occupations worldwide, with high fatality rates reported in the fisheries sector [
2]. The safety risks associated with fishing vessels are often linked to relatively limited structural protection, unsafe crew practices, and difficulties in implementing and enforcing safety regulations [
3]. As a result, fishing vessel accidents frequently lead to serious consequences, including property loss and casualties [
4].
The contrast between commercial and fishing vessels is pronounced, as each adheres to distinct operational paradigms. Commercial vessels are generally designed for transport efficiency and operate along prescribed shipping routes, whereas fishing vessels often display flexible and irregular navigational patterns shaped by fishing operations and working domains [
5]. These differences increase encounter complexity and collision risk in shared coastal waters [
5,
6]. Fishing vessels are also generally more vulnerable in accidents because accident severity has been shown to be associated with vessel size, vessel age, and weather-related conditions such as wind speed [
7,
8].
The convergence of these disparate operational modes within shared maritime spaces creates favorable conditions for accidents, further exacerbated by variable weather, uneven access to navigational technology, and human error.
Collisions between commercial and fishing vessels can result in severe consequences, including vessel damage, sinking, and serious threat to crew safety. The resulting casualties remain a major concern within the maritime industry, as reflected in global statistics. According to the International Labour Organization, the annual mortality rate for fishermen worldwide is approximately 80 per 100,000 individuals [
7]. A substantial proportion of these casualties arises from ship collisions, particularly those involving commercial and fishing vessels [
9].
The complexity inherent in maritime collision scenarios, particularly those involving commercial and fishing vessels, poses significant challenges to conventional analytical methodologies. Traditional linear or deterministic models, while informative, often fail to capture the intricate interplay among dynamic and multi-dimensional factors. This limitation highlights the need for more advanced and flexible approaches capable of analyzing the multifactorial nature of maritime accidents.
Collision accidents between merchant and fishing vessels refer to traffic accidents that occur on water, involving vessels in the transportation industry and those engaged in fishing operations [
6]. Collisions between commercial and fishing vessels typically arise from a combination of human, vessel-related, and environmental factors [
10]. The complexity and unpredictability of these elements make experimental replication of accidents impractical, creating major challenges for causal analysis. Moreover, the influence of each factor can vary considerably, while maritime accident data often exhibit randomness, multi-dimensionality, and ambiguity. Recent Bayesian analyses of fishing vessel accidents [
11] have revealed that vessel characteristics, human errors, and environmental conditions exert heterogeneous influences on accident severity, underscoring the effectiveness of probabilistic modeling in maritime casualty assessment. Therefore, employing robust probabilistic models to identify key associated factors and support more targeted risk mitigation strategies is essential.
China’s rapid economic growth has intensified maritime activity, positioning the country as a major commercial and fishing nation. Its coastal waters host both dense shipping routes and rich fishing grounds, where operational overlaps are common. The intersection of commercial navigation lanes and traditional fishing areas amplifies environmental complexity and substantially heightens the risk of vessel collisions.
To address these challenges, this study employs a Tree-Augmented Bayesian Network (TAN-BN) to characterize the probabilistic dependency patterns among vessel, human liability, temporal, and environmental factors in a complex socio-technical setting. Compared with conventional factor-ranking methods, Bayesian networks are better suited to maritime accident analysis because they can capture uncertainty, interdependence, and conditional relationships among vessel, human, and environmental factors. Methods such as DEMATEL and HFACS are useful for identifying and ranking influential factors [
12], and CASMET provides a structured framework for casualty analysis [
13]; however, these approaches are less capable of representing conditional probabilistic relationships and conducting inference across multiple interacting variables under uncertainty. Moreover, TAN-BN was adopted because it relaxes the strong conditional independence assumption of naive Bayes while remaining more parsimonious and interpretable than a fully unrestricted Bayesian network. This makes Bayesian networks particularly suitable for casualty analysis in commercial–fishing vessel collisions, where severe outcomes usually emerge from combinations of factors and where probabilistic inference is needed to interpret scenario-specific risk configurations. Although the empirical analysis is restricted to China’s coastal waters, the analytical framework may provide a useful reference for similar report-based maritime safety studies. However, its applicability to other regions or transport contexts would require separate validation with external data.
While TAN structures and mutual information analysis have been widely used in risk studies, their application in this study is directed at a more specific question that has received limited attention in the existing literature, namely how casualty severity emerges in collisions between commercial and fishing vessels. Most previous studies have concentrated on collision occurrence, navigational risk, or individual accident cases, whereas this study focuses on casualty outcomes and the interplay among vessel, environmental, temporal, and human liability factors behind them. In addition, the analysis is based on a database of 137 official accident reports rather than AIS trajectories or a small number of representative cases, which makes it possible to incorporate contextual information such as defect conditions, liability attribution, and accident circumstances. The contribution of this study is therefore analytical and empirical rather than methodological: it uses an established but appropriate probabilistic framework to generate new evidence on casualty severity patterns in a specific and underexamined collision context. More specifically, the study contributes by (1) positioning casualty severity, rather than collision occurrence alone, as the analytical target; (2) focusing on the asymmetric socio-technical setting of commercial–fishing vessel collisions; and (3) revealing interpretable dependency patterns from structured official investigation reports that are not readily observable from descriptive summaries or occurrence-oriented studies alone. The rest of this paper is organized as follows:
Section 2 presents a comprehensive literature review, summarizing current research on collisions between commercial and fishing vessels, data sources and their applications, and key influencing factors.
Section 3 outlines the research methodology.
Section 4 describes the model development process, including data acquisition, factor selection, and construction of the TAN-BN model.
Section 5 presents and discusses the findings, beginning with an analysis of the TAN-BN results, followed by a sensitivity analysis based on the mutual information (MI) index and a comparison of factor influences under different casualty scenarios using the Most Probable Explanation (MPE) approach. Finally,
Section 6 concludes the study and summarizes the main insights.
2. Literature Review
2.1. Collision Between Commercial and Fishing Vessels
Existing studies relevant to maritime collisions and casualty outcomes can be broadly grouped into four streams: (1) descriptive and case-based analyses of commercial–fishing vessel collisions, (2) statistical and econometric studies of accident severity and casualty outcomes, (3) Bayesian-network-based studies of maritime accident mechanisms, and (4) human and organizational factor studies. Although these streams have each contributed useful insights, they differ in analytical focus, data basis, and explanatory power. More importantly, they do not address equally well the specific question examined in this study, namely how casualty severity is associated with interacting socio-technical factors in collisions between commercial and fishing vessels. A first stream consists of descriptive, legal, and regional studies of collisions between commercial and fishing vessels. These studies are valuable because they document the operational tensions that arise when large, route-following commercial ships interact with smaller, task-oriented, and often less protected fishing vessels in shared waters [
5,
6,
9,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23]. They have improved understanding of accident characteristics, local navigational conditions, and region-specific operational problems. However, much of this work remains descriptive, geographically localized, or based on individual cases. As a result, although such studies are useful for identifying recurrent practical issues, they are less effective in clarifying whether the same factors remain important across a broader set of collision events and different coastal settings. In other words, this literature is strong in contextual detail but weaker in cross-case synthesis and comparative explanation. A second stream addresses accident severity and casualty outcomes more broadly in maritime transportation. These studies provide a more systematic basis for understanding accident consequences by showing that severity is associated with interacting vessel, environmental, operational, and crew-related factors [
7,
8,
24,
25]. For example, Wang et al. [
24] found that accident severity is related to accident type, weather and sea conditions, ship type, ship age, and crew-related deficiencies, while Wang et al. [
25] showed that accident severity depends not only on accident circumstances but also on vessel characteristics, water depth, distance from shore, and seafarer-related factors. Zhou et al. [
26] further highlighted the value of modeling casualty outcomes explicitly through a Bayesian-network-based framework. This body of work is important because it shifts attention from accident occurrence alone to the consequences of accidents. However, most of these studies examine maritime accidents in general rather than the specific collision context of commercial–fishing vessel encounters. As a result, they do not fully resolve whether the determinants of casualty severity in this highly asymmetric collision class differ from those identified in broader maritime accident datasets. A third stream focuses on probabilistic and Bayesian approaches to maritime safety analysis [
27,
28,
29,
30,
31]. These studies have shown that Bayesian methods are well suited to accident analysis because they can represent uncertainty, conditional dependence, and interactions among vessel, environmental, and human-related factors. Foundational applications emphasized systemic and organizational risk factors [
27], while later studies expanded toward accident-type analysis, accident severity analysis, and data-driven quantification of maritime risk [
28,
29,
30,
31]. This stream demonstrates clear methodological value, especially for situations in which accident outcomes do not arise from isolated factors but from combinations of conditions. Nevertheless, most existing Bayesian studies either address maritime accidents at a broad level, focus on occurrence risk rather than casualty severity, or use data sources that do not preserve the same degree of contextual detail as official investigation reports. Thus, while Bayesian methods are clearly established in maritime safety research, their use has less often been directed toward casualty severity in commercial–fishing vessel collisions using multi-case official report data with explicit contextual coding. A fourth stream examines human and organizational factors in maritime accidents [
18,
32,
33,
34,
35]. This literature has consistently shown that unsafe acts, organizational deficiencies, communication failures, and deficiencies in decision-making or watchkeeping are central to accident development. Review studies such as Wu et al. [
32] demonstrate that human and organizational factors have become a major direction in maritime safety research, while HFACS-based and -related empirical studies [
18,
33,
34,
35] provide more structured interpretations of accident pathways. This stream is especially useful in explaining how accidents develop. However, it also has two limitations in relation to the present study. First, many such studies focus on causal pathways or responsibility structures rather than on casualty severity as the main analytical outcome. Second, human-factor information in accident reports is often rich narratively but difficult to code in a fully standardized way across many cases, which complicates direct cross-case quantitative modeling. Accordingly, this literature is highly relevant to interpretation, but less often provides an integrated probabilistic account of how human-related factors combine with vessel, environmental, and temporal conditions to shape casualty severity outcomes. Taken together, the existing literature has substantially improved understanding of maritime collisions, accident causation, severity, and organizational risk. However, it remains uneven in relation to the specific problem addressed here. First, many studies on commercial–fishing vessel collisions are geographically narrow or case-based, which limits the generalizability of their conclusions. Second, a substantial portion of the literature focuses on collision occurrence, legal responsibility, or qualitative causal analysis rather than on casualty severity as the primary analytical target. Third, even where probabilistic approaches are used, relatively few studies combine multiple official investigation reports with an interpretable analytical framework that preserves contextual information on vessel characteristics, environmental exposure, accident circumstances, and liability attribution. The present study is positioned in relation to these limitations. It does not claim methodological novelty; rather, it applies an established TAN-BN framework to examine casualty severity in a specific and underexamined collision context using structured information extracted from official accident investigation reports.
Commercial–fishing vessel collisions have attracted sustained attention because they often occur in operationally complex waters where dense traffic routes intersect with fishing grounds. Existing studies show that this collision type is characterized by asymmetry in vessel size, maneuverability, operating purpose, and navigational behavior, which creates a distinctive socio-technical risk environment.
Several studies have examined the geographic distribution and contextual features of such accidents. Zhang et al. [
14] identified a high incidence of collisions involving commercial and fishing vessels in global accident statistics, while Przywarty et al. [
15] and Oh et al. [
16] analyzed fishing vessel collisions in the Polish Exclusive Economic Zone and Korean waters, respectively. These studies improve understanding of accident characteristics in particular regions, but they are primarily descriptive and do not fully resolve how casualty severity is shaped by interacting factors across multiple cases.
Other studies have adopted broader data-driven approaches. Jiang et al. [
31] developed a Bayesian network model using a large global maritime accident database and found that location, ship type, age, gross tonnage, and deadweight tonnage are important risk-influencing factors. Li et al. [
17] also used a TAN-BN framework at the global scale and showed that ship operation, voyage segment, ship speed, sea conditions, wind, ship type, and human factors strongly affect collision-related risk. More recently, Zhou et al. [
26] explicitly addressed maritime casualty analysis from a global perspective, further demonstrating the value of probabilistic modeling for consequence-oriented accident analysis. However, these studies remain broad in accident scope and do not focus specifically on casualty severity in commercial–fishing vessel collisions.
Human and organizational analyses provide another important perspective. Wang et al. [
35] examined 443 historical merchant–fishing vessel collision accidents in China using an HFACS-BN framework and identified 56 human and organizational factors, including several factors specific to this collision type. Ma et al. [
18] similarly used HFACS, DEMATEL, and fuzzy cognitive mapping to identify key human-related contributors such as lookout negligence, poor safety management practices, and failure to take effective avoidance action. These studies strengthen understanding of human and organizational pathways, but they focus mainly on causal structures or accident development paths rather than on casualty severity outcomes.
Case-based and region-specific studies remain valuable because they reveal operational detail that large databases may overlook. Tang et al. [
19], Zhang [
20], Su [
21], Wu et al. [
22], Yang et al. [
23], and Yi [
9] provide insights into particular accidents, local navigational conditions, and legal or managerial issues in China’s coastal waters. However, conclusions from single accidents or limited sea areas are often difficult to generalize because marine conditions, traffic composition, and fishing practices differ substantially across regions.
With respect to data sources, AIS-based studies can capture trajectories, encounter geometry, and collision location, but they often lack direct information on weather conditions, vessel deficiencies, organizational context, and responsibility attribution. This is particularly problematic in commercial–fishing vessel collisions because smaller fishing vessels may have incomplete AIS coverage or inconsistent AIS usage. By contrast, official accident investigation reports provide richer contextual information, including accident sequence, vessel condition, weather and sea conditions, and liability assessment. Recent work has also shown the value of insurance claim data and narrative accident records for severity analysis in fishing vessel safety research [
36,
37,
38].
Overall, the literature confirms that commercial–fishing vessel collisions are an important but still insufficiently synthesized research area. Existing studies have improved understanding of accident occurrence, human and organizational contributors, and regional collision patterns. Yet three shortcomings remain: (1) many studies are geographically narrow or case-based; (2) casualty severity is less frequently modeled as the primary outcome; and (3) few studies combine multi-report official accident data with an interpretable probabilistic framework specifically tailored to commercial–fishing vessel collisions. More specifically, the gap addressed in this study is not the absence of probabilistic approaches in maritime safety or accident severity research. Rather, it lies in the comparatively limited quantitative analysis of casualty severity in commercial–fishing vessel collisions based on multiple official investigation reports that preserve contextual information on vessel characteristics, environmental exposure, accident circumstances, and liability attribution. This gap is important because collisions between commercial and fishing vessels occur within a highly asymmetric socio-technical setting: the two vessel types differ substantially in size, maneuverability, operational purpose, and structural vulnerability, and these differences may shape casualty outcomes in ways not fully captured by broader maritime accident datasets. Accordingly, the present study focuses specifically on casualty severity in this collision context and applies a Tree-Augmented Bayesian Network to examine the associated probabilistic dependency patterns.
2.2. Selection of Influencing Factors
The international literature suggests that casualty severity in maritime accidents is shaped by interacting environmental, vessel-related, temporal, and human/organizational factors. Accordingly, the variables selected in this study are intended to represent not all possible contributors, but rather the key socio-technical dimensions that can be coded consistently from official accident reports and that are well supported by prior research.
Environmental conditions are among the most widely documented contributors to maritime risk and severity. Studies of marine accident severity have shown that wind, sea state, current, and visibility affect both accident development and accident consequences by altering maneuverability, perception, and response capacity [
24,
25]. In the fishing vessel literature, Rezaee et al. [
37] and Zhang et al. [
39] found that harsh weather conditions increase incident rates relative to fishing activity levels, especially for smaller vessels. Recent Bayesian severity studies likewise confirm that visibility, wind, and location are important contextual variables in consequence-oriented accident analysis. Therefore, wind speed [
37,
40], visibility [
28,
40,
41], and location [
30,
41] are retained in this study as core environmental indicators.
Human and organizational factors remain central in the maritime safety literature [
42,
43]. Broad reviews and empirical studies consistently show that accident development often involves deficiencies in voyage preparation [
30,
38,
44] (e.g., inadequate voyage planning), the judgment and decision-making phase [
42,
43,
45] (e.g., poor hazard judgment, improper lookout), navigation maneuvering [
42,
46] (e.g., failure to maintain a safe speed, delayed action, incorrect vessel handling), and the management process [
30,
43,
45]. In merchant–fishing vessel collisions specifically, HFACS-based studies also indicate that unsafe acts, inadequate organizational processes, and communication failures play major roles. In the present study, human factors are operationalized through a liability-based variable derived from official accident reports. This is necessarily a simplified representation, but it remains useful because responsibility attribution in investigation reports captures which side’s operational errors were judged to be dominant in the accident sequence.
Vessel characteristics are also important because they influence both exposure and vulnerability. Previous studies have reported associations between severity outcomes and vessel size, age, defects, and technical condition. For example, Wang et al. [
24] found that ship type and age were significant in marine accident severity, while Uğurlu et al. [
38] showed that fishing vessel accidents are significantly related to vessel length and age. In commercial–fishing vessel collisions, the asymmetry between large commercial vessels and relatively small fishing vessels is especially relevant because it affects impact consequences, survivability, and post-collision rescue conditions. On this basis, the present study includes commercial vessel ship length [
29,
47], ship gross tonnage [
29,
48,
49], ship age [
47,
48,
50], and ship defects [
2,
30,
51].
Temporal and contextual variables are also supported by prior research. Time of day [
8,
52,
53] and season [
8,
47,
52] may shape traffic density, visibility, fatigue, fishing behavior, and weather exposure. Prior studies have shown that nighttime operations, seasonal conditions, and adverse weather frequently co-occur with severe maritime outcomes, while recent fishing vessel research also suggests that the effects of such variables may vary across operational contexts and incident types. Therefore, time and season are retained as auxiliary contextual factors that may help explain how casualty patterns differ across accident scenarios.
In summary, the factor set used in this study reflects a synthesis of the broader international literature and the practical constraints of accident report coding. It combines environmental exposure, vessel vulnerability, temporal context, and human liability information into one operational framework. This choice is consistent with current maritime safety research, which increasingly views accident outcomes as emergent properties of socio-technical systems rather than as the result of isolated single factors.
While the preceding review positions the study within the broader state of the art, the purpose of
Table 1 is narrower: it summarizes the factor-level literature used to support variable selection for the empirical model.
Based on the above literature, the gap addressed in this study is not that probabilistic approaches have never been used in maritime safety or accident severity research. Rather, the gap lies in the comparatively limited quantitative analysis of casualty severity in commercial–fishing vessel collisions using multiple official investigation reports that preserve contextual information on vessel characteristics, environmental exposure, accident circumstances, and liability attribution. This gap is relevant because collisions between commercial and fishing vessels occur in a highly asymmetric socio-technical setting: the two vessel types differ substantially in size, maneuverability, operational purpose, and vulnerability, which may shape casualty outcomes differently from those observed in broader maritime accident datasets. Accordingly, the contribution of the present study is not the introduction of a new Bayesian technique, but the generation of interpretable empirical evidence on how casualty severity is associated with interacting vessel, environmental, temporal, and liability-related factors in this specific collision context. The present study therefore focuses specifically on casualty severity in this collision context and applies a Tree-Augmented Bayesian Network to examine the associated probabilistic dependency patterns.
3. Materials and Methods
The data used in this study are drawn from accident reports issued by the China Maritime Safety Administration. These reports provide detailed records, including weather conditions at the time of the accident and determinations of human liability. Using these data, we apply data mining techniques to construct a TAN-BN model. The model is then subjected to sensitivity analysis, and the MPE is used to infer plausible causes of casualties. Further methodological details are provided in the remainder of
Section 3.
3.1. TAN-BN Structure Construction
For the analysis of risk-influencing factors (RIFs), two common approaches can be used to learn a BN structure. The first approach relies on expert knowledge to specify causal relationships. The second approach is data-driven and infers dependencies among RIFs using structure-learning algorithms within the BN framework [
54]. This study developed the BN structure using the latter data-driven approach.
First, raw maritime accident reports were manually coded, yielding a database comprising 137 reports involving 274 ships. The resulting database provides an empirical basis for data-driven risk analysis, although the sample size remains moderate relative to the dimensionality of the final discretized TAN-BN. Following Friedman et al. [
55], we adopted a Tree-Augmented Bayesian Network as a parsimonious and interpretable probabilistic structure for analyzing dependency patterns among accident-related factors. TAN captures strong attribute dependencies by connecting them in a tree, thereby improving classification performance over standard naive Bayes without incurring prohibitive computational complexity.
Formally, a BN encodes a joint probability distribution over a set of random variables U represented by an annotated directed acyclic graph (DAG). Let , where n is the number of RIFs, denote the RIF variables, and C is the class variable (accident type).
Consider a graph structure where the class variable is the root, i.e.,
(
represents the set of the parent elements of
in
), and every RIF has the class variable as its only parent, i.e.,
. BN defines the unique joint probability distribution over
U as expressed by the following formula:
The DAG on is a tree if contains only one parent except for a variable with no parent (called the root). There exists a function π that defines a tree on , if there is an such that = 0 (i.e., the root of the tree), and there is no sequence that makes a (no cycle). Such a function defines a tree network where if , and
Learning the TAN structure is an optimization problem. The solution to this problem is by the use of conditional mutual information between attributes [
56]. The function can be defined as
indicates conditional mutual information, is the th state of RIF , is the th state of RIF , and is the th state of accident type. The optimization problem, that is, learning the TAN structure, is finding a defined function tree of , maximizing logarithmic likelihood.
3.2. Expectation–Maximization Parameter Learning
After constructing the Bayesian network, the next step is parameter learning. For each node, a conditional probability distribution must be estimated. When learning parameters, one can utilize compiled data or rely on empirical knowledge from experts to specify conditional probability distributions. However, expert knowledge may sometimes be imprecise, biased, and subjective, which can reduce the accuracy of the network structure. Therefore, in this study, we opted to specify the conditional probability distribution by learning from accident report data to ensure the accuracy of the network structure.
The Expectation–Maximization (EM) algorithm is a classical method for maximum-likelihood estimation with incomplete or latent data, first formalized by Dempster et al. [
55]. It iteratively estimates the probability parameters in the Bayesian network, enabling the model to maximize the likelihood function based on the observed data.
The EM algorithm consists of two main steps:
E-Step: In the E-step, the posterior probability of the hidden variable is calculated based on the current parameter estimation. The posterior probability of all hidden variables is calculated for each sample data based on the current parameters. Given the observed data
and the current parameter θ, calculate the expectation of the log-likelihood function for the complete data
. The joint probability distribution is
; the conditional probability distribution is
. For this purpose, the expectation of the log-likelihood function is defined as
M-Step: The posterior probability of the hidden variable calculated in the E-step is used to update the parameter estimate. For each variable node and edge node, the parameter’s value is re-estimated according to the posterior probability obtained in step E using the maximum-likelihood estimation or Bayesian estimation method. This step can be done by maximizing the likelihood function or adding a prior distribution for regularization. Maximizing the expectation of the log-likelihood function under complete data, that is, selecting some parameter
, satisfies
Compared with gradient learning and count learning, the EM algorithm has broader applicability, especially when the data is incomplete or contains some unknown parameters. Although the convergence rate of the EM algorithm is relatively slow, it is still widely used in practical applications and can give more accurate results [
57]. Therefore, in this paper, we use the EM parameter learning method.
3.3. Mutual Information
In Bayesian networks, MI measures the statistical dependence between two variables. It quantifies the reduction in uncertainty of one variable when the value of another variable is known, indicating the degree of information sharing between them. The mutual information index is calculated as follows:
where
represents entropy, and
represents the probability of a random event in a specific state space. According to the first two parameters,
can be obtained under the condition that the probability of a random event is known.
The
MI index is calculated by comparing the entropy of the random event to the entropy of the target variable. A higher ratio indicates a stronger correlation between the accident outcome and the causative event.
The greater the value of MI, the stronger the correlation between the two variables and the higher the degree of information sharing. When the MI is 0, there is no correlation between the two variables.
MI in Bayesian networks provides the ability to measure the correlation of variables, infer conditional independence, learn network structure, and select features. By interpreting mutual information results, the relationships between variables in Bayesian networks can be better understood and interpreted to support probabilistic inference, prediction, and decision-making [
58].
3.4. Most Probable Explanation
To examine the connections between the relevant nodes in the BN and identify the most likely states within the nodes, the BN model can provide the MPE based on the determined number of casualties caused by the accident.
The explanation of MPE is as follows: the given set of evidential variables E solves for the maximum probability assignment of O = X/E. The formal description is as follows: Given E = e, calculate . O = X/E represents the variables in the variable set X except for the evidence variable set E.
The goal of MPE inference is to determine the underlying variable’s value that maximizes the given observation’s conditional probability. It aims to find the most likely explanation, which is the combination of potential variable values with the highest probability for a given observation [
59].
An optimal estimate of the unobserved variable can be obtained by calculating the MPE. This estimation is useful for scenario-based probabilistic interpretation and decision support.
3.5. Research Workflow
To improve the transparency and reproducibility of the study, the overall data-processing and analytical procedure is summarized in
Table 2. The table outlines the main steps from accident report collection and case screening to variable coding, model construction, parameter learning, and subsequent probabilistic analyses.
The workflow summarized in
Table 2 provides an overview of how the raw accident reports were transformed into a structured analytical dataset and subsequently used for TAN-BN modeling and inference. The main methodological components are described in detail in the following section.
5. Result and Discussion
5.1. Marginal Probability Distribution of Key Variables
Using a TAN-BN model developed from 137 investigation reports on collisions between commercial and fishing vessels, this study identifies the key socio-technical factors associated with casualty severity, examines their marginal probability distributions and relative statistical associations, and further characterizes the typical risk scenarios and factor configurations underlying different casualty outcomes through MPE inference. In this way, the analysis provides new empirical evidence on how casualty severity is structured in this specific collision class, rather than merely applying a Bayesian model to reproduce already well-established patterns.
Figure 3 presents the marginal probability distribution of each variable node in the TAN-BN model, calculated using EM-based parameter learning on the dataset. These values represent the overall likelihood of each state for a given factor, independent of any observed evidence or conditional settings. The model indicates that, among collisions between commercial and fishing vessels of general grade or above in China’s coastal waters, the probability of an accident resulting in 1 to 3 casualties (each additional three serious injuries is counted as one casualty) is the highest at 54%, while the probability of more than 10 casualties (each additional three serious injuries is considered 1 additional casualty) is relatively low at 2.92%.
- (1)
Nodal analysis of hull factors
According to the marginal distributions obtained from the TAN-BN model, 10.9% of commercial vessels involved in the dataset were more than 20 years old, and 47.4% had a gross tonnage exceeding 3000 tons. The proportion of commercial vessels with documented technical or structural defects was 6.57%. In contrast, among fishing vessels involved in collisions, 23.4% were over 20 years old and 59.8% had a length greater than 24 m. The proportion of defective fishing vessels was 8.03%. These marginal probabilities describe the overall distribution of vessel characteristics in the dataset, independent of casualty outcomes or other conditional influences.
- (2)
Nodal analysis of environmental factors
The probabilistic profile of the model reveals that 72.2% of collision cases occurred during nighttime hours, indicating a strong temporal bias toward non-daylight operations. Additionally, 69.3% of the accidents took place under moderate wind conditions (Beaufort 4–6), and 68.6% occurred during good visibility. These results suggest that many collisions between commercial and fishing vessels occur under typical operating conditions rather than extreme weather or poor visibility.
Seasonally, only 15.3% of collisions were associated with summer months, which may reflect the influence of fishing moratorium policies that reduce vessel density during this period. However, these proportions are derived from the model’s marginal distributions and do not, by themselves, imply causal relationships with casualty severity.
- (3)
Nodal analysis of human liability factors
The accident reports indicate that both commercial and fishing vessel crews were involved in different forms of human-related error across the recorded collision events, including deficiencies in lookout, judgment, maneuvering, and voyage management. However, because such details are not described in a sufficiently standardized form across all reports, the present study does not model them as separate variables. Instead, it uses a liability-based categorical indicator to denote whether primary human-related responsibility in the official report was attributed mainly to the commercial vessel, the fishing vessel, or both parties.
5.2. Sensitivity Analysis Based on Mutual Information
Based on the marginal probabilities derived from the TAN-BN model, commercial vessels were identified as the primarily liable party in 50.4% of cases, fishing vessels in 26.3% of cases, and both parties in 23.3% of cases. These probabilities should not be interpreted as a complete characterization of human causation. Rather, they describe the distribution of reported liability configurations within the dataset and provide a simplified way to retain one human-related dimension in the probabilistic analysis.
To assess the relative statistical association of each input variable with casualty outcomes, a sensitivity analysis was conducted based on the MI index. Within the Bayesian network framework, MI measures the reduction in uncertainty of the target variable (casualties) given knowledge of a predictor variable’s state. This metric offers a theoretically grounded and model-consistent approach for assessing the relative importance of contributing factors in the probabilistic inference process.
Figure 4 illustrates the MI values between each risk factor and the casualty outcome node. The node sizes are scaled proportionally to the magnitude of shared information. Among all variables, the length of the fishing vessel (F Length) shows the strongest association with casualty severity, exhibiting the highest MI score (0.322), followed by time of day, wind speed, visibility, and season. These five variables represent the most significant contributors to the variation in casualty levels.
A closer examination of the results reveals that vessel-related characteristics of fishing vessels—including length, age, and structural integrity—consistently exerted a stronger influence on casualty outcomes than the corresponding attributes of commercial vessels. This suggests that, in collision scenarios, fishing vessels constitute the more vulnerable party, as their structural limitations exacerbate the consequences of collision impact. The model captures this asymmetry by allocating greater informational weight to fishing vessel-related variables.
In contrast, the liability-based human-factor variable exhibits a relatively low mutual information value in the marginal sense. This should not be interpreted as evidence that human factors are unimportant in maritime collisions. Rather, it reflects the fact that the present model represents the human dimension only through a simplified liability-based proxy and not through a richer set of directly coded human and organizational variables. The result therefore suggests that the marginal explanatory role of the liability variable is limited within this coding framework, while broader human-factor mechanisms may still be embedded indirectly in other contextual or operational patterns represented in the model.
Overall, the MI-based sensitivity analysis underscores the structural vulnerability and environmental exposure of fishing vessels as primary determinants of casualty severity in commercial–fishing vessel collisions. Within the present dataset, these findings suggest that risk mitigation strategies in China’s mixed commercial–fishing coastal waters should focus not only on navigational behavior but also on improving the design, robustness, and protective measures of small fishing vessels operating in complex traffic environments. From a practical perspective, the MI results suggest that safety interventions should prioritize the factors most strongly associated with casualty severity, especially the structural vulnerability of fishing vessels and the heightened exposure associated with nighttime operation, poor visibility, and adverse wind conditions.
5.3. Analysis of External Environmental Influences Under the MPE Model
To examine which combinations of factor states are most strongly associated with different levels of collision severity within the learned network, MPE inference was applied within the TAN-BN model. MPE identifies the combination of variable states that maximizes the joint probability of the entire network under a given evidence condition—in this case, a fixed number of casualties.
Two scenarios were selected for comparative analysis:
- (1)
Low-severity outcome (C1): no casualties and fewer than three serious injuries;
- (2)
High-severity outcome (C4): more than ten casualties.
The resulting most probable configurations for each scenario are illustrated in
Figure 5 and
Figure 6, with a summary provided in
Table 7. Key patterns identified from the results are summarized as follows.
These scenario-specific risk profiles indicate systematic probabilistic patterns across meteorological, temporal, spatial, and operational dimensions. They show how different reported conditions and liability configurations are associated with different casualty outcomes within the dataset, although they should not be interpreted as direct evidence of causal mechanisms.
- (1)
Wind Conditions: In both low- and high-casualty scenarios, wind speeds predominantly fall within Beaufort 3–6 (light-to-moderate breeze). Notably, severe winds (Beaufort ≥ 7) are rarely observed in the MPE configurations, suggesting that operational restrictions commonly imposed on both commercial and fishing vessels during extreme weather effectively reduce exposure to hazardous conditions. Thus, while moderate winds do not necessarily prevent collisions, they may still pose dynamic challenges for vessel control and response time, particularly in confined waters.
- (2)
Visibility: A clear divergence emerges with respect to visibility. In low-severity incidents (no casualties or ≤3 serious injuries), good visibility (≥5 nautical miles) is the most likely condition. In contrast, high-casualty events (≥10 casualties) are strongly associated with poor visibility. This pattern underscores visibility as a critical risk amplifier: even in the presence of human error, adequate visual conditions may allow for last-minute evasive actions, whereas reduced visibility severely impairs situational awareness and collision avoidance, significantly increasing casualty risk.
- (3)
Time of day: Nighttime is a dominant factor in both configurations, indicating a persistent vulnerability during dark hours. This aligns with known challenges such as reduced visual detection capability, circadian fatigue, and diminished crew alertness. Its consistent presence, particularly in high-severity cases, reinforces the role of nighttime as a key compounding factor, especially when combined with poor visibility or complex traffic environments.
- (4)
Seasonality: The seasonal distribution reveals a notable divergence: low-casualty incidents are most probable in autumn, whereas high-casualty events peak in spring. Both seasons are periods of intensified fishing activity along China’s coast, but spring may entail additional risks due to transitional weather, increased vessel traffic density from seasonal fisheries, and potential mismatches in crew readiness or vessel preparedness after winter layup. This indicates that severe collisions are influenced not only by traffic intensity but also by seasonal operational stressors.
- (5)
Location: Spatial context plays a decisive role in outcome severity. Collisions resulting in no or minimal casualties are most likely to occur in designated shipping lanes or regulated waterways, where navigational rules are enforced, traffic flow is structured, and vessel behavior is more predictable. Conversely, open waters are the predominant location for fatal collisions. These areas typically lack formal traffic separation schemes, suffer from weaker surveillance and enforcement, and involve greater uncertainty in fishing vessel movements—factors that collectively reduce collision avoidance effectiveness and increase the likelihood of catastrophic outcomes.
These MPE-based scenario patterns indicate that maritime safety management may benefit from more targeted, scenario-specific interventions, such as enhanced monitoring and warning measures during nighttime, low-visibility, and open-water operations.
5.4. Impact Analysis of Human Liability Under the MPE Model
The MPE-based analysis not only reveals the influence of human liability on casualty levels but also facilitates further exploration of the environmental context in which each party’s errors are most likely to occur.
Figure 7 and
Figure 8 present the MPE outcomes under two conditions: HE1 (commercial vessels as the primary liable party) and HE2 (fishing vessels as the primary liable party).
When human error is attributed to crews of commercial vessels (HE1), the MPE model, conditioned on selecting HE1 as the primary liability factor, reveals a highly consistent scenario characterized by winter conditions (S4 = 100%), nighttime operations (T2 = 100%), good visibility (≥5 nautical miles = 100%), moderate-to-strong winds (WS2 = 100%), and navigation within traffic-regulated shipping lanes (L1 = 100%). This configuration is associated with large-scale casualties (C3 = 100%). In other words, when commercial vessels are assigned primary liability, even clear but dark and windy winter nights in congested channels significantly heighten collision risk. This elevated risk likely results from the combined effects of high vessel momentum, constrained maneuvering space, and reduced crew vigilance under heavy operational workload.
In contrast, when fishing vessel crews are identified as primarily liable (HE2), the MPE solution (conditioned on HE2) shifts markedly toward less severe outcomes: no casualties (C1 = 100%) or at most 1–3 casualties (C2 = 66.8%). These incidents occur under similar environmental and spatial conditions—winter, night, good visibility, and regulated shipping lanes—but are associated with calmer-to-moderate wind conditions (WS1). This suggests that, although fishing vessels operate in the same high-density, clear-night winter environments, their lower mass, reduced speed, and typically more cautious operational profiles contribute to mitigated casualty severity.
Collectively, these liability-specific risk profiles indicate that casualty outcomes are not determined solely by vessel type or the attribution of fault, but rather by the interaction of human error with seasonal, temporal, meteorological, and spatial factors. Commercial vessels emerge as “high-risk agents” when operational errors coincide with winter darkness and elevated wind conditions in narrow, high-traffic corridors. In contrast, fishing vessels, even when committing errors in comparable settings, rarely result in large-scale loss events, underscoring the moderating role of vessel dynamics and operational behavior in accident consequences. The liability-specific results further suggest that prevention measures should be differentiated across vessel types, with particular attention to bridge watchkeeping, lookout management, and collision avoidance practices on commercial vessels operating in mixed-traffic waters.
5.5. Discussions
The directed relationships identified by the TAN-BN model represent learned dependency patterns among key factors, allowing for a clearer interpretation of how these variables are statistically associated within the dataset. Several directional dependencies merit particular attention. First, the learned link between “season” and “human liability” indicates that the distribution of liability attribution varies across seasonal contexts in the observed cases. This pattern may be related to seasonal differences in weather, visibility, traffic conditions, and operational context, although the present model does not establish these factors as direct causal mechanisms of human error. In addition, seasonal contexts may coincide with different navigational conditions, such as ice formation, dense fog, or intensified fishing activities, further compounding operational risks.
Second, the learned dependency between “wind speed” and “fishing vessel length” indicates that smaller fishing vessels are more frequently represented in higher-wind accident contexts within the dataset. This pattern may be consistent with the greater operational vulnerability often reported for smaller vessels under adverse conditions. Third, the learned dependency between “season” and “visibility” indicates that visibility conditions vary across seasonal settings in the observed cases. This pattern is consistent with broader seasonal variation in meteorological conditions, although the present model does not identify a direct causal pathway from season to collision severity. Collectively, these learned relationships provide an interpretive description of how environmental and operational variables are statistically patterned in the accident dataset.
The sensitivity analysis based on the mutual information (MI) index further identified five factors showing relatively strong associations with casualty severity, namely fishing vessel length, time of day, wind speed, visibility, and season, all of which correspond to prominent arcs in the TAN-BN network. Among these, fishing vessel length demonstrated the strongest mutual information with casualty outcomes, indicating that it is the variable most strongly associated with casualty severity in the present model. One possible interpretation is that smaller fishing vessels are more frequently associated with severe casualty outcomes, a pattern that is consistent with lower structural protection and survivability in collision contexts, although the present analysis does not test these mechanisms directly. This association may also be related to the greater exposure of smaller vessels to adverse operating conditions such as strong winds and rough seas. This interpretation is also broadly consistent with the learned dependency between wind speed and fishing vessel length in the network.
Time of day also shows a relatively strong association with collision severity, which is consistent with prior research [
7,
52]. In the present dataset, more severe outcomes are more frequently linked to nighttime conditions, a pattern that may be related to fatigue, reduced vigilance, or greater operational complexity during night navigation. This pattern may also be consistent with the greater operational difficulty of maintaining situational awareness and navigation discipline during nighttime operations, although the model itself does not isolate these possible explanations directly.
Consistent with earlier studies [
40,
61], wind speed emerged as a critical factor affecting collision outcomes. This association may be related to the greater handling difficulty often reported under strong-wind and rough-sea conditions, especially for smaller fishing vessels. Similarly, visibility demonstrated a high MI value, corroborating findings from previous research [
37,
61]. This pattern may be consistent with the reduced detectability of approaching vessels under low-visibility conditions, although the present analysis does not isolate this operational mechanism directly.
The seasonal association observed in this study may also reflect broader traffic dynamics along China’s coastline, a pattern that is consistent with earlier studies [
46,
52]. One possible interpretation is that seasonal variation in fishing activity and vessel traffic density may help explain why casualty patterns differ across the year. In this sense, season should be understood as a contextual variable that co-varies with other operational and environmental conditions, rather than as an independently identified causal factor.
Regarding human liability, the analysis suggests that its association with casualty severity varies across environmental and operational contexts. Commercial-vessel-liability configurations are more frequently associated with severe casualty outcomes when they coincide with winter darkness, high winds, or congested waterways. Conversely, fishing-vessel-liability configurations under similar conditions are more often associated with less severe outcomes, a pattern that may be related to differences in vessel characteristics and operating conditions. This asymmetry suggests that differences in vessel characteristics and operational contexts may help explain the observed variation in casualty severity. Although the MI value for human liability was not among the most prominent, this may result from the structural definition of nodes within the model. Even after disaggregating human factors into multiple variables, the same five dominant factors persisted. The relatively low MI for human liability may be related to its dependence on broader seasonal and environmental contexts, suggesting that its association with casualty severity is not expressed as a uniformly strong marginal pattern.
The practical implications of this study should be interpreted directly in relation to the probabilistic patterns identified by the TAN-BN model. First, because fishing vessel length showed the strongest association with casualty severity in the mutual information analysis, the results suggest that casualty-reduction efforts should give particular attention to the structural vulnerability of fishing vessels, especially smaller or less protected vessels. In practical terms, this points to the importance of strengthening survivability-oriented safety management for fishing vessels operating in mixed-traffic waters, including closer attention to lifesaving equipment, hull condition, emergency preparedness, and safety inspection.
Second, because the model indicates that more severe casualty outcomes are more strongly associated with nighttime operation, poor visibility, and adverse wind conditions, the findings support more targeted risk management under these conditions. This may include strengthened lookout requirements, more proactive warning and communication measures, and intensified monitoring of collision-prone areas during periods of reduced visibility and nighttime navigation.
Third, the MPE results show that severe casualty configurations are more likely to occur in open-water settings than in regulated waterways. This suggests that collision-risk prevention should not focus only on traffic-regulated routes, but should also strengthen surveillance, encounter warning, and risk communication in open-water zones where commercial navigation and fishing activity overlap and vessel behavior may be less predictable.
Fourth, the liability-specific MPE analysis indicates that severe casualty scenarios are more frequently associated with configurations in which commercial vessels are the primary liable party. Although this should not be interpreted as a causal judgment beyond the official reports themselves, it does suggest the need for stronger bridge watchkeeping, lookout discipline, and collision avoidance practice among commercial vessel crews operating in mixed commercial–fishing traffic environments.
Overall, these recommendations are intended as risk-informed implications derived from probabilistic model results rather than as direct evidence-based prescriptions validated through intervention analysis. They reflect the main empirical patterns identified in the analysis—namely fishing vessel vulnerability, adverse environmental exposure, nighttime operational risk, and open-water high-severity configurations—and indicate where safety management efforts may be more effectively prioritized. From a scientific-contribution perspective, the value of this study lies not in methodological innovation, but in the extraction of domain-specific knowledge from a structured set of official investigation reports. The analysis moves beyond descriptive summaries by showing how casualty severity in commercial–fishing vessel collisions is probabilistically associated with a distinct combination of fishing vessel vulnerability, environmental exposure, temporal conditions, and liability-related operational context. In this sense, the study contributes new empirical understanding of consequence formation in a collision class that has more often been examined from the perspective of occurrence, legal responsibility, or individual cases than from casualty severity itself.
The main strength of the study lies in combining multiple official accident reports with a probabilistic framework that captures interdependence among vessel, environmental, temporal, and human liability factors. Compared with single-case analysis or AIS-only data, this approach provides richer contextual insight into casualty-related patterns. At the same time, several limitations should be acknowledged. The dataset is restricted to China’s coastal waters, the sample size remains moderate relative to the model dimensionality, and human factors are represented through a simplified liability-based proxy. In addition, TAN-BN supports association and conditional dependency analysis rather than formal causal inference. Despite these limitations, the results offer useful implications for the quantitative investigation of socio-technical drivers of casualty severity and for the design of more targeted maritime safety interventions.
6. Conclusions
This study examined casualty severity in collisions between commercial and fishing vessels in China’s coastal waters using 137 official accident reports and a TAN-BN framework. In doing so, it addressed the research question by identifying the socio-technical factors most strongly associated with casualty severity, revealing their probabilistic dependency patterns, and characterizing the typical configurations associated with different casualty outcomes.
The results indicate that fishing vessel length shows the strongest association with casualty severity, followed by time of day, wind speed, visibility, and season. Overall, more severe casualty outcomes were more frequently associated with smaller fishing vessels, nighttime conditions, adverse weather, and reduced visibility. The scenario-based MPE analysis further shows that severe casualty configurations are more likely when human liability is primarily attributed to commercial vessel crews, whereas fishing vessel liability is more often associated with less severe outcomes. Taken together, these findings suggest that casualty severity in commercial–fishing vessel collisions is strongly associated with the combined presence of vessel vulnerability, environmental exposure, and liability-related operational conditions.
From a practical perspective, this study provides evidence for improving maritime safety governance in shared coastal waters, particularly through better protection of vulnerable fishing vessels, enhanced night-navigation management, and stronger precautionary measures under adverse weather conditions. Within the scope of China’s coastal waters, these findings are relevant to the resilience and sustainability of coastal maritime socio-technical systems in which commercial shipping and fishing activities coexist.
Nevertheless, the results should be interpreted with caution: the dataset is restricted to China’s coastal waters, the human liability variable is a simplified proxy for human factors, and the probabilistic model supports association and conditional dependency analysis rather than formal causal inference Accordingly, the findings related to human liability indicate associations between reported responsibility configurations and casualty severity, but they do not provide a comprehensive explanation of underlying human causation. In addition, the TAN-BN structure imposes a simplified tree-structured dependency pattern and may not fully represent more complex interactions among variables. The learned dependency structure and conditional probabilities may also be influenced by sample size and variable discretization, particularly for low-frequency state combinations. Because the dataset is geographically limited to China’s coastal waters, the findings should be interpreted as context-specific rather than universally generalizable. Future research could extend the dataset, refine human-factor representation, and incorporate additional validation and uncertainty assessment.