Identiﬁcation of Risk Inﬂuential Factors for Fishing Vessel Accidents Using Claims Data from Fishery Mutual Insurance Association

: This research aims to identify and analyze the signiﬁcant risk factors contributing to accidents involving ﬁshing vessels, a crucial step towards enhancing safety and promoting sustainable practices in the ﬁshing industry. Using a data-driven Bayesian network (BN) model that incorporates feature selection through the random forest (RF) method, we explore these key factors and their interconnected relationships. A review of past academic studies and accident investigation reports from the Fishery Mutual Insurance Association (FMIA) revealed 17 such factors. We then used the random forest model to rank these factors by importance, selecting 11 critical ones to build the Bayesian network model. The data-driven Bayesian network (BN) model is further utilized to delve deeper into the central factors inﬂuencing ﬁshing vessel accidents. Upon validation, the study results show that incorporating the random forest feature selection method enhances the simplicity, reliability, and precision of the BN model. This ﬁnding is supported by a thorough performance evaluation and scenario analysis.


Introduction
Marine fishing is an inherently perilous occupation, with fishermen recording some of the highest mortality rates among different occupational groups [1][2][3].The most significant risk factor they face is fishing vessel accidents, which can lead to injuries, fatalities, significant damage to vessels, and even complete vessel loss.These incidents threaten crew safety and negatively impact the overall financial stability of the fishing industry.As such, it's crucial to give risk management measures for fishing vessels top priority and fortify these strategies to reduce casualties and limit property damage.Understanding the factors that contribute to fishing vessel accidents is crucial for developing proactive measures to prevent such incidents.Studies in this area help identify the root causes and risk factors associated with these accidents, which in turn facilitate the development and implementation of specially targeted safety policies, regulatory measures, and training initiatives.Identifying the most influential factors allows the targeted allocation of limited resources towards precise interventions that yield maximum accident reduction and return on investment.
Having access to accurate data on fishing vessel accidents is key for analysis.Prior studies often rely on information from regulatory bodies, but these datasets are often incomplete.Fishing vessel accidents are more common yet often undisclosed publicly, presenting skewed data.To address this, our research utilizes data from the Fishery Mutual Insurance Association (FMIS) of China.This national non-profit organization represents both groups and individuals in fishing-related services, offering mutual insurance to the sector.The Ningbo Fishery Mutual Insurance Association, established in 1996 as China's first local fishery mutual insurance association, provides coverage for over 98% of fishing vessel insurance in the region.Every fishing vessel accident, regardless of its size, is thoroughly investigated by the association, generating comprehensive data for analysis.Utilizing FMIS data enhances the breadth and authenticity of the data, leading to a more accurate understanding of accident scenarios and outcomes.This wealth of information supports risk assessment, economic impact analysis, and the development of targeted safety interventions, promoting a data-driven approach to accident analysis.
The application of Bayesian models for identifying risks associated with fishing vessel accidents is not new, given their established reliability as documented in references [4][5][6][7].However, the pronounced effects of some factors over others obscure modeling.In response to this intricate complexity, this paper presents an enhanced methodology with a novel Bayesian network model, integrating random forest feature selection.This facilitates retaining only the most significant influential factors for modeling.
A key innovation is our new database of 3448 FMIA fishing vessel accident reports from 2018 to 2022.This unparalleled and extensive collection, which covers small and large vessel accidents, stands out as the most comprehensive resource for studying fishing vessel incidents.
This paper, therefore, proposes a new Bayesian model using this unique database.The paper is structured as follows: Section 2 reviews past studies and Bayesian network techniques.Section 3 outlines the accident database and risk factor identification.Section 4 details our research methodology, including the random forest algorithm and Bayesian network learning.Section 5 presents sensitivity analysis and model validation.Section 6 concludes with discussions and study findings.

Literature Review
Fishing vessels are generally perceived as the least safe type of vessel, prompting extensive research on their accidents.A comprehensive literature review using databases, including Web of Science, identified vessel-related, environment-related, and accident-related factors as primary influences.
Vessel-related factors significantly impact fishing vessel accidents.Studies by Jin [8], Jin et al. [2,9], Cakir et al. [10], U gurlu et al. [11], Wang et al. [12], and others have highlighted the correlation between factors such as vessel age, size, and type, and the incidence and severity of accidents.Recent studies like Obeng et al. [13] and Li et al. [7] used Bayesian network models to demonstrate the importance of vessel factors.Lazakis et al. [14] found trawlers had more occupational accidents than other types of fishing vessels.
Environmental factors also play a crucial role in fishing vessel accidents.Several studies have cited adverse weather conditions as a key cause of fishing vessel accidents, often exceeding human error [15,16].Studies by Jin et al. [8,9], Davis et al. [17], Heij and Knapp [18], Weng et al. [19], and others have identified correlations between adverse weather conditions, seasons, and the occurrence and severity of accidents.By employing advanced machine learning methods, Rezaee et al. [20], Liu et al. [21], and Wang et al. [22] further clarified the relationship between weather conditions and accident severity.Özaydın et al. [23] used BN and association rule mining (ARM) methods to demonstrate the impact of adverse weather and sea conditions on fishing vessel accidents.
Accident-related factors affect consequences.Jin et al. [2,8,9], Wang et al. [22], and Wróbel et al. [24] linked accidents to geographic area and distance from the coast.Weng et al. [19] showed accidents further from the harbor had more fatalities.Jin [8] found that accident type impacted severity.Liu et al. [21] found that collision accidents tended to result in serious accident consequences.Cao et al. [5] showed that the type of accident was the highest factor affecting the severity level of accidents, and capsize/submerge, mechanical damage, and collision were the factors most likely to result in a "very serious accident".Human error contributes to 70-85% of incidents, per multiple studies [25][26][27].Various methodologies have been employed to examine these human errors.For instance, Wang et al. [22], Wróbel et al. [24], and Kose et al. [28] employed an accident tree analysis, HFACS methodology, and logistic regression model, respectively, to highlight the predominance of human error in maritime mishaps.Furthermore, Obeng et al. [29] concluded that inadequate training and a lack of experience were key contributors to these incidents.Celik and Cebi [30] used HFACS to identify the hierarchal structure and internal relationships of human factors in ship accidents.
While BNs are widely used in the analysis of maritime accidents, including those involving fishing vessels, current research focuses on assessing the severity of accidents.These models demonstrate good classification but can be improved in the area of feature selection.Building models with a multitude of parameters not only amplifies computational complexity but also raises the risk of overfitting.Thus, it is crucial to prioritize the identification of critical parameters when devising a risk prediction model for fishing vessels.
Another limitation of previous research is the constraint of data availability.Therefore, this study establishes a comprehensive database of fishing vessel accidents.Random forest will first be used to identify key parameters.These selected variables will then be used to construct the BN model.Subsequently, a BN model will be employed to predict fishing vessel risks, with its performance compared against a BN model that does not use feature selection.The reliability and feasibility of the RF-BN model will be assessed, analyzing the primary factors influencing fishing vessel accidents and offering technical support for safe fishing.

Database
Most prior literature is based on data from various global databases that predominantly document significant fishing vessel accidents.However, numerous minor incidents are either not documented or lack thorough investigation, given the higher frequency of fishing vessel accidents compared to commercial vessels.To analyze accident causation in a more precise and comprehensive manner, this study primarily draws data from the FMIA, a chief provider of insurance coverage for fishing vessels, which covers more than 98% of fishing vessels in the study area.These vessels are incorporated into an organizing management system called fishery organizing companies, which assist in safety management by monitoring vessel entries/exits, inspecting safety equipment, organizing safety training for crew members, using information monitoring platforms to check the operation of fishing vessels in real time, and providing emergency assistance, etc.The companies also verify vessel certificates and inspection dates.
This study developed a comprehensive and robust five-year (2018-2022) fishing vessel accident database for the Ningbo region, China, sourcing diverse data on accidents including date, location, vessel information (operational mode, age, material, dimensions, tonnage, and power), accident type, personnel and vessel certifications, casualties, losses, and brief descriptions of accident causation.Some accident reports provide environmental conditions, but the details vary.Vessel data was sourced by referencing the fishery system using the Maritime Mobile Service Identity (MMSI) and Beidou ID.Environmental data gaps were addressed via retrospective marine meteorological assessments.This rigorous data collection approach culminated in 3448 accident samples, methodically organized in an Excel framework, archiving pertinent details derived from accident reports.Figure 1 illustrates the spatial distribution of these accidents with locations indicated by red dots.

Risk Influential Factors
Risk influential factors (RIFs) are variables that influence the safety of fishing vessels.Through a rigorous examination of the relevant literature and an in-depth analysis of accident report archives, 17 risk influential factors (RIF) pertinent to maritime accidents were identified.These encompass vessel, environmental, and other factors as shown in Table 1.Since this study focuses solely on fishing vessels, the vessel type is considered irrelevant and thus excluded from the RIFs.Fishing vessels utilize unique operational methods, including single trawl, double trawl, purse seine, and fishing transport, among others.These methods have significant implications for their associated risks.For example, anecdotal evidence from fishermen and regulatory bodies suggests double trawlers have lower operational risks than single trawlers.Therefore, this study introduces the operation mode of fishing vessels as a novel influential factor.The finalized list consists of 17 RIFs, as shown in Table 2.
Table 1.RIFs in the retrieved literature.

Risk Influential Factors
Risk influential factors (RIFs) are variables that influence the safety of fishing vessels.Through a rigorous examination of the relevant literature and an in-depth analysis of accident report archives, 17 risk influential factors (RIF) pertinent to maritime accidents were identified.These encompass vessel, environmental, and other factors as shown in Table 1.Since this study focuses solely on fishing vessels, the vessel type is considered irrelevant and thus excluded from the RIFs.Fishing vessels utilize unique operational methods, including single trawl, double trawl, purse seine, and fishing transport, among others.These methods have significant implications for their associated risks.For example, anecdotal evidence from fishermen and regulatory bodies suggests double trawlers have lower operational risks than single trawlers.Therefore, this study introduces the operation mode of fishing vessels as a novel influential factor.The finalized list consists of 17 RIFs, as shown in Table 2.The primary aim of this study is to discern the RIFs impacting the outcomes of fishing vessel accidents.These outcomes are stratified into four severity categories based on considerations such as human casualties, property damage, and equipment impairment.For categorization purposes, we use the designated deemed losses (DL, in RMB).The classifications are as follows: general (DL < 10,000), severe (10,000 ≤ DL < 100,000), major (100,000 ≤ DL < 1000,000), and critical (DL > 1000,000).

Research Methodology
This study first utilizes the random forest algorithm to evaluate and rank the importance of risk influential factors (RIFs).The paramount RIFs were then chosen to construct a Bayesian model, employing the tree-augmented naive Bayes (TAN) classifier specifically.After model construction, sensitivity analysis, validation, and performance assessment were conducted, as depicted in Figure 2. The primary aim of this study is to discern the RIFs impacting the outcomes of fishing vessel accidents.These outcomes are stratified into four severity categories based on considerations such as human casualties, property damage, and equipment impairment.For categorization purposes, we use the designated deemed losses (DL, in RMB).The classifications are as follows: general (DL < 10,000), severe (10,000 ≤ DL < 100,000), major (100,000 ≤ DL < 1000,000), and critical (DL > 1000,000).

Research Methodology
This study first utilizes the random forest algorithm to evaluate and rank the importance of risk influential factors (RIFs).The paramount RIFs were then chosen to construct a Bayesian model, employing the tree-augmented naive Bayes (TAN) classifier specifically.After model construction, sensitivity analysis, validation, and performance assessment were conducted, as depicted in Figure 2.

Random Forest
The literature review and accident data analysis revealed 17 potential risk factors contributing to fishing vessel accidents.Due to computational complexity and overfitting risks, it is crucial to screen these factors before modeling.The aim is to identify those with the most significant impact.The selected factors will then become input variables for the Bayesian model.
Breiman [39] introduced random forest in 2001, as an ensemble statistical learning technique using classification and regression tree (CART) models.It handles multi-collinearity and high-dimensionality by accurately capturing the impact of multiple

Random Forest
The literature review and accident data analysis revealed 17 potential risk factors contributing to fishing vessel accidents.Due to computational complexity and overfitting risks, it is crucial to screen these factors before modeling.The aim is to identify those with the most significant impact.The selected factors will then become input variables for the Bayesian model.
Breiman [39] introduced random forest in 2001, as an ensemble statistical learning technique using classification and regression tree (CART) models.It handles multi-collinearity and high-dimensionality by accurately capturing the impact of multiple explanatory variables.As a result, it is widely regarded as one of the best algorithms [40].It splits the dataset into multiple subsets, building comparatively weak decision tree models per subset.These are amalgamated into a potent composite model via a voting mechanism, significantly reducing over-fitting issues with models such as ID3, C4.5, and CART [41].Any decision tree can be used as a sub-model.This study uses the CART tree.Nodes are split by minimizing the Gini index, which quantifies the purity of classification.The Gini index evaluates the efficacy of random forest feature selection.For node K with sample set D of e categorized samples D 1 , D 2 , . .., D e , the Gini index of node K can be computed according to Equation (1): where P 1 , P 2 , . .., and P i are the probabilities corresponding to each classified sample.
Equation (1) shows that the Gini index denotes the likelihood of randomly selecting two different class samples from the dataset.As such, when selecting attributes to partition node K, minimizing the Gini index determines the optimal partitioning.If attribute F partitions node K into child nodes {K 1 , K 2 , . .., K i }, the post-partition Gini index can be computed according to Equation (2): where G ki represents the number of samples partitioned to the ith child node from node K, and l is the total number of samples in node K.
Every sample in the test set is assessed using each decision tree, leading to corresponding class predictions C1(X), C2(X), . .., CT(X), with X being a random variable that signifies the sampled instance.As trees function independently, their T output results are aggregated through a voting process.The class with the most votes from the weak T decision trees is the final class prediction.
During training, a random sampling method is implemented to create datasets.Weak classifiers use a fraction of samples, generating out-of-bag (OOB) data.The generated random forest's performance can be assessed using this OOB data.If we assume the total number of OOB data points to be Q, these Q OOB data points serve as input and are fed into the pre-established random forest classifier.The classifier then categorizes each of the Q data points.Let C be incorrect classifications; the sample OOB error is given by Equation (3), and the total OOB error for the random forest is computed according to Equation (4): where N is the number of samples.The OOB data can not only be used to calculate the model's error but also to assess the importance of features [42,43].Equation ( 5) calculates the importance qt of feature t: where O errt,i is the OOB error obtained by using feature t to classify the samples and O errt,i − O erri represents the change in OOB error as a result of variations in the feature variable t.Larger values indicate a greater OOB accuracy decrease, indicating higher importance.

Tree-Augmented Naive Bayes (TAN)
Introduced by Friedman et al. [44], the tree-augmented naive Bayes (TAN) classifier is a Bayesian network classifier that incorporates a tree-like structure.TAN expands naive Bayes by integrating directed arcs between strongly dependent attributes while restricting attribute connectivity.This leads to a graphical model with a tree structure that depicts attribute dependencies.Compared to standard naive Bayes, TAN better leverages attribute dependencies, avoiding exponential computation of complex dependencies and improving classification performance.In the TAN tree, the class variable is at the root without any parent nodes.Each attribute node can have one other attribute variable as its parent, allowing up to two parent nodes.
When using this model for classification, an unknown instance with an unclassified category is calculated using the Bayesian formula.The class label with the maximum probability is chosen as the assigned class, as per Equation ( 6): where the set is obtained based on the constructed TAN structure.The learning of TAN involves an optimization problem, and its mathematical formulation is well documented in [38,45].Once the qualitative structure of the TAN network is established, the next step is parameter learning to determine the conditional probability tables (CPT) for each node.Standard methods for learning parameters from data samples include maximum likelihood estimation, Bayesian estimation for complete datasets, and the expectation-maximization algorithm for incomplete datasets [46].
Given the extensive database constructed in this study and the higher accuracy of Bayesian estimation over maximum likelihood estimation [47], Bayesian estimation is chosen for parameter learning.The implementation of Bayesian networks (BN) in maritime risk analysis generally follows a series of recognized steps, including data collection, variable identification, structure learning, model validation, and sensitivity analysis.This study mirrors such an approach, segmented into four parts: database construction, model design, model validation, and model output.The method selected for this research is the tree-augmented naive Bayes (TAN) classifier, and a BN visualization software, GeNIe modeler, which is a graphical user interface allowing for interactive model building and learning [21,23], was used to develop BN structure.

Model Validation
To examine the comprehensive impact of multiple influencing factors on fishing accidents and confirm the accuracy of the BN model, the sensitivity analysis inference process should satisfy at least two hypotheses during the following axioms [48]: Axiom 1: Any increase or decrease in the probability values of each parent node should result in a corresponding relative increase or decrease in its child node.

Axiom 2:
The cumulative impact resulting from a combination of probability shifts in evidence should not be less pronounced than the impact arising from a subset of that evidence.
It is important to emphasize that these hypotheses act as benchmarks to gauge the accuracy and reliability of the BN model when assessing the multifaceted influences on accidents.

Sensitivity Analysis
Sensitivity analysis is a commonly used method for uncertainty analysis.When analyzing maritime accident risks using BN, the goal of sensitivity analysis is to identify risk factors that exert a significant impact on the target variable.Recognizing these factors helps implement appropriate measures to mitigate risks associated with fundamental factors.To ensure a thorough evaluation, both mutual information and sensitivity analysis methods are employed.
(1) Mutual Information Mutual information is used to identify the importance and priority of risk factors in influencing the target node.Information entropy is a statistical metric denoting the level of uncertainty in a random variable.A higher entropy value signifies greater uncertainty in the variable.The calculation formula for information entropy is as follows in Equation (7).
Mutual information quantifies the shared information between two variables, acting as an indicator of their interdependence.It measures the reduction in information entropy of a query node based on the probability distribution of evidence nodes.The mutual information for two discrete random variables, X and Y, can be defined using Equation ( 8): where P(x,y) is the joint probability distribution function of variables X and Y, and p(x) and p(y) are the marginal probability distribution functions of X and Y, respectively.The mutual information is used to identify the most influential factors that have the highest degree of dependence on the query node.
(2) Sensitivity Analysis Sensitivity analysis is a technique that helps validate the probability parameters of Bayesian networks, focusing on how small changes in the numerical parameters of the model (i.e., prior probabilities and conditional probabilities) influence the output parameters (e.g., posterior probabilities).Parameters that are particularly sensitive have a more pronounced influence on the inference results.The precision of the numerical parameters is crucial for calculating the target posterior probabilities.A large derivative of a parameter p can lead to a significant change in the posterior probability of the target.Conversely, a small derivative suggests that notable changes in the parameter may only minimally affect the posterior.

Model Prediction Performance Evaluation
This study evaluates the prediction accuracy and reliability of the BN model using a confusion matrix and various performance evaluation metrics.We partitioned a new database randomly into training and testing datasets.The former facilitated model construction, while the latter enabled model evaluation.
Overall accuracy is a simple and effective metric for evaluating the prediction accuracy of the constructed model, defined as the percentage of correctly predicted samples out of the total samples.However, it is not suitable for measuring results with imbalanced samples.To address these issues, precision, recall, F-value, specificity, and false positive rate (FPR) were employed to assess the reliability and robustness of the model.
Precision represents the probability of an optimistic prediction being a true positive among all predicted positive samples.Recall refers to the probability of a positive sample being predicted as positive among all actual positive samples, also known as sensitivity.Precision assesses the model's accuracy, while recall evaluates the consistency of the model.However, they are mutually constrained.The F-value, calculated as twice the harmonic mean of precision and recall, therefore offers a balanced assessment of both precision and recall.Specificity represents the proportion of correctly predicted negative samples to all actual negative samples.A higher specificity value is desirable.Conversely, a lower false positive rate (FPR) value is preferable.The detailed confusion matrix is available in Table 3, and the mathematical definitions of each metric span are provided in Equations ( 9)- (13)

Model Consistency Verification
Cohen's kappa statistic measures agreement between categorical variables.For example, kappa can assess the consistency of different raters in classifying subjects into one of several groups.Kappa can also be used to assess the agreement between different methods of categorical assessment.
In this study, the Cohen's kappa statistic was used to verify the model consistency of the predictive performance of each consequence of fishing vessel accidents.Kappa is calculated from the observed and expected frequencies on the diagonal of a square contingency table.In this context, the square contingency table is the confusion matrix, as shown in Table 3.The kappa statistic is defined in Equation ( 14): where k is the kappa statistic, and p 0 indicates the relative agreement between the true and predicted values.The value of p 0 is defined in Equation (15).P e indicates the hypothetical probability of chance agreement.The value of p e is defined in Equation ( 16).
The calculation result of the kappa statistic is k ∈ [−1,1].A value closer to 1 indicates stronger model consistency.Studies [49,50] suggest that the model is considered almost perfect when k ∈ [0.81,1].

RF-Based RIF Selection
This study utilizes a sample of 3448 accident data collected from 2018 to 2022 in Ningbo.The dependent variable in this model is the consequence of the fishing vessel accident.The training set comprises 80% of the data, equating to 2758 samples, while the remaining 690 samples form the test set.The significance of variables can be evaluated through metrics like a reduction in impurity or a decrease in accuracy.In this study, the significance of variables was evaluated using the mean decrease in the Gini coefficient, which is a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting random forest.The higher the value of the mean decrease Gini score, the higher the importance of the variable in the model, as shown in Table 4.Of all the explanatory variables, the operation mode of the fishing vessel ranked fourth among all explanatory variables and has a significant impact on the consequences.Its importance notably exceeds other variables, indicating its unique role in determining accident severity.Variables with importance scores greater than 0.04 are ranked in this order: season, accident type, human factors, operation mode, wind, age, gross tonnage, length, accident locations, time of day, and power.Conversely, factors like sea condition, hull type, visibility, crew, width, and equipment are less critical as influencing factors.

Bayesian Model Structure Learning
The random forest model identified 11 highly impactful factors: operation mode of the fishing vessel, season, age, accident type, accident locations, wind, time of day, human factors, power, gross tonnage, and length of the vessel.Their high importance scores led to their selection for the Bayesian network (BN) model.
To create a purely data-driven model, the tree-augmented naive Bayes (TAN) [5,22] approach was employed for training without integrating prior knowledge.The final training outcomes are depicted in Figure 3.The trained BN structure consists of 12 nodes interconnected by 21 links.Nodes highlighted in light blue represent vessel-related factors; those in orange correspond to accident-related factors; and the ones in bright blue signify environment-related factors.Links were established by examining the correlations among all influencing factors as per accident records.
To create a purely data-driven model, the tree-augmented naive Bayes (TAN) [5,22] approach was employed for training without integrating prior knowledge.The final training outcomes are depicted in Figure 3  The interaction strength between nodes within the model as derived from the training results is illustrated in Figure 4. Strong interactions signify substantial causal connections, while weaker interactions are termed weak causal relationships.The thickness of the connecting lines in Figure 4 symbolizes the intensity of these causal relationships.Bold links indicate substantial relationships.Strength is calculated via the Jensen-Shannon divergence.The substantial causal relationships include connections between length and power, length and gross tonnage, gross tonnage and operation mode, gross tonnage and age, gross tonnage and accident type, consequence of fishing vessel accident and accident type, accident type and wind, human factors and accident type, consequence of fishing vessel accident and accident locations, and accident type and season.Table 5 shows strong correlations among vessel factors.Logically, larger vessels having greater length and width need more power.For environmental factors, wind has robust relationships with the accident type and consequence, with windy conditions increasing collisions/contact risks and severity.Accident factors also connect strongly, with Table 5 shows strong correlations among vessel factors.Logically, larger vessels having greater length and width need more power.For environmental factors, wind has robust relationships with the accident type and consequence, with windy conditions increasing collisions/contact risks and severity.Accident factors also connect strongly, with about 90% of collision/contact accidents caused by human errors like improper operation and negligence.

Bayesian Model Probabilistic Learning
The probabilities were computed via the Bayesian method, and each node's conditional probability tables (CPTs) were developed using the GeNIe software.The probability distributions for the nodes were determined based on historical accident data.The final Bayesian model is shown in Figure 5.
The node posterior probability distributions in Figure 5 provide initial observations.For fishing vessels, single trawl operations are most frequently linked with accidents, accounting for 52% of accidents, followed by double trawl at 19% and gillnet at 13%.Vessels over 24 m in length, with a power above 136 kW and a tonnage between 100 and 200 tons, are most prone to accidents.The age of the vessel also plays a major role, with nearly 74% of accidents involving vessels over ten years old.As age increases, so does accident likelihood.For example, vessels aged between 10 and 20 years account for 36% of accidents, while those aged over 20 years account for 38%.
Regarding environmental factors, accident likelihoods in spring, autumn, and winter are 34%, 30%, and 22%, respectively.The chance of accidents happening in the summer is 14%, which is likely due to the fishing ban during this period.The daytime accident probability is 62% versus 38% at night.Interestingly, the impact of wind speed is notably different from previous research.When wind speed is under scale 7, accidents account for 89%, possibly since weather forecasts now mitigate wind risks.
The most frequent types of fishing vessel accidents are collisions (47%), contact damage (26%), and mechanical failures (12%).Wind damage and fires are 2% and 4%.Accidents were most likely to occur in the operational sea area (65%), versus 35% near-shore.Human errors such as improper operation and negligence comprise 51% of accidents.
over 24 m in length, with a power above 136 kW and a tonnage between 100 and 200 tons, are most prone to accidents.The age of the vessel also plays a major role, with nearly 74% of accidents involving vessels over ten years old.As age increases, so does accident likelihood.For example, vessels aged between 10 and 20 years account for 36% of accidents, while those aged over 20 years account for 38%.

Mutual Information
A sensitivity analysis of variables was performed employing the mutual information (MI) technique.MI quantifies the mutual dependence between two elements, with information entropy representing the interaction's significance.High entropy signifies strong correlation, while low entropy implies weak correlation.The analysis was executed on the target nodes, with results presented in Table 6 and Figure 6.Mutual information was computed between the "Consequence" target node and its influencing factors.The blue line represents the mutual information values, while the orange line indicates differences between adjacent values.A higher mutual information value signifies greater factor influence on the "Consequence".Table 6 outlines the mutual information values, entropy percentages, and belief differences.
Power exerts significant influence with a mutual information value of 0.08309.Subsequent influential factors ranked by decreasing mutual information are length, tonnage, operation mode, accident type, and human factors, with values of 0.07446, 0.06431, 0.04053, 0.03541, and 0.02873, respectively.This affirms the significant impact of "Operation mode" on the "Consequence", validating its inclusion as a RIF.
The 6th to 9th mutual information values show minor variation, with changes of 0.00141, 0.00127, and 0.00304 between adjacent values.Based on their mutual information and variation rate, the six RIFs are recognized as the most significantly varying factors.

Sensitivity Analysis
Within the scope of Bayesian modeling for risk analysis of fishing vessel accidents, recognizing risk determinants substantially influencing "Consequences" is critical for applying specific mitigation measures.
Utilizing GeNIe software, a sensitivity analysis was carried out, designating all 11 variables as target nodes.Results show high sensitivity for fishing vessel power, length, tonnage, and consequences.This is consistent with the structure of the mutual information analysis, where the top 10 scenarios under three states of power were selected for impact analysis after the sensitivity test of the Bayesian model.The distribution is shown in Figures 7-9.The bar shows the range of changes in the target state as the parameter changes in its range (±10%).The color of the bar shows the direction of the change in the target state, red expresses negative and green positive change.Mutual information was computed between the "Consequence" target node and its influencing factors.The blue line represents the mutual information values, while the orange line indicates differences between adjacent values.A higher mutual information value signifies greater factor influence on the "Consequence".Table 6 outlines the mutual information values, entropy percentages, and belief differences.
Power exerts significant influence with a mutual information value of 0.08309.Subsequent influential factors ranked by decreasing mutual information are length, tonnage, operation mode, accident type, and human factors, with values of 0.07446, 0.06431, 0.04053, 0.03541, and 0.02873, respectively.This affirms the significant impact of "Operation mode" on the "Consequence", validating its inclusion as a RIF.
The 6th to 9th mutual information values show minor variation, with changes of 0.00141, 0.00127, and 0.00304 between adjacent values.Based on their mutual information and variation rate, the six RIFs are recognized as the most significantly varying factors.

Sensitivity Analysis
Within the scope of Bayesian modeling for risk analysis of fishing vessel accidents, recognizing risk determinants substantially influencing "Consequences" is critical for applying specific mitigation measures.
Utilizing GeNIe software, a sensitivity analysis was carried out, designating all 11 variables as target nodes.Results show high sensitivity for fishing vessel power, length, tonnage, and consequences.This is consistent with the structure of the mutual information analysis, where the top 10 scenarios under three states of power were selected for impact analysis after the sensitivity test of the Bayesian model.The distribution is shown in Figures 7-9.The bar shows the range of changes in the target state as the parameter changes in its range (±10%).The color of the bar shows the direction of the change in the target state, red expresses negative and green positive change.
x FOR PEER REVIEW 17 of 25     Figure 7 shows that vessels with power less than 44 kW account for 8.59% of accidents.Severe accidents usually occur in this range, with a peak sensitivity value of 0.126.In Figure 8, vessels with power between 44 and 136 kW comprise 2.8% of accidents, where the sensitivity value of having a severe accident is 9%.Figure 9 shows that vessels with power more than 136 kW accounted for 88.6% of accidents, and the sensitivity of having a severe accident soared to 91.16%.Figure 7 shows that vessels with power less than 44 kW account for 8.59% of accidents.Severe accidents usually occur in this range, with a peak sensitivity value of 0.126.In Figure 8, vessels with power between 44 and 136 kW comprise 2.8% of accidents, where the sensitivity value of having a severe accident is 9%.Figure 9 shows that vessels with power more than 136 kW accounted for 88.6% of accidents, and the sensitivity of having a severe accident soared to 91.16%.
The sensitivity analysis shows that fishing vessels generally have severe and general accidents.If only the power of fishing vessels is considered, the probability of major and critical accidents is relatively low.
In order to understand the causal effects of RIFs, their variation was further explored under the different consequences of fishing vessel accidents.Since the BN comprises 11 RIFs with extensive scenarios, simulating all potential state combinations is challenging.Based on the mutual information, the first six variables, power, length, gross tonnage, operation mode, accident type, and human factors, were chosen for additional sensitivity analysis to identify their nuanced influence on the "Consequence".The probabilities for each variable's state were progressively increased to 100%, yielding the joint probabilities depicted in Table 7.It shows the shifts in corresponding consequences when individual node states become 100%, with the respective probability changes.The upward arrow indicates that the probability of the target node increases, and the downward arrow indicates that the probability of the target node decreases.For example, when operation mode is 100% single trawling, the consequences shift from (general: 23%, severe: 68%, major: 7%, critical: 2%) to (general: 19%, severe: 72%, major: 7%, critical: 2%), with changes of −4%, 4%, 0%, and 0%, respectively.Table 7 highlights that "other" accident types have the highest "Major" consequence probability, possibly including shipwrecks.Similarly, fire incidents show the greatest "Major" likelihood.Moreover, gillnet vessels under 44 kW power have significantly higher "Severe" consequence chances.
Table 7 highlights that "other" accident types have the highest "Major" consequence The sensitivity analysis shows that fishing vessels generally have severe and general accidents.If only the power of fishing vessels is considered, the probability of major and critical accidents is relatively low.
In order to understand the causal effects of RIFs, their variation was further explored under the different consequences of fishing vessel accidents.Since the BN comprises 11 RIFs with extensive scenarios, simulating all potential state combinations is challenging.Based on the mutual information, the first six variables, power, length, gross tonnage, operation mode, accident type, and human factors, were chosen for additional sensitivity analysis to identify their nuanced influence on the "Consequence".The probabilities for each variable's state were progressively increased to 100%, yielding the joint probabilities depicted in Table 7.It shows the shifts in corresponding consequences when individual node states become 100%, with the respective probability changes.The upward arrow indicates that the probability of the target node increases, and the downward arrow indicates that the probability of the target node decreases.For example, when operation mode is 100% single trawling, the consequences shift from (general: 23%, severe: 68%, major: 7%, critical: 2%) to (general: 19%, severe: 72%, major: 7%, critical: 2%), with changes of −4%, 4%, 0%, and 0%, respectively.Table 7 highlights that "other" accident types have the highest "Major" consequence probability, possibly including shipwrecks.Similarly, fire incidents show the greatest "Major" likelihood.Moreover, gillnet vessels under 44 kW power have significantly higher "Severe" consequence chances.
Table 7 highlights that "other" accident types have the highest "Major" consequence probability, possibly including shipwrecks.Similarly, fire incidents show the greatest "Major" likelihood.Moreover, gillnet vessels with under 44 kW of power have significantly higher "Severe" consequence chances.

Model Accuracy
Additional sensitivity analyses were conducted to investigate the cumulative effects of multiple variables and confirm the accuracy of the model.The first six RIFs were taken as variable sets.Minor 5% increases were made to their prior probabilities towards extreme states impacting "Consequence".This was implemented sequentially from the first node, resulting in cumulative changes in values for power, length, gross tonnage, operation mode,

Model Prediction Performance Test
A randomly chosen subset of 690 accidents (20% of the total) was used as the test dataset to assess the prediction capability of the model.This assessment is reflected in the confusion matrix shown in Table 9.Based on the matrix, the overall accuracy of the model is 84.6% (584 out of 690).The detailed accuracy values in Table 9 show the most accurate predictions were for "Major" accidents at 87.5%.Accuracy was 80.6% for "General", 85.9% for "Severe", and 75.0% for "Critical" accidents.According to Section 4.5, each accident type's five performance metrics were calculated and presented in Table 10.For "Severe" accidents, the BN model achieves 96.1% accuracy.For "General", "Major", and "Critical", accidents, the model's recall rate is over 80%.Higher specificity and a lower false-positive rate (FPR) are preferable.Table 10 shows that the specificity is over 90% for all types, while the FPR is under 9%.These comparative results further confirm the excellent performance and reliability of the constructed model.To compare the predictive accuracy, a traditional Bayesian model using 17 factors without feature selection was built.With 20% randomly selected test data, its overall accuracy was 62.1% (428/690).The accuracy of the Bayesian model constructed through feature selection (84.6%) was significantly higher, further confirming its accuracy.
Furthermore, we calculated the kappa coefficient for the Bayesian model built without feature selection, which was 0.7633.This is significantly lower than the Bayesian model proposed in this research.The model's validation provides further evidence of its enhanced performance.

Model Consistency Test
As per Equation ( 15) and the confusion matrix in Table 8, p e is determined to be 0.132.The overall accuracy of P 0 equals 0.846.Using Equation ( 14), we calculate the kappa coefficient to be 0.8333.It is commonly known that when the kappa coefficient (k) falls within the range [0.81,1], the model is considered to be nearly perfect.This affirmation further underlines the strong consistency displayed by the developed model.
The kappa coefficient for the Bayesian model without RIF selection was significantly lower at 0.7633.This validates the enhanced performance of the proposed model incorporating RIF selection.

Case Verification
To further demonstrate the effectiveness of the model, a recent fishing vessel collision accident in 2023 was selected for evaluation.On 4th March 2023, the fishing vessel "Zhe Xiang Yu *****" grounded in the East China Sea.The accident report outlines 11 relevant parameters in Table 11.The Bayesian network (BN) model we developed was used to simulate this incident (Figure 10).The simulation concluded with an 87.0% probability of the consequence of an accident falling into the "General" category.This result aligns with the evaluation performed by the Ningbo Mutual Insurance Association, which assessed a loss of 45,256 yuan.The effectiveness of the BN model in this study is further endorsed.The kappa coefficient for the Bayesian model without RIF selection was significantly lower at 0.7633.This validates the enhanced performance of the proposed model incorporating RIF selection.

Case Verification
To further demonstrate the effectiveness of the model, a recent fishing vessel collision accident in 2023 was selected for evaluation.On 4th March 2023, the fishing vessel "Zhe Xiang Yu *****" grounded in the East China Sea.The accident report outlines 11 relevant parameters in Table 11.The Bayesian network (BN) model we developed was used to simulate this incident (Figure 10).The simulation concluded with an 87.0% probability of the consequence of an accident falling into the "General" category.This result aligns with the evaluation performed by the Ningbo Mutual Insurance Association, which assessed a loss of 45,256 yuan.The effectiveness of the BN model in this study is further endorsed.

Discussion
Building an extensive database is paramount for fishing vessel accident analysis.In our research, we harnessed claims data from a mutual insurance association for the first

Discussion
Building an extensive database is paramount for fishing vessel accident analysis.In our research, we harnessed claims data from a mutual insurance association for the first time, creating a dedicated database for fishing vessel incidents with broader coverage than previous research.We then applied the RF-BN model framework to simulate how different variables affect the consequences.This amalgamates the random forest (RF) method to discern crucial factors based on feature importance, reducing them from 17 to 11 while maintaining the model's precision.This significantly eases computational complexity and reduces overfitting risk.Further validation was conducted through real-world scenarios, demonstrating the model's solid generalizability.
Additionally, the TAN Bayesian model identified associations between the selected essential factors and the severity of accidents and the mutual relations among independent variables.We ultimately pinpointed three categories of influential factors contributing to the consequences.These insights offer valuable knowledge into the factors shaping the consequences of accidents and are discussed as follows.
Given the context of this study, operation mode was the most vital variable per RF feature importance.Sensitivity analysis further highlighted its significant influence on severity.Single trawling had the highest probability of 54% versus other fishing operations.The introduction of the operation mode as a new risk factor was reinforced by both RF and Bayesian model analysis.Length, power, and tonnage are interconnected factors that considerably affect the consequences.The sensitivity analysis of the TAN model pointed to these as the three most sensitive variables, aligning with previous studies [2,9,11,12].Our research also unveiled a strong correlation between vessel age and the consequence of an accident, congruent with prior studies [10,11].Older vessels, especially those over 20 years old, have higher accident risks.Therefore, the safety of older fishing vessels should receive more focus.Our research also identified a higher accident likelihood for gillnet fishing vessels.Policies could encourage safer vessel construction, like double trawlers.
The three identified environmental factors include wind, time of day, and season.Prior studies [8,9] have substantiated the significance of wind as an influencing factor on the consequences, which aligns with our findings.Similarly, the season is a pivotal factor affecting accident severity, consistent with previous research [16].The probability of accidents happening in the summer is the lowest, reflecting reality as the study area has a fishing ban from 1st May to 17th September, resulting in fewer active fishing vessels and, hence, a lower accident probability.Additionally, the time of day significantly impacts accidents and emerges as one of the most influential factors, consistent with previous research.Numerous researchers, including Jin et al. [8], have indicated that fatal fishing vessel accidents are more likely to occur at night.The increased probability of severe accidents during nighttime might be attributed to the challenge sailors face in judging distances and estimating visibility at night, leading to heightened confusion as visibility naturally diminishes.
Accident type, location, and human factors were the accident-related factors identified.The significance of these three factors has been verified in previous research.Their significance is verified in previous studies [25][26][27].The importance of accident locations is also consistent with prior findings [8,9].Past studies [9,16] identified accident type as a key factor in determining fishing vessel accident severity, which is in line with this study's findings.
Building upon the critical RF factors, this study constructs a Bayesian model and conducts a sensitivity analysis.Six key factors influencing the severity of fishing vessel accidents are identified: power, length, gross tonnage, operation mode, accident type, and human factors.

Conclusions
This study demonstrates the utilization of insurance data and Bayesian network modeling to analyze fishing vessel accidents.An accident database was developed from rare FMIS data, risk factors were identified, and previous research was integrated to discern influencing factors.
The research presents a data-driven Bayesian model for fishing vessel accidents using influential factors selected via a random forest algorithm.Despite the use of a finite set of indicators, the model exhibits excellent performance and enhanced predictive abilities, as validated through testing and assessments.The findings provide invaluable accident prevention insights.
(1) The 11 predominant factors for fishing vessel accidents were identified, including operation mode, season, age, accident type, wind, gross tonnage, time of day, human factors, power, accident locations, and length.Post-model sensitivity analysis further distilled the core factors to six: power, length, gross tonnage, operation mode, accident type, and human factors.(2) The data-driven Bayesian model incorporates an enhanced approach, achieving an impressive 89% prediction accuracy based on real-world case studies.This makes it a reliable tool for accident prevention.
(3) The model provides comprehensive information for regulatory authorities and other stakeholders, delivering crucial insights for monitoring the conditions of fishing vessels and creating pertinent policies to ensure maritime safety during fishing operations.
Whilst the RF-BN model developed in this study explains the relevant factors affecting the consequences of fishing vessel accidents, there are still some limitations to this study.For example, the sample used in this paper is the data of fishing vessel accidents in a city along the coast of China, and because of the limitations of the region from which the sample originated, the conclusions drawn in this paper may not be applicable to the analysis of fishing vessel accidents in other regions.Meanwhile, constructing a BN model under the assumption that the samples and variables are independent of each other requires determining the relationship between nodes during the structural learning process, which often requires further discussion in practical applications.In this study, the relationships between nodes were constructed on a data-driven basis.However, the results may be biased if there are irrelevant connections between nodes.Therefore, in further studies, researchers can use the BN model in combination with other methods to improve the reliability of the results.In addition, data related to human factors was collected and assessed subjectively; a more rigorous approach to human factor analysis might help improve the quantitative analysis of fishing vessel accidents.

Figure 1
Figure 1 illustrates the spatial distribution of these accidents with locations indicated by red dots.

Figure 1 .
Figure 1.The distribution of accident locations.

Figure 1 .
Figure 1.The distribution of accident locations.

p 0 =
TP + TN TP + FP + FN + TN (15) p e = (TP + FP) * (TP + FN) + (FN + TN) * (FP + TN) . The trained BN structure consists of 12 nodes interconnected by 21 links.Nodes highlighted in light blue represent vessel-related factors; those in orange correspond to accident-related factors; and the ones in bright blue signify environment-related factors.Links were established by examining the correlations among all influencing factors as per accident records.

Figure 3 .
Figure 3. Bayesian network structure diagram.The interaction strength between nodes within the model as derived from the training results is illustrated in Figure 4. Strong interactions signify substantial causal connections, while weaker interactions are termed weak causal relationships.The thickness of the connecting lines in Figure 4 symbolizes the intensity of these causal relationships.Bold links indicate substantial relationships.Strength is calculated via the Jensen-Shannon divergence.The substantial causal relationships include connections between length and power, length and gross tonnage, gross tonnage and operation mode, gross tonnage and age, gross tonnage and accident type, consequence of fishing vessel accident and accident type, accident type and wind, human factors and accident type, consequence of fishing vessel accident and accident locations, and accident type and season.Sustainability 2023, 15, x FOR PEER REVIEW 13 of 25

Figure 4 .
Figure 4. Strength map of interaction pairs (links) in BN.

Figure 4 .
Figure 4. Strength map of interaction pairs (links) in BN.

Figure 6 .
Figure 6.Mutual information values and variance.

Figure 6 .
Figure 6.Mutual information values and variance.

Figure 7 .
Figure 7. Sensitivity analysis of fishing vessels with power less than 44 kW.

Figure 8 .
Figure 8. Sensitivity analysis of fishing vessels with power between 44 kW and 136 kW.

Figure 7 . 25 Figure 7 .
Figure 7. Sensitivity analysis of fishing vessels with power less than 44 kW.

Figure 8 .
Figure 8. Sensitivity analysis of fishing vessels with power between 44 kW and 136 kW.

Figure 8 .
Figure 8. Sensitivity analysis of fishing vessels with power between 44 kW and 136 kW.

Figure 9 .
Figure 9. Sensitivity analysis of fishing vessels with power more than 136 kW.

Figure 9 .
Figure 9. Sensitivity analysis of fishing vessels with power more than 136 kW.

Table 1 .
RIFs in the retrieved literature.

Table 2 .
Definition and status of risk influencing factors (RIFs).

Table 3 .
The confusion matrix.

Table 4 .
Importance score of factors influencing the "Consequences".

Table 5 .
Parent node and child node influence strength.

Table 6 .
Mutual information between the node "Consequences" and the parent node.

Table 9 .
Confusion matrix for prediction results.

Table 10 .
Performance results for different consequences.

Table 11 .
Details of the fishing vessel accident that occurred in 2023.

Table 11 .
Details of the fishing vessel accident that occurred in 2023.