Identification of Key Contributing Factors and Risk Propagation Paths in Safety Accidents at Chinese Chemical Enterprises

Ni, Zhiheng; Li, Zhen; Zhang, Mingyu; Morake, Otsile

doi:10.3390/safety12010005

Open AccessArticle

Identification of Key Contributing Factors and Risk Propagation Paths in Safety Accidents at Chinese Chemical Enterprises

School of Management, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Safety 2026, 12(1), 5; https://doi.org/10.3390/safety12010005

Submission received: 29 October 2025 / Revised: 16 December 2025 / Accepted: 18 December 2025 / Published: 5 January 2026

Download

Browse Figures

Versions Notes

Abstract

To address the complex and uncertain causes of safety accidents in chemical enterprises, this study applied text mining techniques to systematically extract 29 causative factors from 422 accident reports. These factors were classified into five categories: personnel issues, resource management deficiencies, adverse organizational atmosphere, organizational process flaws, and inadequate supervision. Based on the extracted factors, a complex network model of accident causation was constructed. Using degree centrality, betweenness centrality, and eigenvector centrality, seven core causative factors were identified, along with multiple peripheral factors closely linked to them. Bayesian network-based sensitivity analysis further revealed the factors that exert the greatest influence on accident occurrence, and subsequent path analysis uncovered several critical accident propagation paths. The findings reveal core causative factors and critical propagation paths, which may inform the prioritization of risk control measures under conditions of limited resources.

Keywords:

chemical accidents; association rules; complex network; Bayesian network; association paths

1. Introduction

The chemical industry is pivotal to China’s economy, but its safety remains a concern [1,2]. The industry’s high temperatures and pressures and the presence of flammable, explosive, toxic, and harmful substances often result in serious accidents [3]. Incidents in Xiangshui, Jiangsu, and Yima, Henan, caused significant casualties and economic losses. Complex chemical production processes and many interrelated risks make accident causation intricate, requiring in-depth research and solutions [4]. Accident causes involve organizational management, regulatory mechanisms, and personnel operations, each contributing differently. According to the principle of the critical few, prevention and control should target core causes, making accurate identification of key factors vital for effective chemical safety governance [5].

In the face of the massive accident case reports accumulated by chemical enterprises, which are valuable but unstructured information sources, traditional manual analysis methods are inefficient and struggle to avoid subjectivity, and it is difficult to fully explore the deep rules contained in them [6]. Text mining technology provides a new possibility to solve this problem [7]. At present, leveraging text mining technology on accident reports to extract risk evolution patterns from vast amounts of unstructured data, combined with intelligent detection technologies based on sensor networks, computer vision, and AI algorithms (such as video behavior analysis, real-time equipment condition monitoring, and automatic gas leak identification), is crucial for achieving accurate risk identification and rapid response, thereby promoting the transformation of chemical safety governance into a data-driven model [8].

This study addresses the complex and coupled nature of accident causes by developing an analytical framework that combines unstructured text mining with Bayesian networks. This framework is applied to extract key contributing factors from accident investigation reports and to quantify risk transmission paths.

In terms of cause analysis of chemical accidents, one study proposed an analysis method based on the organizational-level accident triangle, which divided accidents into different levels and used the Spearman correlation coefficient to analyze the rationality of accident classification. This method helps to quickly identify risk factors, but its limitation is that it mainly focuses on the organization-level classification of accidents, and the deep mining of accident causes is not deep enough [9]. Another study analyzed representative accidents in chemical Small and Medium-sized Enterprises (SMEs) through a revised Human Factor Analysis and Classification System (HFACS) framework and provided comprehensive recommendations for preventing major chemical accidents [10]. However, the research mainly focuses on human factors, and the analysis of other factors such as equipment and environment is relatively limited, making it difficult to fully reveal the complex causal relationship of the accident. Futher research [11] applied complex network theory and the HFACS framework to analyze 109 accidents, and identified critical nodes through Gephi topology analysis. However, this study simplified the application of complex networks and did not conduct in-depth research on critical nodes. Another study layered the risk perception factors and constructed a system dynamics feedback loop based on the explanatory structure model, revealing that risk experience was the basic driving factor [12]. However, the factor variables in the system dynamics in this study were difficult to quantify, and no empirical measurement method was designed. The existing research on the cause analysis of accidents mainly focuses on a single factor or specific link, making it difficult to fully reveal the complex causal relationship of accidents. In addition, the existing methods have shortcomings in dealing with the dynamic evolution process of accidents and the nonlinear relationship between different factors, which affects the accuracy and reliability of the results.

In terms of quantitative analysis, one study developed a semi-quantitative method that integrates safety, economic, and aging factors to prioritize risk in industrial chemical plants. But for specific high-risk projects, fully quantitative tools are needed to provide a more accurate assessment [13]. Another study proposed a quantitative analysis method based on network evolution to construct the security risk network of chemical enterprises. However, the computational complexity of this method is high when dealing with complex networks, which limits its application to large-scale datasets [14]. Further research analyzed analyzed 271 thermal runaway events, quantified the accident characteristics, and proposed a four-dimensional prevention strategy. However, the study was relatively simple in the classification of accident causes and failed to deeply explore the interaction between different factors [15]. Additionally, a study applied the improved HFACS model combined with the Bayesian method to study unsafe behavior in hazardous chemical storage, but its factor system did not cover macro variables, resulting in limited prediction scenarios [16]. one study proposed an improved fuzzy Bayesian network model to improve the accuracy and reliability of tank accident prediction, but the efficiency of this model needs to be improved when dealing with large-scale datab [17]. Another study used Bayesian belief networks to quantify the risk of refinery fire and explosion events, but this study mainly relied on expert knowledge and had insufficient support for data-driven analysis [18]. The efficiency and accuracy of existing quantitative analysis methods need to be improved when dealing with large-scale data, and there are shortcomings in the in-depth analysis of factor interaction, which makes it difficult to effectively predict the development trend and potential risk of accidents.

In the application of text mining technology, one study developed a semi-automatic method based on natural language processing technology to construct a knowledge graph of chemical accidents, but the efficiency and accuracy of this method in processing large-scale text data need to be improved [19]. Another study developed a chemical accident case text mining method based on word embedding and deep learning, but there is still room for improvement in the classification accuracy of this method for different types of accidents [20]. Further research proposed a method for accident consequence prediction and investigation based on natural language processing technology, but the prediction accuracy and timeliness of this method in practical applications need to be further verified [21]. Another study proposed an improved text mining method to extract risk factors from reports and build a Bayesian network model, but this method is not robust enough to deal with complex causal relationships [22]. Additionally, one study applied text mining and BERT models to identify the potential consequences of refinery accidents, but this approach has limited adaptability when dealing with accidents in different industries [23]. Another work applied NLP and text mining techniques to analyze pipeline accident texts, but the study did not go far enough in extracting dependencies between influencing factors. The existing text mining technology has insufficient efficiency and accuracy when dealing with large-scale text data, and it needs to be improved in terms of prediction accuracy and robustness [24]. Existing studies have shortcomings in accident analysis and factor dependence extraction in different industries, which make it difficult to effectively support the comprehensive analysis and risk prediction of accident causes.

Although many studies have explored the causes of chemical accidents, there are still shortcomings. For instance, some studies focus on single factors, making it difficult to fully reveal the complex causal relationships of accidents [25]. Other research has limitations in data collection and analytical methods, which affect the accuracy and reliability of the results. The quantitative analysis methods in some studies require improvement in efficiency and accuracy when processing large-scale data. Text mining methods in other works suffer from insufficient efficiency and accuracy in handling large-scale text data. To address these limitations, this paper employs text mining technology to extract 29 key factors from a large number of chemical safety accident cases, constructs a complex network model of accident causes, quantitatively analyzes factor importance to determine core key factors, utilizes Bayesian networks for quantitative analysis, reveals key influencing factors, clarifies the accident causation mechanism, and identifies critical association paths. The identification of these core factors and paths contributes to the theoretical understanding of accident causation mechanisms.

2. Research Methods and Data Selection

The causes of safety accidents in chemical production are characterized by complexity, systematicity, and unstructured information. Faced with large volumes of unstructured Chinese accident investigation reports, traditional manual analysis and single-factor statistical methods show clear limitations in revealing multi-factor coupling and risk propagation paths. To enhance the readability and extensibility of the analytical framework in this study, we briefly outline the logical connections and applications of text mining, complex network analysis, and Bayesian networks:

Text mining is used to automatically extract key contributing factors from unstructured reports, overcoming the subjectivity and inefficiency of manual annotation and forming a quantifiable set of discrete factors. Complex network analysis constructs a systemic structural model based on factor associations, identifying core nodes and vulnerable links in the network through topological metrics to reveal the structural characteristics of risk propagation. Bayesian networks further quantify dependencies and causal strengths among factors within a probabilistic framework, supporting uncertainty reasoning and dynamic risk prediction.

These three methods form a progressive analytical chain of “information extraction → structural modeling → relationship quantification,” collectively enabling a systematic deconstruction of accident causation from textual information to network structure and finally to probabilistic mechanisms. This integrated approach not only provides an operable path for analyzing multi-source heterogeneous data in the field of chemical safety but also offers methodological insights for interdisciplinary complex system research.

2.1. Complex Networks

The development of complex network theory has provided new analytical tools for the study of real-world complex systems. This research method integrates theories and techniques from multiple disciplines, such as mathematics and computer science, reflecting the interdisciplinary nature of the research. It aims to reveal the network topology, evolution mechanisms, and functional characteristics of complex systems [26]. At the end of the 20th century, the introduction of small-world networks and scale-free networks marked a significant breakthrough in complex network research [27], providing theoretical support for the analysis of system behavior patterns. This theory has been widely applied in the research of Internet topology [28], social networks [29], infrastructure networks [30], and transportation systems [31], among others. Meanwhile, complex network research focuses on the dynamic evolution process of systems, by tracking changes such as the addition or removal of nodes and the reconfiguration of edges, to explain the internal logic of system development, providing theoretical support for understanding the operational mechanisms of complex systems.

Degree centrality measures a node’s local influence by calculating the number of its direct connections. In a chemical safety risk network, this metric can be used to evaluate the likelihood that a specific risk factor may lead to an accident. Betweenness centrality and closeness centrality, on the other hand, assess a node’s global influence within the network and its sensitivity to external factors from different perspectives. Such a multi-indicator evaluation approach helps to comprehensively identify the key roles of risk factors in the accident propagation process. The degree centrality of a node can be calculated according to Equation (1), where $C_{D} (v_{i})$ represents a normalized degree centrality of node i; $k_{i}$ represents the total number of connections (sum of in-degree and out-degree) of node i; and N represents the total number of nodes in the network.

$C_{D} (v_{i}) = k_{i} / (N - 1)$

(1)
Closeness centrality is used to measure the accessibility of a node to other nodes within the network. Quantifying the shortest path distances between nodes reflects the efficiency and breadth of a node’s influence in information transmission. In a chemical safety risk network, nodes with higher closeness centrality values possess more significant influence, indicating that these nodes can more rapidly establish associations with other risk factors, occupy critical positions within risk transmission paths, and thus hold greater reference value for the formulation of risk warning and prevention strategies. The closeness centrality of a node is quantified according to Equation (2), where $C_{C} (v_{i})$ represents the betweenness centrality of node i; $d_{i j}$ represents the shortest path length from node i to node j; N presents the total number of nodes in the network.

$C_{C} (v_{i}) = (N - 1) / [\sum_{j = 1, j \neq i}^{N} d_{i j}]$

(2)
Betweenness centrality measures the frequency with which a node acts as a “bridge” in the shortest paths of the network, reflecting its control over information flow or risk transmission. The betweenness centrality of a node is calculated as shown in Equation (3), where $C_{B} (v_{i})$ represents the betweenness centrality of node i; $σ_{s t}$ represents the total number of shortest paths from node s to node t.

$C_{B} (v_{i}) = \sum_{s \neq i \neq t} \frac{σ_{s t} (i)}{σ_{s t}}$

(3)

2.2. Bayesian Network

As a graphical probabilistic model, the Bayesian network [32], based on Bayes’ theorem, effectively describes the probabilistic relationships among random variables. This model can achieve autonomous information learning and structure optimization under conditions of limited information and uncertainty and output precise reasoning conclusions.

Conditional Probability Table

In the Bayesian network system, the conditional probability table (CPT) is an important tool for representing the probability distribution of a node given the states of its parent nodes, usually presented in tabular form. This table completely records the conditional probability values of the node under various combinations of its parent node states. Taking node A with two parent nodes, B and C, as an example, its conditional probability can be calculated using Formula (4), and the CPT provides the probability distribution information of the node for each combination of parent node states. Through this formula, the probability distribution characteristics of node A under different combinations of parent node B and C states can be characterized.

P (A | B, C) = \frac{P (A B C)}{P (B C)}

(4)

2.: Joint Probability Distribution

For a Bayesian network containing random variables X₁, X₂,…, X_n, its joint probability distribution P(X₁, X₂,…, X_n) completely describes the probability characteristics of various state combinations of the nodes. This distribution reflects the statistical properties presented by all variable state combinations given the network structure. Within the Bayesian network system, the joint probability distribution can be calculated based on the network’s topological structure and the conditional probability distributions of each node. Specifically, the joint probability distribution can be expressed by Formula (5).

P (X_{1}, X_{2}, \dots, X_{n}) = \prod_{i = 1}^{n} P (X_{i} | P a (X_{i}))

(5)

2.3. Data Sources

The data used in this study are primarily derived from official Chinese chemical accident investigation reports published between 2010 and 2023. Specifically, the database comprises 537 reports released by government agencies such as the State Administration of Work Safety and the Ministry of Emergency Management, covering various safety incidents in the chemical industry. The source of the reports is limited to publicly accessible information on the Internet, and some reports contain missing information. During data preprocessing, such incomplete cases or reports were excluded, resulting in a final sample of 422 accident investigation reports for this study. These reports provide detailed records of accident types, contextual backgrounds, causal analyses, and outcomes across different chemical enterprises, offering a rich empirical foundation for the research. By systematically organizing and analyzing these official reports, we were able to extract the primary contributing factors of accidents and further construct a Bayesian network model for causal inference. The official reports are highly authoritative and reliable, encompassing long-term safety data in the chemical industry, and they capture the characteristics and trends of different types of accidents over time, thus providing invaluable first-hand information for accident analysis.

To assess the robustness of model parameter estimation, this study employs a non-parametric bootstrap method. By performing 1000 resamplings with replacement from the original accident case data, the Bayesian network is retrained each time, and path coefficients as well as sensitivity indices are recalculated. This yields an empirical distribution for each estimated parameter, based on which the 95% confidence interval and standard error are computed.

2.4. Element Selection

2.4.1. Text Preprocessing and Word Segmentation

Accident reports generally include an incident summary, organizational information, and event details. To reduce redundancy, this study extracted three core components—accident process, causal analysis, and responsibility determination—to construct a text-mining corpus. All reports were standardized and saved as “.txt” files.

Among several mature Chinese word segmentation tools (e.g., Jieba, HanLP, PKUSeg, THULAC), Jieba (version 0.42.1) was selected for its balanced efficiency, accuracy, and applicability. Although Jieba provides a basic dictionary, general-purpose lexicons are insufficient for chemical safety texts. Therefore, this study incorporated domain-specific vocabularies from the Sogou Cell Lexicon, including the Specialized Lexicon for Production Safety, Chemical and Chemical Engineering Vocabulary, and the Registered Safety Engineer Lexicon, to enhance segmentation precision.

When selecting the Chinese word segmentation tool, we conducted a preliminary comparison of Jieba (version 0.42.1), HanLP (version 2.2.0), PKUSeg (version 0.3.1), and THULAC (version 0.3.2). Based on 50 randomly sampled accident reports (totaling 18,240 characters), the segmentation accuracy of the four tools was evaluated as follows (Table 1):

Based on the evaluation results presented in the Table 1, Jieba achieved the highest scores in precision (0.923), recall (0.912), and F1-score (0.917), demonstrating its superior accuracy and consistency in segmenting texts related to process safety, accident reports, regulatory penalty documents, and investigation reports from safety supervision administrations. The specific word segmentation rules are detailed in the Supplementary Materials. Therefore, Jieba was selected as the Chinese word segmentation tool for this study.

2.4.2. Feature Selection for Text Analysis

After word segmentation, the chemical accident reports generated a large vocabulary; however, only a limited proportion of these words carried meaningful information relevant to safety themes. Feature selection aims to filter representative terms from the text collection. Analysis revealed that approximately 90% of the words lacked distinctive feature information. To improve the accuracy of keyword identification and topic classification, this study calculated feature values for the segmented terms and extracted key feature words that capture the essential characteristics of chemical safety events.

The Term Frequency–Inverse Document Frequency (TF-IDF) method is a widely used technique for feature extraction in Chinese text analysis. It evaluates the importance of a term based on the product of term frequency (TF) and inverse document frequency (IDF). Term frequency reflects how often a term appears in a single document, calculated as shown in Equation (6), while inverse document frequency measures the distribution of the term across the entire document collection, computed as the logarithm of the ratio of the total number of documents to the number of documents containing the term, as shown in Equation (7). This method simultaneously considers the significance of a term within individual documents and its uniqueness across the corpus.

T F (t, d) = \frac{T h e n u m b e r o f o c c u r r e n c e s o f t e r m t i n d o c u m e n t d}{T h e t o t a l n u m b e r o f t e r m s i n d o c u m e n t t}

(6)

I D F (t, D) = l o g (\frac{T h e t o t a l n u m b e r o f d o c u m e n t s i n t h e c o r p u s D}{T h e n u m b e r o f d o c u m e n t s i n D t h a t c o n t a i n t h e t e r m + 1})

(7)

The TF-IDF method plays an important role in text analysis and information retrieval applications. By calculating the weight of each term, this method effectively supports natural language processing tasks such as keyword identification and document similarity analysis. In feature extraction, TF-IDF provides an efficient quantitative approach for evaluating term significance. Table 2 lists a selection of terms with high TF-IDF values extracted from chemical enterprise production safety accident reports.

2.4.3. Estimation of the Optimal Number of Topics

Determining the number of causal themes in chemical accidents is essential for topic analysis. An appropriate number of topics improves classification accuracy and prevents semantic overlap or omission of key themes. For large text datasets, manual selection is inefficient and error-prone. Therefore, quantitative estimation methods are typically used.

This study employs perplexity to optimize topic number selection. In natural language processing, lower perplexity indicates better model performance, and in topic modeling, it generally corresponds to more coherent topic clustering. The calculation formula for perplexity is shown in Equation (8).

Perplexity (D) = \exp (- \frac{\sum_{d = 1}^{M} \log P (W_{d})}{\sum_{d = 1}^{M} N_{d}})

(8)

In the formula,

D

denotes the text dataset, and

M

is the total number of documents in the dataset.

P (W_{d})

represents the conditional probability of term

W

occurring in document

d

.

The LDA topic model employs a two-stage generative process. First, the topic distribution

θ_{d}

for document

d

is drawn from a Dirichlet distribution with parameter

α

, and the topic

Z_{(d, n)}

for the

n

-th word is sampled from

θ_{d}

. Second, the word distribution

ϕ_{k}

for topic

k

is drawn from a Dirichlet distribution with parameter

β

, and the word

W_{(d, n)}

in document

d

is generated from the multinomial distribution

ϕ_{k}

conditional on

Z_{(d, n)}

. By iterating this process, the entire document corpus is generated. The core mathematical expression of the model is shown in Equation (9).

p (\vec{w_{d}}, \vec{z_{d}}, \vec{θ_{d}}, ϕ | \vec{α, \vec{β}}) = \prod_{n = 1}^{N_{m}} p (w_{d, n} | {\vec{φ}}_{z_{d, n}}) p (z_{d, n} | \vec{θ_{d}}) p (\vec{θ_{d}} | \vec{α}) p (ϕ | \vec{β})

(9)

Parameter estimation in LDA topic models is commonly performed using Gibbs sampling, which is simple to implement and computationally efficient. As a special case of the Metropolis–Hastings algorithm, Gibbs sampling is based on the Markov chain principle, iteratively updating the value of each dimension while keeping others fixed until convergence is achieved. This makes it a practical choice for topic model inference.

Based on the results of Gibbs sampling, analytical expressions for

θ_{d}

and

ϕ_{k}

can be derived. For a specific term

t

, the computation of its corresponding topic distribution

θ_{d}

and word distribution

ϕ_{k}

is given in Equations (10) and (11), respectively.

φ_{k, t}^{\land} = \frac{n_{k}^{(t)} + β_{t}}{\sum_{t = 1}^{V} n_{k}^{(t)} + β_{t}} α β

(10)

θ_{d, k}^{\land} = \frac{n_{d}^{(k)} + α_{k}}{\sum_{k = 1}^{K} n_{d}^{(k)} + α_{k}}

(11)

In the formulas,

n_{k}^{(t)}

represents the number of times term

t

is assigned to topic

k

;

n_{k}^{(t)}

denotes the number of times topic

k

appears in document

d

.

In this study, perplexity was employed as the model performance metric, and the number of topics

K

was systematically optimized (α = 1/K, β = 0.01, number of iterations: 1000). The perplexity was calculated over 1000 iterations. As shown in Figure 1, the perplexity reaches its minimum when the number of topics is five, indicating that the LDA model achieves the best topic clustering at this point. This result determines the optimal number of topic categories for chemical accident analysis.

2.4.4. Causal Factor Identification Results

Based on the determined optimal number of topics, parameters were set, and the top 10 feature words with the highest weights under each topic were selected.

The LDA topic modeling algorithm clusters semantically related feature words, with those appearing at the top of each cluster typically having the highest probability within the corresponding topic. However, the model only performs clustering and probability ranking of feature words and cannot automatically generate topic names. Therefore, the assignment of topic names still requires researchers to summarize and define them manually, combining domain knowledge and practical context.

Based on TF-IDF values and the LDA topic model, and with reference to existing literature and the mechanisms of accident causation, combined with statistical analysis of phrases in accident reports, the contributing factors of safety accidents in chemical enterprises were identified, as shown in Table 3.

Unsafe acts (e.g., a4 (violation of operating procedures), a3 (weak safety awareness)), preconditions for unsafe acts (e.g., a2 (insufficient personnel qualification), x3 (equipment defects)), unsafe supervision (e.g., y4 (inadequate training), z3 (failure to implement primary responsibility)), and organizational influences (e.g., z1 (imperfect safety management system), y1 (lack of organizational emphasis on safety)) all correspond to the classical levels of the HFACS framework and are consistent with findings from existing chemical safety literature. Building upon the HFACS framework, this study further identifies three new factors with distinct contextual features that have received less attention in the existing literature through text mining and network analysis: w1 (insufficient supervision by social departments), z5 (formalism in the hidden danger investigation system), and y2 (illegal organization of production).

2.5. Flowchart

Based on the research methodology described above, the flowchart for this study is presented below (Figure 2):

3. Results

At present, research on the coupling effect among influencing factors of safety accidents in chemical enterprises remains relatively limited. Many studies focus solely on overall statistical identification of contributing factors without adequately distinguishing between core and peripheral factors of accidents. Furthermore, the causes of accidents in the field of chemical production safety are diverse, and the impact of each factor on accidents exhibits significant heterogeneity.

When analyzing the coupling relationships among factors in complex systems, complex network methods have demonstrated substantial advantages. By analyzing the complex network model established based on the results of association rule mining, the correlation strength among various factors can be accurately reflected. This study constructs a causal model of safety accidents in chemical enterprises based on the results of association rule mining and quantifies the intrinsic risk propagation mechanism of accidents through its statistical indicators.

3.1. Mining Association Rules of Contributing Factors in Chemical Safety Accidents

This study was conducted in a Python (version 3.12) programming environment using the mlxtend package, specifically its Apriori and Association Rules modules, to mine frequent patterns and association rules of accident contributing factors. By adjusting the support threshold in the Apriori algorithm, frequent itemsets of contributing factors were systematically extracted. The association rules were further filtered and optimized based on confidence, lift, and support, enabling the identification of strong associations.

The selection of support and confidence thresholds directly affects the quality of the mined rules. The literature indicates no consensus on standard threshold values, with typical ranges of 0.03–0.1 for support and 0.1–0.7 for confidence. To assess parameter sensitivity, multiple combinations of support and confidence were tested, and the resulting number of association rules was recorded, as shown in the attached surface plot (Figure 3).

The surface plot visually illustrates the variation in the number of association rules generated under different support and confidence combinations. It can be observed that in regions with low support and low confidence, the number of rules increases significantly; as both parameters increase, the number of rules gradually declines. In the parameter range with support around 0.07 and confidence around 0.5, the surface exhibits a relatively flat trend, indicating stable rule counts in this region, making it suitable for threshold selection. Using these parameters, this study generated 363 association rules. After further filtering with a lift threshold greater than 1.25, 134 strong association rules were retained. These rules reveal the interaction mechanisms among key accident contributing factors and clarify how associations among risk factors influence accident probability.

Table 4 presents the top 10 association rules ranked by lift, reflecting significant positive relationships between antecedent and consequent factors. For example, the rule “{x1 Lack of Certified Operation Management} → {a2 Insufficient Personnel Qualification}” has the highest lift value of 3.27. This result indicates that neglecting certification requirements for operational staff often leads to insufficient personnel qualifications, thereby significantly increasing the risk of accidents.

High-Support Association Rules

The top 10% of rules with the highest support were selected from the 134 association rules. As shown in Table 5, the antecedent and consequent of each rule, along with their corresponding support, confidence, and lift values, are presented.

2.: High-Confidence Association Rules

The top 10% of rules ranked by confidence were selected from the association rule set (Table 6).

Table 6 shows that the support of the association rules ranges from 0.07 to 0.12, confidence from 0.82 to 0.94, and lift from 1.32 to 1.60. These metrics indicate that the mined rules possess high statistical significance and practical relevance.

For example, Rule 1 in Table 6, “{a3 Weak Safety Awareness of Operators, z11 Lack of On-Site Safety Management} → {y4 Inadequate Safety Training},” has a confidence of 0.94, indicating that when both a3 and z11 are present, the probability of y4 (inadequate Safety Training) occurring reaches 94%. This finding aligns closely with real-world safety management: enterprises exhibiting both weak personnel safety awareness and poor on-site management typically also have severe deficiencies in safety training. This empirical evidence confirms the validity and reliability of the mined association rules.

3.2. Construction of the Network Model for the Causes of Safety Accidents in Chemical Enterprises

By constructing a complex network model, the interaction relationships among various factors can be effectively characterized, and the key causal elements and their associated mechanisms can be identified. This study, based on the association rules mined by the Apriori algorithm, uses Gephi (version 0.10.1) software to construct a directed weighted network of the causes of chemical accidents (Figure 4). This model quantifies the association strength between factors using the degree of enhancement and identifies core causes and peripheral factors through structural analysis, revealing the interaction mechanisms among the elements in the accident system. In the network diagram, nodes represent the causal elements of the accident, and edges indicate the direct causal relationships based on the association rules. For specific node definitions, please refer to Table 3. The size of the nodes is determined by the eigenvector centrality value, which reflects the relative importance of the nodes in the network.

3.3. Identification of Core and Peripheral Risk Factors for Safety Accidents in Chemical Enterprises

3.3.1. Identification of Core Contributing Factors for Safety Accidents in Chemical Enterprises

Given that the network nodes exhibit a hub-and-spoke distribution characteristic and that core contributing factors play a critical and decisive role in accident occurrence, effectively suppressing these core factors can significantly mitigate the severity of accidents in prevention efforts. Node importance evaluation methods in complex networks primarily encompass three categories: metrics based on neighborhood connections (node degree), metrics based on paths (betweenness centrality), and metrics based on network characteristics (eigenvector centrality). This study integrates these three network structure indicators to quantitatively assess and rank the importance of each causative factor in chemical safety accidents, thereby identifying the core contributing factors for safety accidents in chemical enterprises.

(1): Node Degree

In the network model, the number of connection edges associated with each causative factor node constitutes its degree centrality indicator, which directly reflects the importance of the node within the model structure. Figure 5 illustrates the degree value distribution of each node in the chemical accident causative factor network, and the top ten nodes are selected here for in-depth analysis.

As depicted in Figure 5, ten high-degree contributing factors have been identified in the safety accidents of chemical enterprises, including “inadequate safety education and training”, “failure to implement the primary responsibility for safety production”, and “lack of establishment or enforcement of the hidden danger investigation system”.

(2): Node Betweenness Centrality

The node betweenness centrality index quantifies the extent to which a specific node acts as an intermediary in the shortest paths of the network, and its numerical value directly reflects the node’s influence on network connectivity. As depicted in Figure 6, the top ten nodes with the highest betweenness centrality in the association network of chemical accidents play a critical central role. These nodes are pivotal in the transmission of accident risks. Conducting an in-depth analysis of these high-betweenness nodes can facilitate the identification of key control points for accident prevention.

As shown in Figure 6, in the safety accidents of chemical enterprises, ten accident-causing factors, including “inadequate safety education and training,” “failure to implement the primary responsibility for production safety,” “insufficient supervision by social departments,” and “lack of establishment or enforcement of the hidden danger investigation system,” exhibit high betweenness centrality.

(3): Node Eigenvector Centrality

The eigenvector centrality index evaluates the criticality of a node by considering both the importance of the node itself and its adjacent nodes, thereby reflecting the network characteristic that “the more critical the neighboring nodes, the more important the target node becomes”. As illustrated in Figure 7, the distribution of eigenvector centrality among nodes in the chemical accident causation network is presented, with emphasis on the top ten core nodes.

As depicted in Figure 7, in the safety accidents of chemical enterprises, ten accident-causing factors, including “inadequate safety education and training,” “failure to implement the primary responsibility for production safety,” “lack of establishment or enforcement of the hidden danger investigation system,” and “violation of operating procedures,” exhibit high eigenvector centrality.

By comprehensively considering the three network indicators—node degree, betweenness centrality, and eigenvector centrality—the top 10 factors in each indicator are screened. The causes that appear in all three indicators are identified as the core accident-causing factors. The specific results are presented in Table 7. This multi-dimensional assessment method ensures the comprehensiveness and reliability of the identification of core causes.

3.3.2. Identification of Peripheral Contributing Factors Related to Core Contributing Factors

In safety accidents in chemical enterprises, core contributing factors are often numerous and difficult to control. If these core factors are not properly mitigated, risks may propagate along association pathways to connected peripheral factors, potentially generating new safety hazards. Specifically, ineffective hazard inspection can reduce the likelihood of inspectors detecting unsafe operations, thereby increasing accident risk. Based on the accurate identification of primary contributing factors, it is necessary to systematically analyze related secondary factors and implement preventive measures from a holistic perspective.

Eigenvector centrality can accurately measure the closeness of nodes to core nodes, effectively identifying other nodes strongly associated with the target node. In this study, a local network framework of core contributing factors was constructed based on eigenvector centrality to analyze associated peripheral factors. Taking a4 (Unsafe Operation) as an example, the antecedent and consequent elements of association rules were treated as network nodes, with their logical relationships represented as edges, forming the corresponding network topology. The individual central networks of each core causative factor are illustrated in Figure 8.

To avoid interference from other key factors, only a4, “Illegal operation”, was retained as the core node during network construction. The resulting network is shown in Figure 8a, where node size directly reflects eigenvector centrality. The study identified that the core factor a4 together with the five nodes with the highest eigenvector centrality, a1, a3, z4, y5, and w2, constitute the causative factor set. These associated nodes form the relevant causative cluster for “Unsafe Operation.”

In safety accidents within chemical enterprises, unsafe operations reflect deficiencies in the implementation of the company’s safety management system. Analysis of multiple accident cases indicates that violations of safety procedures by operators are often rooted in the failure to enforce primary safety responsibilities, the absence of an effective system of safety behavior norms, and a disconnect between operating procedures and actual production requirements, with supervision often being merely formal. Specifically, inadequate safety training may leave operators unaware of process risks; performance assessments that prioritize production over safety may indirectly encourage short-cutting operational steps; and ineffective supervision mechanisms allow unsafe behaviors to go undetected and uncorrected. Additionally, the lack of a comprehensive process safety information management system may prevent timely updates of operating procedures, making them incompatible with actual production conditions. These management shortcomings increase the likelihood that employees will overlook safety controls during operations, ultimately leading to accidents caused by unsafe practices.

Therefore, systematic prevention and control of the “Unsafe Operation” causative cluster in chemical enterprises is critical. Strengthening the implementation of safety responsibilities, improving operational procedures, enhancing behavioral safety management, and establishing effective supervision mechanisms can reduce human errors, improve intrinsic safety, and prevent major accidents. Applying the same approach to other core contributing factors allows identification of the associated factor sets for each core element, with detailed results presented in Table 8.

3.4. Construction of Bayesian Network Model Based on Correlation Rules

This study previously applied association rule mining to analyze risk contributing factors in chemical enterprise safety accidents and constructed a complex network based on the filtered results. Building on this, a Bayesian network model was developed by integrating text-mined data with the complex network topology.

The model captures risk propagation mechanisms among contributing factors through three steps: mapping complex network nodes to Bayesian variables, establishing conditional dependencies from association rules, and refining the structure via expert input and parameter learning. This approach preserves network topology while enabling probabilistic inference, allowing quantitative identification of critical risk pathways for accident prevention.

Key thresholds for rule mining were optimized: minimum support = 0.015, minimum confidence = 0.3, and rules limited to a single antecedent. Using lift > 1, 209 statistically significant strong rules were extracted (Table 9). This configuration balances comprehensive pattern capture with noise reduction, ensuring robust and reliable results.

3.5. Bayesian Network Structure Optimization and Learning

3.5.1. Network Structure Optimization Based on Search Scoring

Based on the extracted strong association rules, a Bayesian network model was constructed, where nodes correspond to the antecedents and consequents of the rules, each representing a specific accident-causative factor. Directed edges encode causal relationships, with edge directions determined by the logical structure of the rules—antecedents as parent nodes and consequents as child nodes—thereby forming the complete network topology. Conditional probability tables were initialized using the confidence values of the association rules and further refined with expert knowledge.

Bayesian network structure optimization can be achieved via two main approaches: score-based search and constraint-based methods. Both aim to derive the optimal network topology from observed data. Constraint-based methods, however, involve iterative searches and are computationally intensive, reducing efficiency for large-scale networks. To address this, the present study employs a score-based method, specifically the K2 algorithm, for structure learning. The rationale is twofold: (1) K2 limits the number of parent nodes, reducing search space complexity and accommodating the medium-scale network in this study; (2) it allows the incorporation of expert knowledge as node ordering constraints, complementing domain-specific insights into chemical accident causation.

Score-based algorithms systematically explore candidate network structures, evaluating each using predefined criteria such as BIC (Bayesian Information Criterion), MLE (Maximum Likelihood Estimation), or the K2 score. Interactively comparing candidate scores identifies the network configuration that best balances model complexity and data fit, enhancing generalization. K2’s minimal prior knowledge requirement and ability to learn network structures autonomously without complex assumptions make it practical across various disciplines. The mathematical formulation of K2 is presented in Equation (12):

\max P [(B, D)] = c \prod_{i = 1}^{n} \max [\prod_{j = 1}^{q_{i}} \frac{(r_{i} - 1)!}{(N_{i j} + r_{i} - 1)!}] \prod_{k = 1}^{r_{i}} N_{i j k}!

(12)

where:

P[B,D] denotes the joint probability of the network structure B and the observed data D; c is a normalizing constant; n is the total number of nodes in the network;

q_{i}

is the number of possible parent configurations for node i;

r_{i}

is the number of possible states of node

i

;

N_{i j}

is the total number of observations in which node i’s parents are in their j-th configuration;

N_{i j k}

is the number of observations in which node i takes its k-th value while its parents are in the j-th configuration. This formulation quantifies the fit between the network structure and the observed data, providing a mathematical basis for model evaluation.

The K2 algorithm employs a local search strategy, offering superior computational efficiency compared with other score-based methods for large datasets. In this study, the K2 optimization was implemented using GeNIe software, and the resulting optimized Bayesian network for chemical accidents is shown in Figure 9.

3.5.2. Bayesian Network Parameter Learning

Based on the established association network of accidents, this study employed the GeNIe software platform to perform parameter learning, thereby quantifying the probabilistic dependencies among risk factors and between risk factors and accident outcomes. The training data for learning were derived from prior text mining results and processed into a binary feature matrix: rows correspond to individual accident cases, columns represent contributing factors, and matrix elements take values of 0 or 1, indicating the absence or presence of a factor in a specific accident. This data representation effectively preserves key relational features from the original text, providing a reliable basis for parameter estimation. The Expectation Maximization (EM) algorithm [33] was applied to conduct parameter learning; the primary rationale lies in the fact that the encoded accident text data are incomplete, with some nodes exhibiting missing values or parent-node combinations occurring with insufficient frequency. Direct application of Maximum Likelihood Estimation would result in numerous zero probabilities in the Conditional Probability Tables, thereby compromising the inference stability of the network. The EM algorithm addresses this by estimating the expectations of missing data in the E-step and maximizing the likelihood in the M-step, enabling stable convergence of probability learning even under conditions of incomplete or sparse data. This method is particularly adaptive to structures characterized by “multiple parent nodes and high-dimensional CPTs,” effectively mitigating issues of probability bias or non-convergence caused by data sparsity. As one of the most mature parameter learning methods in engineering safety, reliability analysis, and risk inference, EM provides significant advantages in enhancing model robustness and inference accuracy when dealing with incomplete and uncertain accident data. Consequently, the combined approach of K2 and EM maximizes the utilization of both prior knowledge and textual data structure inherent in this study, ensuring that the final modeling results exhibit high credibility and interpretability in terms of theoretical logic, structural stability, and parameter reliability, ultimately generating the conditional probability tables for all causal nodes and updating the Bayesian network accordingly. Figure 10 shows the Bayesian network after updating the conditional probabilities.

To quantify the uncertainty of the Conditional Probability Table (CPT) parameters, this study obtained the posterior distribution based on the Dirichlet-Multinomial model and employed posterior sampling to calculate the 95% Confidence Interval (CI). Specifically, for each set of parent node values, the conditional probability vector θ was assigned a Dirichlet prior, which, together with the sample frequencies, formed the Dirichlet posterior distribution. Through 10,000 rounds of posterior sampling (with the first 2000 discarded as burn-in), the 2.5% and 97.5% percentiles were taken as the parameter CI. To evaluate the predictive performance of the model, Leave-One-Out Cross-Validation (LOO-CV) was applied to the Bayesian network. Using the validation module in GeNIe, the LOO-CV mode was selected to calculate the prediction accuracy for each node. Detailed validation results are presented in Table 10. This approach ensures the rigor and reliability of the model evaluation.

Table 10 shows that the highest node prediction accuracy reached 0.9060, with most nodes exceeding 0.7. Overall, the network achieved a prediction accuracy of 0.7647, indicating that the constructed Bayesian network model for chemical accidents demonstrates strong predictive performance and is suitable for causal analysis and inference of accident factors.

Sensitivity Analysis

The sensitivity analysis of Bayesian networks quantifies the extent to which changes in parent nodes influence the probabilities of child nodes, thereby enabling the identification of key factors within the model. In the GeNIe software, sensitivity analysis was performed with the Accident node as the target variable. Table 11 presents the variables with a maximum absolute sensitivity index (MSI) greater than 0.01. The analysis results visually illustrate the degree to which different factors affect the probability of accident risk.

It is evident that in the production process of chemical enterprises, employees may trigger subsequent accidents due to insufficient safety protection, violation of operational commands, and other factors. Similarly, organizational influences and regulatory factors also play a critical role in ensuring safety. Equipment defects or malfunctions escalate risks in the chemical production process, potentially leading to severe consequences such as leaks and explosions. Moreover, inadequate equipment management may result in insufficient maintenance, further increasing the likelihood of accidents. The failure to implement primary responsibility for safety production, coupled with insufficient supervision by social departments, introduces additional safety hazards at the management and oversight levels, thereby contributing to accidents. Therefore, it is essential to implement effective regulatory and control measures for these highly sensitive factors.

2.: Path Analysis

Based on the sensitivity analysis results, this study employs the path analysis method to quantify the correlation characteristics among contributing factors. By constructing a structural equation model, the path coefficients of each variable are calculated, and the absolute value of these coefficients directly reflects the intensity of the causal relationships. The statistical test results are presented in Figure 11, where “*”, “**”, and “***” denote significance levels of p < 0.1, p < 0.05, and p < 0.001, respectively, indicating the statistical reliability of each path relationship. This method realizes the quantitative representation of the accident-causing mechanism.

It is evident that there are 13 significant association paths for safety accidents in chemical enterprises. Among these, the effect level (i.e., the product of path coefficients) of the causal path “w1 (Insufficient Supervision) → z3 (Failure to Implement Primary Responsibility) → y6 (Illegal Construction) → a4 (Violation of Operating Procedures) → Accident” is the highest, at 0.0085 (0.024 × 0.427 × 0.239 × 0.2624). This indicates that this path is the key causal path for safety accidents in chemical enterprises, as depicted in Figure 12. Specifically, this path highlights that social department supervision, implementation of the primary responsibility for safety production, on-site safety management, and violations of operating procedures are critical factors in avoiding such accidents.

4. Discussion

This study systematically investigates the risk propagation mechanisms of safety accidents in Chinese chemical enterprises by integrating text mining, complex network analysis, and Bayesian network modeling. Compared to previous qualitative studies based on limited typical cases, the analytical framework of this research overcomes the limitations of traditional linear approaches. Through systematic mining of 422 accident reports, it not only identifies 29 key factors spanning personnel, organizational, and regulatory dimensions but also reveals the nonlinear coupling and risk transmission mechanisms among accident contributing factors. The analysis reveals that the identified accident factors exhibit nonlinear coupling characteristics within a complex network, supporting a systemic rather than single-factor view of accident causation.

By incorporating network centrality indicators, this study not only differentiates core from peripheral contributing factors but also constructs a structured network of accident factors. It finds that insufficient safety training and unfulfilled production safety responsibilities play a dominant role in the risk propagation network, which aligns with the findings of previous research. However, through quantitative analysis using Bayesian networks, this study further reveals that these surface-level factors systematically lead to accidents through the transmission pathway of “regulatory gaps” to “behavioral violations”, achieving a transition from “factor identification” to “mechanism analysis.” Sensitivity analysis confirms the high influence of critical nodes such as violations of command and equipment defects, while the Bayesian network constructed based on strong association rules provides data-driven support for probabilistic inference of accident association paths.

The findings offer a scientific basis for optimizing the allocation of safety resources, ensuring accountability, and improving training systems. In particular, by proposing a “core–peripheral” factor system along with corresponding leading indicators, control measures, and audit standards, this study provides actionable decision-making tools for enterprises to implement targeted risk management under resource constraints. Additionally, the study emphasizes the need to strengthen regulatory supervision and enhance safety governance mechanisms, promoting a paradigm shift in chemical safety management from “compliance-driven” to “performance-driven.” Overall, this study establishes a comprehensive empirical framework that expands the theoretical boundaries of accident causation research and provides novel academic insights and practical guidance for chemical safety management and accident prevention.

5. Conclusions

In this study, a multidimensional causality system covering three dimensions of personnel, organization, and supervision was constructed with 29 factors. Through complex network modeling, it is revealed that factors such as “insufficient safety education” and “failure to perform primary safety duties” play key roles in the risk propagation network, and 13 important association paths are identified through Bayesian network analysis. This study bridges the theoretical and practical gaps in chemical safety research by systematically analyzing the accident causation network and quantifying risk pathways. Theoretically, it enhances the understanding of nonlinear and system-wide interactions in accident causation. Practically, it offers actionable recommendations for enterprises and policymakers to prioritize intervention strategies and optimize safety investments. Future research should investigate the dynamic and situational adaptability of the proposed framework to further improve its applicability.

By integrating text mining, complex network analysis, and Bayesian network modeling, this study systematically reveals the causal structure and critical pathways of safety accidents in Chinese chemical enterprises, providing a data-driven analytical framework and practical insights for chemical safety governance. However, the generalization and application of the research findings require careful consideration of its inherent limitations. First, the empirical foundation of this study relies entirely on publicly available accident investigation reports within China. The conclusions profoundly reflect China’s unique safety management systems, regulatory environment, and cultural context, and their applicability to chemical safety practices in other countries and regions requires further validation. Second, although the text mining method employed efficiently processes unstructured information, it may not fully capture the complex context and implicit correlations of accident causes due to inconsistencies in the quality of original reports and variations in semantic expression. Furthermore, the static association network constructed based on historical data can depict the structural relationships among risk factors but struggles to reflect their dynamic evolution over time or under external interventions. The edge weights in the network rely solely on statistical association strength and do not incorporate domain expert knowledge or calibration with multi-source data, which may oversimplify the practical managerial implications of causal influences. In the Bayesian network analysis, the conditional probability parameters are entirely learned from observational data, which essentially represents statistical association inference rather than rigorous validation of risk propagation mechanisms. Additionally, the binary treatment of node states fails to capture the continuous variation and nonlinear response characteristics of risk factor intensities. Finally, the methodological integration across the three stages—text mining, complex networks, and Bayesian networks—still faces challenges related to information loss and computational efficiency. Further optimization of the integrated framework is needed when dealing with larger-scale, multimodal data. Future research could be expanded in areas such as cross-regional comparisons, dynamic risk evolution modeling, multi-source data fusion, and interventional causal validation to enhance the generalizability, timeliness, and explanatory power of the conclusions, thereby promoting the continuous advancement of chemical safety governance toward intelligent and precise directions.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/safety12010005/s1.

Author Contributions

Conceptualization, Conceptualization, Z.N. and Z.L.; methodology, Z.N.; software, Z.N.; validation, Z.N., Z.L. and M.Z.; formal analysis, Z.N.; investigation, Z.N.; resources, Z.N. and O.M.; data curation, Z.N.; writing—original draft preparation, Z.N.; writing—review and editing, Z.N., Z.L. and O.M.; visualization, Z.N.; supervision, Z.N. and O.M.; project administration, Z.N.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant Nos. 72071096, 71971100), the Key Project of Jiangsu Provincial Social Science Foundation (No. 25GLA002), the Key Project of Philosophy and Social Science Research in Colleges and Universities of Jiangsu Province (No. 2024SJZD048), and the Qing Lan Project of Jiangsu Province.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in the Zenodo repository at https://zenodo.org/records/17512047 (accessed on 16 December 2025), reference number DOI: 10.5281/zenodo.17512046. These data were derived from the following resources available in the public domain: Ministry of Emergency Management of the People’s Republic of China: “https://www.mem.gov.cn/gk/sgcc/tbzdsgdcbg/ (accessed on 10 October 2025)”; Safehoo Case Database: “https://www.safehoo.com/Case/ (accessed on 10 October 2025)”.

Acknowledgments

The authors would like to thank the anonymous reviewers for their invaluable and constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bai, M.; Qi, M.; Shu, C.-M.; Reniers, G.; Khan, F.; Chen, C.; Liu, Y. Why do major chemical accidents still happen in China: Analysis from a process safety management perspective. Process Saf. Environ. Prot. 2023, 176, 411–420. [Google Scholar] [CrossRef]
Chen, C.; Reniers, G. Chemical industry in China: The current status, safety problems, and pathways for future sustainable development. Saf. Sci. 2020, 128, 104741. [Google Scholar] [CrossRef]
Wu, Y.; Fu, G.; Han, M.; Jia, Q.; Lyu, Q.; Wang, Y.; Wu, Z. Comparison of the theoretical elements and application characteristics of STAMP, FRAM, and 24Model: A major hazardous chemical explosion accident. J. Loss Prev. Process Ind. 2022, 80, 104741. [Google Scholar] [CrossRef]
Park, Y.; Park, D.J. System dynamics approach for assessing the performance of safety management systems in petrochemical plants. J. Loss Prev. Process Ind. 2024, 90, 105324. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, Z.; Tao, Y.; Hu, H. Quantitative risk assessment of railway intrusions with text mining and fuzzy Rule-Based Bow-Tie model. Adv. Eng. Inform. 2022, 54, 101726. [Google Scholar] [CrossRef]
Ahadh, A.; Binish, G.V.; Srinivasan, R. Text mining of accident reports using semi-supervised keyword extraction and topic modeling. Process Saf. Environ. Prot. 2021, 155, 455–465. [Google Scholar] [CrossRef]
Li, S.; You, M.; Li, D.; Liu, J. Identifying coal mine safety production risk factors by employing text mining and Bayesian network techniques. Process Saf. Environ. Prot. 2022, 162, 1067–1081. [Google Scholar] [CrossRef]
Xu, H.; Liu, Y.; Shu, C.-M.; Bai, M.; Motalifu, M.; He, Z.; Wu, S.; Zhou, P.; Li, B. Cause analysis of hot work accidents based on text mining and deep learning. J. Loss Prev. Process Ind. 2022, 76, 104747. [Google Scholar] [CrossRef]
Zhang, H.; Geng, H. A methodology to identify and assess high-risk causes for electrical personal accidents based on directed weighted CN. Reliab. Eng. Syst. Saf. 2022, 231, 109027. [Google Scholar] [CrossRef]
Wang, H.; Wei, L.; Wang, K.; Duo, Y.; Chen, C.; Zhang, S.; Su, M.; Zeng, T. Exploring human factors of major chemical accidents in China: Evidence from 160 accidents during 2011–2022. J. Loss Prev. Process Ind. 2024, 89, 105279. [Google Scholar] [CrossRef]
Yang, J.F.; Wang, P.C.; Liu, X.Y.; Bian, M.C.; Chen, L.C.; Lv, S.Y.; Tao, J.F.; Suo, G.Y.; Xuan, S.Q.; Li, R.; et al. Analysis on causes of chemical industry accident from 2015 to 2020 in Chinese mainland: A complex network theory approach. J. Loss Prev. Process Ind. 2023, 83, 105061. [Google Scholar] [CrossRef]
Yang, R.; Zhang, M.; Zhang, Y. A system dynamics model based on ISM for risk perception in emergency by employees in chemical industrial parks. Sci. Rep. 2024, 14, 25767. [Google Scholar] [CrossRef]
Mocellin, P.; Pilenghi, L. Semi-quantitative approach to prioritize risk in industrial chemical plants aggregating safety, economics and ageing: A case study. Reliab. Eng. Syst. Saf. 2023, 237, 109355. [Google Scholar] [CrossRef]
Tao, R.; Li, D.; Shi, H.; Pang, S.; Lin, Y.; Li, C. A quantitative analysis method based on network evolution for risk factors of safety production in chemical enterprises. Sci. Rep. 2025, 15, 8173. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.; Bai, M.; Wang, X.; Gai, J.; Shu, C.-M.; Roy, N.; Liu, Y. Thermal runaway incidents-a serious cause of concern: An analysis of runaway incidents in China. Process Saf. Environ. Prot. 2021, 155, 277–286. [Google Scholar] [CrossRef]
Jiang, W.; Zhang, M.; Li, M.; Xu, Y. Study on the occurrence path and prediction of unsafe behaviours of hazardous chemical storage personnel. J. Loss Prev. Process Ind. 2025, 96, 105653. [Google Scholar] [CrossRef]
Guo, X.; Ji, J.; Khan, F.; Ding, L.; Yang, Y. Fuzzy Bayesian network based on an improved similarity aggregation method for risk assessment of storage tank accident. Process Saf. Environ. Prot. 2021, 149, 817–830. [Google Scholar] [CrossRef]
Mkrtchyan, L.; Straub, U.; Giachino, M.; Kocher, T.; Sansavini, G. Insurability risk assessment of oil refineries using Bayesian Belief Networks. J. Loss Prev. Process Ind. 2022, 74, 104673. [Google Scholar] [CrossRef]
Luo, X.; Feng, X.; Ji, X.; Dang, Y.; Zhou, L.; Bi, K.; Dai, Y. Extraction and analysis of risk factors from Chinese chemical accident reports. Chin. J. Chem. Eng. 2023, 61, 68–81. [Google Scholar] [CrossRef]
Jing, S.; Liu, X.; Gong, X.; Tang, Y.; Xiong, G.; Liu, S.; Xiang, S.; Bi, R. Correlation analysis and text classification of chemical accident cases based on word embedding. Process Saf. Environ. Prot. 2022, 158, 698–710. [Google Scholar] [CrossRef]
Wang, F.; Gu, W.; Bai, Y.; Bian, J. A method for assisting the accident consequence prediction and cause investigation in petrochemical industries based on natural language processing technology. J. Loss Prev. Process Ind. 2023, 83, 105028. [Google Scholar] [CrossRef]
Zhou, Z.; Huang, J.; Lu, Y.; Ma, H.; Li, W.; Chen, J. A new text-mining–Bayesian network approach for identifying chemical safety risk factors. Mathematics 2022, 10, 4815. [Google Scholar] [CrossRef]
Macêdo, J.B.; das Chagas Moura, M.; Aichele, D.; Lins, I.D. Identification of risk features using text mining and BERT-based models: Application to an oil refinery. Process Saf. Environ. Prot. 2022, 158, 382–399. [Google Scholar] [CrossRef]
Liu, G.; Boyd, M.; Yu, M.; Halim, S.Z.; Quddus, N. Identifying causality and contributory factors of pipeline incidents by employing natural language processing and text mining techniques. Process Saf. Environ. Prot. 2021, 152, 37–46. [Google Scholar] [CrossRef]
Li, J.; Yang, Z.; He, H.; Guo, C.; Chen, Y.; Zhang, Y. Risk causation analysis and prevention strategy of working fluid systems based on accident data and complex network theory. Reliab. Eng. Syst. Saf. 2024, 252, 110445. [Google Scholar] [CrossRef]
Li, Y.; She, Y.; Shi, Y.; Ding, R. Modeling and analysis of open-pit coal mine accident causation based on directed weighted network. Reliab. Eng. Syst. Saf. 2025, 261, 111141. [Google Scholar] [CrossRef]
Hartmann, B.; Sugár, V. Searching for small-world and scale-free behaviour in long-term historical data of a real-world power grid. Sci. Rep. 2021, 11, 6575. [Google Scholar] [CrossRef]
Chen, Y.; Jiang, A. Spatial characteristics and complexity of the urban economic network structure based on the secure Internet of Things. Sustain. Comput. Informatics Syst. 2022, 35, 100729. [Google Scholar] [CrossRef]
Xuan, Y.; Yu, X.; Wang, Y.; Liu, D.; Ma, Q.; Xue, X. A self-evolving network-based artificial society model for the experiment analysis of complex social system. Inf. Sci. 2023, 640, 119057. [Google Scholar] [CrossRef]
Liu, W.; Huang, X.; Liang, B. Resilience assessment of urban connected infrastructure networks. Sci. Rep. 2025, 15, 19770. [Google Scholar] [CrossRef] [PubMed]
Wandelt, S.; Shi, X.; Sun, X. Estimation and improvement of transportation network robustness by exploiting communities. Reliab. Eng. Syst. Saf. 2021, 206, 107307. [Google Scholar] [CrossRef]
Babakov, N.; Sivaprasad, A.; Reiter, E.; Bugarín-Diz, A. Reusability of Bayesian Networks case studies: A survey. Appl. Intell. 2025, 55, 417. [Google Scholar] [CrossRef]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. 1977, 39, 1–22. [Google Scholar] [CrossRef]

Figure 1. Perplexity curve.

Figure 2. Research flowchart.

Figure 3. Change rule of the number of association rules with the degree under different confidence and support levels.

Figure 4. Association network model of safety accidents in chemical enterprises.

Figure 5. Nodal degree map of the chemical safety accident association network.

Figure 6. The betweenness centrality diagram of network nodes causing chemical safety accidents.

Figure 7. Centrality map of node feature vectors of chemical safety accident causation network.

Figure 8. Network of individual centers for each core causal factor. (a) “Unsafe Acts” Individual-Centered Network. (b) “Insufficient Supervision by Social Departments” Individual-Centered Network. (c) “Equipment Defects or Failures” Individual-Centered Network. (d) “Inadequate Safety Training” Individual-Centered Network. (e) “Imperfect/Inadequate Implementation of Safety Management System” Individual-Centered Network. (f) “Failure to Implement Primary Responsibility for Safety Production” Individual-Centered Network. (g) “Hidden Hazard Investigation System Not Established/Improperly Implemented” Individual-Centered Network.

Figure 9. Bayesian network model of safety accidents in chemical enterprises.

Figure 10. Bayesian network diagram of accident contributing factors after updating conditional probabilities.

Figure 11. Causal path of safety accidents in chemical enterprises. Notes: *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively.

Figure 12. Accident risk transmission path.

Table 1. Comparison of word segmentation accuracy.

Item	Precision	Recall	F1-Score
Jieba	0.923	0.912	0.917
HanLP	0.901	0.884	0.892
PKUSeg	0.873	0.861	0.867
THULAC	0.882	0.815	0.847

Table 2. Vocabulary of high TF-IDF (partly).

Word	TF-IDF	Word	TF-IDF	Word	TF-IDF
Operations	0.4144	Inspection	0.0835	Duty	0.0590
Accident	0.2344	Supervision	0.0812	Workshop	0.0590
Space	0.1992	Requirement	0.0797	Supervision	0.0574
Personnel	0.1900	Responsibility	0.0774	Organization	0.0552
Production	0.1762	Risk	0.0751	Investigation	0.0544
Implementation	0.1440	Violation	0.0705	Equipment	0.0544
Emergency	0.1402	Gas	0.0659	Inspection	0.0529
Regulation	0.1065	Poisoning	0.0651	Education	0.0521
Management	0.0919	Safety	0.0613	Urging/Supervision	0.0498
Training	0.0896	Employe	0.0605	Hidden Hazard	0.0498

Table 3. Contributing factors of safety accidents in chemical enterprises.

First-Level Cause	Node	Secondary Cause
Personnel factor	a1	The wear/tools do not meet the requirements
	a2	The qualifications of the workers are insufficient
	a3	The safety awareness of the workers is weak
	a4	Illegal operation
	a5	The workers lack knowledge or skills
Resource management loopholes	x1	The management of holding certificates for employment is lacking
	x2	The full-time safety management personnel have not been fully staffed
	x3	Equipment defect or malfunction
	x4	Absence of equipment management
Poor organizational atmosphere	y1	Enterprises do not attach importance to safety
	y2	Illegal organization of production
	y3	Managers lack safety awareness
	y4	Safety education and training are inadequate
	y5	Command in violation of regulations
	y6	Illegal construction
Organizational process loopholes	z1	The safety management system is imperfect/not implemented properly
	z2	The responsibilities for safety management are unclear
	z3	The primary responsibility for work safety has not been implemented
	z4	The safety operation procedures are incomplete/not implemented properly
	z5	The hidden danger investigation system has not been established/has not been implemented properly
	z6	Risk control and identification were inadequate
	z7	No safety and technical briefing was conducted
	z8	The management of hot work operations is inadequate
	z9	Emergency management is inadequate
	z10	No warning signs have been set up
	z11	The on-site safety management is lacking
	z12	The process technology regulations are incomplete/not implemented
Insufficient supervision	w1	The supervision of the social department is inadequate
Insufficient supervision	w2	The safety management personnel failed to supervise properly

Table 4. Chemical safety accident-causative high-lift rules.

No.	Antecedent	Consequent	Support	Confidence	Lift
1	x1	a2	0.08	0.68	3.27
2	y4, y3	z3, a4	0.07	0.61	2.17
3	x4	x3	0.11	0.52	2.14
4	z3, z4, w1	z1	0.07	0.70	2.06
5	a4, y3	z3, y4	0.07	0.70	1.92
6	z4, z1	z3, w1	0.07	0.53	1.90
7	z4, z1	y4, w1	0.07	0.53	1.88
8	z3, z4, z1	w1	0.07	0.79	1.86
9	y4, z4, w1	z1	0.07	0.62	1.84
10	x1	a2	0.08	0.75	1.76

Table 5. High-support rules for causation of safety accidents in chemical companies.

Item	Antecedent	Consequent	Support	Confidence	Lift
1	a3	y4	0.21	0.81	1.30
2	y4, w1	z3	0.19	0.67	1.26
3	z1	w1	0.18	0.55	1.28
4	z3, a4	w1	0.18	0.63	1.48
5	a4, w1	z3	0.18	0.71	1.33
6	z3, w1	a4	0.18	0.62	1.26
7	z5, w1	z3	0.17	0.74	1.38
8	z3, z5	w1	0.17	0.55	1.28
9	z3, z1	w1	0.14	0.66	1.54
10	z3, w1	z1	0.14	0.51	1.51
11	z1, w1	z3	0.14	0.78	1.46
12	y3	z3	0.13	0.75	1.42
13	y4, z1	w1	0.13	0.56	1.31

Table 6. High-confidence rules for the causation of safety accidents in chemical enterprises.

Item	Antecedent	Consequent	Support	Confidence	Lift
1	a3, z11	y4	0.10	0.94	1.52
2	z3, a3	y4	0.12	0.86	1.39
3	y4, a4, y3	z3	0.07	0.85	1.60
4	z3, a4, y3	y4	0.07	0.85	1.37
5	z3, a4, a3	y4	0.07	0.85	1.37
6	a4, a2	y4	0.09	0.85	1.37
7	a4, a3	y4	0.12	0.84	1.36
8	y4, y3	z3	0.10	0.84	1.58
9	z4, z11	y4	0.10	0.84	1.36
10	z3, x2	y4	0.07	0.82	1.32
11	a4, z11, w1	y4	0.07	0.82	1.32
12	a4, y3	z3	0.08	0.82	1.54
13	a4, y3	y4	0.08	0.82	1.32

Table 7. Core causes of safety accidents in chemical enterprises.

Degree	Betweenness Centrality	Characteristic Centrality	Core Cause Node	Core Cause
y4	y4	y4	y4, z3, z5, a4, w1, x3, z1	Equipment defects or malfunctions, inadequate safety education and training, incomplete or poorly implemented safety management systems, failure to fulfill the main responsibility for production safety, lack of establishment or inadequate implementation of hidden danger investigation systems, illegal operations, and inadequate supervision by social departments
z3	z3	z3
z5	w1	z5
a4	z5	a4
w1	a4	w1
z1	z1	z1
z4	a2	z4
w2	x3	x3
x3	x1	a2
x4	x2	x1

Table 8. A collection of factors contributing to safety accidents in chemical enterprises.

Core Cause	Marginal Causes Associated with Core Causes
Illegal operation	Wrong PPE/tools, poor risk awareness, missing/inadequate procedures, unsafe commands, lack of supervision
Inadequate Social Department Supervision	Understaffed/untrained inspectors, outdated procedures, poor emergency readiness, missing site warnings, unpermitted work
Equipment defect or malfunction	Poor maintenance records, unclear ownership, no failure response steps, missing risk controls, ignored leaks/malfunctions
Inadequate Safety Education and Training	Wrong PPE/tools, poor emergency skills, no site safety officer, uncertified operators, rushed commands
Imperfect/Unimplemented Safety Management System	Rushed work permits, ignored safety advice, no hiring checks, missing signs/warnings, repeat equipment failures
Unfulfilled Primary Responsibility for Work Safety	Untrained managers, skipped safety steps, blocked safety reports, poor certificate control, ignored equipment issues
Unestablished/Unimplemented Hidden Danger Investigation System	Fake maintenance logs, missing safety briefings, expired emergency gear, unskilled operators, unsafe construction plans

Table 9. Apriori algorithm mining results (partial).

Rule	Support (%)	Confidence (%)	Lift
{x1} ≥ {a2}	7.84	67.57	3.27
{z7} ≥ {a1}	3.13	33.33	3.13
{y1} ≥ {y3}	4.39	41.18	2.30
{x4} ≥ {x3}	10.66	52.31	2.14
{z10} ≥ {z12}	5.02	32.00	2.04
{z7} ≥ {z10}	2.82	30.00	1.91
{z12} ≥ {x4}	5.64	36.00	1.77
{z2} ≥ {y3}	4.08	30.95	1.73
{y6} ≥ {y2}	5.02	34.78	1.68
{z8} ≥ {w2}	5.33	44.74	1.68
{y1} ≥ {w1}	7.52	70.59	1.66
{z8} ≥ {a2}	4.08	34.21	1.65
{x2} ≥ {z9}	4.70	35.71	1.60
{y1} ≥ {x4}	3.45	32.35	1.59
{z2} ≥ {w2}	5.33	40.48	1.52
{x2} ≥ {a2}	4.08	30.95	1.50
{a2} ≥ {y2}	6.27	30.30	1.46
{y2} ≥ {a3}	6.27	30.30	1.46
{z2} ≥ {x3}	4.70	35.71	1.46
{a1} ≥ {z9}	3.45	32.35	1.45
…	…	…	…

Table 10. Bayesian network model cross-validation results.

Node	Prediction	Node	Prediction	Node	Prediction
a1	0.8934	x2	0.8683	z3	0.5329
a2	0.7931	x3	0.7555	z4	0.6301
a3	0.7398	x4	0.7962	z5	0.5047
a4	0.5078	y1	0.8934	z6	0.7085
a5	0.8339	y2	0.7931	z7	0.9060
w1	0.5737	y6	0.8553	z11	0.6364
w2	0.7335	z1	0.6614	z12	0.8433
x1	0.8840	z2	0.8683	……	……

Table 11. Sensitivity analysis results.

Node	MSI	95% CI	Node	MSI	95% CI
y5	0.153	[0.142, 0.165]	a5	0.038	[0.032, 0.045]
x3	0.108	[0.098, 0.118]	a4	0.037	[0.031, 0.043]
z6	0.105	[0.096, 0.114]	w1	0.034	[0.028, 0.040]
x4	0.093	[0.085, 0.101]	z5	0.031	[0.026, 0.036]
y6	0.071	[0.063, 0.079]	z9	0.024	[0.020, 0.029]
z3	0.067	[0.059, 0.075]	z2	0.018	[0.015, 0.022]

Notes: MSI is MonteCarlo Sampling Interval; CI is confidence interval.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ni, Z.; Li, Z.; Zhang, M.; Morake, O. Identification of Key Contributing Factors and Risk Propagation Paths in Safety Accidents at Chinese Chemical Enterprises. Safety 2026, 12, 5. https://doi.org/10.3390/safety12010005

AMA Style

Ni Z, Li Z, Zhang M, Morake O. Identification of Key Contributing Factors and Risk Propagation Paths in Safety Accidents at Chinese Chemical Enterprises. Safety. 2026; 12(1):5. https://doi.org/10.3390/safety12010005

Chicago/Turabian Style

Ni, Zhiheng, Zhen Li, Mingyu Zhang, and Otsile Morake. 2026. "Identification of Key Contributing Factors and Risk Propagation Paths in Safety Accidents at Chinese Chemical Enterprises" Safety 12, no. 1: 5. https://doi.org/10.3390/safety12010005

APA Style

Ni, Z., Li, Z., Zhang, M., & Morake, O. (2026). Identification of Key Contributing Factors and Risk Propagation Paths in Safety Accidents at Chinese Chemical Enterprises. Safety, 12(1), 5. https://doi.org/10.3390/safety12010005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification of Key Contributing Factors and Risk Propagation Paths in Safety Accidents at Chinese Chemical Enterprises

Abstract

1. Introduction

2. Research Methods and Data Selection

2.1. Complex Networks

2.2. Bayesian Network

2.3. Data Sources

2.4. Element Selection

2.4.1. Text Preprocessing and Word Segmentation

2.4.2. Feature Selection for Text Analysis

2.4.3. Estimation of the Optimal Number of Topics

2.4.4. Causal Factor Identification Results

2.5. Flowchart

3. Results

3.1. Mining Association Rules of Contributing Factors in Chemical Safety Accidents

3.2. Construction of the Network Model for the Causes of Safety Accidents in Chemical Enterprises

3.3. Identification of Core and Peripheral Risk Factors for Safety Accidents in Chemical Enterprises

3.3.1. Identification of Core Contributing Factors for Safety Accidents in Chemical Enterprises

3.3.2. Identification of Peripheral Contributing Factors Related to Core Contributing Factors

3.4. Construction of Bayesian Network Model Based on Correlation Rules

3.5. Bayesian Network Structure Optimization and Learning

3.5.1. Network Structure Optimization Based on Search Scoring

3.5.2. Bayesian Network Parameter Learning

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI