3.1. Extracting Causal Chains to Construct Causal Networks
In the field of coal mine safety management, the accident cause chain is an important tool for revealing the causes and processes of coal mine accidents. By analyzing 101 complete accident reports collected, based on the characteristics of the overall mining system in the mining area and in combination with some classic theories of accident causes, various factors leading to the accidents can be more clearly identified and understood. This study employs the selected accident data for investigation primarily because the publicly available accident investigation reports comprehensively encompass the critical contributing factors governing the occurrence and progression of accidents, featuring well-documented information and strong representativeness. Through systematic collation, induction, and in-depth analysis of multiple accident cases, the core causative factors leading to accidents can be effectively identified. Furthermore, by reconstructing the entire sequence of accident initiation and evolution, the dominant accident causation chains can be established. Concurrently, during the extraction and definition of accident causes, rigorous screening and deduplication have been implemented to eliminate redundant information to the greatest extent. This mitigates potential computational redundancy in modeling and search procedures caused by repetitive content, thereby ensuring favorable convergence speed and stability throughout subsequent analytical processes. The specific process is shown in
Figure 2.
Firstly, in the construction of the accident cause chain, all kinds of accident factors are classified and divided into four system factors, including human factors, physical factors, environmental factors and management factors, as shown in
Table 1.
Then, the occurrence of an accident chain is caused by the mutual influence and interaction between a certain event and the surrounding disaster-causing environment and other causes. Similarly, the occurrence of an accident will also have an impact on the environment, creating conditions for the occurrence of other accidents. This chain reaction between the event, the disaster-causing environment, and the causes forms the accident chain. Through manual reading, the accident cause chains in each accident report are extracted in order to explain as much as possible why this accident occurred, as shown in
Table 2. Since a gas explosion accident is ultimately caused by a specific factor, this final cause is taken as the endpoint of each accident causation chain in this paper. For each accident causation chain, the analysis is conducted based on the accident occurrence process and its causal factors. The length of a causation chain is determined by the accident itself, so the lengths may vary slightly across chains. Statistical analysis shows that the maximum chain length is 9, the minimum is 4, and the average length is approximately 6.
After comprehensively combing and systematically sorting the cause chains of various accidents mentioned above, further in-depth analysis of the data was carried out, and finally the cause–cause directed co-word matrix was successfully constructed. As shown in
Table 3, the key to the construction of this matrix is to quantify the relationship between each cause, which helps to reveal the role of different causes in the process of accidents and their interaction patterns. By counting the co-occurrence frequency of each cause with other factors in the accident chain, the interaction and influence between each cause and other factors can be more clearly identified. Further analysis showed that the co-occurrence frequency of some causes was higher, indicating that they might be more closely related in the accident, while some causes might have relatively weak links with other factors. In order to visualize these associations, Gephi software was used to visualize the above data, and an accident cause association network diagram was drawn, as shown in
Figure 3.
In this figure, the causes are marked with different colors according to the categories, which facilitates the intuitive identification of the relationship and influence between different categories. The color coding makes it easy to distinguish the causes of similar categories and helps to understand their role in the accident. This visualization method not only shows the location and connection of each cause, but also highlights their integrity and dependence in the whole network. Through the network diagram, the interaction between each causative factor and other factors and their importance in the causal chain can be clearly seen. At the same time, the diagram reveals the relationship between different categories of factors, which helps to understand the mechanism of the accident and provides theoretical support for subsequent preventive measures.
3.2. Analysis of Node Indicators
In the study of complex networks, research on static topological indices is of great significance. Among them, the degree value is a fundamental indicator for measuring the degree of node connectivity. Specifically, the degree value of a given node is defined as the total number of edges directly connected to it. When the considered network is directed, the concept of degree is further refined into two different components: out-degree and in-degree, as shown in
Figure 4. The out-degree refers to the number of directed edges originating from this node, thereby representing the connection situation between this node and other nodes in the network. Conversely, the in-degree refers to the number of directed edges pointing to this node, reflecting the degree of connection of other nodes to it. This distinction between out-degree and in-degree is particularly important in the analysis of directed networks, as it enables a more detailed understanding of the influence, functional role, and position of the node within the overall topological structure of the network.
The total degree value of the causal network shows the distribution of connections and influences among each node in the network, as shown in
Figure 4. By analyzing the total degree value, the importance of each node in the network can be revealed, which is helpful for identifying key nodes and their roles in the overall network structure. Nodes with higher total degree values usually play more crucial roles.
Figure 4 presents the static topological index analysis of the network. This figure illustrates the four key indicators for each node. From the above chart, it can be seen that, apart from the ultimate factor at the end of the accident chain, nodes such as M4, E3, E7 and M1 have a higher out-degree, while nodes such as M1, H2 and W1 have a larger in-degree. Specific analysis indicates that the out-degree of E7 is 21, while the in-degree is only 8, which means it may affect 21 causes, thus requiring attention. The highest in-degree of H2 is 27, indicating that there are 27 causes that may lead to H2, and the transmission path also needs to be noted. Nodes with a high total degree involve aspects such as personnel, objects, environment and management, suggesting that the gas explosion accident is caused by the combination of multiple factors and paths. Therefore, prevention and control should adopt a comprehensive approach.
The figure also includes the betweenness centrality index, which is used to assess the relative importance of nodes or edges in the network, reflecting the critical role of nodes in information transmission. It reveals the role that nodes play in influencing the efficiency of the network. Nodes with high betweenness centrality are usually the core of the network, have strong influence, and are indispensable parts.
Figure 4 shows the distribution of betweenness centrality for each node.
Through in-depth analysis of the network structure, it can be concluded that the betweenness centrality of M3 and E3 nodes is higher than that of other nodes, indicating that their influence in the whole network is more prominent. This means that M3 and E3 not only play a key role in connecting other nodes in the network, but also may be the main hub of information flow, and any path or information transmission related to these two nodes may have a significant impact on the whole network. At the same time, M1 and W2 are also key connecting points. These nodes have a significant impact on information flow and network connectivity, so they must be prioritized for risk management and control.
It can be seen from the clustering coefficient graph of the factor network nodes in
Figure 4 that the clustering coefficient of some nodes, such as E1, E15, and W3, is high, close to 0.8, indicating that the neighboring nodes of these nodes are closely related to each other, and the network aggregation degree is high, reflecting the close network relationship around them. Meanwhile, for nodes such as the H3 and M11, the clustering coefficient is low, meaning fewer of the other nodes are connected, and the connection with the factor of the node in the network is more dispersed, suggesting that the impact of these nodes is relatively independent. A node with a low clustering coefficient also indicates that the relationships among the other causes connected to it are loose, and the influence of such nodes is relatively weak.
Through calculation, the average clustering coefficient of this network is 0.428, which indicates that there is a certain degree of freedom or dispersion between each node. However, although the nodes maintain a certain degree of independence, they are still associated with each other. Specifically, although each node and other nodes around the connection relationship are not too dense, and instead are spread over the whole network, they still maintained certain interactions and relationships between themselves. The average clustering coefficient indicates that the nodes in the causal network are not completely isolated; rather, they are interconnected to a certain extent, thereby forming a specific network structure and interactions. Consequently, the nodes of the network not only maintain a certain degree of flexibility but also exhibit their intrinsic connections.
3.3. Overall Index Analysis of the Network
- (1)
Average path length
According to the calculation results of Gephi, the average path length of the network model is 2.329, which reflects that there is a short connection distance between nodes, indicating that the propagation efficiency of the cause in the whole network is very significant. On average, any two nodes only need 2 or 3 connections to achieve the transmission of information or influence, which means that the cause can quickly spread from the source to the key factors that eventually lead to the accident. Due to the short path, the obstruction and attenuation in the propagation process are greatly weakened, which significantly enhances the overall connectivity and propagation ability of the network. Therefore, the network exhibits the structural characteristics of high transmission rate and strong connectivity. In such a highly connected topology, the disturbance of any local node or link may trigger cascading reactions of non-adjacent nodes within a very short path, which will cause systemic chain effects and eventually lead to disasters or accidents.
- (2)
Network density
According to the Gephi platform, the overall density of the network is 0.239, which belongs to the medium level. This result shows that the network has a certain connectivity efficiency as a whole, and there are direct connections between some of the accident cause nodes, so that the causal influence can be rapidly transmitted in the local scope. At the same time, the connection between the relationships does not reach a high concentration level; some nodes still lack a tight connection between cause and result in the network, which in a global perspective, shows the characteristics of a relatively sparse structure. Such structural characteristics indicate that the evolution process of accidents does not rely on extensive and random connections, but tends to advance in a local subnetwork composed of key nodes along a number of relatively fixed paths.
Due to the limitation of the overall connection of the network and the obvious directness of the path, it is a structurally feasible and operatively targeted prevention and control strategy to effectively cut off the evolution chain of the accident by accurately identifying and removing the specific causes on the critical path in the practice of risk governance.
3.5. Model Robustness Analysis
In the research framework of robustness analysis, an important application is to identify and analyze the complex association between various causes of gas explosion accidents. By introducing network seepage theory, a propagation model of the causes of gas explosions can be constructed, and then the changes in network structure under different attack strategies can be simulated. Specifically, there are two typical types of attacks on nodes in a network. One is a random attack, which removes nodes or edges indiscriminately. The other is a deliberate attack, which selectively removes critical nodes or links based on their importance. Through these two attack methods, it can effectively evaluate whether the remaining nodes can still maintain the basic functions of the network and whether the overall connectivity of the network can be maintained after the failure of some cause nodes.
In order to systematically evaluate the robustness and vulnerability of the network, this section designs two types of simulation experiments: random attack and deliberate attack. The specific experimental scheme is as follows: First, the random attack simulates a random failure in the network, and all the nodes in the model are randomly sampled many times to realize the attack. Secondly, in the deliberate attack, according to the topological structure characteristics of the network, two strategies are used. One is a degree-based attack, which preferentially removes the nodes with the highest degree values in the network. The other is a betweenness-based attack, which preferentially removes the nodes with the highest betweenness centrality. The specific indicators and their core characteristics based on these two deliberate attack strategies are shown in
Table 5.
By comparing the degradation patterns of network performance under random and deliberate attacks, this study aims to accurately reveal the potential risks of the network. Based on the identified attack methods, a dynamic simulation model is constructed based on the PyCharm (v2025.1) platform. The specific steps and logical relationships of the entire workflow are shown in
Figure 5. The ultimate goal of this process is to identify and determine the key node groups that meet specific requirements through a series of operations. Regarding the complexity issues involved in the methods adopted in this paper, it should be noted that the number of nodes in the network model constructed in this paper is relatively small and does not reach the scale of large-scale network models. Under this premise, the overall computational complexity of this algorithm is within a controllable and reasonable range, and will not cause excessive consumption of computing resources or excessively long running time. Therefore, the current method and its complexity level are suitable for the small-scale network model established in this paper and can meet the requirements for conducting relevant analyses.
Firstly, the independent network efficiency evolution graph of each attack mode is generated by executing the customized analysis code.
Then, by extracting and integrating the core data in the graph, the comprehensive comparison chart as shown in
Figure 6 is finally generated. The chart realizes the horizontal comparison of different attack effects, clearly shows that deliberate attack is the dominant factor leading to the sharp decline in network efficiency, and provides key evidence for the formulation of risk prevention strategies.
Figure 6 presents the simulation results, which clearly show that random attack damage to the network performance is far inferior to the two kinds of deliberate strategy. In the figure, the Y-axis represents network efficiency. Its primary function is to assess whether the remaining nodes can maintain normal operational performance and ensure network connectivity when the causal nodes fail. The actual situation shows that by taking certain measures to control some accident causes—removing specific nodes representing those causes and breaking the accident chain—the probability of accidents can be reduced. This comparison strongly confirms the effectiveness and importance of the attack method based on degree and betweenness in accurately identifying critical nodes in the network. In both of the deliberate attack scenarios, interruptions and gradual failures of 24 to 26 nodes led to a significant reduction in network efficiency. In this case, the network efficiency dropped to 47 ± 2% of its initial value, indicating a decline in the overall operational performance of the network.
However, under the same conditions, the network efficiency can still maintain at about 70% under random attacks, which shows the inherent robustness of the network to random failures. From the perspective of security management, the above phenomenon has profound implications: if only random or blind control measures are taken, it is difficult to effectively cut off the critical path in the accident-causing chain, and the risk can still circulate in the network, resulting in a high probability of accidents. On the contrary, the simulation results show that the whole risk propagation network can be effectively disintegrated by implementing the key control of the identified key causative factors (that is, the “important nodes” in the network), which provides a solid theoretical basis and quantitative support for the safety management strategy of “grasping the key and ensuring the whole situation” in practice.
Comparing three different attack methods, the effect from best to worst is: deliberate attack based on betweenness centrality > deliberate attack based on degree > random attack. Further analysis shows that both deliberate attacks are better than random attacks, but it is easy to find that although the effect of deliberate attacks is stronger than random attacks, it still needs to control more nodes to achieve effective control. Therefore, in order to achieve the purpose of accurate control, the genetic algorithm is used to optimize the established complex network model structure to increase the network invulnerability. The specific indicators are shown in
Table 6.
At the same time, the key nodes are fully mined in the optimization process to provide a basis for key control and reduce the probability of accidents.
The first step is to optimize for betweenness centrality. The core goal of the genetic algorithm is to optimize the attack sequence of nodes with high betweenness centrality. By simulating the “selection–cross–mutation” process of biological evolution, the node removal sequence (attack sequence) that can make the network efficiency decline the fastest is found. The specific implementation steps are as follows:
The first is the label coding based on the “attack node sequence”. In the genetic algorithm, each “individual” corresponds to an “attack sequence”, and the sequential coding is used, that is, each individual is a list of fixed length, the elements in the list are “node labels to be removed”, and the order of the list represents the “order of node removal”. A single individual is a list of length num (num is the total number of nodes to be removed, calculated by the number of nodes in the network × removal ratio), and each element in the list is a node label in the network. There are no duplicate nodes in the list, the same node cannot be attacked repeatedly, and all nodes are from the node set of the original network. Then, population initialization is weighted sampling that favors nodes with high betweenness. In the initialization process, nodes with high betweenness centrality are preferentially selected to ensure that the initial population has a good basis for optimization. If the network has betweenness centrality, the relative betweenness of all nodes is calculated first, and the weight difference in high-betweenness nodes is enlarged by the power of three, and the weight lower limit is set to 0.1 to avoid the low-betweenness nodes being completely ignored. If the betweenness centrality is invalid, such as if there are too few nodes in the network, equal weight sampling is used. Then, each individual is generated by weight sampling, and nodes are extracted from the network nodes without replacement to form an attack sequence.
The third step is to design the appropriate fitness function. The fitness function is the core component of the genetic algorithm, which is used to quantify the pros and cons of each attack sequence. The goal of this paper is to find the optimal attack sequence, so the higher the fitness value, the stronger the damage ability of the sequence to the network efficiency, and the more in line with the target expectation. Therefore, the core of the fitness function designed in this paper is a comprehensive score of “efficiency decline + high betweenness node reward”.
In the iterative process of the genetic algorithm, selection operation is the key link to realize the “survival of the fittest” of the population. Its core goal clearly points to “retain excellent individuals with high fitness and eliminate inferior individuals with low fitness”, so as to screen out the high-quality basis for population evolution. In order to balance the necessary randomness in the evolution process (to avoid premature homogenization of the population) and the convergence efficiency (to ensure that the algorithm steadily moves closer to the optimal solution), the two mechanisms of probabilistic selection and deterministic retention are combined in this operation.
Among them, the deterministic retention mechanism is the core defense line to ensure that “high-quality genes do not lose”. It will directly screen out the best individual with the highest fitness in the current population, and directly copy it into the next generation population without any random screening process. The key purpose of this design is to avoid the subsequent crossover, mutation and other random operations to destroy the excellent individual structure formed, and to ensure that the population always retains the optimal evolution results at the current stage. The probabilistic selection mechanism plays an important role in “maintaining the diversity of the population”. It calculates the selection probability according to the fitness value of the individual (usually, the individual with higher fitness is more likely to be selected, such as in the roulette wheel selection method or tournament selection method), and randomly selects two individuals from the screened population as the parent and the mother, respectively. The two parent individuals will be used as the “gene donors” for the subsequent crossover operation, providing the genetic basis for the generation of new individuals, and avoiding the population falling into a single evolutionary path by random selection.
The crossover operation, which follows the selection operation, is the core link of “gene recombination”, which aims to fuse the excellent traits (i.e., attack strategies) of the parent and the mother to generate new individuals. This operation consists of two steps: point selection and gene recombination. Subsequently, sequence repair is applied to remove duplicate nodes in the sequence, ensuring the validity of the new individual sequence. After the crossover operation, a mutation operation is performed, which serves as a key step for introducing new genes and preventing the algorithm from becoming trapped in a local optimum. In this paper, the mutation probability for each individual node is set to 25%. Through the two processes of mutation triggering and node replacement, the random mutation operation based on node replacement is completed. At the same time, the mutated sequence must still satisfy the requirement of having no duplicate nodes to guarantee the validity of the attack sequence. Finally, the above process is executed through multiple generations of loops to gradually improve the overall performance of the algorithm. In the stages of initialization, crossover completion and mutation replacement, the GA gives priority to selecting nodes with high betweenness to ensure that the evolution direction is consistent with the importance of nodes. In the fitness function, an additional reward will be given to the effective attacks on the nodes with high betweenness, which can directly improve the fitness of such attack sequences. The elite retention strategy can ensure that the optimal high-damage sequence in each generation is not lost, and finally realize the accelerated convergence of the algorithm to the global optimum. In the process of optimization attack, the network efficiency is used as the termination criterion. When the efficiency drops below the threshold, it is set to 10%. The network is considered to have entered a state of total failure, and the optimization process stops. In other words, the algorithm does not search for a solution indefinitely, but considers it successful and terminates the search when the evaluation function detects that the network functionality has been weakened below about 10%. The same is true for the optimization attack on the degree.