Next Article in Journal
Developing a Virtual Museum: Experience from the Design and Creation Process
Previous Article in Journal
MNCF: Prediction Method for Reliable Blockchain Services under a BaaS Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Establishment of a Key Hidden Danger Factor System for Electric Power Personal Casualty Accidents Based on Text Mining

1
Economic and Technological Research Institute, State Grid Henan Electric Power Company, Zhengzhou 450000, China
2
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
3
Suzhou China International Engineering Consulting Corporation, Suzhou 215000, China
*
Author to whom correspondence should be addressed.
Information 2021, 12(6), 243; https://doi.org/10.3390/info12060243
Submission received: 9 April 2021 / Revised: 31 May 2021 / Accepted: 31 May 2021 / Published: 10 June 2021

Abstract

:
Based on actual safety management difficulties and needs, this paper aims to screen and extract the key accident potential factors of personal injuries and deaths within the electric power industry to provide a reference for electric power companies’ accident prevention effort. First, this document sorts out and analyzes all of the causes and influencing elements that may lead to the occurrence of electric personal injuries and deaths, based on which rough accident potential factors are initially identified and combined with the definition of accident potentials. Second, this paper mines and analyzes relevant accident report texts using text-mining technologies such as term count, word cloud, and term frequency–inverse document frequency (TF-IDF), and thus a system of key accident potential factors for personal injuries and deaths within the electric power industry, including three key factors (human, material, and management), is finally constructed. Workers’ habitual violation behavior, in particular, has a larger risk than other key accident potential components, implying that additional steps should be made to eradicate this type of critical accident potential in time.

1. Introduction

Power generating safety is becoming increasingly crucial as China’s power system develops and becomes more complicated, as well as the increased power demand brought on by China’s economic development. An accident is defined as an incident that causes a personal casualty or direct economic loss in production and operation activities under China’s Regulations on the Reporting, Investigation, and Handling of Work Safety Accidents [1]. Personal accidents, as a subset of accidents, are accidents that result in a personal casualty, excluding property damage and equipment damage. As a result, an electric power personal casualty accident is defined as an event that causes a personal casualty in the electric power industry’s production, engineering construction, marketing, and industrial fields, typically involving electric shock, falling accidents, being struck by objects, collapsing, and burial [2]. According to statistics from the National Energy Administration’s Electric Power Safety Supervision Department, 44 electric power personal casualty accidents occurred in China in 2019, resulting in 54 casualties, endangering the lives of various personnel and affecting the development of electric power enterprises and the national economy. As a result, it is urgent to determine the reasons for these events and prevent them from occurring again.
According to research, hazard sources and hidden threats are commonly regarded as the causes of accidents. There are three types of danger sources in the theory of accident causes. The first type of danger source is the energy carrier or hazardous substance itself, which is the material source of the disaster and cannot be eliminated. The second class relates to harmful conditions and unsafe human behavior, while the third class relates to organizational issues that do not correspond to safety standards. The first category includes dangers that cannot be avoided through management, whereas the second and third categories include dangers that may be eliminated and must be addressed by power companies. A class I hazard source is the material root of an accident that cannot be eliminated [3], whereas a hidden danger is defined as the hazardous state of an object as well as people’s risky behaviors and management defects [4] that may cause an accident, including class II and III hazard sources. Addressing these issues is an important way for safety management to play its part and lower the probability of an accident. As a result, it is required to objectively determine the influencing variables of electric power personal injury accidents from the standpoint of standardizing language and focusing on the latter two kinds of danger sources, build a more comprehensive and effective system for the key hidden risk causes of electric power personal injury accidents, and offer a reference for electric power companies to improve the level of production and management safety.
Most scholars and practitioners develop a highly subjective hidden danger factor system using literature study, expert interviews, and questionnaire surveys. However, in recent years, the power industry has developed a reasonably uniform method and system for accident investigation and recording, and a large number of accident reports have been developed, in which many examples of effective accident law and information are hidden. Text information, on the other hand, is less structured, and human processing is time-consuming and labor-intensive. If we can apply text mining technology to mine accident report text, we will be able to eliminate subjectivity and enhance the utilization rate of information.
The following is how this document is structured. Section 2 mostly introduces the research’s current status. Section 3 identifies the hidden danger factors of electric power personal injury accidents and provides the factor index method. Section 4 proposes a method for extracting critical hidden hazard indicators from document mining, which involves document pre-processing, word cloud creation, and term frequency counting. Section 5 includes an empirical investigation to validate the scientific nature and effectiveness of the suggested method.
Section 6 then establishes the primary hidden danger factor system for electric power personal casualty accidents based on empirical analysis. Section 7 summarizes the report by discussing the implications and limitations of our findings as well as future research proposals.
In summary, the following are the research’s main contributions:
(1)
The hidden danger factors of electric power personal injury accidents are preliminarily determined through literature research and based on the definitions of the three types of hazard sources, by removing the factors that cannot be eliminated by management means.
(2)
This study develops a text mining approach for the development of a hidden danger factor system for electric power personal injury accidents, and it picks the essential hidden danger factors based on absolute and relative word frequency. We employ the average maximum TF-IDF value of hidden danger factor proper phrases associated with hidden danger factors as one of the assessment indices during the process.

2. Related Works

2.1. Research on the Extraction of Influencing Factors of Power Accidents

In the study of factors influencing power accidents, both domestic and international experts have produced a wealth of research results. Williamson and Feyer [5] looked at the causes of electrical death at work from four perspectives: the environment, unsafe state of objects, human physiological state, and human unsafe behavior, and then divided the causes of accidents into eight categories: environmental factors, environmental factors, equipment factors, work habits factors, regulatory factors, training factors, work errors factors, physical condition and other factors (including alcohol/drug abuse, etc.). Zhao [6] used a literature review and expert consultation method to create an influencing factor system with 15 factors to analyze the causes of electrical fatalities in the construction industry in the United States of America, including operation time, task type, safety training, operation space, and voltage level. Chi [7] and others investigated the causes of 255 electrical fatal accidents in Taiwan’s construction industry using personal factors (age, gender, and experience), task factors (tasks performed), environmental factors (humid areas and confined spaces), and management factors (company size), as well as accident mode classification. Wei Xiuning [8] and others employed safety behavior, psychology, and accident cause theory to separate the elements impacting human safety in power supply operations into five categories: people, equipment, materials, procedures, and environment. Sun Huiling [9] argued that there are more than 20 risk factors that affect the safety production of power enterprises, and she classified them into five categories, including technical risk, environmental risk, management risk, and so on. The first three types of risk variables are the most important risk variables affecting power plant safety.
Based on the current research findings on the causes and influencing factors of power accidents, it can be concluded that: first, most literatures expound their own viewpoints and select terms at random to describe the causes and influencing factors of power accidents, without distinguishing the specific connotation of these terms, resulting in a lack of unified expression standards in the relevant research. Second, most studies just list and summarize all conceivable accident causes and influencing factors without considering whether or not they can be removed or managed, resulting in an influencing factors system that lacks a practical reference value for safety management practice. Finally, the majority of the literature relies on prior experience, various safety theories, or the use of expert consultation and other subjective qualitative approaches to determine the causes and influencing aspects of the accident. They do not extract information from the accident report text, which contains a wealth of useful data, and they do not use big data mining technology effectively.

2.2. Research on Identifying Accident-Influencing Factors Based on Text Mining

In the process of power production, electric power firms have amassed a huge number of accident text data, which is an essential source for companies to research accident rules and summarize work experience. Due to its unstructured or semi-structured characteristics, text information is more difficult to fully utilize its data value than digital information. As a result, text mining should be used to identify the contributing aspects more scientifically and efficiently.
In many areas of accident investigation, domestic and international researchers have made significant progress in identifying influencing elements utilizing text mining technology. Li Jie and Wang Jianping [10] chose 151 subway construction safety risk accident reports as text data from 2002 to 2015. They performed text preprocessing using the R language and text mining methods, such as word segmentation, feature extraction, spatial vector model development, and so on. The primary risk elements and general risk factors of subway construction safety were mined, and the results of text mining were visualized using visualization methods including word clouds and network structure diagrams, providing useful reference information for future subway construction safety. In 2018, Wu Chen and Jiang Fucai [11] used the R language and text mining algorithm to obtain a set of high-dimensional feature space vectors, then successfully reduced the dimension using statistical methods, and successfully constructed a Bayesian network structure of a “ship environment management” system. The difficulty of being unable to recognize professional jargon is solved by enhancing the TF-IDF algorithm, and the confidence rate of risk factors for ship collision accidents is enhanced. Human factors are thought to be the leading cause of ship collision accidents. Zhang Lu et al. [12] analyzed 41,791 hidden danger records of a coal mining enterprise, using a word cloud and TF to extract coal mine safety risks. From 156 accident reports, Li et al. [13] developed a lexicon and used document frequency (i.e., DF value) to identify 15 high-occurring safety risk variables and three participants.
After reviewing prior research, it was discovered that text mining technology has not yet been fully used in the electricity industry, particularly in China. Moreover, the existing research has not taken into account the factors that can help to eliminate the hidden hazard. As a result, we proposed a three-step strategy for the construction of a major hidden danger factors system applicable to China’s electric power personal injury occurrences. The first stage is to identify hidden danger elements that can be eliminated through management by conducting literature research and integrating it with the three-class theory of danger sources. The second step is to design the text mining process, which includes document preprocessing, word cloud building, and term frequency counting, as well as to screen the most important hidden danger factors using the parameters of hidden danger factor proper terms in the text that are related to the hidden danger factors obtained in the first step. The system of the main hidden danger factors of power personal injury accidents is developed in the last step, using the results of the previous literature research and text mining.

3. Preliminary Establishment of Hidden Danger Factor Index System of Electric Power Personal Casualty Accidents

3.1. Analysis on Influencing Factors of Electric Power Personal Casualty Accidents

The relevant literature is first analyzed in this paper to summarize the most common causes and influencing factors of electric power personal casualty accidents. There are 24 related works (1992 to 2019) that have been collected and analyzed, including 14 Chinese and 10 English papers. The study and statistical results of the causes and influencing factors of accidents can be found in Table 1.
All influencing elements may be grouped into six groups, as shown in Table 1, after integrating the study results of associated literatures, which are human factors, equipment factors, tool factors, method factors, environmental factors, and management factors, with 59 unique components. Personnel factors account for the majority of the factors (17 out of 59), with work skill level/knowledge level/ability and quality, working habits, and physical health/physical condition appearing the most frequently; in most of the literature, the arrival of work leaders/assignment of safety officers and guardians in method factors and the arrival of work leaders/assignment of safety officers and guardians in management factors are also considered key aspects.

3.2. Preliminary Identification of Hidden Danger Factors of Electric Power Personal Casualty Accidents

Many of the 59 causes and influencing factors of electric power personal casualty accidents listed in Table 1 are class I hazards that cannot be eliminated by safety management measures, such as working height type and working power outage type, which are determined by the work task and cannot be changed (high-altitude work, for example, necessitates climbing and cannot be accomplished by not climbing or lowering the climbing height.) As a result, the influencing variables of a class I hazard source are not hidden risk elements, according to the definition. The hidden danger elements of electric power personal injury accidents are preliminarily identified and categorized from four aspects of persons, object, environment, and management [4] by analyzing and judging the nature of each factor; see Table 2 for the results.
Table 2 demonstrates that there are 42 hidden danger factors in electric power personal-casualty incidents, which may be split into four groups, with people and management hidden danger elements accounting for the most, totaling 13 each. A P i means the i-th hidden danger factor group and A P i j means the j-th hidden danger factor belonging to the i-th group. Management refers to the activity process in which managers in a particular organization coordinate other people’s activities by implementing the planning, organizing, leading, coordinating, and controlling functions, so that others can achieve the established goals alongside them. In the final analysis, management is carried out by people, so the hidden dangers of personnel and management are essentially those of human factors, with the difference being that the hidden dangers of personnel focus on the unsafe state and behavior of workers, whereas hidden dangers of management focus on the unsafe behavior of managers and the unsafe systems they established. Furthermore, the hidden danger factors of the object are the least, primarily in four aspects: working equipment, control system, working instruments, and personal protective equipment; the hidden danger factors of the environment comprise natural and humanistic working environment components.

4. Extraction Method of Key Hidden Danger Factors of Electric Power Personal Casualty Accidents Based on Document Mining

4.1. Experimental Scheme of Document File

Document information mining has become an important direction of data mining with the application of the Internet and social networks, and it is frequently employed in disciplines such as emotion analysis [34] and public opinion recognition [35].
Figure 1 depicts the experimental scheme for this paper by document mining, which consists of six steps: document collection, document storage, document preprocessing, document mining, analysis and comparison of experimental results, and final establishment of key hidden danger factor system of electric power personal casualty accidents. This paper collects 225 electric power casualty accident reports from the Compilation of National Electric Power Accidents and Electric Power Safety Events (2014 to 2018) [2,36,37,38,39] provided by the National Energy Administration’s Electric Power Safety Supervision Department; document mining experiments are carried out for the accident brief and cause in each accident report.
The cases are all in Chinese and come from the State Grid Corporation of China. The following are the reasons why we chose the Chinese case: (1) Chinese and English have distinct text characteristics that cannot be mixed during text mining. (2) The safety management models and cultures of power firms varies among countries. Case texts in Chinese are being studied in depth in order to better target the characteristics of China’s power safety management.

4.2. Document Preprocessing

  • Due to the redundancy, incompleteness, and complexity of a huge quantity of document data in the electric power personal casualty accident report, it is required to pre-process the content first to increase the effect of document mining [40], which mostly includes Stopword’s filtration. In the original accident report document, there are numerous punctuation marks such as “,”, “.”, and “...”, as well as meaningless phrases for experimental purposes such as “of”, “very”, “limited company”, and “power supply station” that should be removed. As a result, in order to filter the document content, a stop list must first be specified.
  • Segmentation of Chinese words: some specific words hidden in the whole language in the electric power personal casualty accident report express the precise connotation of the important concealed danger aspects of electric power personal casualty accident. For example, “XiaoX Li, an untrained masonry worker, has weak self-protection awareness and inadequate safety knowledge”. It reflects that the accident is related to the work skill level/knowledge level/ability and quality in personnel hidden threats. As a result, word segmentation for the document content using Chinese word segmentation is required, as well as word segmentation for each concealed risk factor’s connected terms, in order to prepare for its following characteristic calculation and reduction.
  • Create your own dictionary: Chinese sentences are made up of meaningful genuine words and function words that help create the sentence grammar; because there is no space between words, Chinese sentences cannot be subdivided directly by spaces and punctuation between words as English sentences can. As a result, a dictionary or an algorithm is often needed to finish the word segmentation [40]. Jieba, a Python-based word segmentation tool, is utilized in this work to finish the word segmentation of an electric power personal casualty accident report document. A custom dictionary was imported to realize the fixed collocation and combination of words related to various hidden danger factors in order to ensure the accuracy and completeness of the word segmentation results related to various hidden danger factors and avoid the segmentation of proper nouns such as “...... weak/self/protection/consciousness, poor/safety/consciousness”, “self-protection awareness”, and “safety awareness” to guarantee that the follow-up trial runs well.

4.3. Word Cloud Building

Word Cloud, also known as word cloud analysis, is a technology that highlights the “keywords” that appear frequently in document data and converts them into cloud-like color pictures, allowing visitors to appreciate or understand the main idea conveyed by the document at a glance, primarily using document mining and visualization technology. Based on the term count result of words directly reflecting the connotation of hidden danger factors, this paper draws a word cloud picture for the accident cause document in the electric power personal casualty accident report and shows the general situation of the important terms related to hidden danger factors; see Figure 2 for the experimental process.

4.4. Term Frequency Counting

The word frequency count refers to the statistics of how many times a term appears in the document. In general, the absolute term frequency (i.e., term count) counts the number of terms that appear in a document. However, this creates a problem: a term may appear more frequently in a long document than in a short one, but this does not imply that the term is more essential or has more weight. As a result, the normalized term frequency, also known as relative term frequency, is frequently used as the weight value of the term. Word frequency-inverse document frequency (TF-IDF), a popular weighting method for information retrieval and data mining, is based on the basic principle of relative term frequency. TF (term frequency) refers to the frequency of a certain term appearing in the document and normalizes the number of terms to avoid bias towards large documents, whereas IDF (inverse document frequency) is a measure of the universal relevance of terms [41,42].
Due to the long time span, large number, and numerous writers of the 225 electric power casualty accident reports chosen for this paper, the language style is difficult to unify, the description methods are diverse, and the number of words is quite different, resulting in a disadvantage in the simple term count experiment. Some high-frequency keywords directly expressing the meaning of concealed danger elements, for example, are overstated because they appear frequently in many exceedingly long and inconsequential accident cause documents. This is because their description documents are too short, and some other keywords directly representing the meaning of hidden danger factors may be overestimated, impacting the final screening result of important hidden danger factors. As a result, in this paper, the word count experiment is carried out first on the terms related to hidden danger factors, and then the relevant important terms are screened based on the term count results; additionally, the TF-IDF method is introduced to count the relevant important words with the highest TF-IDF value in each accident cause document and their TF-IDF values. After classification and merging, the comprehensive average maximum TF-IDF value of the relevant important words of each hidden danger factor in 225 electric power personal casualty accident reports can be obtained, reflecting the importance and influence of each hidden danger factor on an electric power personal casualty accident.
The accident cause document set in 225 electric power personal casualty accident reports is considered as set D = { D 1 , D 2 , , D m , } , and D r ( r = 1 ,   2 ,   ,   m ) means the r-th accident cause document in the document set. The term set appearing in the accident cause document set is considered as W = { W 1 , W 2 , , W n } , and W s ( s = 1 ,   2 ,   ,   n ) means the s-th term in the term set. The proper term set of hidden danger factors appearing in the accident cause document set is considered as R = { R 1 , R 2 , , R l } , and R t ( t = 1 ,   2 ,   ,   l ) means the t-th proper term of hidden danger factors.
The influence of specific hidden danger factors is calculated and screened using the following steps:
Step (1): calculate the term frequency of the proper term of hidden danger factors R t :
T C t = r = 1 m x t r , t = 1 ,   2 ,   ,   l
T C t means the term frequency of R t in the document set D; x t r means the frequency of the occurrence of R t in D r .
Step (2): set a threshold of term frequency to screen the important proper term of hidden danger factors:
The proper terms of hidden danger factors whose term frequencies is greater than threshold (α) will form set U = { U 1 , U 2 , , U p } , and U k ( k = 1 ,   2 ,   ,   p ,   p < l ) means the k-th important proper terms of hidden danger factors. The screening discriminant is as follows:
Φ   = {   R t U ,   T C t     α R t U ,   T C t   <   α ,   t = 1 , 2 , , l
The threshold is generally empirically set according to the text analysis field. If T C t exceeds the threshold,   R t will be considered as an important proper term of hidden danger factors and be classified into set U; otherwise, it will be removed.
Step (3): determine whether there is a logical correlation between the important proper term of hidden danger factors ( U k ) and each specific hidden danger factor ( A P i j ) in Table 2, the determination formula is:
A i j k = { 1 ,   there   is   a   logical   correlation   between   U k   and   A P i j 0 ,   there   is   not   a   logical   correlation   between   U k   and   A P i j ,     k = 1 ,   2   , ,   p ;   i = 1 ,   2 ,   ,   4 ;   j = 1 ,   2 ,   ,   13
Step (4): calculate the term frequency of important proper terms of hidden danger factors related to each specific hidden danger factor ( A P i j ):
C i j k = k = 1 p A i j k , i = 1 , 2 , , 4 ;   j = 1 ,   2 ,   ,   13
C i j k means the number of important proper terms of hidden danger factors that have a logical correlation with A P i j .
Q i j k = k = 1 p T C k · A i j k ,   i = 1 , 2 , , 4 ;   j = 1 ,   2 ,   ,   13
Q i j k means the term frequency of important proper terms of hidden danger factors related to A P i j ; T C k means the term frequency of U k in the document set D.
For a specific hidden danger factor A P i j , the larger Q i j k is, the more influence the specific hidden danger factor A P i j will have on electric power personal casualty accidents.
Step (5): calculate the TF-IDF value of each important proper term of hidden danger factors ( U k ) in each accident cause text ( D r ):
T F k r = x k r s = 1 n x s r ,   k = 1 ,   2 ,   ,   p ;   r = 1 ,   2 ,   ,   m
T F k r means the TF value of the k-th important proper term of hidden danger factors ( U k ) in the r-th accident cause document ( D r ); x k r means the frequency of the occurrence of U k in the r-th accident cause document; s = 1 n x s r means the number of terms in the r-th accident document.
I D F k = log 10 m | { r : U k D r } | ,   k = 1 ,   2 ,   ,   p
I D F k means the IDF value of the k-th important proper term of hidden danger factors ( U k ), m means the number of documents in the accident cause document set, and | { r : U k D r } | means the number of accident cause documents containing the k-th important proper term of hidden danger factors ( U k ).
T F - I D F k r = T F k r × I D F k
T F - I D F k r means the TF-IDF value of the k-th important proper term of hidden danger factors ( U k ) in the r-th accident cause document.
Step (6): determine whether the TF-IDF value of each important proper term of hidden danger factors ( U k ) in each accident cause text D r is the largest,
B k r = { 1 ,   T F - I D F k r     T F - I D F q r , q { 1 , 2 , , p } 0 ,   otherwise   ,   k = 1 ,   2 ,   ,   p ;   r = 1 ,   2 ,   ,   m
Step (7): calculate the average highest TF-IDF value of every important proper term of hidden danger factors ( U k ) that have correlation with each specific hidden danger factor A P i j in the accident cause text set,
T F - I D F i j m a x ¯ = r = 1 m k = 1 p T F - I D F k r · B k r · A i j k r = 1 m k = 1 p B k r · A i j k ,   i = 1 , 2 , , 4 ; j = 1 ,   2 ,   ,   13
T F - I D F i j m a x ¯ means the comprehensive average maximum TF-IDF value of important proper terms of hidden danger factors.
It has been found that if an important term related to a hidden danger factor appears more frequently in a specific accident cause document and less frequently in the entire set of accident cause documents, the hidden danger factor has a greater importance and influence on the occurrence of the specific accident.

5. Analysis on Document Mining Results of Electric Power Personal Casualty Accident Report

5.1. Data Sources

As stated in Section 4.1, the accident text used in this study is a compilation of 225 electric power personal casualty accident reports from the national compilation of electric power accidents and electric power safety incidents (2014 to 2018) compiled by the national energy administration’s electric power safety supervision department. Each article is divided into three sections: a brief account of the accident, the causes of the event, and ideas on how to rectify it. As a result of the unified subject compiling the compilation set and the state having relevant provisions on the terms and format of power accident report writing, the 225 texts may better adapt to the text mining method and process of this study. This study begins with a simple word count of accident type phrases in the accident description section to determine the number distribution of various sorts of accidents. The text of the accident’s cause is then mined using word cloud and TF-IDF technologies.

5.2. Document Mining of Accident Brief

The term count experiment on the accident brief in 225 electric power personal casualty accident reports (2014 to 2018) was carried out using the document mining method; see Figure 3 for the results.
As illustrated in Figure 3, the six most common types of accident from 2014 to 2018 were falling, electric shock, mechanical injury, collapse, object striking, and derrick falling down, accounting for 87 percent of all accident kinds. As the frequency of falling and electric shock accidents is significantly higher than that of other accident kinds, how to effectively limit the occurrence of these two categories of accidents is the emphasis and challenge of accident prevention work in power enterprises.

5.3. Document Mining of Accident Causes

5.3.1. Word Cloud of Accident Hidden Danger Factors

The relevant phrases connected to the concealed hazard variables as accident causes in 225 electric power personal injury accident reports (2014 to 2018) are graphically represented using the word cloud analysis approach described in Section 4.3; see Figure 4 for the drawn word cloud. Terms such as “absence”, “work leader”, “management”, “safety management”, “supervisor”, “inspection”, and “violation” appear most frequently and are most frequently used in accident cause documents to explain the accident cause, and their frequent occurrence also represents the high possibility of related hidden danger factors in common accidents.

5.3.2. Screening Results and Analysis of Key Hidden Danger Factors Based on Term Count and TF-IDF

In order to address the inherent defects in the collected 225 electric power personal casualty accident reports (2014 to 2018), the approach of term count combination with TF-IDF developed in Section 4.4 was used: first, the term count threshold for hidden risk factors ws set to 5, and 389 essential terms linked to hidden danger factors were screened from α them; second, three characteristic values of the key keywords associated with each concealed hazard factor were calculated: the count of the important terms related to each hidden danger factor ( C i j k ), the comprehensive cumulative frequency of the important terms related to each hidden danger factor ( Q i j k ), and the comprehensive average maximum TF-IDF value of the important terms related to each hidden danger factor ( T F - I D F i j k m a x ¯ ); finally, based on the overall distribution of the experimental results, the threshold was set as follows: for a specific hidden danger factor A P i j ( i = 1 ,   2 ,   ,   4 ;   j = 1 ,   2 ,   ,   13 ) , if C i j k ≥ 10, Q i j k ≥ 200 and T F - I D F i j k m a x ¯ ≥ 0.1, it will be considered as the key hidden danger factor of electric power personal casualty accident. See Table 3 for the screening results.
In Table 3, for a specific hidden danger factor A P i j   ( i = 1 ,   2 ,   3 ;   j = 1 ,   2 ,   ,   5 ) , if Q i j k is greater with the increasing of C i j k , it shows that that hidden danger factor appears frequently in general electric power personal casualty accidents; if T F - I D F i j k m a x ¯ is greater, it demonstrates that such a hidden danger element has a stronger impact on a personal injury catastrophe using electric power. The primary hidden hazard variables of electric power personal casualty incidents filtered out in this paper include two features based on the assumption of setting a threshold: first, hidden danger factors are present in the majority of electric power personal casualty accidents, emphasizing their prevalence; second, the appearance of the hidden danger factor has a significant impact on the occurrence of an electric power personal casualty accident, emphasizing its harmfulness. The risk concept is the combination of these two characteristics; based on the formula (risk = possibility influence), it can be seen that the occurrence possibility and harmfulness of the working habit (a key hidden danger factor) in the personnel hidden danger are above average, which belongs to the high-risk key hidden danger factor and is the “top priority” in the key hidden danger factor system.

6. Establishment of Key Hidden Danger Factor System of Electric Power Personal Casualty Accidents Based on Empirical Analysis

In this paper, the key hidden danger factor system of electric power personal casualty accidents is ultimately formed, the names of key hidden danger factors are defined and standardized, and their specific connotations are reinforced; see Table 4 for specific contents.
Table 4 shows that, in contrast to other industrial fields where production accidents are frequently attributed to personnel, object, environment, and management, the key hidden danger factors of electric power personal casualty accidents extracted based on document mining in this paper exclude the hidden danger factors of the environment and only focus on hidden dangers of personnel, object, and management. This is exactly consistent with the actual situation in which many electric power enterprises adjust working arrangements based on environmental weather conditions (e.g., not working on hot days or heavy rain and typhoon days). Therefore, the hidden danger factors of the environment can be ignored.

7. Conclusions

The significant information in the accident report has not been effectively exploited in the safety management of China’s power system. The first step toward realizing the digitalization, automation, and intelligence of power grid safety management is to extract important information from power grid accident reports using machine learning and text mining. This study builds an index system for hidden danger factors based on text mining of power grid safety accident reports based on the current state of China’s power grid safety management, employing safety management theory and text mining methods. This method not only reduces the subjectivity of identifying and judging potential safety threats, but it also lays the groundwork for the future implementation of digital and intelligent safety management.
The following are the limitations of this study. As the majority of electric power personal injury accidents are minor or minor-level accidents, and there are few significant accidents, the 225 electric power personal injury accidents collected were minor or minor-level accidents (level 4). Based on text information in Chinese, we only focused on power grid safety management in Chinese.

Author Contributions

Conceptualization, D.L. and C.M.; methodology, C.M. and Y.W.; software, C.Z.; validation, X.X.; formal analysis, Y.W.; resources, C.X.; data curation, D.L.; writing, Y.W. and C.Z.; supervision, C.M.; project administration, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the science and technology project of the headquarters of the State Grid Corporation of China (52170018000T).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. State Council of the PRC. Report on Production Safety Accident and Regulations of Investigation and Treatment. 28 March 2007. Available online: http://www.gov.cn/zwgk/2007-04/19/content_588577.htm (accessed on 4 June 2020).
  2. Electric Power Safety Supervision Department of National Energy Administration. Compilation of National Power Accidents and Power Security Incidents (2014); China Power Media Group Co., Ltd.: Beijing, China, 2015. [Google Scholar]
  3. Tian, S.C. Study on Identification and Control of the Third Type of Hazard; Beijing Institute of Technology: Beijing, China, 2001. [Google Scholar]
  4. State Administration of Work Safety. Interim Provisions on the Investigation and Control of Safety Accidents. 28 December 2007. Available online: https://www.mem.gov.cn/fw/flfgbz/gz/200801/t20080110_233408.shtml (accessed on 4 June 2020).
  5. Williamson, A.; Feyer, A.M. The causes of electrical fatalities at work. J. Saf. Res. 1998, 29, 187–196. [Google Scholar] [CrossRef]
  6. Zhao, D.; Thabet, W.; McCoy, A.; Kleiner, B. Electrical deaths in the US construction: An analysis of fatality investigations. Int. J. Inj. Control. Saf. Promot. 2014, 21, 278–288. [Google Scholar] [CrossRef]
  7. Chi, C.F.; Yang, C.C.; Chen, Z.L. In-depth accident analysis of electrical fatalities in the construction industry. Int. J. Ind. Ergon. 2009, 39, 635–644. [Google Scholar] [CrossRef]
  8. Wei, X.N.; Liu, Y.G.; Huang, Y.L.; Chen, G.S.; Xiao, J.H. Prediction on human casualty hazard in power-supplying operation based on multiple linear regression. J. Saf. Sci. Technol. 2015, 11, 178–184. [Google Scholar]
  9. Sun, H.L. Study on Risk Management of the Electric Power Enterprise Production Safety; Qingdao University of Technology: Shandong, China, 2018. [Google Scholar]
  10. Li, J.; Wang, J.P.; Xu, N.; Zhou, Z. Analysis of safety risk factors for metro construction based on text mining method. Tunn. Constr. 2017, 2, 33–42. [Google Scholar]
  11. Wu, J.; Jiang, F.C. An analysis and risk forecasting of inland ship collision based on text mining. J. Transp. Inf. Saf. 2018, 3, 154–161. [Google Scholar]
  12. Tan, Z.L.; Chen, X.; Song, Q.Z.; Chen, X.C. Analysis for the potential hazardous risks of the coal mines based on the so-called text mining. J. Saf. Environ. 2017, 4, 1262–1266. [Google Scholar]
  13. Feng, J.; Gong, C.; Li, X.; Lau, R.Y. Automatic approach of sentiment lexicon generation for mobile shopping reviews. Wirel. Commun. Mob. Comput. 2018, 10, 232–240. [Google Scholar] [CrossRef]
  14. Castillo-rosa, J.; Suarez-cebador, M.; Rubio-romero, J.C.; Aguado, J.A. Personal factors and consequences of electrical occupational accidents in the primary, secondary and tertiary sectors. Saf. Sci. 2017, 91, 286–297. [Google Scholar] [CrossRef]
  15. Cheng, C.W.; Leu, S.S.; Lin, C.C.; Fan, C. Characteristic analysis of occupational accidents at small construction enterprises. Saf. Sci. 2010, 48, 698–707. [Google Scholar] [CrossRef]
  16. Dokov, W. Assessment of risk factors for death in electrical injury. Burns 2009, 35, 114–117. [Google Scholar] [CrossRef]
  17. Yang, Y.Q. Evaluation Model of Power Operation of Personal Risk Quantification; South China University of Technology: Guangdong, China, 2015. [Google Scholar]
  18. Suarez-cebador, M.; Rubio-romero, J.C.; Lopez-arquillos, A. Severity of electrical accidents in the construction industry in Spain. J. Saf. Res. 2014, 48, 63–70. [Google Scholar] [CrossRef] [PubMed]
  19. Khanzode, V.V.; Maiti, J.; Ray, P.K. Occupational injury and accident research: A comprehensive review. Saf. Sci. 2012, 50, 1355–1367. [Google Scholar] [CrossRef]
  20. Paivinen, M. Electricians’ perception of work-related risks in cold climate when working on high places. Int. J. Ind. Ergon. 2006, 36, 661–670. [Google Scholar] [CrossRef]
  21. Tang, J.X. Research on Human Reliability Analysis and Accident Pre-control of Power Grid; Zhejiang University: Zhejiang, China, 2015. [Google Scholar]
  22. Zhang, J.W. Research on Human Error of Power Accidents Based on HFACS; Soochow University: Jiangsu, China, 2016. [Google Scholar]
  23. Ren, C.Y. Application on Safety Management of Human Factors Engineering Theory in Electric Power Company; Tianjin University: Tianjin, China, 2016. [Google Scholar]
  24. Wu, S.S. Research and Application on safety Human Factors in Electricity Enterprises; Beijing Jiaotong University: Beijing, China, 2013. [Google Scholar]
  25. Wu, S.P. Study on the Influencing Factors and Management of Human Errors in the Production of Electricity Enterprises; Beijing Jiaotong University: Beijing, China, 2009. [Google Scholar]
  26. Yang, Z.L. Electric Power Production Safety Risk Causes and Pre-Control Method Research; North China Electric Power University: Beijing, China, 2016. [Google Scholar]
  27. Ye, J.G. Safety Risk Identification and Control. System of Power Supply Enterprise; North China Electric Power University: Beijing, China, 2014. [Google Scholar]
  28. Mellen, P.F.; Weedn, V.W.; Kao, G. Electrocution: A review of 155 cases with emphasis on human factors. J. Forensic Sci. 1992, 37, 1016–1022. [Google Scholar] [CrossRef]
  29. Zhu, L.N. Research on the Early Warning of Production Safety Accidents in Power Generation Enterprises Based on Bayesian Network; North China Electric Power University: Beijing, China, 2019. [Google Scholar]
  30. Editorial board of “Construction and Practice of Intrinsic Safety Capability of Power Grid Enterprises”. Construction and Practice of Intrinsic Safety Capability of Power Grid Enterprises; China Electric Power Press: Beijing, China, 2018. [Google Scholar]
  31. Shen, Y.L.; Yang, Z.X.; Li, H.L.; Zhang, K.; Liu, Y.Z. A review of the safety psychology and physiology related research in electric power human accident. Telecom World 2019, 26, 368–369. [Google Scholar]
  32. Li, Y.B.; Han, Y.; Zhang, R.; Li, Y. Research on impact model of meteorological factors on the power accidents. Power Syst. Technol. 2013, 37, 1683–1687. [Google Scholar]
  33. Yu, X.; Xu, C.; Lu, D.; Zhu, Z.; Zhou, Z.; Ye, N.; Mi, C. Design and application of a case analysis system for handling power grid operational accidents based on case-based reasoning. Information 2020, 11, 91. [Google Scholar] [CrossRef] [Green Version]
  34. Yang, Y.J.; Yuan, H.H.; Wang, Y.L. Sentiment analysis method for comment text. J. Nanjing Univ. Sci. Technol. 2019, 43, 280–285. [Google Scholar]
  35. Tan, J.Y.; Ma, S.C. Research on network microblogging lyrics topic recognition and tracking technology based on clustering. J. Chongqing Univ. Technol. 2019, 33, 176–181. [Google Scholar]
  36. Electric Power Safety Supervision Department of National Energy Administration. Compilation of National Power Accidents and Power Security Incidents (2015); China Power Media Group Co., Ltd.: Beijing, China, 2016. [Google Scholar]
  37. Electric Power Safety Supervision Department of National Energy Administration. Compilation of National Power Accidents and Power Security Incidents (2016); China Power Media Group Co., Ltd.: Beijing, China, 2017. [Google Scholar]
  38. Electric Power Safety Supervision Department of National Energy Administration. Compilation of National Power Accidents and Power Security Incidents (2017); China Power Media Group Co., Ltd.: Beijing, China, 2018. [Google Scholar]
  39. Electric Power Safety Supervision Department of National Energy Administration. Compilation of National Power Accidents and Power Security Incidents (2019); China Power Media Group Co., Ltd.: Beijing, China, 2019. [Google Scholar]
  40. Huang, C.N.; Zhao, H. Chinese word segmentation: A decade review. J. Chin. Inf. Process. 2007, 3, 8–19. [Google Scholar]
  41. Yang, B.; Han, Q.W.; Lei, M.; Zhang, Y.P.; Liu, X.G.; Yang, Y.Q.; Ma, X.F. Short text classification algorithm based on improved TF-IDF weight. J. Chongqing Univ. Technol. 2016, 30, 108–113. [Google Scholar]
  42. Zhang, J. A Method of intelligence key words extraction based on improved TF-IDF. J. Intell. 2014, 33, 153–155. [Google Scholar]
Figure 1. Document mining of electric power personal casualty accident report.
Figure 1. Document mining of electric power personal casualty accident report.
Information 12 00243 g001
Figure 2. Word cloud of important words related to hidden danger factors of electric power personal casualty accidents.
Figure 2. Word cloud of important words related to hidden danger factors of electric power personal casualty accidents.
Information 12 00243 g002
Figure 3. Statistical results of the main types of electric power personal casualty accidents (2014 to 2018).
Figure 3. Statistical results of the main types of electric power personal casualty accidents (2014 to 2018).
Information 12 00243 g003
Figure 4. Term cloud of important terms related to hidden danger factors of electric power personal casualty accidents (2014 to 2018).
Figure 4. Term cloud of important terms related to hidden danger factors of electric power personal casualty accidents (2014 to 2018).
Information 12 00243 g004
Table 1. Analysis and Statistical Results of Causes and Influencing Factors of Electric Power Personal Casualty Accidents in the Related Literature.
Table 1. Analysis and Statistical Results of Causes and Influencing Factors of Electric Power Personal Casualty Accidents in the Related Literature.
Factor CategoryNo.Specific Causes and Influencing Factors of AccidentsMentioned PaperQuantity of Mentioned Paper
Personnel factor1GenderLiterature [6,7,14,15,16]5
2NationalityLiterature [14]1
3AgeLiterature [6,7,8,14,15,16,17,18,19,20]10
4Education background/education level/intelligence levelLiterature [15,19,21,22]4
5Working years/work experience level/familiarity/qualificationsLiterature [7,8,14,15,17,18,19,20,21,23]10
6Time for current workLiterature [20]1
7Contract typeLiterature [15]1
8Work skill level/knowledge level/ability and qualityLiterature [7,8,9,15,17,19,21,22,23,24,25,26,27,28]15
9Working habitLiterature [5,6,7,8,9,15,17,19,21,22,26,27,28,29,30]16
10AbsenteeismLiterature [19]1
11HandednessLiterature [19]1
12Personality trait/character traitLiterature [20,24,25,26,28]5
13Physical health/physical conditionLiterature [5,8,15,17,19,20,21,22,23,24,25,26,27,29,31]15
14LifestyleLiterature [5,16,19,28]4
15Mental stateLiterature [8,15,17,19,20,21,22,23,24,25,26,27]12
16Work mood/psychological condition/psychological qualityLiterature [8,17,19,21,22,23,24,25,26,27,29,31]12
17Communication level/team cooperation/interpersonal relationship/coordination degreeLiterature [9,15,19,21,23,24,27,29]8
Equipment factor1Equipment design and quality/equipment stateLiterature [5,7,9,15,17,18,20,21,26,27,29,30]12
2Equipment nameplate quality/safety signLiterature [15,21,30]3
3Equipment location/site layoutLiterature [21,27,29,30]4
4Control system stateLiterature [29]1
Tool factor1Instrument configurationLiterature [5,6,7,8,9,15,17,18,20,21,26,27,30]13
2Personal protective equipment configurationLiterature [5,7,8,9,15,17,18,20,21,27,29,30]12
Method factor1Working height typeLiterature [7,8,17,19,20,27]6
2Work power outage typeLiterature [7,8,17,19,27]5
3Working potential typeLiterature [7,8,17,19,27]5
4Working voltage classLiterature [6]1
5Working equipment typeLiterature [6,7,8,9,17,19,20,27]7
6Working nature/technical environment/technical natureLiterature [6,8,9,14,15,17,18,19,20,22,27]12
7Working personnel allocation/personnel scale/personnel arrangement/personnel assignmentLiterature [8,17,21,27]6
8Working organization formLiterature [8,17,21,27]4
9Working resource allocation/resource management conditionLiterature [19,22]2
10Working difficultyLiterature [21,27]2
11Working duration/workload/working intensity/working pressureLiterature [8,17,19,20,21,27]6
12Working timeLiterature [5,6,8,16,17,18,19,24]8
13Working instruction basisLiterature [5,6,8,9,15,17,20,21,22,23,24,27,29,30]14
Environmental factor1Working weatherLiterature [8,9,15,16,17,19,20,26,27,29,32,33]11
2Working temperatureLiterature [15,19,20,21,23,26,27,29,32]9
3Working air forceLiterature [15,19,20,26,29,32]6
4Working humidityLiterature [5,7,15,19,20,23,29,32]8
5Working air quality/dustLiterature [15,19,20,26,29]5
6Working ventilationLiterature [15,19,20,30]4
7Working noise/interferenceLiterature [15,19,20,21,23,26,29]7
8Working radiationLiterature [15,19,20,26,29]5
9Working illumination/lightingLiterature [5,15,19,20,21,27,29,30]8
10Working area/working siteLiterature [5,7,8,9,15,17,18,19,20,27,29]11
11Working spaceLiterature [5,6,7,8,9,15,18,19,20,23,27]11
12Working atmosphere/safety culture atmosphere/organizational atmosphere/organizational safety commitmentLiterature [9,19,21,22,24,29]6
13Company sizeLiterature [7,18]1
14Social environment/policy environmentLiterature [30]1
Management factor1Investigation before working/analysis of hazardous points in working/safety inspection/site investigationLiterature [21,23,27,29]4
2Arrival of work leaders/assignment of safety officers and guardiansLiterature [5,6,8,9,15,17,18,19,21,22,23,27,29,30]14
3Implementation of site safety control measures/effective supervision by work leadersLiterature [8,9,15,19,23]5
4Training selection/safety education trainingLiterature [5,6,15,21,26,29,30]7
5Accident handling abilityLiterature [29]1
6Reward and punishment mechanism state/investigation degree of accountability for accidentsLiterature [29]1
7Management consulting service modeLiterature [18]1
8Project jurisdiction ownershipLiterature [15]1
9Project contract amountLiterature [15]1
Table 2. Preliminary Identification Results of Hidden Danger Factors of Electric Power Personal Casualty Accidents.
Table 2. Preliminary Identification Results of Hidden Danger Factors of Electric Power Personal Casualty Accidents.
Hidden Danger Factor Category
A P i   ( i   =   1 , 2 , , 4 )
Specific Hidden Danger Factor
A P i j   ( i   =   1 , 2 , , 4 ;   j   =   1 , 2 , , 13 )
Hidden danger of personnel A P 1 Education background/education level/intelligence level A P 11
Working years/work experience level/familiarity/qualifications A P 12
Time for current work A P 13
Contract type A P 14
Work skill level/knowledge level/ability and quality A P 15
Working habit A P 16
Absenteeism A P 17
Personality trait/character trait A P 18
Physical health/physical condition A P 19
Lifestyle A P 1 × 10
Mental state A P 1 × 11
Work mood/psychological condition/psychological quality A P 1 × 12
Communication level/team cooperation/interpersonal relationship/coordination degree A P 1 × 13
Hidden danger of object A P 2 Equipment design and quality/equipment state A P 21
Equipment nameplate quality/safety sign A P 22
Equipment location/site layout A P 23
Control system state A P 24
Instrument configuration A P 25
Personal protective equipment configuration A P 26
Hidden danger of environment A P 3 Working weather A P 31
Working temperature A P 32
Working air force A P 33
Working humidity A P 34
Working air quality/dust A P 35
Working ventilation A P 36
Working noise/interference A P 37
Working radiation A P 38
Working illumination/lighting A P 39
Working atmosphere/safety culture atmosphere/organizational atmosphere/organizational safety commitment A P 3 × 10
Hidden danger of management A P 4 Investigation before working/analysis of hazardous points in working/safety inspection/site investigation A P 41
Working instruction basis A P 42
Working organization form A P 43
Working personnel allocation/personnel scale/personnel arrangement/personnel assignment A P 44
Working resource allocation/resource management condition A P 45
Working time A P 46
Working duration/workload/working intensity/working pressure A P 47
Arrival of work leaders/assignment of safety officers and guardians A P 48
Implementation of site safety control measures/effective supervision by work leaders A P 49
Training selection/safety education training A P 4 × 10
Accident handling ability A P 4 × 11
Reward and punishment mechanism state/investigation degree of accountability for accidents A P 4 × 12
Management consulting service mode A P 4 × 13
Note: Although factors such as gender, nationality, and age can be controlled through management means (e.g., recruitment conditions), they will not be considered during the development of the hidden danger factor system that can be eliminated through various safety management means in this paper because they are linked to the sensitive social issue of employment discrimination.
Table 3. Screening Results of Key Hidden Danger Factors of Electric Power Personal Casualty Accidents Based on Term Frequency Count.
Table 3. Screening Results of Key Hidden Danger Factors of Electric Power Personal Casualty Accidents Based on Term Frequency Count.
ij AP ij C ijk Q ijk TF - IDF ijk max ¯
11Work skill level/knowledge level/ability and quality10514860.2156
2Working habit17929580.2354
21Equipment design and quality/equipment state477270.3602
2Equipment nameplate quality/safety sign213440.2948
3Instruments configuration132110.2666
4Personal protective equipment configuration354930.2102
31Investigation before working/analysis of hazardous points in working/safety inspection/site investigation8216210.2319
2Working instruction basis10922310.2238
3Arrival of work leaders/assignment of safety officers and guardians 9319250.2119
4Implementation of site safety control measures/effective supervision by work leaders7217670.1829
5Training selection/safety education training379890.1793
Average7213410.2375
Table 4. Key Hidden Danger Factor System of Electric Power Personal Casualty Accident.
Table 4. Key Hidden Danger Factor System of Electric Power Personal Casualty Accident.
Hidden Danger Factor CategoryKey Hidden Danger FactorsSpecific Connotation
Hidden danger of personnelSafety quality and ability of workersMeans workers’ safety awareness ability and safety production skill level (including the work skills, skills mastering working safety devices and facilities, and knowledge and skills of proper handling and emergency rescue in case of emergencies)
Executive ability of rules and regulations of workersMeans whether the workers have good working habits, whether they follow numerous power sector safety production rules and regulations while working, and whether they have habitual violations
Hidden danger of objectEquipment design and qualityMeans whether the design parameters of the equipment fulfill industry and national standards, whether the commissioning time of the equipment exceeds the maximum working life, and whether the equipment has component damage and function loss
Equipment nameplate configuration and qualityMeans whether the equipment nameplate is missing, damaged or unclear
Configuration and quality status of operation instrumentsMeans whether the instruments are missing, unqualified and damaged during working
Personal protective equipment configuration and qualityMeans whether the personal protective equipment is missing, unqualified, damaged, or exceeds the shelf life during working
Hidden danger of managementSite investigation of managersMeans whether the managers have conducted site investigation before working, and whether the site investigation is accurate
Formulation of working instruction basisMeans whether the managers have developed the necessary working instruction base (containing three measures and one scheme, labor discipline, working drawings, various rules and regulations, and so on), and whether the formulation is accurate.
Arrival of managersMeans whether the managers arrive and whether they perform guardianship duties on the worksite
Implementation of site safety control measures by managersMeans whether the managers have implemented safety control measures during working and whether the implementation is accurate
Safety education training of enterpriseMeans whether electric power firms have provided safety education training to their personnel as part of their daily safety management job, and whether the effect has been positive
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lu, D.; Xu, C.; Mi, C.; Wang, Y.; Xu, X.; Zhao, C. Establishment of a Key Hidden Danger Factor System for Electric Power Personal Casualty Accidents Based on Text Mining. Information 2021, 12, 243. https://doi.org/10.3390/info12060243

AMA Style

Lu D, Xu C, Mi C, Wang Y, Xu X, Zhao C. Establishment of a Key Hidden Danger Factor System for Electric Power Personal Casualty Accidents Based on Text Mining. Information. 2021; 12(6):243. https://doi.org/10.3390/info12060243

Chicago/Turabian Style

Lu, Dan, Changqing Xu, Chuanmin Mi, Yijing Wang, Xiangmin Xu, and Chufan Zhao. 2021. "Establishment of a Key Hidden Danger Factor System for Electric Power Personal Casualty Accidents Based on Text Mining" Information 12, no. 6: 243. https://doi.org/10.3390/info12060243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop