Establishment of a Key Hidden Danger Factor System for Electric Power Personal Casualty Accidents Based on Text Mining

: Based on actual safety management difﬁculties and needs, this paper aims to screen and extract the key accident potential factors of personal injuries and deaths within the electric power industry to provide a reference for electric power companies’ accident prevention effort. First, this document sorts out and analyzes all of the causes and inﬂuencing elements that may lead to the occurrence of electric personal injuries and deaths, based on which rough accident potential factors are initially identiﬁed and combined with the deﬁnition of accident potentials. Second, this paper mines and analyzes relevant accident report texts using text-mining technologies such as term count, word cloud, and term frequency–inverse document frequency (TF-IDF), and thus a system of key accident potential factors for personal injuries and deaths within the electric power industry, including three key factors (human, material, and management), is ﬁnally constructed. Workers’ habitual violation behavior, in particular, has a larger risk than other key accident potential components, implying that additional steps should be made to eradicate this type of critical accident potential in time.


Introduction
Power generating safety is becoming increasingly crucial as China's power system develops and becomes more complicated, as well as the increased power demand brought on by China's economic development. An accident is defined as an incident that causes a personal casualty or direct economic loss in production and operation activities under China's Regulations on the Reporting, Investigation, and Handling of Work Safety Accidents [1]. Personal accidents, as a subset of accidents, are accidents that result in a personal casualty, excluding property damage and equipment damage. As a result, an electric power personal casualty accident is defined as an event that causes a personal casualty in the electric power industry's production, engineering construction, marketing, and industrial fields, typically involving electric shock, falling accidents, being struck by objects, collapsing, and burial [2]. According to statistics from the National Energy Administration's Electric Power Safety Supervision Department, 44 electric power personal casualty accidents occurred in China in 2019, resulting in 54 casualties, endangering the lives of various personnel and affecting the development of electric power enterprises and the national economy. As a result, it is urgent to determine the reasons for these events and prevent them from occurring again.
According to research, hazard sources and hidden threats are commonly regarded as the causes of accidents. There are three types of danger sources in the theory of accident causes. The first type of danger source is the energy carrier or hazardous substance itself, which is the material source of the disaster and cannot be eliminated. The second class relates to harmful conditions and unsafe human behavior, while the third class relates to organizational issues that do not correspond to safety standards. The first category includes dangers that cannot be avoided through management, whereas the second and third categories include dangers that may be eliminated and must be addressed by power companies. A class I hazard source is the material root of an accident that cannot be eliminated [3], whereas a hidden danger is defined as the hazardous state of an object as well as people's risky behaviors and management defects [4] that may cause an accident, including class II and III hazard sources. Addressing these issues is an important way for safety management to play its part and lower the probability of an accident. As a result, it is required to objectively determine the influencing variables of electric power personal injury accidents from the standpoint of standardizing language and focusing on the latter two kinds of danger sources, build a more comprehensive and effective system for the key hidden risk causes of electric power personal injury accidents, and offer a reference for electric power companies to improve the level of production and management safety.
Most scholars and practitioners develop a highly subjective hidden danger factor system using literature study, expert interviews, and questionnaire surveys. However, in recent years, the power industry has developed a reasonably uniform method and system for accident investigation and recording, and a large number of accident reports have been developed, in which many examples of effective accident law and information are hidden. Text information, on the other hand, is less structured, and human processing is time-consuming and labor-intensive. If we can apply text mining technology to mine accident report text, we will be able to eliminate subjectivity and enhance the utilization rate of information.
The following is how this document is structured. Section 2 mostly introduces the research's current status. Section 3 identifies the hidden danger factors of electric power personal injury accidents and provides the factor index method. Section 4 proposes a method for extracting critical hidden hazard indicators from document mining, which involves document pre-processing, word cloud creation, and term frequency counting. Section 5 includes an empirical investigation to validate the scientific nature and effectiveness of the suggested method. Section 6 then establishes the primary hidden danger factor system for electric power personal casualty accidents based on empirical analysis. Section 7 summarizes the report by discussing the implications and limitations of our findings as well as future research proposals.
In summary, the following are the research's main contributions: (1) The hidden danger factors of electric power personal injury accidents are preliminarily determined through literature research and based on the definitions of the three types of hazard sources, by removing the factors that cannot be eliminated by management means. (2) This study develops a text mining approach for the development of a hidden danger factor system for electric power personal injury accidents, and it picks the essential hidden danger factors based on absolute and relative word frequency. We employ the average maximum TF-IDF value of hidden danger factor proper phrases associated with hidden danger factors as one of the assessment indices during the process.

Research on the Extraction of Influencing Factors of Power Accidents
In the study of factors influencing power accidents, both domestic and international experts have produced a wealth of research results. Williamson and Feyer [5] looked at the causes of electrical death at work from four perspectives: the environment, unsafe state of objects, human physiological state, and human unsafe behavior, and then divided the causes of accidents into eight categories: environmental factors, environmental factors, equipment factors, work habits factors, regulatory factors, training factors, work errors factors, physical condition and other factors (including alcohol/drug abuse, etc.). Zhao [6] used a literature review and expert consultation method to create an influencing factor Information 2021, 12, 243 3 of 16 system with 15 factors to analyze the causes of electrical fatalities in the construction industry in the United States of America, including operation time, task type, safety training, operation space, and voltage level. Chi [7] and others investigated the causes of 255 electrical fatal accidents in Taiwan's construction industry using personal factors (age, gender, and experience), task factors (tasks performed), environmental factors (humid areas and confined spaces), and management factors (company size), as well as accident mode classification. Wei Xiuning [8] and others employed safety behavior, psychology, and accident cause theory to separate the elements impacting human safety in power supply operations into five categories: people, equipment, materials, procedures, and environment. Sun Huiling [9] argued that there are more than 20 risk factors that affect the safety production of power enterprises, and she classified them into five categories, including technical risk, environmental risk, management risk, and so on. The first three types of risk variables are the most important risk variables affecting power plant safety.
Based on the current research findings on the causes and influencing factors of power accidents, it can be concluded that: first, most literatures expound their own viewpoints and select terms at random to describe the causes and influencing factors of power accidents, without distinguishing the specific connotation of these terms, resulting in a lack of unified expression standards in the relevant research. Second, most studies just list and summarize all conceivable accident causes and influencing factors without considering whether or not they can be removed or managed, resulting in an influencing factors system that lacks a practical reference value for safety management practice. Finally, the majority of the literature relies on prior experience, various safety theories, or the use of expert consultation and other subjective qualitative approaches to determine the causes and influencing aspects of the accident. They do not extract information from the accident report text, which contains a wealth of useful data, and they do not use big data mining technology effectively.

Research on Identifying Accident-Influencing Factors Based on Text Mining
In the process of power production, electric power firms have amassed a huge number of accident text data, which is an essential source for companies to research accident rules and summarize work experience. Due to its unstructured or semi-structured characteristics, text information is more difficult to fully utilize its data value than digital information. As a result, text mining should be used to identify the contributing aspects more scientifically and efficiently.
In many areas of accident investigation, domestic and international researchers have made significant progress in identifying influencing elements utilizing text mining technology. Li Jie and Wang Jianping [10] chose 151 subway construction safety risk accident reports as text data from 2002 to 2015. They performed text preprocessing using the R language and text mining methods, such as word segmentation, feature extraction, spatial vector model development, and so on. The primary risk elements and general risk factors of subway construction safety were mined, and the results of text mining were visualized using visualization methods including word clouds and network structure diagrams, providing useful reference information for future subway construction safety. In 2018, Wu Chen and Jiang Fucai [11] used the R language and text mining algorithm to obtain a set of high-dimensional feature space vectors, then successfully reduced the dimension using statistical methods, and successfully constructed a Bayesian network structure of a "ship environment management" system. The difficulty of being unable to recognize professional jargon is solved by enhancing the TF-IDF algorithm, and the confidence rate of risk factors for ship collision accidents is enhanced. Human factors are thought to be the leading cause of ship collision accidents. Zhang Lu et al. [12] analyzed 41,791 hidden danger records of a coal mining enterprise, using a word cloud and TF to extract coal mine safety risks. From 156 accident reports, Li et al. [13] developed a lexicon and used document frequency (i.e., DF value) to identify 15 high-occurring safety risk variables and three participants.
After reviewing prior research, it was discovered that text mining technology has not yet been fully used in the electricity industry, particularly in China. Moreover, the existing research has not taken into account the factors that can help to eliminate the hidden hazard. As a result, we proposed a three-step strategy for the construction of a major hidden danger factors system applicable to China's electric power personal injury occurrences. The first stage is to identify hidden danger elements that can be eliminated through management by conducting literature research and integrating it with the three-class theory of danger sources. The second step is to design the text mining process, which includes document preprocessing, word cloud building, and term frequency counting, as well as to screen the most important hidden danger factors using the parameters of hidden danger factor proper terms in the text that are related to the hidden danger factors obtained in the first step. The system of the main hidden danger factors of power personal injury accidents is developed in the last step, using the results of the previous literature research and text mining.

Analysis on Influencing Factors of Electric Power Personal Casualty Accidents
The relevant literature is first analyzed in this paper to summarize the most common causes and influencing factors of electric power personal casualty accidents. There are 24 related works (1992 to 2019) that have been collected and analyzed, including 14 Chinese and 10 English papers. The study and statistical results of the causes and influencing factors of accidents can be found in Table 1.  All influencing elements may be grouped into six groups, as shown in Table 1, after integrating the study results of associated literatures, which are human factors, equipment factors, tool factors, method factors, environmental factors, and management factors, with 59 unique components. Personnel factors account for the majority of the factors (17 out of 59), with work skill level/knowledge level/ability and quality, working habits, and physical health/physical condition appearing the most frequently; in most of the literature, the arrival of work leaders/assignment of safety officers and guardians in method factors and the arrival of work leaders/assignment of safety officers and guardians in management factors are also considered key aspects.

Preliminary Identification of Hidden Danger Factors of Electric Power Personal Casualty Accidents
Many of the 59 causes and influencing factors of electric power personal casualty accidents listed in Table 1 are class I hazards that cannot be eliminated by safety management measures, such as working height type and working power outage type, which are determined by the work task and cannot be changed (high-altitude work, for example, necessitates climbing and cannot be accomplished by not climbing or lowering the climbing height.) As a result, the influencing variables of a class I hazard source are not hidden risk elements, according to the definition. The hidden danger elements of electric power personal injury accidents are preliminarily identified and categorized from four aspects of persons, object, environment, and management [4] by analyzing and judging the nature of each factor; see Table 2 for the results. Table 2 demonstrates that there are 42 hidden danger factors in electric power personalcasualty incidents, which may be split into four groups, with people and management hidden danger elements accounting for the most, totaling 13 each. AP i means the i-th hidden danger factor group and AP ij means the j-th hidden danger factor belonging to the i-th group. Management refers to the activity process in which managers in a particular organization coordinate other people's activities by implementing the planning, organizing, leading, coordinating, and controlling functions, so that others can achieve the established goals alongside them. In the final analysis, management is carried out by people, so the hidden dangers of personnel and management are essentially those of human factors, with the difference being that the hidden dangers of personnel focus on the unsafe state and behavior of workers, whereas hidden dangers of management focus on the unsafe behavior of managers and the unsafe systems they established. Furthermore, the hidden danger factors of the object are the least, primarily in four Information 2021, 12, 243 7 of 16 aspects: working equipment, control system, working instruments, and personal protective equipment; the hidden danger factors of the environment comprise natural and humanistic working environment components. Note: Although factors such as gender, nationality, and age can be controlled through management means (e.g., recruitment conditions), they will not be considered during the development of the hidden danger factor system that can be eliminated through various safety management means in this paper because they are linked to the sensitive social issue of employment discrimination.

Experimental Scheme of Document File
Document information mining has become an important direction of data mining with the application of the Internet and social networks, and it is frequently employed in disciplines such as emotion analysis [34] and public opinion recognition [35]. Figure 1 depicts the experimental scheme for this paper by document mining, which consists of six steps: document collection, document storage, document preprocessing, document mining, analysis and comparison of experimental results, and final establishment of key hidden danger factor system of electric power personal casualty accidents. This paper collects 225 electric power casualty accident reports from the Compilation of National Electric Power Accidents and Electric Power Safety Events (2014 to 2018) [2,[36][37][38][39] provided by the National Energy Administration's Electric Power Safety Supervision Department; document mining experiments are carried out for the accident brief and cause in each accident report. equipment; the hidden danger factors of the environment comprise natural and humanistic working environment components.

Experimental Scheme of Document File
Document information mining has become an important direction of data mining with the application of the Internet and social networks, and it is frequently employed in disciplines such as emotion analysis [34] and public opinion recognition [35]. Figure 1 depicts the experimental scheme for this paper by document mining, which consists of six steps: document collection, document storage, document preprocessing, document mining, analysis and comparison of experimental results, and final establishment of key hidden danger factor system of electric power personal casualty accidents. This paper collects 225 electric power casualty accident reports from the Compilation of National Electric Power Accidents and Electric Power Safety Events (2014 to 2018) [2,[36][37][38][39] provided by the National Energy Administration's Electric Power Safety Supervision Department; document mining experiments are carried out for the accident brief and cause in each accident report.
The cases are all in Chinese and come from the State Grid Corporation of China. The following are the reasons why we chose the Chinese case: (1) Chinese and English have distinct text characteristics that cannot be mixed during text mining. (2) The safety management models and cultures of power firms varies among countries. Case texts in Chinese are being studied in depth in order to better target the characteristics of China's power safety management.

Document Preprocessing
1. Due to the redundancy, incompleteness, and complexity of a huge quantity of document data in the electric power personal casualty accident report, it is required to pre-process the content first to increase the effect of document mining [40], which mostly includes Stopword's filtration. In the original accident report document,

1.
Due to the redundancy, incompleteness, and complexity of a huge quantity of document data in the electric power personal casualty accident report, it is required to pre-process the content first to increase the effect of document mining [40], which mostly includes Stopword's filtration. In the original accident report document, there are numerous punctuation marks such as ",", ".", and "...", as well as meaningless phrases for experimental purposes such as "of", "very", "limited company", and "power supply station" that should be removed. As a result, in order to filter the document content, a stop list must first be specified. 2. Segmentation of Chinese words: some specific words hidden in the whole language in the electric power personal casualty accident report express the precise connotation of the important concealed danger aspects of electric power personal casualty accident. For example, "XiaoX Li, an untrained masonry worker, has weak self-protection awareness and inadequate safety knowledge". It reflects that the accident is related to the work skill level/knowledge level/ability and quality in personnel hidden threats. As a result, word segmentation for the document content using Chinese word segmentation is required, as well as word segmentation for each concealed risk factor's connected terms, in order to prepare for its following characteristic calculation and reduction.

3.
Create your own dictionary: Chinese sentences are made up of meaningful genuine words and function words that help create the sentence grammar; because there is no space between words, Chinese sentences cannot be subdivided directly by spaces and punctuation between words as English sentences can. As a result, a dictionary or an algorithm is often needed to finish the word segmentation [40]. Jieba, a Python-based word segmentation tool, is utilized in this work to finish the word segmentation of an electric power personal casualty accident report document. A custom dictionary was imported to realize the fixed collocation and combination of words related to various hidden danger factors in order to ensure the accuracy and completeness of the word segmentation results related to various hidden danger factors and avoid the segmentation of proper nouns such as "...... weak/self/protection/consciousness, poor/safety/consciousness", "self-protection awareness", and "safety awareness" to guarantee that the follow-up trial runs well.

Word Cloud Building
Word Cloud, also known as word cloud analysis, is a technology that highlights the "keywords" that appear frequently in document data and converts them into cloud-like color pictures, allowing visitors to appreciate or understand the main idea conveyed by the document at a glance, primarily using document mining and visualization technology. Based on the term count result of words directly reflecting the connotation of hidden danger factors, this paper draws a word cloud picture for the accident cause document in the electric power personal casualty accident report and shows the general situation of the important terms related to hidden danger factors; see Figure 2 for the experimental process.

Term Frequency Counting
The word frequency count refers to the statistics of how many times a term appears in the document. In general, the absolute term frequency (i.e., term count) counts the number of terms that appear in a document. However, this creates a problem: a term may appear more frequently in a long document than in a short one, but this does not

Term Frequency Counting
The word frequency count refers to the statistics of how many times a term appears in the document. In general, the absolute term frequency (i.e., term count) counts the number of terms that appear in a document. However, this creates a problem: a term may appear more frequently in a long document than in a short one, but this does not imply that the term is more essential or has more weight. As a result, the normalized term frequency, also known as relative term frequency, is frequently used as the weight value of the term. Word frequency-inverse document frequency (TF-IDF), a popular weighting method for information retrieval and data mining, is based on the basic principle of relative term frequency. TF (term frequency) refers to the frequency of a certain term appearing in the document and normalizes the number of terms to avoid bias towards large documents, whereas IDF (inverse document frequency) is a measure of the universal relevance of terms [41,42].
Due to the long time span, large number, and numerous writers of the 225 electric power casualty accident reports chosen for this paper, the language style is difficult to unify, the description methods are diverse, and the number of words is quite different, resulting in a disadvantage in the simple term count experiment. Some high-frequency keywords directly expressing the meaning of concealed danger elements, for example, are overstated because they appear frequently in many exceedingly long and inconsequential accident cause documents. This is because their description documents are too short, and some other keywords directly representing the meaning of hidden danger factors may be overestimated, impacting the final screening result of important hidden danger factors. As a result, in this paper, the word count experiment is carried out first on the terms related to hidden danger factors, and then the relevant important terms are screened based on the term count results; additionally, the TF-IDF method is introduced to count the relevant important words with the highest TF-IDF value in each accident cause document and their TF-IDF values. After classification and merging, the comprehensive average maximum TF-IDF value of the relevant important words of each hidden danger factor in 225 electric power personal casualty accident reports can be obtained, reflecting the importance and influence of each hidden danger factor on an electric power personal casualty accident.
The accident cause document set in 225 electric power personal casualty accident reports is considered as set D = {D 1 , D 2 , . . . , D m ,}, and D r (r = 1, 2, . . . , m) means the r-th accident cause document in the document set. The term set appearing in the accident cause document set is considered as W = {W 1 , W 2 , . . . , W n }, and W s (s = 1, 2, . . . , n) means the s-th term in the term set. The proper term set of hidden danger factors appearing in the accident cause document set is considered as R = {R 1 , R 2 , . . . , R l }, and R t (t = 1, 2, . . . , l) means the t-th proper term of hidden danger factors.
The influence of specific hidden danger factors is calculated and screened using the following steps: Step (1): calculate the term frequency of the proper term of hidden danger factors R t : TC t means the term frequency of R t in the document set D; x tr means the frequency of the occurrence of R t in D r .
Step (2): set a threshold of term frequency to screen the important proper term of hidden danger factors: The proper terms of hidden danger factors whose term frequencies is greater than threshold (α) will form set U = U 1 , U 2 , . . . , U p , and U k (k = 1, 2, . . . , p, p < l) means the k-th important proper terms of hidden danger factors. The screening discriminant is as follows: The threshold is generally empirically set according to the text analysis field. If TC t exceeds the threshold, R t will be considered as an important proper term of hidden danger factors and be classified into set U; otherwise, it will be removed.
Step (3): determine whether there is a logical correlation between the important proper term of hidden danger factors (U k ) and each specific hidden danger factor (AP ij ) in Table 2, the determination formula is: there is a logical correlation between U k and AP ij 0, there is not a logical correlation between U k and AP ij , k = 1, 2 , . . . , p; i = 1, 2, . . . , 4; j = 1, 2, . . . , 13 Step (4): calculate the term frequency of important proper terms of hidden danger factors related to each specific hidden danger factor (AP ij ): For a specific hidden danger factor AP ij , the larger Q ijk is, the more influence the specific hidden danger factor AP ij will have on electric power personal casualty accidents.
Step (5): calculate the TF-IDF value of each important proper term of hidden danger factors (U k ) in each accident cause text (D r ): TF kr means the TF value of the k-th important proper term of hidden danger factors (U k ) in the r-th accident cause document (D r ); x kr means the frequency of the occurrence of U k in the r-th accident cause document; ∑ n s=1 x sr means the number of terms in the r-th accident document.
IDF k = log 10 m |{r:U k ∈ D r }| , k = 1, 2, . . . , p IDF k means the IDF value of the k-th important proper term of hidden danger factors (U k ), m means the number of documents in the accident cause document set, and |{r : U k ∈ D r }| means the number of accident cause documents containing the k-th important proper term of hidden danger factors (U k ).
TF-IDF kr = TF kr ×IDF k TF-IDF kr means the TF-IDF value of the k-th important proper term of hidden danger factors (U k ) in the r-th accident cause document.
Step (6): determine whether the TF-IDF value of each important proper term of hidden danger factors (U k ) in each accident cause text D r is the largest, B kr = 1, TF-IDF kr ≥ TF-IDF qr , q ∈ {1, 2, . . . , p} 0, otherwise , k = 1, 2, . . . , p; r = 1, 2, . . . , m Step (7): calculate the average highest TF-IDF value of every important proper term of hidden danger factors (U k ) that have correlation with each specific hidden danger factor AP ij in the accident cause text set, It has been found that if an important term related to a hidden danger factor appears more frequently in a specific accident cause document and less frequently in the entire set of accident cause documents, the hidden danger factor has a greater importance and influence on the occurrence of the specific accident.

Data Sources
As stated in Section 4.1, the accident text used in this study is a compilation of 225 electric power personal casualty accident reports from the national compilation of electric power accidents and electric power safety incidents (2014 to 2018) compiled by the national energy administration's electric power safety supervision department. Each article is divided into three sections: a brief account of the accident, the causes of the event, and ideas on how to rectify it. As a result of the unified subject compiling the compilation set and the state having relevant provisions on the terms and format of power accident report writing, the 225 texts may better adapt to the text mining method and process of this study. This study begins with a simple word count of accident type phrases in the accident description section to determine the number distribution of various sorts of accidents. The text of the accident's cause is then mined using word cloud and TF-IDF technologies.

Document Mining of Accident Brief
The term count experiment on the accident brief in 225 electric power personal casualty accident reports (2014 to 2018) was carried out using the document mining method; see Figure 3 for the results.

Document Mining of Accident Brief
The term count experiment on the accident brief in 225 electric power personal casualty accident reports (2014 to 2018) was carried out using the document mining method; see Figure 3 for the results. As illustrated in Figure 3, the six most common types of accident from 2014 to 2018 were falling, electric shock, mechanical injury, collapse, object striking, and derrick falling down, accounting for 87 percent of all accident kinds. As the frequency of falling and electric shock accidents is significantly higher than that of other accident kinds, how to effectively limit the occurrence of these two categories of accidents is the emphasis and challenge of accident prevention work in power enterprises.

Word Cloud of Accident Hidden Danger Factors
The relevant phrases connected to the concealed hazard variables as accident causes in 225 electric power personal injury accident reports (2014 to 2018) are graphically represented using the word cloud analysis approach described in Section 2.3; see Figure 4 for the drawn word cloud. Terms such as "absence", "work leader", "management", "safety management", "supervisor", "inspection", and "violation" appear most frequently and are most frequently used in accident cause documents to explain the accident cause, and their frequent occurrence also represents the high possibility of related hidden danger factors in common accidents. As illustrated in Figure 3, the six most common types of accident from 2014 to 2018 were falling, electric shock, mechanical injury, collapse, object striking, and derrick falling down, accounting for 87 percent of all accident kinds. As the frequency of falling and electric shock accidents is significantly higher than that of other accident kinds, how to effectively limit the occurrence of these two categories of accidents is the emphasis and challenge of accident prevention work in power enterprises.

Word Cloud of Accident Hidden Danger Factors
The relevant phrases connected to the concealed hazard variables as accident causes in 225 electric power personal injury accident reports (2014 to 2018) are graphically represented using the word cloud analysis approach described in 2.3; see Figure 4 for the drawn word cloud. Terms such as "absence", "work leader", "management", "safety management", "supervisor", "inspection", and "violation" appear most frequently and are most frequently used in accident cause documents to explain the accident cause, and their frequent occurrence also represents the high possibility of related hidden danger factors in common accidents.

Screening Results and Analysis of Key Hidden Danger Factors Based on Term Count and TF-IDF
In order to address the inherent defects in the collected 225 electric power personal casualty accident reports (2014 to 2018), the approach of term count combination with TF-IDF developed in 3.4 was used: first, the term count threshold for hidden risk factors ws set to 5, and 389 essential terms linked to hidden danger factors were screened from αthem; second, three characteristic values of the key keywords associated with each concealed hazard factor were calculated: the count of the important terms related to each hidden danger factor (C ijk ), the comprehensive cumulative frequency of the important terms related to each hidden danger factor (Q ijk ), and the comprehensive average maximum TF-IDF value of the important terms related to each hidden danger factor (TF-IDF ijk max ); finally, based on the overall distribution of the experimental results, the threshold was set as follows: for a specific hidden danger factor AP ij i = 1, 2, …, 4; j = 1, 2, …, 13 , if C ijk ≥ 10, Q ijk ≥ 200 and TF-IDF ijk max ≥ 0.1, it will be considered as the key hidden danger factor of electric power personal casualty accident. See Table 3 for the screening results. In Table 3, for a specific hidden danger factor AP ij (i = 1, 2, 3; j = 1, 2, …, 5), if Q ijk is greater with the increasing of C ijk , it shows that that hidden danger factor appears frequently in general electric power personal casualty accidents; if TF-IDF ijk max is greater, it demonstrates that such a hidden danger element has a stronger impact on a personal injury catastrophe using electric power. The primary hidden hazard variables of electric power personal casualty incidents filtered out in this paper include two features based on

Screening Results and Analysis of Key Hidden Danger Factors Based on Term Count and TF-IDF
In order to address the inherent defects in the collected 225 electric power personal casualty accident reports (2014 to 2018), the approach of term count combination with TF-IDF developed in 3.4 was used: first, the term count threshold for hidden risk factors ws set to 5, and 389 essential terms linked to hidden danger factors were screened from α them; second, three characteristic values of the key keywords associated with each concealed hazard factor were calculated: the count of the important terms related to each hidden danger factor (C ijk ), the comprehensive cumulative frequency of the important terms related to each hidden danger factor (Q ijk ), and the comprehensive average maximum TF-IDF value of the important terms related to each hidden danger factor (TF-IDF max ijk ); finally, based on the overall distribution of the experimental results, the threshold was set as follows: for a specific hidden danger factor AP ij (i = 1, 2, . . . , 4; j = 1, 2, . . . , 13), if C ijk ≥ 10, Q ijk ≥ 200 and TF-IDF max ijk ≥ 0.1, it will be considered as the key hidden danger factor of electric power personal casualty accident. See Table 3 for the screening results. In Table 3, for a specific hidden danger factor AP ij (i = 1, 2, 3; j = 1, 2, . . . , 5), if Q ijk is greater with the increasing of C ijk , it shows that that hidden danger factor appears frequently in general electric power personal casualty accidents; if TF-IDF max ijk is greater, it demonstrates that such a hidden danger element has a stronger impact on a personal injury catastrophe using electric power. The primary hidden hazard variables of electric power personal casualty incidents filtered out in this paper include two features based on the assumption of setting a threshold: first, hidden danger factors are present in the majority of electric power personal casualty accidents, emphasizing their prevalence; second, the appearance of the hidden danger factor has a significant impact on the occurrence of an electric power personal casualty accident, emphasizing its harmfulness. The risk concept is the combination of these two characteristics; based on the formula (risk = possibility influence), it can be seen that the occurrence possibility and harmfulness of the working habit (a key hidden danger factor) in the personnel hidden danger are above average, which belongs to the high-risk key hidden danger factor and is the "top priority" in the key hidden danger factor system.

Establishment of Key Hidden Danger Factor System of Electric Power Personal Casualty Accidents Based on Empirical Analysis
In this paper, the key hidden danger factor system of electric power personal casualty accidents is ultimately formed, the names of key hidden danger factors are defined and standardized, and their specific connotations are reinforced; see Table 4 for specific contents.  Table 4 shows that, in contrast to other industrial fields where production accidents are frequently attributed to personnel, object, environment, and management, the key hidden danger factors of electric power personal casualty accidents extracted based on document mining in this paper exclude the hidden danger factors of the environment and only focus on hidden dangers of personnel, object, and management. This is exactly consistent with the actual situation in which many electric power enterprises adjust working arrangements based on environmental weather conditions (e.g., not working on hot days or heavy rain and typhoon days). Therefore, the hidden danger factors of the environment can be ignored.

Conclusions
The significant information in the accident report has not been effectively exploited in the safety management of China's power system. The first step toward realizing the digitalization, automation, and intelligence of power grid safety management is to extract important information from power grid accident reports using machine learning and text mining. This study builds an index system for hidden danger factors based on text mining of power grid safety accident reports based on the current state of China's power grid safety management, employing safety management theory and text mining methods. This method not only reduces the subjectivity of identifying and judging potential safety threats, but it also lays the groundwork for the future implementation of digital and intelligent safety management.
The following are the limitations of this study. As the majority of electric power personal injury accidents are minor or minor-level accidents, and there are few significant accidents, the 225 electric power personal injury accidents collected were minor or minor-level accidents (level 4). Based on text information in Chinese, we only focused on power grid safety management in Chinese.

Conflicts of Interest:
The authors declare no conflict of interest.