Text Mining of Hazard and Operability Analysis Reports Based on Active Learning

: In the ﬁeld of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great signiﬁcance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difﬁcult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Ampliﬁcation of Least Conﬁdence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the ﬁnal model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced.


Introduction
In the field of chemical safety analysis, hazard and operability analysis (HAZOP) [1] is the most popular method to prevent chemical accidents [2]. HAZOP is an evaluation method based on qualitative analysis of hazards, which is used to identify potential hazards and hidden dangers in equipment and process, analyze the causes of accidents, and seek corresponding countermeasures. The analysis process is shown in Figure 1. According to the deviation of each node in the production process, with the help of the analysis of the expert team, the causes and adverse consequences of the deviation are obtained, and the corresponding risk level and protection measures are formulated; the analysis results are finally retained in the form of report text. HAZOP plays an important role in improving the safety of personnel, reducing the economic loss of accidents, and analyzing the causes of accidents, which greatly improves the safety and reliability of the factory. HAZOP has been listed as a standard analytical method in some countries. In China, the State Administration of Work Safety has promoted HAZOP vigorously and issued a series of application guidelines. There is an abundance of information in HAZOP text, such as details on complex equipment, materials, and causality. This information can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and is of great significance to improve the safety of the whole chemical system. Therefore, how to extract key information effectively from HAZOP text is a valuable research. Named entity recognition (NER) [3] can achieve the goa from HAZOP reports. It is the most indispensable part of n and its purpose is to extract information from unstructured te specific meaning is regarded as an entity, such as time, place cific technical terms can also be classified as named entities, field [4], terms in the freight field [5], and bridge damage typ ods to build a NER model: rule dictionary [7], statistical ma learning [9]. Due to its strong stability and generalization, d most popular method. However, HAZOP text is profession tional model based on deep learning cannot extract key i HAZOP reports [10]. To solve this problem, active learning [1 of chemical safety in this study. In the theory of active learn set contains a different amount of information. The sample called high-quality samples, which means that the uncertain and they play a positive role in the training of the model. Sa Named entity recognition (NER) [3] can achieve the goal of mining key information from HAZOP reports. It is the most indispensable part of natural language processing, and its purpose is to extract information from unstructured text. In NER, information with specific meaning is regarded as an entity, such as time, place name, and so on. Other specific technical terms can also be classified as named entities, such as cells in the biological field [4], terms in the freight field [5], and bridge damage types [6]. There are three methods to build a NER model: rule dictionary [7], statistical machine learning [8], and deep learning [9]. Due to its strong stability and generalization, deep learning has become the most popular method. However, HAZOP text is professional and normative; the traditional model based on deep learning cannot extract key information effectively from HAZOP reports [10]. To solve this problem, active learning [11] is introduced into the field of chemical safety in this study. In the theory of active learning, each sample in the data set contains a different amount of information. The samples with more information are called high-quality samples, which means that the uncertainty of these samples is large, and they play a positive role in the training of the model. Samples with less information are called low-quality samples, which means that the uncertainty of these samples is small, and they will hinder the fitting of the model. In addition, there will be similar samples in the dataset that contain similar information, making it difficult to further improve the performance of the model. Therefore, active learning can help to train the deep learning model by mining high-quality samples in data set through the sampling algorithm, which can reduce the limitation of professional and standard HAZOP text on the model and improve the Processes 2021, 9, 1178 3 of 15 stability and generalization ability of the model. Samples with high uncertainty can be filtered from the dataset by the sampling algorithm, which is the core of active learning. Different sampling algorithms affect the final training results of the model. For example, the performance of the model constructed by the sampling algorithm based on uncertainty is better than that of the model constructed by random sampling [12] because the latter with no strategy selects samples randomly and cannot guarantee that the selected samples are of high quality, while the former uses the prediction information of the model to screen out more diverse samples [13].
Deep learning based on active learning is widely used in computer vision and the natural language processing tasks of artificial intelligence. In the field of computer vision, Li et al. [14] proposed a novel active learning method. This method eliminates redundant information through uncertainty, selects key samples to train image classifiers, and achieves good results in the tasks of object recognition and scene recognition. Vununu et al. [15] applied active learning to the task of human epithelial type 2 (Hep-2) cell classification, which helped construct the Hep-2 cell classification system and assisted the computer in the diagnosis of autoimmune diseases. Nath et al. [16] introduced active learning into the task of medical image dataset segmentation, which improved the performance of the model. In the aspect of natural language processing, the deep learning method based on active learning has also achieved good results in the professional field. For example, in the field of materials, Liu et al. [17] introduced active learning into the hypereutectic Al-Si alloy material entity recognition task to build a data-driven model, which can obtain valuable material data from public resources. In the field of chemistry, Wang et al. [18] used active learning to construct the corresponding electrolyte material model, which accelerated the molecular screening of solvate ionic liquids. In the field of health informatics, Bi et al. [19] developed a model to judge the health status of a person based on active learning to exclude similar feature data in the data set and based on deep learning in order to identify human activities. Such a model has a wide range of application scenarios.
In the field of chemical safety, a named entity recognition model based on deep learning can mine key information from HAZOP reports; according to this information, engineers can complete a new round of risk analysis and staff can improve the efficiency of industrial system optimization. The model can improve the safety of chemical systems and has high application value. However, the HAZOP report is normative and professional, the information density of the text is very high, the relevance between the information is strong, and there is a lot of polysemy and semantic ambiguity, as well as entities nested with each other. These characteristics lead to the recognition that performance of the model cannot reach the ideal level, and key information cannot be efficiently mined from HAZOP reports. To solve this problem, an improved method based on active learning is proposed, and three novel sampling algorithms have been designed: Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE), and Amplification of Least Confidence (ALC). The high-quality samples in the dataset can be effectively screened out by these three algorithms, which improves the ability of the model to understand HAZOP text. In this method, a small amount of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to get the final model. The performance of this method was evaluated and compared. The main contributions of this paper are as follows:

•
An improved text mining method is proposed to mine the key information in HAZOP reports, and active learning is introduced into chemical safety entity recognition tasks for the first time; • Three novel sampling algorithms, VTE, HCE, and ALC, are proposed. High-quality samples can be effectively screened out by these three algorithms, which improves the ability of the model to understand HAZOP text; • Experiments prove that the method and the three algorithms are reliable and advanced.
The second part of this paper is titled "Related Research", wherein the content of the deep learning model and the characteristics of HAZOP reports are mainly described. Three novel sampling algorithms and the training process of the model are introduced in the third part. The fourth part describes the evaluation experiment and comparative experiment, and the experimental results are discussed and analyzed. The "Conclusions" are found in the fifth part.

NER Model
At present, the mainstream NER structures based on deep learning are the bidirectional long short-term memory-conditional random field model (BiLSTM-CRF) [20] and iterated dilated convolution-conditional random field model (IDCNN-CRF) [21], which represent the recurrent neural network (RNN) [22] and convolutional neural network (CNN) [23], respectively. In this paper, the skip-gram algorithm in Word2vec [24] is used to convert text input into word embedding in the two models because skip-gram has a good performance in semantic modeling tasks [25]. In addition, CRF is used as the decoder of these two models to predict the final label because CRF can reduce the invalid tags in the prediction results [26]. The main difference between the two models is their encoding layer. For the BiLSTM-CRF model, BiLSTM is used as the feature extractor. BiLSTM is composed of forward LSTM and backward LSTM, which can capture the dependency between words in a longer sequence [27] and encode the context information. For the IDCNN-CRF model, dilated convolution [28] is used as the encoder to extract text features through convolution operation, which can overcome the problem that the peripheral neurons lose information after convolution operation of traditional CNN. The structure of the model is shown in Figure 2, where "#" is the separator, the left side of "#" is the BiLSTM-CRF model, and the right side of "#" is the IDCNN-CRF model.

NER Model
At present, the mainstream NER structures based on deep learning are the bidirectional long short-term memory-conditional random field model (BiLSTM-CRF) [20] and iterated dilated convolution-conditional random field model (IDCNN-CRF) [21], which represent the recurrent neural network (RNN) [22] and convolutional neural network (CNN) [23], respectively. In this paper, the skip-gram algorithm in Word2vec [24] is used to convert text input into word embedding in the two models because skip-gram has a good performance in semantic modeling tasks [25]. In addition, CRF is used as the decoder of these two models to predict the final label because CRF can reduce the invalid tags in the prediction results [26]. The main difference between the two models is their encoding layer. For the BiLSTM-CRF model, BiLSTM is used as the feature extractor. BiLSTM is composed of forward LSTM and backward LSTM, which can capture the dependency between words in a longer sequence [27] and encode the context information. For the IDCNN-CRF model, dilated convolution [28] is used as the encoder to extract text features through convolution operation, which can overcome the problem that the peripheral neurons lose information after convolution operation of traditional CNN. The structure of the model is shown in Figure 2, where "#" is the separator, the left side of "#" is the BiLSTM-CRF model, and the right side of "#" is the IDCNN-CRF model.  Figure 2. The network structure of the model, where "#" is the separator, the left side of "#" is the BiLSTM-CRF model, and the right side of "#" is the IDCNN-CRF model.

The Characteristics of HAZOP Report
Due to the standardization and professionalism of HAZOP reports, they share the following characteristics: 1. There are a lot of proprietary expressions about process, equipment, and materials, such as "Fischer Tropsch reactor", "Rich liquid flash tank", and so on; 2. There are a great variety of professional words, and the formation of these words is different from the general field. They have low causality, fuzzy semantic information, and different expressions for the same entity; 3. Different types of entities are nested with each other. For example, "recycle gas compressor" corresponds to different entity types in different situations. It may be "recycle gas" material or "compressor" equipment, which can be a great obstacle for the understanding ability of the model; Figure 2. The network structure of the model, where "#" is the separator, the left side of "#" is the BiLSTM-CRF model, and the right side of "#" is the IDCNN-CRF model.

The Characteristics of HAZOP Report
Due to the standardization and professionalism of HAZOP reports, they share the following characteristics: 1.
There are a lot of proprietary expressions about process, equipment, and materials, such as "Fischer Tropsch reactor", "Rich liquid flash tank", and so on; 2.
There are a great variety of professional words, and the formation of these words is different from the general field. They have low causality, fuzzy semantic information, and different expressions for the same entity; 3.
Different types of entities are nested with each other. For example, "recycle gas compressor" corresponds to different entity types in different situations. It may be Processes 2021, 9, 1178 5 of 15 "recycle gas" material or "compressor" equipment, which can be a great obstacle for the understanding ability of the model; 4.
Polysemy is common. For example, the part of speech of "interrupt" is different between "interrupt device" and "device interrupt". In some scenes, the guiding words of "low", "small", and "little" have the same meaning.

5.
Compared with entities in the general domain, chemical safety entities contain more characters, such as "antifreeze dehydration tower overhead air cooler", "release gas compressor reflux cooler", and "ethylene glycol recovery tower overhead knockout tank", etc., which have negative effects on model fitting.

Sampling Algorithm
In this paper, the uncertainty-based AL framework was used to train the NER model. The classical sampling algorithms include Token Entropy (TE) [29] and Least Confidence (LC) [30]. However, HAZOP reports are normative and professional, which means these algorithms are not suitable for the field of chemical safety, and the improvement of model performance is not obvious. To solve this problem, three novel sampling algorithms based on TE and LC are proposed in this study. The following is an analysis of these algorithms.
In machine learning (ML), entropy [31] is a measure of the disordered state, which is used to measure the amount of information of an event with multiple states and the expected value of the amount of information about the event probability distribution. For the discrete random variable Y, the information entropy can be calculated by Formula (1).
According to the meaning of entropy, the amount of information is inversely proportional to the probability of an event. Based on the posterior entropy of the model, a TE algorithm was proposed by Burr et al. According to Formula (1), the TE algorithm relies on the predicted value P(y i ) of the sample label, but the recognition accuracy of the initial model based on HAZOP text is not high enough, and the algorithm does not show good effect. Therefore, a variation of the token entropy (VTE) algorithm is proposed based on TE in this study.
Formula (2) is the calculation formula of VTE, where T is the length of the sequence and l is the label type. P(y t = l) is the abbreviation of marginal probability, which means the probability of label l at position t, Q(y t = l) means the probability that label l is not at position t. The variable Q(y t = l) is introduced by the VTE algorithm, and the variable Q(y t = l) and the variable P(y t = l) are considered by the model at the same time. The false prediction value of samples is used to constrain the real prediction value so as to get rid of the limitation that the algorithm relies too much on the prediction ability of the initial model. Compare the value of ϕ VTE with the given threshold τ and select the samples whose value of ϕ VTE is greater than the value of τ (that is, the samples with rich information). The image of the VTE algorithm is shown in Figure 3, where the shadow layer represents the set threshold τ. The first derivative function of VTE is shown in Figure 4. It can be seen from Figures 3 and 4 that the value of ϕ VTE is inversely proportional to the prediction value of the label. With the decrease of the prediction value of the label, the richer the corresponding sample information is, the higher the value of ϕ VTE is, and the greater the gap with the threshold value is, so the sampling error can be reduced. The prediction value of label is in the range of 0 to 0.2, the value of ϕ VTE changes dramatically, and the samples with different quality are separated effectively, which is helpful for the algorithm to select high-quality samples. The prediction value of the label is in the range of 0.8 to 1, and the low-quality samples can be eliminated well. Therefore, through the VTE algorithm, the iterative efficiency of model training can be improved, the negative impact of the accuracy of the initial model on the sampling process is weakened, and the sampling error is reduced. esses 2021, 9, x FOR PEER REVIEW of the initial model on the sampling process is weakened, and the sampling duced.  In machine learning, cross-entropy can be used to measure the differen two probability distributions and eliminate the uncertainty of the system. Fo the cross-entropy based on the current model can be calculated by Formula (3 where T is the length of the sequence and l is the label type. Considering th   In machine learning, cross-entropy can be used to measure the diffe two probability distributions and eliminate the uncertainty of the system. the cross-entropy based on the current model can be calculated by Formula where T is the length of the sequence and l is the label type. Considering In machine learning, cross-entropy can be used to measure the difference between two probability distributions and eliminate the uncertainty of the system. For sample x, the cross-entropy based on the current model can be calculated by Formula (3). where T is the length of the sequence and l is the label type. Considering the posterior entropy and cross-entropy of the current model, the cross-entropy term is used to reduce the sampling uncertainty and improve the sampling efficiency. In order to adapt the two different entropies and make the sampling algorithm fit the HAZOP text, the α and β were introduced in this study, as shown in Formula (4). α and β are the coefficients with the interval of [0, 1], which control the posterior entropy term and the cross-entropy term, respectively, so that the two entropies are in a relatively balanced state.
where P(y t = l) is the probability that the label at position t is l. A large number of experiments show that when α = 1/3, β = 2/3, the effect of the algorithm is the most obvious, the cross-entropy term is given higher weight, and the system uncertainty is reduced by more measuring of the positive and negative prediction values of the sample label, as shown in Formula (5).
The correlation between entities in a HAZOP report is relatively small, and the length of the sequence will have a negative impact on the sampling results. In this paper, Formula (6) is normalized and its coefficients are removed. The final result is shown in Formula (7).
Formula (7) is the calculation formula of the HAZOP Confusion Entropy (HCE) algorithm proposed in this paper. The characteristics of HAZOP text are considered by HCE, the posterior entropy is given a low weight to reduce the dependence of active learning on the recognition performance of the initial model, and the cross-entropy term is introduced to improve the stability of sampling.
Culotta et al. [32] used least confidence (LC) to measure the informativeness of text and applied LC to NER. According to the most uncertain probability predicted by the current model, the samples were sorted in descending order.
Formula (8) is the calculation formula of LC. However, for some sequences, there are some cases where the difference between samples is small. For example, in the text of "light oil level is too low, light oil tank is in trouble", "light oil" and "light oil tank" should be classified into different entity types. The current model may have errors in the process of predicting sample labels, which would lead to the failure of the LC algorithm. In this paper, the algorithm of amplification of least confidence (ALC) is proposed. Firstly, ALC makes the difference between samples more obvious through exponential operation. The samples are then sorted in ascending order according to the value of ALC output from the model. The calculation formula of ALC is shown in Formula (9). The brief proof and description are as follows.
The predicted value P(y|{x ij }) of the sample label transformed by the SoftMax module is also in the interval (0,1), so the relationship (12) can be obtained.
|e p(y m |{x ij }) − e p(y n |{x ij }) | > |p(y m |{x ij }) − p(y n |{x ij })| (12) It can be seen from inequality (12) that through the ALC algorithm the probability values calculated by the current model for the sequence are amplified, the differences between the sequences are adjusted, and the sequence samples are sampled more effectively.

Training Process
The algorithms in Section 3.1 were used to train the model in the framework of active learning. The training process is shown in Figure 5. The concrete process is as follows: 1.
Partial samples are used to train the initial model; 2.
The label probabilities of other samples are calculated by the initial model. According to these probabilities, high-quality samples are selected from the data set by the sampling algorithm; 3.
The selected samples are added to the training set; 4.
Steps 1-3 are repeated until the number of training sets reaches the specified value; 5.
The final training set is used to retrain the model.
The predicted value P(y|{xij}) of the sample label transforme is also in the interval (0,1), so the relationship (12) can be obtaine It can be seen from inequality (12) that through the ALC a values calculated by the current model for the sequence are amp tween the sequences are adjusted, and the sequence samples are s

Training Process
The algorithms in Section 3.1 were used to train the model in learning. The training process is shown in Figure 5

Experimental Data
The HAZOP reports of the 4 million tons/year indirect coa collected from the Shenhua Ningxia coal industry group. The rule to preprocess the HAZOP reports to get the original HAZOP tex

Experimental Data
The HAZOP reports of the 4 million tons/year indirect coal liquefaction project was collected from the Shenhua Ningxia coal industry group. The rule-based method was used to preprocess the HAZOP reports to get the original HAZOP text. The materials (such as light oil, circulating gas, etc.) and equipment (such as Fischer Tropsch reactor, distillation column, etc.) in the text were regarded as named entities, and then these entities were manually marked in the form of "B-X", "I-X", or "O", where "B-X" represented the beginning of an entity of type X, "I-X" represented the rest of the entity of type X, and "O" represented a non-entity. In this way, the boundary of different entities could be defined clearly, which is helpful to model training. Materials and equipment were marked as "MAT" and "EQU", respectively. A specific labeling example is shown in Table 1. These marked data sets were HAZOP data sets, which are divided into training sets, test sets, and verification sets in the ratio of 8:1:1. In this text, "the level controller" was seen as a kind of equipment, the first character of which was marked as "B-EQU", and the other characters were marked as "I-EQU". The last two characters of the text were non-entities and were marked "O".
In this paper, the precision (P), recall (R), and F1-score (F 1 ) were used to measure the recognition performance of the model. The calculation formulas are shown in (13)- (15). The relevant information in these formulas is shown in Table 2.

Experimental Results
Based on the BiLSTM-CRF model, the reliability and effectiveness of the proposed method in this study were tested, and the experimental results are shown in Table 3. Figure 6 shows the improvement effect of VTE, HCE, ALC, and algorithm baseline RAND [30] on the performance of this model.  According to Table 3, the F1 score of the BiLSTM-CRF model based on VTE, HCE, and ALC was 94.62%, 94.43%, and 93.64%, respectively, which was higher significantly than that of BiLSTM-CRF model (88.82%); the performance of the model was improved, which proves the effectiveness of the proposed method. It can be seen from Figure 6 that the performance of the three algorithms was obviously better than RAND, which made the F1 score of the model more stable and had a positive effect on active learning training. The highest F1 score (94.62%) was obtained via the VTE-BiLSTM-CRF model, which could better adapt to the text in the field of chemical safety. In situations where samples with different quality are mixed together, through the VTE algorithm, high-quality samples can be effectively screened out, samples with less information can be eliminated, and the iterative efficiency of model training is improved. Compared with RAND, the trend of the F1 score of the HCE-BiLSTM-CRF model was more stable, and the uncertainty of the NER system could be eliminated by the HCE algorithm. In addition, the F1 score of the ALC-BiLSTM-CRF model was 4.82% higher than that of the basic model. The difference between HAZOP samples can be amplified by the ALC algorithm to improve the recognition effect of the model.
Based on the IDCNN-CRF model, the reliability and effectiveness of the proposed method in this study was tested, and the experimental results are shown in Table 4. Figure 7 shows the improvement effect of VTE, HCE, ALC, and the algorithm baseline RAND on the performance of this model.  According to Table 4, the F1 score of the IDCNN-CRF model based on VTE and ALC was 3.34%, 3.5%, and 3.33% higher than that of the basic model, respe The F1 score of the model was improved greatly, which had a positive effect on t formance of the model; the generalization of the proposed method is proved by th periments. As can be seen from Figure 7, based on the IDCNN-CRF model, the mance of the three algorithms was still better than RAND, obviously.
Based on the CRF model [33], the effectiveness of the proposed method in thi was tested, and the experimental results are shown in Table 5 and Figure 8. The According to Table 4, the F1 score of the IDCNN-CRF model based on VTE, HCE, and ALC was 3.34%, 3.5%, and 3.33% higher than that of the basic model, respectively. The F1 score of the model was improved greatly, which had a positive effect on the performance of the model; the generalization of the proposed method is proved by these experiments. As can be seen from Figure 7, based on the IDCNN-CRF model, the performance of the three algorithms was still better than RAND, obviously.
Based on the CRF model [33], the effectiveness of the proposed method in this study was tested, and the experimental results are shown in Table 5 and Figure 8. The results show that the F1 score of the VTE-CRF model, HCE-CRF model, and ALC-CRF model were 90.01%, 89.62%, and 89.46%, respectively, while that of the CRF model was only 71.49%.   In this study, the performance of the BiLSTM-CRF model was better than tha IDCNN-CRF model and CRF model, so based on the BiLSTM-CRF model, differen In this study, the performance of the BiLSTM-CRF model was better than that of the IDCNN-CRF model and CRF model, so based on the BiLSTM-CRF model, different methods to improve the recognition effect of the model are further discussed, and the proposed method is compared with other methods. The comparison results are shown in Table 6. Among them, in the CNN-BiLSTM-CRF model [34], the convolutional neural network module is introduced, and the text features of HAZOP are further extracted, which makes up for the deficiency that some important information may be discarded by the BiLSTM module due to the capacity problem. In the IDCNN-BiLSTM-CRF model, BiLSTM and IDCNN are integrated together, which can not only alleviate the symptom that the peripheral neurons are easy to lose data information after convolution operation of traditional CNN, but also endow the word vector with the characteristics of context. In the BiLSTM-Attention-CRF model, the self-Attention [35] module is introduced. Through the weighted summation operation of the output vector of each time step, the relationship between the words in the sentence is established and the ability of the model to grasp long-distance information is strengthened. In the BERT-BiLSTM-CRF model [36], the pretraining language model BERT with strong generalization ability is used to improve the understanding of text semantic information in the model, and the dependence symptoms between texts are alleviated. VTE-BiLSTM-CRF, HCE-BiLSTM-CRF, and ALC-BiLSTM-CRF are the proposed models. VTE, HCE, and ALC algorithms are used to select high-quality samples to train models. In the first three methods, additional neural network structure was added on the basis of the model to improve the recognition effect of the model through a richer deep neural network. In the fourth method, a pre-training language model is used to assist the neural network to process the task of feature extraction so as to improve the performance of the model. In the proposed method, from the perspective of data optimization and sample selection, high-quality samples are provided to the model, and the performance of the model is improved effectively. It can be seen from Table 6, compared with the base model, the model recognition effect was not improved significantly based on the first three methods, and the recognition effect of the BiLSTM-Attention-CRF model was even lower than that of the base model. In HAZOP reports, there are not many long-distance texts with connections, and self-Attention does not play an ideal role. On the contrary, while long-distance information is pursued by the model, local information of the text is ignored, which led to a decrease of 1.24 percentage points in the F1 score of the model. In addition, the IDCNN-BiLSTM-CRF model is slightly better than the CNN-BiLSTM-CRF model because a dilated CNN can expand the scope of information processing and improve the limitations of traditional the CNN layer.
The F1 score of the BERT-BiLSTM-CRF model was 93.31%, which is 4.49% higher than that of the base model. Although BERT, with a large number of parameters and strong feature extraction ability, can help the base model to learn rich semantic information features from the text, the concrete common sense and reasoning contained in language are not acquired. In HAZOP reports, more information is recorded about the causal relationship between materials and equipment. The F1-score of the BiLSTM-CRF model based on VTE, HCE, and ALC was 94.62%, 94.43%, and 93.64%, respectively; the performance of the model was improved to an advanced level by the method and algorithms in this study.
In summary, the performance of the VTE, HCE and ALC algorithms was obviously better than RAND, and the recognition performance of the NER model was improved greatly, which solves the problem that HAZOP text information could not be excavated effectively by previous models and proves that the three algorithms are advanced. In addition, compared with other methods, the proposed method helped the model achieve higher recognition performance, which proves the effectiveness and reliability of the proposed method.

Conclusions
In the field of chemical safety, a named entity recognition model based on deep learning can mine key information from HAZOP reports; according to the information, engineers can complete a new round of risk analysis and staff can improve the efficiency of industrial system optimization. This task can improve the safety of chemical systems and has high application value. However, due to the standardization and professionalism of HAZOP reports, the recognition performance of the model cannot reach the ideal level, the F1 scores of the BiLSTM-CRF model, IDCNN-CRF model, and CRF model were only 88.82%, 89.95%, and 71.49%, respectively. To solve this problem, an improved text mining method is proposed, and active learning is introduced into chemical safety entity recognition tasks for the first time. Three novel sampling algorithms, VTE, HCE, and ALC, are proposed. The experimental results show that VTE, HCE and ALC are obviously superior to RAND. Through these three algorithms, the F1 score of the BiLSTM-CRF model was improved to 94.62%, 94.43%, and 93.64%, respectively, the F1 score of the IDCNN-CRF model was improved to 93.39%, 93.45%, and 93.38%, respectively, and the F1 score of the CRF model was improved to 90.01%, 89.62%, and 89.56%, respectively. The results show that the recognition performance of the model is improved greatly. In addition, compared with other methods, the effectiveness of this method is better. It was proved that the method and the three algorithms are reliable and advanced.
Future research should focus on extracting the causal relationship between all kinds of information in HAZOP text and design a knowledge map of chemical safety analysis. Knowledge mapping can be used to guide the staff to optimize the hidden dangers of the whole process system, which has a positive significance to improve the safety of the industrial process.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement: Not applicable.
Acknowledgments: Our special thanks go to Dong Gao and Beike Zhang, School of Beijing University of Chemical Technology, for their insightful suggestions.

Conflicts of Interest:
We declare that we have no financial or personal relationships with other people or organizations that could inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, "Text Mining of Hazard and Operability Analysis Reports Based on Active Learning".